HK40085792A

HK40085792A - System and method for view optimized 360 degree virtual reality video streaming

Info

Publication number: HK40085792A
Application number: HK62023073613.9A
Authority: HK
Inventors: 沙林·马汉卓; 杨晖; 刘杉; 徐萌; 封薇薇
Original assignee: 腾讯美国有限责任公司
Priority date: 2020-11-13
Filing date: 2021-05-20
Publication date: 2023-08-11

Description

System and method for 360-degree virtual reality video streaming for view optimization

Cross-referencing

This application claims priority to U.S. application Ser. No. 17/097,604, filed 11/13/2020, which is hereby incorporated by reference in its entirety.

Technical Field

The present disclosure relates generally to the field of data processing, and more particularly to video encoding and/or decoding with respect to view-optimized 360-degree Virtual Reality (VR) video streaming.

Background

360-degree VR video streaming presents several unique challenges compared to conventional video streaming pipelines. For example, the resolution of 360 videos is typically very large, varying from 4K to 16K. If the video to be transmitted is in stereo format, the resolution becomes twice as large. Such high resolution video also requires very high bandwidth networks to enable real-time streaming.

Disclosure of Invention

Embodiments relate to methods, systems, and computer-readable media for streaming an encoded Virtual Reality (VR) video stream.

According to one aspect, a method for streaming an encoded VR video stream is provided. The method comprises the following steps: receiving a plurality of segments of the encoded VR video stream; storing the plurality of segments in a play-out buffer, wherein a buffered segment from the plurality of segments comprises a plurality of buffered tiles including at least one refined buffered tile corresponding to a previous viewport; determining whether a current playback time of a VR video corresponding to the encoded VR video stream is within a threshold time of a playback time of the buffered segment; determining whether a current duration of the play-out buffer is greater than a threshold duration; determining whether the current bandwidth is greater than a threshold bandwidth; determining whether a current viewport is different from the previous viewport; based on determining that the current play-out time is within the threshold time, that the current duration of the play-out buffer is greater than the threshold duration, that the current bandwidth is greater than the threshold bandwidth, and that the current viewport is different from the previous viewport, storing at least one refined tile corresponding to the current viewport into the play-out buffer; constructing a frame corresponding to the buffered segment based on the plurality of buffered tiles and the at least one refined tile corresponding to the current viewport; and decoding the encoded VR video stream based on the constructed frame.

According to one aspect, an apparatus for streaming an encoded VR video stream is provided. The device comprises: at least one memory configured to store program code; and at least one processor configured to read the program code and to operate as instructed by the program code, the program code comprising: receiving code configured to cause the at least one processor to receive a plurality of segments of the encoded VR video stream; first storing code configured to cause the at least one processor to store the plurality of clips in a play-out buffer, wherein a buffered clip from the plurality of clips comprises a plurality of buffered tiles including at least one refined buffered tile corresponding to a previous viewport; first determining code configured to cause the at least one processor to determine whether a current playback time of a VR video corresponding to the encoded VR video stream is within a threshold time of a playback time of the buffered segment; second determining code configured to cause the at least one processor to determine whether a current duration of the play-out buffer is greater than a threshold duration; third determining code configured to cause the at least one processor to determine whether a current bandwidth is greater than a threshold bandwidth; fourth determining code configured to cause the at least one processor to determine whether a current viewport is different from the previous viewport; second storing code configured to cause the at least one processor to store, based on determining that the current play time is within the threshold time, that the current duration of the play buffer is greater than the threshold duration, that the current bandwidth is greater than the threshold bandwidth, and that the current viewport is different from the previous viewport, at least one refined tile corresponding to the current viewport into the play buffer; first constructing code configured to cause the at least one processor to construct a frame corresponding to the buffered segment based on the plurality of buffered tiles and the at least one refined tile corresponding to the current viewport; and decoding code configured to cause the at least one processor to decode the encoded VR video stream based on the constructed frame.

According to one aspect, a non-transitory computer-readable medium for streaming an encoded VR video stream is provided. The computer-readable medium may store instructions comprising: one or more instructions that, when executed by one or more processors of a device for receiving an encoded Virtual Reality (VR) video stream, cause the one or more processors to: receiving a plurality of segments of the encoded VR video stream; storing the plurality of clips in a play-out buffer, wherein a buffered clip from the plurality of clips includes a plurality of buffered tiles including at least one refined buffered tile corresponding to a previous viewport; determining whether a current play time of a VR video corresponding to the encoded VR video stream is within a threshold time of a play time of the buffered segment; determining whether a current duration of the play-out buffer is greater than a threshold duration; determining whether the current bandwidth is greater than a threshold bandwidth; determining whether a current viewport is different from the previous viewport; based on determining that the current play time is within the threshold time, the current duration of the play buffer is greater than the threshold duration, the current bandwidth is greater than the threshold bandwidth, and the current viewport is different from the previous viewport, storing at least one refined tile corresponding to the current viewport into the play buffer; constructing a frame corresponding to the buffered segment based on the plurality of buffered tiles and the at least one refined tile corresponding to the current viewport; and decoding the encoded VR video stream based on the constructed frame.

Drawings

These and other objects, features and advantages will become apparent from the following detailed description of illustrative embodiments which is to be read in connection with the accompanying drawings. The various features of the drawings are not to scale, as the illustrations are for the purpose of facilitating a clear understanding by a person skilled in the art in conjunction with the detailed description. In the drawings:

FIG. 1 illustrates a networked computer environment, according to at least one embodiment;

fig. 2 illustrates an example of a viewport in a frame of a 360-degree VR video in accordance with at least one embodiment;

fig. 3A-3D illustrate examples of tile-based VR video streaming in accordance with at least one embodiment;

fig. 4 illustrates an example of filling a play buffer of a video client when a viewport of a user changes, in accordance with at least one embodiment;

FIG. 5 illustrates an example of a segment download and refinement module in accordance with at least one embodiment;

FIG. 6 illustrates a tile merge module in accordance with at least one embodiment;

FIG. 7 illustrates an example comparison between a user position and a viewport position in accordance with at least one embodiment;

FIG. 8 illustrates an operational flow diagram of steps performed by a program for encoding video data in accordance with at least one embodiment;

FIG. 9 is a block diagram of internal and external components of the computer and server shown in FIG. 1, according to at least one embodiment;

FIG. 10 is a block diagram of an illustrative cloud computing environment for the computer system shown in FIG. 1 in accordance with at least one embodiment; and

FIG. 11 is a block diagram of functional layers of the illustrative cloud computing environment of FIG. 10 in accordance with at least one embodiment.

Detailed Description

Detailed embodiments of the claimed structures and methods are disclosed herein. However, it is to be understood that the disclosed embodiments are merely illustrative of the claimed structures and methods that may be embodied in various forms. These structures and methods may, however, be embodied in many different forms and should not be construed as limited to the exemplary embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art. In the description, details of well-known features and techniques may be omitted to avoid unnecessarily obscuring the presented embodiments.

Embodiments are generally directed to buffer management techniques when view-optimized VR 360 degree video is played on a client device. For example, embodiments may relate to systems and methods that may decouple playout buffer size and viewport switching latency in view optimized VR 360 degree video streaming.

Aspects are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer-readable media according to various embodiments. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.

Referring now to fig. 1, a functional block diagram of a networked computer environment illustrates a video encoding system 100 (hereinafter "system") for encoding and/or decoding video data according to an exemplary embodiment as described herein. It will be appreciated that FIG. 1 provides only an illustration of one implementation and is not intended to suggest any limitation as to the environments in which different embodiments may be implemented. Many modifications to the depicted environments may be made as required by design and implementation requirements.

The system 100 may include a computer 102 and a server computer 114. The computer 102 may communicate with a server computer 114 over a communication network 110 (hereinafter "network"). Computer 102 may include a processor 104 and a software program 108, the software program 108 being stored on a data storage device 106 and capable of connecting with a user and communicating with a server computer 114. As will be discussed below with reference to FIG. 9, computer 102 may include internal components 800A and external components 900A, respectively, and server computer 114 may include internal components 800B and external components 900B, respectively. The computer 102 may be, for example, a mobile device, a telephone, a personal digital assistant, a netbook, a laptop computer, a tablet computer, a desktop computer, or any type of computing device capable of running programs, accessing a network, and accessing a database.

As discussed below with respect to fig. 10 and 11, the server computer 114 may also operate in a cloud computing service model, such as software as a service (SaaS), platform as a service (PaaS), or infrastructure as a service (IaaS). The server computer 114 may also be located in a cloud computing deployment model, such as a private cloud, a community cloud, a public cloud, or a hybrid cloud.

A server computer 114, which is operable to encode video data, is enabled to run a video encoding program 116 (hereinafter "program") that can interact with the database 112. In one embodiment, the computer 102 may operate as an input device including a user interface, and the program 116 may run primarily on the server computer 114. In alternative embodiments, the program 116 may run primarily on one or more computers 102, while the server computer 114 may be used to process and store data used by the program 116. It may be noted that the program 116 may be a stand-alone program or may be integrated into a larger video encoding program.

However, it may be noted that in some cases, the processing of program 116 may be shared between computer 102 and server computer 114 in any proportion. In another embodiment, the program 116 may operate on multiple computers, server computers, or a combination thereof (e.g., multiple computers 102 communicating with a single server computer 114 over the network 110). In another embodiment, for example, the program 116 may operate on multiple server computers 114 in communication with multiple client computers over the network 110. Alternatively, the program may operate on a network server in communication with a server and a plurality of client computers via a network.

Network 110 may include wired connections, wireless connections, fiber optic connections, or some combination thereof. In general, the network 110 may be any combination of connections and protocols that support communication between the computer 102 and the server computer 114. Network 110 may include various types of networks, such as a Local Area Network (LAN), a Wide Area Network (WAN) (e.g., the internet), a telecommunications network (e.g., the Public Switched Telephone Network (PSTN)), a wireless network, a public switched network, a satellite network, a cellular network (e.g., a fifth generation (5G) network, a Long Term Evolution (LTE) network, a third generation (3G) network, a Code Division Multiple Access (CDMA) network, etc.), a Public Land Mobile Network (PLMN), a Metropolitan Area Network (MAN), a private network, an ad hoc network, an intranet, a fiber-based network, etc., and/or a combination of these or other types of networks.

The number and arrangement of devices and networks shown in fig. 1 are provided as examples. In practice, there may be additional devices and/or networks, fewer devices and/or networks, different devices and/or networks, or devices and/or networks that are located differently than shown in FIG. 1. Further, two or more of the devices shown in fig. 1 may be implemented within a single device, or a single device shown in fig. 1 may be implemented as multiple distributed devices. Additionally or alternatively, one set of devices (e.g., one or more devices) of system 100 may perform one or more functions described as being performed by another set of devices of system 100.

A number of techniques may be used to address the large bandwidth requirements of 360 degree VR video. According to an embodiment, for example, as shown in fig. 2, some techniques may take advantage of the fact that when a 360 degree VR video is used on a client device, the user 202 only sees a portion of the entire 360 degree sphere 204. This portion of the sphere 204 under the field of view (FOV) of the user 202 may be referred to as the viewport 206. Since the user does not view the entire sphere 202 of 360 degrees of video at the same time, the areas that are not visible to the user are not transmitted at the highest quality. Embodiments related to this VR 360 video stream optimization technique may be referred to as view optimization or viewport-based streaming. Using this optimization technique, the region under the user's viewport (e.g., viewport 206) can be transmitted at the highest quality, while the remaining region is transmitted at a relatively lower quality.

There are several ways in which this viewport-based streaming can be implemented, depending on the embodiment.

For example, in an embodiment, multiple representations of a scene may be created 360, each representation covering a unique region of the scene with high quality and the remainder of the scene with low quality.

As another example, in an embodiment, tile-based streaming may be used. Fig. 3A-3B illustrate examples of tile-based streaming according to embodiments. For example, as shown in FIG. 3A, a 360 degree scene may be divided into multiple tiles, and multiple such representations may be created, e.g., representing Q1 may have low quality or low resolution, representing Q2 may have higher quality or higher resolution, and representing Q3 may have the highest quality and highest resolution. The video client may dynamically merge high-quality tiles, e.g., tiles designated 3 and 2 in fig. 3B and 3C, for the area of sphere 204 corresponding to viewport 206, and low-quality tiles, e.g., tiles designated 1 in fig. 3B and 3C, for the non-viewport area of sphere 204, to create an entire frame of 360-degree VR video. As shown in fig. 3C, using a larger number of high quality tiles may increase the required bandwidth.

While such techniques may help optimize client-side bandwidth requirements, challenges arise when the client viewport changes within a 360-degree virtual reality environment. As the user moves around, the client can download new video frames with high quality tiles/streams in the user's new viewport. But the client will continue to play the video frames that have been buffered for the previous viewport.

Fig. 4 shows an example of filling a play buffer of a video client when a viewport of a user changes. For example, at time t, the current viewport may be viewport VP1. When the current viewport is still the viewport VP1, the play-out buffer may be filled with segments optimized for the viewport VP1, such as segment n, segment n +1, segment n +2, and segment n +3.

Then, at time t + i, the current viewport may be changed to viewport VP2. Thus, the client may start downloading new optimized segments for VP2, such as segment n +4 and segment n +5. However, the client may continue to play back VP1 segments that are already available in the play-out buffer, e.g. segment n, segment n +1, segment n +2 and segment n +3 optimized for the viewport VP1. Thus, the user experience may be sub-optimal before the client starts playing the newly downloaded video frames. The time difference from the time the user starts viewing the new viewport VP2 to the time they actually start seeing the highest possible quality tiles/streams in their new viewport can be referred to as the viewport switching delay. For example, the viewport switching delay may be proportional to the sum of the play-out buffer duration and the clip duration.

One technique to optimize such view switching may be to keep the buffering duration very short at the client. While this helps reduce viewport switching delay, it may introduce problems in protecting the player buffer from different network conditions on the client side. If the client bandwidth drops, the play may be stuck until all required tiles/streams are downloaded. On the other hand, even if the client bandwidth is very large, the player may not be able to pre-buffer the segments to protect the player buffer from being left empty during low bandwidth.

Thus, viewport-based streaming may be an ideal way to deliver the highest quality video to the client. However, improved Quality of Experience (QoE) can be provided by minimizing viewport switching delay and optimizing the total available bandwidth for smooth playback.

Accordingly, embodiments relate to a new method that can decouple player-side buffer length from viewport switching delay. The method enables the client to reserve a larger buffer to protect the player from network jitter without affecting viewport switching delay. This may also better utilize the overall bandwidth of the client and provide a more optimized viewport for the FOV of the user.

To accomplish this, an embodiment may include two new modules in the 360 video playback pipeline, specifically a Segment Download and Reference Module (SDRM) and a Tile Merge Module (TMM).

Figure 5 illustrates an SDRM502, which SDRM502 may be an example of an SDRM according to an embodiment. In addition to downloading future fragments in linear order using future tile downloader 504, e.g., in a manner similar to related art Adaptive Bitrate (ABR) techniques, SDRM502 may have the capability to refine the quality of already downloaded fragments by fetching specific fragment tiles in non-linear order using, e.g., fragment refinement downloader 506. In an embodiment, when the current play time approaches the timestamp of a previously downloaded and buffered clip, SDRM502 can check whether the current viewport is the same as the viewport used when the clip was originally downloaded. If the viewport has changed, SDRM502 can attempt to download a number of new tiles/streams that are more appropriate for the new viewport based on the available bandwidth and the current occupancy of the play buffer. One example of this is discussed in more detail below with reference to fig. 9. In an embodiment, SDRM502 may be aware of the current play timeline and may also have the ability to download individual tiles of a clip in non-increasing order.

In an embodiment, SDRM502 may facilitate pre-buffering of video segments under good bandwidth conditions. In case of bandwidth fluctuation of the client, the player can still continue playing. Another important advantage of this module is that it can record the current play timeline and refine the viewport quality of previously downloaded segments to closely match what the viewer is watching. This enables it to obtain more optimal tiles for the user's viewport without having to re-download all tiles.

FIG. 6 illustrates an example of a TMM602, which TMM602 can be a TMM according to an embodiment. Unlike the related art playback pipeline that reads video frames from the playback buffer in a first-in-first-out (FIFO) order, in an embodiment, the TMM602 can read the downloaded segments from the playback buffer 604 in a non-FIFO order. In an embodiment, TMM602 may be used to decide which tiles/streams will be part of the next video segment to be decoded at the last possible time, e.g., while reading the first Instantaneous Decoder Refresh (IDR) frame of the segment. This means that TMM602 can also access the user's current viewport to decide which tiles to merge to construct the frame. Because the tiles/streams of a given segment may be spread out across the play-out buffer 604, e.g., as shown in fig. 6, they may be assembled together by TMM602 to create a single decodable frame that may be fed into a decoder queue. Thus, the TMM602 may have the ability to read buffered data in a non-FIFO manner.

In an embodiment, having TMM602 decide which tiles will form the final frame may help create the highest quality viewport for the viewer. Since TMM602 is invoked just before the frame is displayed to the viewer, TMM602 can use the most current and accurate FOV information to deliver the highest quality viewport. As an example, as shown in fig. 6, when constructing a frame corresponding to segment N, TMM602 may merge a set of tiles 606 for segment N and two sets of refined tiles 608 and 610 for segment N, all of which are downloaded and buffered at different times and stored at different locations of play-out buffer 604.

FIG. 7 illustrates an example comparison between a location 702 of an unoptimized viewport, a location 704 of a viewport optimized according to an embodiment, and a user location 700. As can be seen in fig. 7, as the user moves in the 360 environment, embodiments can quickly transfer high quality tiles in the viewport of the user.

Referring now to the operational flow diagram of fig. 8, steps of a method 8000 for streaming (e.g., receiving) an encoded virtual reality video stream are shown. In some implementations, one or more of the process blocks of FIG. 8 may be executed by computer 102 (FIG. 1) and server computer 114 (FIG. 1). In some implementations, one or more of the processing blocks of fig. 8 may be performed by another device or group of devices separate from or including computer 102, server computer 114, SDRM502, and TMM 602.

At 8010, method 8000 includes receiving a plurality of fragments of an encoded VR video stream.

At 8020, method 8000 includes storing the plurality of segments in a playout buffer. In an embodiment, the buffered segment from the plurality of segments comprises a plurality of buffered tiles including at least one refined buffered tile corresponding to the previous viewport. In an embodiment, the previous viewport may correspond to, for example, VP1 discussed above, the play buffer may correspond to play buffer 604 discussed above, and the buffered segment may correspond to, for example, segment N discussed above.

At 8030, method 8000 includes determining whether a current play time of a VR video corresponding to the encoded VR video stream is close to a play time of the buffered segment. In an embodiment, when the current play time is within a threshold time of the play time of the buffered clip, it may be determined that the current play time is close to the play time of the buffered clip.

If it is determined that the current playtime is close to the playtime of the buffered segment (yes at 8030), method 8000 may continue to 8040. If it is determined that the current playtime is not near the playtime of the buffered segment (no at 8030), method 8000 may continue to 8100.

At 8040, method 8000 includes determining whether a current duration of the playout buffer is greater than a threshold duration.

If it is determined that the current duration of the playout buffer is greater than the threshold duration (yes at 8040), then method 8000 may continue to 8050. If it is determined that the current duration of the playout buffer is not greater than the threshold duration (no at 8040), method 8000 may continue to 8100.

At 8050, method 8000 includes determining whether the current bandwidth is greater than a threshold bandwidth.

If it is determined that the current bandwidth is greater than the threshold bandwidth (yes at 8050), method 8000 may continue to 8060. If it is determined that the current bandwidth is not greater than the threshold bandwidth (NO at 8050), method 8000 may continue to 8100.

At 8060, method 8000 includes determining whether the current viewport is different from the previous viewport.

If it is determined that the current viewport is different from the previous viewport (yes at 8060), method 8000 may continue to 8070. If it is determined that the current viewport is the same as the previous viewport (no at 8060), method 8000 may continue to 8100.

In an embodiment, elements 8030, 8040, 8050, and 8060 can be performed by, for example, SDRM 502.

In embodiments, elements 8030, 8040, 8050, and 8060 may be rearranged and performed in any order within method 8000.

Regardless of the order in which they are performed, based on determining that the current play time is within the threshold time, the current duration of the play-out buffer is greater than the threshold duration, the current bandwidth is greater than the threshold bandwidth, and the current viewport is different from the previous viewport, at 8070, method 8000 includes: storing at least one refined tile corresponding to the current viewport in a play-out buffer.

At 8080, method 8000 includes: constructing a frame corresponding to the buffered segment based on the plurality of buffered tiles and the at least one refined tile corresponding to the current viewport. In an embodiment, a frame may be constructed by TMM 602.

At 8090, the method comprises: decoding the encoded VR video stream based on the constructed frames.

Based on determining that at least one of the current playout time is outside of the threshold time, the current duration of the playout buffer is less than the threshold duration, the current bandwidth is less than the threshold bandwidth, and the current viewport is not different from the previous viewport, at 8100, method 8000 includes: the next segment of the encoded VR video stream is stored in the play buffer. In an embodiment, this means that the SDRM502 can determine the latest clip stored in the play buffer, and can download the clip that sequentially follows the latest clip and store the clip in the play buffer.

In an embodiment, based on determining that at least one of the current playout time is outside of the threshold time, the current duration of the playout buffer is less than the threshold duration, the current bandwidth is less than the threshold bandwidth, and the current viewport is not different from the previous viewport, method 8000 may include: a frame corresponding to the buffered segment is constructed based on the plurality of buffered tiles.

In an embodiment, the at least one refined tile corresponding to the current viewport may have at least one of: a higher video quality than at least one of the plurality of buffered tiles, and a higher resolution than at least one of the plurality of buffered tiles.

In an embodiment, the current viewport may correspond to a field of view (FOV) of the user at the current play time, and the previous viewport may correspond to the FOV of the user at the previous time when the buffered segment is stored in the play buffer.

In an embodiment, the frame may be constructed by merging the plurality of buffered tiles and the at least one refined tile corresponding to the current viewport. In an embodiment, the merging may be performed by TMM 602.

In an embodiment, the plurality of buffered tiles and the at least one refined tile corresponding to the current viewport are obtained from the play-out buffer in a non-FIFO manner.

In an embodiment, a frame may be constructed based on a frame read request corresponding to a buffered segment. In an embodiment, the frame read request may be received by TMM 602.

The embodiment can greatly improve the playing experience of the 360-degree VR video stream under different bandwidth conditions.

Embodiments can minimize the likelihood of play stutter during viewport-optimized 360VR video streaming by enabling clients with larger play buffers without affecting viewport switching delay. Playing the morton may be detrimental to the user experience, especially when VR video is used in a head-mounted (HMD) device, the user is forced to wait in the virtual environment.

In an embodiment, SDRM502 may better utilize the bandwidth of the client over time, and by doing so, may provide more uniform quality streaming rather than frequently changing the quality of the video as the bandwidth changes. Uniform quality playback may be a key metric for measuring playback QoE, and embodiments may be implemented by pre-buffering tiles during high bandwidth periods.

SDRM502 may also help refine the quality of the viewport by enabling the client to download high fidelity tiles that are closer to the actual playing of the clip. This enables SDRM502 to pre-buffer some tiles and also to download some tiles closer to the play time.

Viewport-optimized streaming can provide the highest quality video in the user's field of view. By performing late merging of tiles in TMM602 just before they are transferred to the decoder queue, embodiments may enable users to enjoy an optimized viewport for their FOV.

In general, embodiments may significantly improve the quality of the playback experience. Embodiments may ensure that users have the highest quality and cardless playback experience under different network conditions. Furthermore, embodiments may be equally applicable to all types of viewport-optimized streaming solutions, such as stream-based solutions or tile-based solutions.

It will be appreciated that fig. 8 provides only an illustration of one implementation and is not meant to imply any limitations on how the different embodiments may be implemented. Many modifications to the depicted environments may be made as required by design and implementation requirements.

FIG. 9 is a block diagram 9000 of internal and external components of the computer shown in FIG. 1, in accordance with an illustrative embodiment. It will be appreciated that FIG. 9 provides only an illustration of one implementation and is not intended to suggest any limitation as to the environments in which the different embodiments may be implemented. Many modifications to the depicted environments may be made as required by design and implementation requirements.

The computer 102 (FIG. 1) and the server computer 114 (FIG. 1) may include respective internal component groups 800A, 800B and external component groups 900A, 900B shown in FIG. 4. Each set of internal components 800 includes one or more processors 820, one or more computer-readable RAMs 822 and one or more computer-readable ROMs 824 on one or more buses 826, one or more operating systems 828, and one or more computer-readable tangible storage devices 830.

The processor 820 is implemented in hardware, firmware, or a combination of hardware and software. Processor 820 is a Central Processing Unit (CPU), graphics Processing Unit (GPU), accelerated Processing Unit (APU), microprocessor, microcontroller, digital Signal Processor (DSP), field Programmable Gate Array (FPGA), application Specific Integrated Circuit (ASIC), or another type of processing component. In some implementations, the processor 820 includes one or more processors that can be programmed to perform functions. Bus 826 includes components that allow communication between internal components 800A, 800B.

One or more operating systems 828, software programs 108 (fig. 1), and video encoding programs 116 (fig. 1) on server computer 114 (fig. 1) are stored on one or more of the respective computer readable tangible storage devices 830 for execution by one or more of the respective processors 820 via one or more of RAMs 820 in respective RAMs 822, which typically include cache memory. In the embodiment shown in fig. 9, each computer readable tangible storage device 830 is a disk storage device of an internal hard drive. Alternatively, each computer readable tangible storage device 830 is a semiconductor memory device, such as a ROM 824, an EPROM, a flash memory, an optical disk, a magneto-optical disk, a solid state disk, an optical disk (CD), a Digital Versatile Disk (DVD), a floppy disk, a cassette, a tape, and/or another type of non-transitory computer readable tangible storage device that can store a computer program and digital information.

Each set of internal components 800A, 800B also includes an R/W drive or interface 832 that reads from or writes to one or more portable computer-readable tangible storage devices 936, such as a CD-ROM, DVD, memory stick, magnetic tape, magnetic disk, optical disk, or semiconductor memory device, etc. Software programs, such as the software program 108 (fig. 1) and the video encoding program 116 (fig. 1), can be stored on one or more of the respective portable computer-readable tangible storage devices 936, read by the respective R/W drive or interface 832 and loaded into the respective hard disk drive 830.

Each set of internal components 800A, 800B also includes a network adapter or interface 836, such as a TCP/IP adapter card, a wireless Wi-Fi interface card, or a 3G, 4G, or 5G wireless interface card or other wired or wireless communication link. The software program 108 (fig. 1) and the video encoding program 116 (fig. 1) on the server computer 114 (fig. 1) may be downloaded from external computers to the computer 102 (fig. 1) and the server computer 114 over a network (e.g., the internet, a local area network, or other wide area network) and corresponding network adapters or interfaces 836. From the network adapter or interface 836, the software program 108 and the video encoding program 116 on the server computer 114 are loaded into the respective hard disk drive 830. The network may include: copper wire, fiber optics, wireless transmission, routers, firewalls, switches, gateway computers, and/or edge servers.

Each set of external components 900A, 900B may include a computer display detector 920, a keyboard 930, and a computer mouse 934. The external components 900A, 900B may also include touch screens, virtual keyboards, touch pads, pointing devices, and other human interface devices. Each set of internal components 800A, 800B also includes device drivers 840 connected to computer display detector 920, keyboard 930, and computer mouse 934. The device drivers 840, R/W drivers or interfaces 832, and network adapters or interfaces 836 include hardware and software (stored in the storage device 830 and/or ROM 824).

It should be understood in advance that although this disclosure includes detailed descriptions regarding cloud computing, implementations of the teachings described herein are not limited to cloud computing environments. Rather, some embodiments can be implemented in connection with any other type of computing environment, whether now known or later developed.

Cloud computing is a service delivery model for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, network bandwidth, servers, processes, memory, storage, applications, virtual machines, and services) that can be rapidly provisioned and released with minimal administrative effort or interaction with a service provider. The cloud model may include at least five features, at least three service models, and at least four deployment models.

The characteristics are as follows:

self-service as required: cloud consumers can unilaterally provide computing capabilities such as server time and network storage automatically as needed without requiring manual interaction with the service provider.

Wide network access: capabilities may be obtained over a network and accessed through standard mechanisms that may facilitate the use of heterogeneous thin or thick client platforms (e.g., mobile phones, laptops, and PDAs).

Resource pool: the provider's computing resources are pooled to serve multiple consumers using a multi-tenant model, dynamically allocating and reallocating different physical and virtual resources as needed. There is a sense of location independence in that consumers typically do not have control over or knowledge of the exact location of the resources provided, but can specify locations at higher levels of abstraction (e.g., country, state, or data center).

Quick elasticity: the ability to expand quickly and resiliently, and in some cases automatically, is provided for and quickly released to expand quickly. The capabilities available for offering generally appear unlimited to consumers and can be purchased in any number at any time.

Measurement service: cloud systems automatically control and optimize resource usage by leveraging metering capabilities at some level of abstraction appropriate to the type of service (e.g., storage, processing, bandwidth, and active user accounts). Resource usage can be detected, controlled and reported, providing transparency to the provider and user of the service being used.

The service model is as follows:

software as a service (SaaS): the ability to provide consumers is to use the vendor's applications running on the cloud infrastructure. These applications may be accessed from a variety of client devices through a thin client interface such as a web browser (e.g., web-based email). The consumer does not manage or control the underlying cloud infrastructure, which includes: network, server, operating system, storage, and even individual application capabilities, with the possible exception of limited user-specific application configuration settings.

Platform as a service (PaaS): the ability to provide to the consumer is to deploy onto the cloud infrastructure applications created or acquired by the consumer, created using programming languages and tools supported by the provider. The consumer does not manage or control the underlying cloud infrastructure, which includes: a network, a server, an operating system, or storage, but has control over deployed applications and application hosting environment configurations.

Infrastructure as a service (IaaS): the ability to provide consumers is to provide processing, storage, networking, and other basic computing resources in which consumers can deploy and run arbitrary software, which can include operating systems and applications. The consumer does not manage or control the underlying cloud infrastructure, but has control over the operating system, storage, deployed applications, and possibly limited control over selected network components (e.g., host firewalls).

The deployment model is as follows:

private cloud: the cloud infrastructure operates only for organizations. The private cloud may be managed by an organization or a third party, either on-site or off-site.

Community cloud: the cloud infrastructure is shared by multiple organizations and supports specific communities with common concerns (e.g., tasks, security requirements, policies, and compliance considerations). It may be administered by an organization or a third party, either on-site or off-site.

Public cloud: the cloud infrastructure is provided to the general public or large industry groups and owned by an organization selling cloud services.

Mixing cloud: the cloud infrastructure includes two or more clouds (private, community, or public) that remain unique entities but are bound together by standardized or proprietary techniques that support portability of data and applications (e.g., cloud bursting for inter-cloud load balancing).

Cloud computing environments are service-oriented with the emphasis on stateless, low-coupling, modular, and semantic interoperability. At the heart of cloud computing is an infrastructure consisting of a network of interconnected nodes.

Referring to fig. 10, a cloud computing environment 1000 suitable for implementing certain embodiments of the disclosed subject matter is illustrated. As shown, cloud computing environment 1000 includes one or more cloud computing nodes 10 with which local computing devices used by cloud consumers (e.g., personal Digital Assistants (PDAs) or cellular telephones 54A, desktop computers 54B, laptop computers 54C, and/or automobile computer systems 54N) may communicate. The cloud computing nodes 10 may communicate with each other. These cloud computing nodes may be grouped (not shown) physically or virtually in one or more networks, such as private, community, public, or hybrid clouds as described above, or a combination thereof. This enables the cloud computing environment 1000 to provide infrastructure, platforms, and/or software as a service for which cloud consumers do not need to maintain resources on local computing devices. It is understood that the types of computing devices 54A-N shown in fig. 10 are for illustration only, and that cloud computing node 10 and cloud computing environment 1000 can communicate with any type of computerized device over any type of network and/or network addressable connection (e.g., using a web browser).

Referring to fig. 11, a set of functional abstraction layers 1100 provided by cloud computing environment 1000 (fig. 5) is shown. It is to be understood in advance that the components, layers, and functions shown in fig. 11 are intended to be illustrative only and the embodiments are not limited thereto. As shown, the following layers and corresponding functions are provided:

the hardware and software layer 60 includes hardware and software components. Examples of hardware components include: mainframe 61: a Reduced Instruction Set Computer (RISC) architecture based server 62, a server 63, a blade server 64, a storage device 65, and network components 66. In some embodiments, the software components include web application server software 67 and database software 68.

The virtualization layer 70 provides an abstraction layer from which the following examples of virtual entities may be provided: virtual server 71, virtual memory 72, virtual network 73 comprising a virtual private network, virtual applications and operating system 74, and virtual client 75.

In one example, the management layer 80 may provide the functionality described below. Resource provisioning 81 provides dynamic procurement of computing resources and other resources for performing tasks within the cloud computing environment. When resources are used in a cloud computing environment, metering and pricing 82 provides cost records and bills or invoices for the consumption of these resources. In one example, these resources may include application software licenses. Security provides authentication for cloud consumers and tasks, as well as protection for data and other resources. The user portal 83 provides access to the cloud computing environment for consumers and system administrators. Service level management 84 provides cloud computing resource allocation and management such that the required service level is met. Service Level Agreement (SLA) planning and fulfillment 85 provides prearrangement and procurement of cloud computing resources for which future demands are anticipated according to the SLA.

Workload layer 90 provides an example of the functionality with which a cloud computing environment may be used. Examples of workloads and functions that may be provided from the workload layer include: mapping and navigation 91, software development and lifecycle management 92, virtual classroom education delivery 93, data analysis processing 94, transaction processing 95, and video encoding/decoding 96. Video encoding/decoding 96 may encode/decode video data using a delta angle derived from a nominal angle.

Some embodiments may be directed to systems, methods, and/or computer-readable media integrated at any possible level of technical detail. The computer-readable medium may include a computer-readable non-transitory storage medium (or media) having computer-readable program instructions thereon for causing a processor to perform operations.

The computer readable storage medium may be a tangible device that can hold and store instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but is not limited to, an electronic memory device, a magnetic memory device, an optical memory device, an electromagnetic memory device, a semiconductor memory device, or any suitable combination of the foregoing. A non-exhaustive list of more specific examples of the computer-readable storage medium would include the following: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a Static Random Access Memory (SRAM), a portable compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD), a memory stick, a floppy disk, a mechanical coding device (e.g., a punch card or a raised structure in a recess having instructions recorded thereon), and any suitable combination of the foregoing. The computer-readable storage medium used herein should not be interpreted as a transitory signal per se, such as a radio wave or other freely propagating electromagnetic wave, an electromagnetic wave propagating through a waveguide or other transmission medium (e.g., optical pulses through a fiber optic cable), or an electrical signal propagating through an electrical wire.

The computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a variety of computing/processing devices, or to an external computer or external storage device via a network (e.g., the internet, a local area network, a wide area network, and/or a wireless network). The network may include: copper transmission cables, optical transmission fibers, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. The network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium within the respective computing/processing device.

The computer-readable program code/instructions for performing operations may be assembler instructions, instruction Set-Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, configuration data for an integrated circuit, or source or object code written in any combination of one or more programming languages, including an object-oriented programming language such as Smalltalk, C + +, or the like, and a procedural programming language such as the "C" programming language or a similar programming language. The computer-readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, an electronic circuit including, for example, a Programmable Logic circuit, a Field Programmable Gate Array (FPGA), or a Programmable Logic Array (PLA) may perform various aspects or operations by executing computer-readable program instructions using state information of the computer-readable program instructions to personalize the electronic circuit.

These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer-readable program instructions may also be stored in a computer-readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer-readable storage medium storing the instructions comprises an article of manufacture including instructions which implement aspects of the function/act specified in the flowchart and/or block diagram tile or block.

The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer-readable media according to various embodiments. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). Methods, computer systems, and computer-readable media may include additional blocks, fewer blocks, different blocks, or blocks arranged differently than the blocks shown in the figures. In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed concurrently or substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

Further, according to embodiments, any of the components, elements, modules or units described herein may be embodied as various numbers of hardware, software and/or firmware structures performing the respective functions described above. For example, these components, elements, modules or units may use direct circuit structures, such as memories, processors, logic, look-up tables, etc., that may perform corresponding functions under the control of one or more microprocessors or other control devices. Further, these components, elements, modules or units may be embodied by a program or a portion of code that contains one or more executable instructions for performing specified logical functions. Furthermore, at least one of these components, elements, modules or units may also include a processor, such as a Central Processing Unit (CPU), microprocessor, or the like, that performs the corresponding function.

It is to be understood that the systems and/or methods described herein may be implemented in various forms of hardware, firmware, or combinations of hardware and software. The actual specialized control hardware or software code used to implement these systems and/or methods is not limiting of implementations. Thus, the operation and behavior of the systems and/or methods were described herein without reference to the specific software code- -it being understood that software and hardware may be designed to implement the systems and/or methods based on the description herein.

No element, act, or instruction used herein should be construed as critical or essential unless explicitly described as critical or essential. In addition, the articles "a" and "an" as used herein are intended to include one or more items, and may be used interchangeably with "one or more". Further, the term "group" as used herein is intended to include one or more items (e.g., related items, unrelated items, combinations of related and unrelated items, etc.) and may be used interchangeably with "one or more". If only one item is intended, the term "one" or similar language is used. Furthermore, the terms "having," "containing," "including," and the like, as used herein, are intended to be open-ended terms. Furthermore, unless expressly stated otherwise, the word "based on" is intended to mean "based, at least in part, on".

The description of the various aspects and embodiments is presented for purposes of illustration and is not intended to be exhaustive or limited to the disclosed embodiments. Even though combinations of features are disclosed in the claims and/or in the description, these combinations are not intended to limit the possible implementations of the present application. In fact, many of these features may be combined in ways not specifically recited in the claims and/or disclosed in the specification. Although each dependent claim listed below may directly refer to only one claim, the disclosure of possible implementations includes each dependent claim in combination with every other claim in the claim set. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope of the described embodiments. The terms used herein were chosen in order to best explain the principles of the embodiments, the practical application or technical improvements to the technology found in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

The acronym selected:

VR virtual reality

HMD head mounted device

QoE: quality of experience

FOV field of view

ABR: adaptive code rate

IDR: instantaneous decoder refresh

SDRM: segment download and refinement module

TMM: tile merge module

FIFO (first in first out)

Claims

1. A method for receiving, using at least one processor, an encoded Virtual Reality (VR) video stream, the method comprising:

receiving a plurality of segments of the encoded VR video stream;

storing the plurality of segments in a play-out buffer, wherein a buffered segment from the plurality of segments comprises a plurality of buffered tiles including at least one refined buffered tile corresponding to a previous viewport;

determining whether a current play time of a VR video corresponding to the encoded VR video stream is within a threshold time of a play time of the buffered segment;

determining whether a current duration of the play-out buffer is greater than a threshold duration;

determining whether the current bandwidth is greater than a threshold bandwidth;

determining whether a current viewport is different from the previous viewport;

based on determining that the current play time is within the threshold time, the current duration of the play buffer is greater than the threshold duration, the current bandwidth is greater than the threshold bandwidth, and the current viewport is different from the previous viewport, storing at least one refined tile corresponding to the current viewport into the play buffer;

constructing a frame corresponding to the buffered segment based on the plurality of buffered tiles and at least one refined tile corresponding to the current viewport; and

decoding the encoded VR video stream based on the constructed frames.

2. The method of claim 1, wherein the at least one refined tile corresponding to the current viewport has at least one of: a higher video quality than at least one of the plurality of buffered tiles, and a higher resolution than at least one of the plurality of buffered tiles.

3. The method of claim 1, further comprising:

based on determining that at least one of the current play time is outside the threshold time, the current duration of the play buffer is less than the threshold duration, the current bandwidth is less than the threshold bandwidth, and the current viewport is not different from the previous viewport, storing a next segment of the encoded VR video stream into the play buffer.

4. The method of claim 1, further comprising:

based on determining that at least one of the current play-out time is outside the threshold time, the current duration of the play-out buffer is less than the threshold duration, the current bandwidth is less than the threshold bandwidth, and the current viewport is not different from the previous viewport, constructing the frame corresponding to the buffered clip based on the plurality of buffered tiles.

5. The method of claim 1, wherein the current viewport corresponds to a field of view (FOV) of the user at the current play time, an

Wherein the previous viewport corresponds to the FOV of the user at a previous time when the buffered segment is stored in the play buffer.

6. The method of claim 1, wherein the frame is constructed by merging the plurality of buffered tiles and the at least one refined tile corresponding to the current viewport.

7. The method of claim 1, wherein the plurality of buffered tiles and the at least one refined tile corresponding to the current viewport are not obtained from the play-out buffer in a first-in-first-out (FIFO) manner.

8. The method of claim 1, wherein the frame is constructed based on a frame read request corresponding to the buffered segment.

9. A device for receiving an encoded Virtual Reality (VR) video stream, the device comprising:

at least one memory configured to store program code; and

at least one processor configured to read and operate as directed by the program code, the program code comprising:

receiving code configured to cause the at least one processor to receive a plurality of segments of the encoded VR video stream;

first storing code configured to cause the at least one processor to store the plurality of clips in a play-out buffer, wherein a buffered clip from the plurality of clips comprises a plurality of buffered tiles including at least one refined buffered tile corresponding to a previous viewport;

first determining code configured to cause the at least one processor to determine whether a current playback time of a VR video corresponding to the encoded VR video stream is within a threshold time of a playback time of the buffered segment;

second determining code configured to cause the at least one processor to determine whether a current duration of the playout buffer is greater than a threshold duration;

third determining code configured to cause the at least one processor to determine whether a current bandwidth is greater than a threshold bandwidth;

fourth determining code configured to cause the at least one processor to determine whether a current viewport is different from the previous viewport;

second storing code configured to cause the at least one processor to store, based on determining that the current play time is within the threshold time, that the current duration of the play buffer is greater than the threshold duration, that the current bandwidth is greater than the threshold bandwidth, and that the current viewport is different from the previous viewport, at least one refined tile corresponding to the current viewport into the play buffer;

first constructing code configured to cause the at least one processor to construct a frame corresponding to the buffered slice based on the plurality of buffered tiles and the at least one refined tile corresponding to the current viewport; and

decoding code configured to cause the at least one processor to decode the encoded VR video stream based on the constructed frames.

10. The apparatus of claim 9, wherein the at least one refined tile corresponding to the current viewport has at least one of: a higher video quality than at least one of the plurality of buffered tiles, and a higher resolution than at least one of the plurality of buffered tiles.

11. The apparatus of claim 9, wherein the program code further comprises third stored code configured to cause the at least one processor to store a next segment of the encoded VR video stream into the play buffer based on determining that at least one of the current play time is outside the threshold time, the current duration of the play buffer is less than the threshold duration, the current bandwidth is less than the threshold bandwidth, and the current viewport is not different from the previous viewport.

12. The apparatus of claim 9, wherein the program code further comprises second construction code configured to cause the at least one processor to construct the frame corresponding to the buffered segment based on the plurality of buffered tiles based on determining that at least one of the current play time is outside the threshold time, the current duration of the play-out buffer is less than the threshold duration, the current bandwidth is less than the threshold bandwidth, and the current viewport is not different from the previous viewport.

13. The apparatus of claim 9, wherein the current viewport corresponds to a field of view (FOV) of a user at the current play time, an

14. The apparatus of claim 9, wherein the frame is constructed by merging the plurality of buffered tiles and the at least one refined tile corresponding to the current viewport.

15. The apparatus of claim 9, wherein the plurality of buffered tiles and the at least one refined tile corresponding to the current viewport are not obtained from the play-out buffer in a first-in-first-out (FIFO) manner.

16. The apparatus of claim 9, wherein the frame is constructed based on a frame read request corresponding to the buffered segment.

17. A non-transitory computer-readable medium storing instructions, the instructions comprising: one or more instructions that, when executed by one or more processors of an apparatus for receiving an encoded Virtual Reality (VR) video stream, cause the one or more processors to:

receiving a plurality of segments of the encoded VR video stream;

determining whether a current viewport is different from the previous viewport;

based on determining that the current play-out time is within the threshold time, that the current duration of the play-out buffer is greater than the threshold duration, that the current bandwidth is greater than the threshold bandwidth, and that the current viewport is different from the previous viewport, storing at least one refined tile corresponding to the current viewport into the play-out buffer;

constructing a frame corresponding to the buffered segment based on the plurality of buffered tiles and the at least one refined tile corresponding to the current viewport; and

decoding the encoded VR video stream based on the constructed frames.

18. The non-transitory computer-readable medium of claim 17, wherein the at least one refined tile corresponding to the current viewport has at least one of: a higher video quality than at least one of the plurality of buffered tiles, and a higher resolution than at least one of the plurality of buffered tiles.

19. The non-transitory computer-readable medium of claim 17, wherein the one or more instructions further cause the at least one processor to: based on determining that at least one of the current play time is outside the threshold time, the current duration of the play buffer is less than the threshold duration, the current bandwidth is less than the threshold bandwidth, and the current viewport is not the same as the previous viewport, storing a next segment of the encoded VR video stream into the play buffer.

20. The non-transitory computer-readable medium of claim 17, wherein the one or more instructions further cause the at least one processor to: based on determining that at least one of the current playout time is outside the threshold time, the current duration of the playout buffer is less than the threshold duration, the current bandwidth is less than the threshold bandwidth, and the current viewport is not different from the previous viewport, constructing the frame corresponding to the buffered segment based on the plurality of buffered tiles.