CN113905221B

CN113905221B - Stereoscopic panoramic video asymmetric transport stream self-adaption method and system

Info

Publication number: CN113905221B
Application number: CN202111165065.8A
Authority: CN
Inventors: 兰诚栋; 梁昊霖; 饶迎节
Original assignee: Fuzhou University
Current assignee: Fuzhou University
Priority date: 2021-09-30
Filing date: 2021-09-30
Publication date: 2024-01-16
Anticipated expiration: 2041-09-30
Also published as: CN113905221A

Abstract

The invention relates to a self-adaptive method and a self-adaptive system for asymmetric transmission streams of stereoscopic panoramic video, wherein the method comprises the following steps: s1, a server side cuts video data into fragments in time and cuts the fragments into slices in space; s2, caching the cut video in an HTTP server according to different video quality and different code rates; s3, carrying out probability prediction by combining a 3DCNN network and an LSTM network; s4, performing joint code rate control on the left and right view points by utilizing multi-agent reinforcement learning based on the AC so as to balance the mutual influence of the quality of the single-path view point and the overall quality; s5, designing a reward function so that the system can select a more proper code rate; and S6, decoding, splicing and stitching the downloaded data, storing the data in a play buffer of the client, and rendering and playing the data through playing software. The method is beneficial to improving the control effect of the joint code rate of the stereoscopic panoramic video, and improving the experience quality of users under the limited bandwidth.

Description

Stereoscopic panoramic video asymmetric transport stream self-adaption method and system

Technical Field

The invention belongs to the field of stereoscopic panoramic video transmission, and particularly relates to a stereoscopic panoramic video asymmetric transmission stream self-adaption method and system based on multi-agent reinforcement learning.

Background

Virtual Reality (VR) is a new technology in the field of computers developed by integrating various scientific technologies such as computer graphics technology, multimedia technology, sensor technology, man-machine interaction technology, network technology, stereoscopic display technology, and simulation technology. Cisco predicts that by 2021, the internet traffic generated by immersive applications will increase 20-fold. Thus, current network bandwidth has not been able to meet the development of VR video. While single 360 videos are perhaps the most popular type in current VR video content, they lack 3D information and therefore cannot be viewed in a full 6 degrees of freedom (DOF). Stereoscopic panoramic video implementations have received attention to further enhance immersive effects under 3DOF360 video conditions. In conventional panoramic video transmission, chakarski J et al propose a rate-distortion model to map the relation between Quantization Parameter (QP) and bit rate, thereby developing a tile-based bit stream allocation algorithm. VanDerHooft J and the like perform bit stream allocation according to the distance from the tile center to the view center. Under the condition of ensuring that each tile has the lowest quality, redundant bandwidth distributes fewer and more bits according to distance and near distance. Xie L and the like construct a tile probability prediction model by using a mathematical method, and then select different code rates for each tile by using a bit stream self-adaptive strategy based on target cache. The reinforcement learning method can obtain optimal decisions over a long period of time. Rate selection of panoramic video using reinforcement learning has been studied by a large number of students. Jiang X and the like use an A3C algorithm in reinforcement learning to perform code rate selection of a viewpoint region, a neighboring region, an external region by inputting data such as previous time bandwidth, previous time prediction accuracy, current bandwidth and the like, and the algorithm has become a classical algorithm for panoramic video region code rate selection based on reinforcement learning. KanN et al also uses the A3C algorithm for bit stream allocation for three regions. It is considered that the buffer should not be excessively large in order to improve the view prediction accuracy. The size of the buffer area is also used as a reward function, an algorithm is encouraged to deviate to a proper buffer size to consider prediction accuracy and play card, and the algorithm shows that the setting of the reward function has important influence on system operation. Zhan gY and the like utilize an AC algorithm, and embed an LSTM network into state (state) change, and orderly adjust states by utilizing LSTM prediction characteristics, so that a search space is reduced, and decision making is facilitated. Currently, stereoscopic panoramic video transmission is less studied. Based on the binocular suppression principle, naikD and the like list DMOS values under various QP, different spatial scaling ratios and the like by performing quality evaluation on the asymmetric stereoscopic panoramic video. The conclusion of the method is that binocular suppression is also applicable to stereoscopic panoramic video, and the bandwidth of the method can be saved by 25% -50% when the spatial resolution of a certain view point is scaled under acceptable conditions. Xu G, etc., downsamples one view horizontally, vertically, and upsamples at the decoding end. While the other view remains unchanged for asymmetric transmission. The methods are all methods for carrying out asymmetric coding by fixed code rate or downsampling, and the influence of real-time change of network bandwidth and the like on QoE is not fully considered.

Disclosure of Invention

The invention aims to provide a self-adaptive method and a self-adaptive system for asymmetric transmission streams of stereoscopic panoramic video, and the method and the system are beneficial to improving the joint rate control effect of the stereoscopic panoramic video.

In order to achieve the above purpose, the invention adopts the following technical scheme: a stereoscopic panoramic video asymmetric transport stream self-adaption method comprises the following steps:

s1, a server side cuts video data into fragments in time and cuts the fragments into slices in space;

s2, caching the cut video in an HTTP server according to different video quality and different code rates;

s3, carrying out probability prediction by combining a 3DCNN network and an LSTM network;

s4, performing joint code rate control on the left and right view points by utilizing multi-agent reinforcement learning based on the AC so as to balance the mutual influence of the quality of the single-path view point and the overall quality;

s5, designing a reward function so that the system can select a more proper code rate;

and S6, decoding, splicing and stitching the downloaded data, storing the data in a play buffer of the client, and rendering and playing the data through playing software.

Further, in the step S3, feature extraction is performed on the obtained static significant information, dynamic significant information and parallax information of the binocular viewpoint of the main viewpoint sequence slice by using a 3DCNN network; meanwhile, the LSTM network is utilized to predict head motion data, and then the head motion data is spliced and fused with the characteristic information extracted by the 3DCNN network; finally, inputting the spliced and fused results into a plurality of full-connection layers to respectively acquire the viewing probabilities of the left and right view points focusing on different information; the viewing probability of the ith slice is recorded as P by the probability prediction method _i 。

Further, the step S4 specifically includes the following steps:

the left view point and the right view point of the panoramic video are respectively divided into N fragments in time, each fragment has the length of T, each fragment comprises K slices, and each fragment has M bit levels; each slice in each slice has a code rate selected to be a _i Where i ε {0, M-1}; q (a) _i ) A mapping representing code rate to perceived quality; the viewing probability of each slice of the left and right viewpoints is respectivelyUsing the multi-agent reinforcement learning based on the Actor-critic, taking each slice as an agent, sharing a state, and carrying out joint action, thereby realizing code rate distribution;

when multi-agent reinforcement learning is adopted for slice code rate distribution in the left and right viewpoints, rewards of each agent are mixed with local rewards obtained from the environment by agent joint action in a single viewpoint and global rewards when agents in the left and right viewpoints are combined;

the Global rewards and the local rewards are separated and optimized respectively by introducing Global-Critic for supervision so as to ensure the stability of the model; the policy gradient of each agent after modification is:

where ep denotes the sample playback buffer, o _i Is a local environment, i.e. an intelligent environment, a _i Is the code rate selected by an agent, s is the overall environment, i.e. the intersection of the environmental states of all agents, θ is the parameter of the network model training,local value function for each agent,/->A global value function composed of all the agents;

the loss function of (2) is:

wherein y is _l As an estimate of the local value function, r _l For local rewards, γ is the discount factor;

the loss function of (2) is:

wherein y is _g An estimated value of the global value function, r _g Is a global reward;the Q value that causes the agent to take the optimal action in combination in the global state representing the composition of the left and right viewpoints is expressed as:

further, the step S5 specifically includes the following steps:

assuming that each agent shares a state at each moment in the left and right viewpoints, the input states are respectively:

wherein,representing network throughput of past k segments; />Representing an optional code rate set; b _t Representing the current buffer size; z _t Representing the average code rate of the last segment; />And->Download time of k past clips respectively representing left and right viewpoints; />And->The viewing probability of each slice of the left and right viewpoints is respectively represented; />And->Respectively representing the set of code rates selected by the slices of the left and right view points of the last segment;

the size of the watching probability of each slice determines the contribution degree of the whole video quality; when the slice is in the viewport region,1, otherwise 0; therefore, the average quality of view ports of the left and right viewpoint segments is:

spatial quality change:

wherein,and->Respectively represent the spatial domain quality change of the viewing port of the left and right view points, q (a) _i ) Representing code rateMapping to perceived quality;

the average quality change of the left and right viewpoint viewport regions at the front and rear moments reflects the fluctuation of video quality in the time domain; time domain quality change:

wherein,and->Respectively representing time domain quality changes of the left and right view points and view ports;

the fragments of the left and right viewpoints are continuously downloaded, the fragments form the final downloading time, and the left and right viewpoints together influence the buffering time of the system; meanwhile, the agents with different code rates are selected for the left and right view points to be in a cooperative relationship; when the requested data is completely downloadedData size b of buffer memory larger than sending request time _t-1 When the data is not completely downloaded and the buffer memory is exhausted, a buffer phenomenon occurs; the buffer time is as follows:

the quality difference of the slices at the corresponding positions of the left and right viewpoints is too large, and the QoE is seriously reduced when the quality difference exceeds a set range; and the symmetric coding has better performance when the quality of the left and right view points is smaller; in order to avoid the too large quality difference of the slices corresponding to the left and right viewpoints, a punishment item A is designed _t To limit the code rate difference of the corresponding slices of the left and right view pointsSize of:

wherein,representing right view slice quality, < >>Representing a difference in left and right view slice quality; when->When larger, the person is in need of->Allow a variation in a larger range, but +.>Does not change significantly; when->Less time, ->In the case of large range changes, +.>Can vary significantly; therefore, the penalty term constrains that the left and right view quality is poor, but has higher acceptance for asymmetric coding in the case of high right view slice quality;

the local rewards aim at single view points, and in order to make the space domain and time domain changes in the view points as small as possible, the space domain and time domain changes are set as negative rewards; the global rewards are aimed at the whole formed by the left viewpoint and the right viewpoint, and the average quality is set as positive rewards in order to obtain higher average quality; to reduce the buffering time and avoid too large difference in quality between left and right view points, the buffering time is reducedAnd the left and right viewpoint quality difference constraint term is a negative reward; setting left and right viewpoint local rewards r _t ^L,l ，r _t ^R,l And global rewards r _t ^g The function expressions are respectively as follows:

wherein λ and η are weights;

and by utilizing head motion data acquired from the playing equipment, selecting different code rates for the intra-view and extra-view region slices by means of view prediction and combining current bandwidth data, reducing the code rate of the slice with lower significance in each path of view, improving the code rate of the slice with higher significance in each path of view, and reasonably distributing network bandwidth data.

The invention also provides a stereoscopic panoramic video asymmetric transmission stream self-adaptive system, which comprises a memory, a processor and computer program instructions which are stored on the memory and can be run by the processor, wherein the computer program instructions can realize the steps of the method when the processor runs the computer program instructions.

Compared with the prior art, the invention has the following beneficial effects: the method and the system take into consideration the difference of the salience of all slices in the stereoscopic panoramic video view points, namely the difference of contribution degrees of the slices corresponding to the left view point and the right view point to subjective quality, reasonably reduce the code rate of the slice with lower salience in each way view point, improve the code rate of the slice with higher salience in each way view point, reasonably allocate network bandwidth data by reinforcement learning, and set a proper reward function according to the binocular inhibition principle, thereby improving the overall quality of the video. The invention utilizes multi-agent reinforcement learning to respectively select the code rate of each slice of the left and right view points, so as to avoid the problem of action space explosion caused by the code rate selection of a plurality of slices in the traditional reinforcement learning. Finally, to ensure the effectiveness of the system, a step-by-step update strategy is employed to balance the overall rewards with the local rewards for the left and right views.

Drawings

FIG. 1 is a schematic diagram of a multi-agent reinforcement learning model in an embodiment of the invention;

FIG. 2 is a block diagram of an adaptive system for stereoscopic panoramic video asymmetric transport stream in accordance with an embodiment of the invention;

FIG. 3 is a schematic view of a tile-based view prediction probability model in an embodiment of the present invention;

FIG. 4 is a diagram of a joint rate control method architecture based on multi-agent reinforcement learning in an embodiment of the present invention;

FIG. 5 is a 4G and 5G bandwidth trace in an embodiment of the invention;

FIG. 6 is a graph comparing the performance of the methods in an embodiment of the invention;

in fig. 6, (a) 4K video is transmitted for 4G bandwidth, (b) 8K video is transmitted for 4G bandwidth, (c) 4K video is transmitted for 5G bandwidth, and (d) 8K video is transmitted for 5G bandwidth;

FIG. 7 is a comparative CDF chart for various methods in an embodiment of the invention;

in FIG. 7, (a) is the average QOE value measured at 4G-4K, (b) is the average QOE value measured at 4G-8K, (c) is the average QOE value measured at 5G-4K, and (d) is the average QOE value measured at 5G-8K.

Detailed Description

The invention will be further described with reference to the accompanying drawings and examples.

It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the present application. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments in accordance with the present application. As used herein, the singular is also intended to include the plural unless the context clearly indicates otherwise, and furthermore, it is to be understood that the terms "comprises" and/or "comprising" when used in this specification are taken to specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof.

As shown in fig. 1-4, the present embodiment provides a stereoscopic panoramic video asymmetric transport stream adaptive method, which includes the following steps:

s1, a server side cuts video data into segment segments in time and cuts the segment segments into slice tiles in space;

s3, carrying out probability prediction by combining a 3DCNN network (3Dimension convolution neuron network,3-dimensional convolutional neural network) and an LSTM network (Long Short-Term Memory network);

s4, performing joint code rate control on the left and right view points by utilizing multi-agent reinforcement learning based on an AC (Actor-Critic) so as to balance the mutual influence of the quality of the single-path view point and the overall quality;

and S6, decoding, splicing and stitching the downloaded data, storing the data in a play buffer of the client, and rendering and playing the data through playing software such as head-mounted equipment.

In this embodiment, the specific implementation method of steps S1 to S2 is as follows:

the panoramic video is cut into segments in time and tiles in space by a specific tool, and meanwhile, a media description file MPD is generated. When in transmission, the MPD file is transmitted preferentially, and the client analysis module analyzes the MPD file so as to analyze the information such as the code rate, resolution, frame rate, download address and the like of the cut video clips. After the client control module analyzes the information of the available video, in order to ensure that video data in the future time viewing area can select a high code rate, an estimation must be made on the future time viewing port. The code rate selection needs to make optimal code rate decisions for tiles in different areas according to the current network situation and the future view port positions. And the client selects proper code rates for tiles in the view field and tiles outside the view field at future time according to the network bandwidth condition and the predicted view port position. And the client sends a downloading request to the server through the HTTP module, and finally downloads the streaming media file according to the URL address. After the client downloads the requested file, the file can be decoded and played.

In this example, experimental verification was performed using an analog transmission experimental platform in the literature [ Jiang X, chiangY-H, zhao Y, et al Plato: learning-based Adaptive Streaming of-Degre video [ C ].2018IEEE 43rd Conference on Local ComputerNetworks (LCN), 2018:393-40 ]. The platform assumes that the client communicates with the server in HTTP/2 mode, and when the server receives the client request, it sends all tiles contained in a segment. And assuming a packet load rate of 95% and a round trip time of 80ms. According to the default repeat request time of the DASH player, when the buffer is full, its re-request time is set to 500ms.

The play buffer size is set to 3s. The implementation of the main framework is based on python and pytorch.

There is no stereoscopic panoramic video data set disclosed at present, and four stereoscopic panoramic videos with resolution of 4k and 8k are downloaded from Youtube in this embodiment. And performing space domain and time domain slicing on the data set by using ffmpeg, and performing HEVC coding. And the MP4Box is utilized to package the coded data. The data reflecting the actual head movements of the user are presented as published data sets in the literature [ Corbillon X, de Simone F, simon G.360-Degree Video Head Movement Dataset [ C ]. Proceedings ofthe 8. 8thACM on Multimedia Systems Conference-MMSys'17,2017:199-204 ]. Today, 5G technology has become increasingly popular, and the use of 5G technology to transmit high definition video is a necessary choice for visual inspection. We therefore verify the effect of transmitting 4K and 8K stereoscopic panoramic video with different algorithms at 4G and 5G bandwidths, respectively. The bandwidth data set is respectively a 4G bandwidth data set measured by VanDerHooft J and the like in Belgium, and a 5G bandwidth data set measured by Darijo Raca. The bandwidth trace is shown in fig. 5. The viewpoint area size is 110 ° in the horizontal direction and 90 ° in the vertical direction. The projection mode adopts the common ERP projection, and the layout is 6x4, namely K is 24.

According to the recommendation of the video website Youtube with the largest global, setting the code rate selection range as [40,16,8,5,2.5,1 ]]Mbps, i.e. M is 6. According to the literature [ Jiang X, chiangY-H, zhaoY, et al Plato: learning-based Adaptive Streaming of-Degree video [ C ]]2018IEEE 43rdConference on Local ComputerNetworks (LCN), proposal in 2018:393-40, mapping code rate to quality q (a _i ) The method can be set as follows:

in the step S3, feature extraction is performed on the obtained static significant information, dynamic significant information and parallax information of the binocular viewpoint of the main viewpoint sequence slice by using a 3DCNN network; meanwhile, the LSTM network is utilized to predict head motion data, and then the head motion data is spliced and fused with the characteristic information extracted by the 3DCNN network; finally, inputting the spliced and fused results into a plurality of full-connection layers to respectively acquire the viewing probabilities of the left and right view points focusing on different information; the viewing probability of the ith slice is recorded as P by the probability prediction method _i 。

In the embodiment, the Opencv is utilized to respectively obtain a static saliency map, a dynamic saliency map and a parallax map of the two-path video. And respectively adopting two full-connection layers to predict the left and right view probability.

In this embodiment, the step S4 specifically includes the following steps:

the left view point and the right view point of the panoramic video are respectively divided into N fragments in time, each fragment has the length of T, each fragment comprises K slices, and each fragment has M bit levels; each slice in each slice has a code rate selected to be a _i Where i ε {0, M-1}; in the single view case, the code rate is selected for each tile by reinforcement learning, and at each time, M is shared ^K A possibility of the species. Such huge rowThe dynamic space is not feasible in practice. And (3) using the multi-agent reinforcement learning based on the Actor-critic to take each slice as an agent, sharing a state, and carrying out joint action so as to realize code rate distribution.

When multi-agent reinforcement learning is adopted for slice code rate selection in left and right view points, rewards of each agent are mixed with local rewards (such as Q ^avg ，Q ^sv ，Q ^tv Etc.), and global rewards at the time of agent coupling of left and right viewpoints (e.g.: t (T) ^rb )。

The Global rewards and the local rewards are separated and optimized respectively by introducing Global-Critic (Global supervision) for supervision so as to ensure the stability of the model; the policy gradient of each agent after modification is:

where ep denotes the sample playback buffer, o _i Is a local environment, i.e. an intelligent environment, a _i Is the code rate selected by an agent, s is the overall environment, i.e. the intersection of the environmental states of all agents, θ is the parameter of the network model training,local value function for each agent,/->A global value function for all agent compositions.

The loss function of (2) is:

the loss function of (2) is:

in this embodiment, the step S5 specifically includes the following steps:

the reward function determines the reinforcement learning direction, so that the proper reward function is designed to determine the working performance of the system. Assuming that each agent shares a state at each moment in the left and right viewpoints, the input states are respectively:

wherein,representing network throughput of past k segments; />Representing an optional code rate set; b _t Representing the current buffer size; z _t Representing the average code rate of the last segment; />And->Download time of k past clips respectively representing left and right viewpoints; />And->The viewing probability of each slice of the left and right viewpoints is respectively represented; />And->Respectively representing the set of code rates selected by the slices of the left and right view of the last slice.

The size of the watching probability of each slice determines the contribution degree of the whole video quality; when the slice is in the viewport region,1, otherwise 0. Therefore, the average quality of view ports of the left and right viewpoint segments is:

the spatial quality of the left and right viewpoint views varies:

wherein,and->Respectively represent the spatial domain quality change of the viewing port of the left and right view points, q (a) _i ) A mapping representing code rate to perceived quality;

wherein,and->Respectively show left and right viewsTime domain quality change of point view port;

the segments of the left and right views can be considered as continuous downloads, which form the final download time, and the left and right views together affect the buffer time of the system; meanwhile, the intelligent agents with different code rates are selected for the left and right view points to be in a complete cooperation relationship; when the requested data is completely downloadedData size b of buffer memory larger than sending request time _t-1 When the data is not completely downloaded and the buffer memory is exhausted, a buffer phenomenon occurs; the buffer time is as follows:

the quality difference of the slices at the corresponding positions of the left and right viewpoints is too large, and the QoE is seriously reduced when the quality difference exceeds a set range; and the symmetric coding has better performance when the quality of the left and right view points is smaller; in order to avoid the too large quality difference of the slices corresponding to the left and right viewpoints, a punishment item A is designed _t To limit the code rate gap size of the corresponding slices of the left and right view points:

wherein,representing right view slice quality, < >>Representing a difference in left and right view slice quality; when->When larger, the person is in need of->Allow a variation in a larger range, but +.>Does not change significantly; when->Less time, ->In the case of large range changes, +.>Will vary significantly. Therefore, the penalty term constrains that the left and right view quality is poor, but has higher acceptance for asymmetric coding in the case of high right view slice quality; />In order to properly reduce the effect of view probability on the view quality gap penalty for intra slices.

The local rewards aim at single view points, and in order to make the space domain and time domain changes in the view points as small as possible, the space domain and time domain changes are set as negative rewards; the global rewards are aimed at the whole formed by the left viewpoint and the right viewpoint, and the average quality is set as positive rewards in order to obtain higher average quality; in order to reduce the buffer time and avoid too large difference of the quality of the left and right view points, the constraint item of the buffer time and the quality difference of the left and right view points is negative rewarded; setting left and right viewpoint local rewards r _t ^L,l ，r _t ^R,l And global rewards r _t ^g Their functional expressions are respectively:

where λ and η are weights. The reward functions include two local reward functions and one global reward function.

In this embodiment, an experience playback mechanism (Experience Replay) is used for training multi-agent reinforcement learning. Let β be 0.7, λ be 15.0, η be 11.2. The discount factor gamma is 0.99, k is 8,T and 1.

The embodiment also provides a stereoscopic panoramic video asymmetric transmission stream self-adaptive system, which comprises a memory, a processor and computer program instructions stored on the memory and capable of being run by the processor, wherein the computer program instructions can realize the method steps when the processor runs the computer program instructions.

In order to verify the effectiveness of the present invention, comparative experiments were performed as follows.

The method of the present invention was compared with the following 2 methods:

(1) Adaptive streaming based on reinforcement Learning method [ Jiang X, chiang Y-H, zhao Y, et al Plato: learning-basedAdaptive Streaming of-Degre video [ C ].2018IEEE 43rd Conference on Local ComputerNetworks (LCN), 2018:393-40 ].

(2) Adaptive streaming methods based on conventional theory [ Nguyen D V, tran HT, pham A T, et al. An Optimal Tile-Based Approach for Viewport-Adaptive 360-Degree Video Streaming [ J ]. IEEE Journal on Emerging and Selected Topics in Circuits and Systems,2019,9 (1): 29-42 ].

The adaptive streaming method based on reinforcement learning adopts fixed area expansion. A method for analyzing self-adaptive stream transmission based on the traditional theory adopts real-time area expansion. The code rates of the corresponding areas of the left view and the right view in the method (1) and the method (2) are equal.

According to the description in literature [ Saygili G, gurler C G, tekalpAM. Evaluation ofAsymmetric Stereo Video Coding and Rate Scaling forAdaptive 3D Video Streaming[J ]. IEEE Transactions on Broadcasting,2011,57 (2): 593-601 ], the 3D perceptual quality of asymmetric coding is better than that of asymmetric coding when the left and right view PSNR value is greater than the threshold value of 32 dB; below the threshold of 32dB, the perceived quality of symmetric coding is better than asymmetric coding. Thus, stereoscopic video objective quality can be measured by the following formula:

the average PSNR of the stereoscopic panoramic video viewpoint area is:

wherein,kth tile, which is the view area of the nth segment,/for the view area>The size of the view region tile set is indicated.

The buffer time, time domain and space domain smoothness of the three algorithms and the viewpoint region PSNR value constructed by the average PSNR of the viewpoint region of the stereoscopic panoramic video are obtained under the same objective experimental environment, and the overall perceived quality is reflected.

Fig. 6 reflects a comparison of the data of the three methods in the case of 4G and 5G network bandwidths, wherein fig. 6 (a), 6 (b) and 6 (c), 6 (d) show the relevant data in the case of 4K and 8K stereoscopic panoramic video transmission in 4G and 5G bandwidths, respectively. Experimental results show that the performance under the 5G bandwidth is better than the performance under the 4G bandwidth in the overall performance, which accords with the consistent cognition and objective facts of people. But in the case of 5G bandwidth, the buffering time of both methods is increasing because the 5G bandwidth fluctuations are more severe relative to the 4G, as shown in fig. 5 for the 4G and 5G bandwidth trace cases. The severely fluctuating bandwidth will present a greater challenge to the code rate selection strategy of the algorithm. The buffering time will instead increase and also result in a reduced temporal smoothness. The reinforcement learning-based code rate selection method can better show excellent performance under complex conditions by continuously learning the past and future effects of reinforcement, and makes correct decisions.

The code rate distribution of each tile of the space domain cutting can cause certain degree of non-smoothness in the time domain and the space domain, which is also the reason that the effect of the method in the time domain and the space domain is poor. But the present method of asymmetric transmission mechanism based on binocular suppression, which represents perceived qualityIs highest relative to the other two algorithms and is also relatively lowest in buffering time. Under the condition that a certain threshold value is exceeded, the asymmetric coding is superior to the symmetric coding, and compared with the transmission of tiles with the left code rate and the right code rate, the asymmetric transmission can reduce bandwidth data, so that the buffer time is reduced. And the effect is more obvious under the condition that the environment is worse, such as the 4G condition and the 5G condition, and the 8K stereoscopic panoramic video is transmitted.

Fig. 7 shows QoE-CDF diagrams of different algorithms for transmitting 4K and 8K stereoscopic panoramic video under 4G and 5G bandwidths, and it can be seen from the diagrams that the method can achieve a good balance, and the average QoE can be improved by 20% on average under 4G bandwidth and can be improved by 12% on average under 5G bandwidth compared with other two algorithms. The 5G network can alleviate the quality degradation caused by the transmission of the stereoscopic panoramic video under the 4G network to a certain extent, and the asymmetric transmission method can further improve the overall quality of the video.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The above description is only a preferred embodiment of the present invention, and is not intended to limit the invention in any way, and any person skilled in the art may make modifications or alterations to the disclosed technical content to the equivalent embodiments. However, any simple modification, equivalent variation and variation of the above embodiments according to the technical substance of the present invention still fall within the protection scope of the technical solution of the present invention.

Claims

1. The method for adapting the asymmetric transmission stream of the stereoscopic panoramic video is characterized by comprising the following steps of:

s4, performing joint code rate control on the left and right view points by utilizing multi-agent reinforcement learning based on Actor-critic so as to balance the interaction between the quality of the single-path view point and the overall quality;

s6, decoding, splicing and stitching the downloaded data, storing the data in a play cache of the client, and rendering and playing the data through playing software;

the step S4 specifically includes the following steps:

the left view point and the right view point of the panoramic video are respectively divided into N fragments in time, each fragment has the length of T, each fragment comprises K slices, and each fragment has M bit levels; each slice in each slice has a code rate selected to be a _i Where i ε {0, M-1}; q (a) _i ) A mapping representing code rate to perceived quality; the viewing probability of each slice of the left and right viewpoints is respectivelyUsing Actor-critic based multi-agent reinforcement learning, each slice is treated as an agent that shares a state for joint actionThereby realizing the distribution of code rate;

where ep denotes the sample playback buffer, o _i Is a local environment, i.e. an intelligent environment, a _i Is the code rate selected by an agent, s is the overall environment, i.e. the intersection of the environmental states of all agents, θ _i Is a parameter for the training of the network model,local value function for each agent,/->A global value function composed of all the agents;

the loss function of (2) is:

2. the method according to claim 1, wherein in the step S3, the 3DCNN network is used to extract the characteristics of the static significant information, the dynamic significant information and the parallax information of the binocular viewpoint of the obtained main viewpoint sequence slice respectively; meanwhile, the LSTM network is utilized to predict head motion data, and then the head motion data is spliced and fused with the characteristic information extracted by the 3DCNN network; finally, inputting the spliced and fused results into a plurality of full-connection layers to respectively acquire the viewing probabilities of the left and right view points focusing on different information; the viewing probability of the ith slice is recorded as P by the probability prediction method _i 。

3. The method for adapting a stereoscopic panoramic video asymmetric transport stream according to claim 1, wherein said step S5 specifically comprises the steps of:

wherein,representing network throughput of past k segments; />Representing an optional code rate set; b _t Representing the current buffer size; z _t Representing the average code rate of the last segment; />And->Download time of k past clips respectively representing left and right viewpoints;and->The viewing probability of each slice of the left and right viewpoints is respectively represented; />And->Respectively representing the set of code rates selected by the slices of the left and right view points of the last segment;

spatial quality change:

the quality difference of the slices at the corresponding positions of the left and right viewpoints is too large, and the QoE is seriously reduced when the quality difference exceeds a set range; and the symmetric coding has better performance when the quality of the left and right view points is smaller; in order to avoid the too large quality difference of the slices corresponding to the left and right viewpoints, a punishment item A is designed _t To limit the code rate difference of the corresponding slices of the left view point and the right view pointDistance size:

the local rewards aim at single view points, and in order to make the space domain and time domain changes in the view points as small as possible, the space domain and time domain changes are set as negative rewards; the global rewards aim at the whole formed by the left view point and the right view point to obtain higher flatnessAverage mass, the average mass is set as positive rewards; in order to reduce the buffer time and avoid too large difference of the quality of the left and right view points, the constraint item of the buffer time and the quality difference of the left and right view points is negative rewarded; setting left and right viewpoint local rewards r _t ^L,l ，r _t ^R,l And global rewards r _t ^g The function expressions are respectively as follows:

wherein λ and η are weights;

4. A stereoscopic panoramic video asymmetric transport stream adaptation system comprising a memory, a processor and computer program instructions stored on the memory and executable by the processor, which when executed by the processor, are capable of carrying out the method steps of any one of claims 1 to 3.