[go: up one dir, main page]

CN119815042A - Network high-definition player streaming media decoding method, device and storage medium - Google Patents

Network high-definition player streaming media decoding method, device and storage medium Download PDF

Info

Publication number
CN119815042A
CN119815042A CN202510273088.2A CN202510273088A CN119815042A CN 119815042 A CN119815042 A CN 119815042A CN 202510273088 A CN202510273088 A CN 202510273088A CN 119815042 A CN119815042 A CN 119815042A
Authority
CN
China
Prior art keywords
decoding
buffer
network
frame
streaming media
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202510273088.2A
Other languages
Chinese (zh)
Other versions
CN119815042B (en
Inventor
杨波
姜赛
况君禄
石常和
冯烽
毛飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Amedia Technology Co ltd
Original Assignee
Shenzhen Amedia Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Amedia Technology Co ltd filed Critical Shenzhen Amedia Technology Co ltd
Priority to CN202510273088.2A priority Critical patent/CN119815042B/en
Publication of CN119815042A publication Critical patent/CN119815042A/en
Application granted granted Critical
Publication of CN119815042B publication Critical patent/CN119815042B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

本发明涉及流媒体解码技术领域,公开了一种网络高清播放机流媒体解码方法、装置及存储介质,该方法包括:对网络高清播放机中解码芯片的实时温度数据、网络带宽抖动数据、视频码率波动数据进行多线程并行采集和存储,得到流媒体动态特征数据集;进行分类训练,得到帧间相关性预测模型;基于帧间相关性预测模型构建多级流媒体缓存系统;对多级流媒体缓存系统中的缓存状态、网络状态和芯片状态进行实时监测,得到解码事件序列;根据解码事件序列对解码参数进行实时调整,得到流媒体解码控制指令,本发明实现了网络层、解码层和显示层的协同优化,降低了播放启动延迟,提升了网络高清播放机的整体稳定性。

The invention relates to the technical field of streaming media decoding, and discloses a method, device and storage medium for decoding streaming media of a network high-definition player. The method comprises: performing multi-threaded parallel acquisition and storage on real-time temperature data of a decoding chip in the network high-definition player, network bandwidth jitter data and video bit rate fluctuation data to obtain a streaming media dynamic feature data set; performing classification training to obtain an inter-frame correlation prediction model; constructing a multi-level streaming media cache system based on the inter-frame correlation prediction model; performing real-time monitoring on the cache state, network state and chip state in the multi-level streaming media cache system to obtain a decoding event sequence; and adjusting decoding parameters in real time according to the decoding event sequence to obtain a streaming media decoding control instruction. The invention realizes the coordinated optimization of the network layer, the decoding layer and the display layer, reduces the playback startup delay and improves the overall stability of the network high-definition player.

Description

Method, device and storage medium for decoding streaming media of network high-definition player
Technical Field
The present invention relates to the field of streaming media decoding technologies, and in particular, to a method, an apparatus, and a storage medium for decoding streaming media of a network high-definition player.
Background
With the popularization of 5G networks and the continuous improvement of video content quality, 4K/8K ultra-high definition video streaming media services have become important application scenes of network high definition players. However, in the actual decoding process, due to the influence of network bandwidth fluctuation, dynamic change of video code rate, heating of a decoding chip and other uncertainty factors, problems such as blocking, mosaic, and asynchronous audio and video of a playing picture are easy to occur, and the viewing experience of a user is seriously influenced.
The existing decoding technology mainly adopts a fixed parameter decoding scheme and an MMSE-based inter-frame prediction method, and when processing ultra-high definition video streams of new generation coding standards such as H.265/HEVC, the decoding delay is often increased due to complex inter-frame dependency and limited computational power of playing equipment. In addition, the traditional fixed caching strategy is difficult to cope with network fluctuation and video code rate change, and cannot meet the requirement of high-quality streaming media playing.
Disclosure of Invention
The invention provides a method, a device and a storage medium for decoding streaming media of a network high-definition player, which realize cooperative optimization of a network layer, a decoding layer and a display layer, reduce play start delay and improve the overall stability of the network high-definition player.
In a first aspect, the present invention provides a method for decoding streaming media of a network high-definition player, where the method for decoding streaming media of a network high-definition player includes:
The method comprises the steps of performing multithreading parallel acquisition and storage on real-time temperature data, network bandwidth jitter data and video code rate fluctuation data of a decoding chip in a network high-definition player to obtain a streaming media dynamic characteristic data set;
classifying and training the data in the stream media dynamic characteristic data set according to the inter-frame prediction difficulty to obtain an inter-frame correlation prediction model;
constructing a multi-level streaming media cache system based on the inter-frame correlation prediction model;
monitoring the buffer status, the network status and the chip status in the multi-level streaming media buffer system in real time to obtain a decoding event sequence;
and adjusting decoding parameters in real time according to the decoding event sequence to obtain a streaming media decoding control instruction.
In a second aspect, the present invention provides a network high-definition player streaming media decoding device, where the network high-definition player streaming media decoding device includes:
The parallel acquisition module is used for carrying out multithread parallel acquisition and storage on real-time temperature data, network bandwidth jitter data and video code rate fluctuation data of a decoding chip in the network high-definition player to obtain a streaming media dynamic characteristic data set;
The classification training module is used for carrying out classification training on the data in the stream media dynamic characteristic data set according to the inter-frame prediction difficulty to obtain an inter-frame correlation prediction model;
the construction module is used for constructing a multi-level streaming media cache system based on the inter-frame correlation prediction model;
the real-time monitoring module is used for monitoring the cache state, the network state and the chip state in the multi-stage streaming media cache system in real time to obtain a decoding event sequence;
And the real-time adjustment module is used for adjusting the decoding parameters in real time according to the decoding event sequence to obtain a streaming media decoding control instruction.
A third aspect of the present invention provides a computer readable storage medium having instructions stored therein that, when executed on a computer, cause the computer to perform the above-described network high definition player streaming media decoding method.
According to the technical scheme, decoding prediction accuracy of different types of video frames is remarkably improved by constructing a double-branch neural network structure and combining time sequence feature extraction and space feature extraction, collaborative optimization of a network layer, a decoding layer and a display layer is achieved based on layered design of a multi-stage streaming media buffer system, play start delay is reduced, an adaptive decoding control strategy of an event trigger mechanism is adopted, system resource occupation and play blocking rate are reduced, an attention fusion module is introduced, accuracy of inter-frame correlation prediction is improved, decoding resource allocation is more reasonable, overheating of a decoding chip is effectively avoided, temperature control, performance optimization and resource allocation can be simultaneously considered by optimizing training of a multi-task loss function, overall stability of the system is remarkably improved, and the system can timely respond to various abnormal events based on a dynamic scheduling mechanism of event priority, so that continuous and stable play of an ultra-high definition video stream is guaranteed.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings may be obtained based on these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic diagram of steps of a method for decoding streaming media of a network high definition player according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a streaming media decoding device of a network high definition player according to an embodiment of the present invention.
Detailed Description
The embodiment of the invention provides a method and a device for decoding streaming media of a network high-definition player and a storage medium. The terms "first," "second," "third," "fourth" and the like in the description and in the claims and in the above drawings, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments described herein may be implemented in other sequences than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or apparatus.
For easy understanding, the following describes a specific flow of an embodiment of the present invention, referring to fig. 1, and an embodiment of a method for decoding streaming media of a network high definition player in an embodiment of the present invention includes:
Step S1, performing multithread parallel acquisition and storage on real-time temperature data, network bandwidth jitter data and video code rate fluctuation data of a decoding chip in a network high-definition player to obtain a streaming media dynamic characteristic data set;
It can be understood that the execution body of the present invention may be a network high-definition player streaming media decoding device, and may also be a terminal or a server, which is not limited herein. The embodiment of the invention is described by taking a server as an execution main body as an example.
Specifically, a data acquisition mechanism is designed in the player system to ensure that real-time temperature data, network bandwidth jitter data, video code rate fluctuation data and system resource state data of the decoding chip are acquired and stored in parallel in a multithreading mode, so that a streaming media dynamic characteristic data set is constructed. The temperature data of the decoding chip is obtained by a periodic sampling mode, wherein the temperature data comprises the measurement of the core temperature and the surface temperature. The core temperature is directly read from the internal sensor of the decoding chip, and the surface temperature is additionally monitored through the external temperature sensor, so that the heat dissipation condition and the heat stability of the decoding chip are reflected. During the data acquisition process, each sampling is accompanied by an accurate time stamp to ensure the accuracy of subsequent data alignment. And meanwhile, detecting a sliding window of the network bandwidth data to obtain the bandwidth jitter condition. Sliding window detection is a data processing method, continuously calculates the change trend of bandwidth in a specified time window, and can capture instantaneous fluctuation, so as to obtain network bandwidth jitter data flow. The data stream comprises two key parameters of bandwidth change rate and packet loss rate, the former reflects the dynamic change condition of the network transmission rate, the latter measures the severity of data loss, and the two parameters together determine the fluency of stream media playing. These parameters are calculated by periodically grabbing network traffic packets and trend prediction is performed in conjunction with historical data to take appropriate preventive action when the network condition is showing signs of instability. Meanwhile, the code rate of the input video stream is dynamically monitored. And as the code rate fluctuation of the video stream directly affects the calculation load and the network transmission pressure of the decoding chip, the video code rate is analyzed in real time, and the change trend of the video code rate is recorded. And extracting code rate information of each section of video data from the transmission stream by adopting a method based on data packet analysis, or directly acquiring the video code rate data before decoding by a decoder module of a player to form a video code rate fluctuation data stream. And monitoring the CPU occupancy rate, the memory utilization rate and the GPU utilization rate in real time to form a system resource state data stream. The CPU occupancy rate is monitored through a process management module of an operating system, the memory utilization rate is calculated through analyzing the memory allocation condition and the cache utilization condition, meanwhile, the GPU utilization rate is inquired through an API interface provided by a graphics card driver, and the data together form a system resource state data stream to help judge the resource bottleneck problem in the decoding process. And (3) performing time stamp alignment on the real-time temperature data stream, the network bandwidth jitter data stream, the video code rate fluctuation data stream and the system resource state data stream, so as to ensure that data from different sources are matched according to the same time axis, and constructing a multidimensional state data matrix. And carrying out annular buffer storage on the multidimensional state data matrix to obtain a streaming media dynamic characteristic data set. The ring buffer is a storage structure, which can keep higher access efficiency in the scene of larger data volume and continuous update, and avoid the problem of memory overflow. Through the ring buffer storage, old data can be automatically covered by new data, so that the data set is ensured to be always kept in the latest state, and excessive storage resources are not occupied.
S2, carrying out classification training on data in the stream media dynamic characteristic data set according to the inter-frame prediction difficulty to obtain an inter-frame correlation prediction model;
Specifically, the streaming media dynamic feature data set is subjected to data preprocessing, and the temperature feature, the network feature and the video code rate feature of the decoding chip are standardized to form a temperature feature vector, a network feature vector and a code rate feature vector respectively. The temperature characteristic vector consists of core temperature and surface temperature data of the decoding chip, and a mean value normalization method is adopted, so that all data are in the same scale range. The network feature vector is composed of network bandwidth jitter data and packet loss rate, data noise is reduced through a sliding window mean value filtering method, meanwhile, the time sequence characteristic of the data is maintained, the code rate feature vector is composed of video code rate fluctuation data, and the code rate features of different videos are ensured to be compared in the same numerical range through normalization transformation. The data is classified according to the h.265/HEVC coding standard to distinguish between different characteristics of I frames, P frames, and B frames. I frames are completely independent frames containing complete image information and are independent of other frames for decoding, the coding complexity is usually low, P frames are predicted based on the preceding frames and contain motion vector information, the coding complexity is relatively high, and B frames are dependent on bi-directional prediction of the preceding and following frames, so that the motion vector calculation is more complex. To quantify this complexity, the entropy coding complexity of each frame type is calculated, measured by counting the dispersion of pixel blocks within the frame, and the motion vector distribution is obtained by analyzing the motion direction and amplitude of the intra-predicted blocks, which information constitutes a frame type feature matrix. On the basis of the frame type feature matrix, a double-branch neural network structure is constructed, and the structure consists of a time sequence feature extraction branch and a space feature extraction branch. The time sequence feature extraction branch adopts a three-layer bidirectional LSTM network, each layer comprises 256 LSTM units, and the structure captures the time dependence relationship of the forward direction and the backward direction at the same time, so that the network can better understand the dynamic change mode among video frames. The spatial feature extraction branch adopts ResNet structures, the branch comprises four residual block groups, each residual block group is formed by three layers of convolution layers and jump connection, the convolution layers can extract spatial features in frames, the design of the jump connection can relieve the gradient disappearance problem, and the training stability of a deep network is improved. The combination of the double branch structure enables the model to learn the time correlation between frames and extract the spatial characteristics in the frames, thereby enhancing the accuracy of decoding prediction. And an attention fusion module is added at the output end of the double-branch network, and the module consists of a channel attention sub-module and a space attention sub-module. The channel attention sub-module enhances the attention to the key features by calculating the importance weight of each feature channel, while the spatial attention sub-module highlights the recognition capability to the target region by calculating the weighted distribution of the spatial features. After the attention fusion module is processed, the obtained fusion feature map can more effectively express the dynamic change mode among frames in the decoding process. The fusion feature map is input into a decoding prediction head network which consists of three parallel branches and is used for predicting inter-frame correlation, decoding resource requirements and a cache allocation strategy respectively. The method comprises the steps of estimating the similarity of a current frame and a front frame and a rear frame by an inter-frame correlation prediction branch so as to guide the weight adjustment of inter-frame prediction in the decoding process, estimating calculation resources required in the decoding process, including CPU occupancy rate, memory demand and GPU load by a decoding resource demand prediction branch so as to dynamically optimize decoding parameters, and calculating a reasonable buffer management scheme by a buffer allocation strategy prediction branch so as to ensure that the problem of clamping caused by insufficient buffer or overload in the decoding process is avoided. The three branches are all predicted by adopting a two-layer fully-connected network, wherein the neuron number of the first layer fully-connected layer is 512, so that sufficient characteristic expression capability is provided, and the neuron number of the second layer fully-connected layer is matched with respective prediction target dimensions so as to ensure the accuracy of a prediction result. In the training process, a multiplexing loss function is adopted to optimize the decoding prediction head network, wherein the inter-frame correlation loss is used for measuring predicted inter-frame similarity errors, the resource prediction loss is used for evaluating the accuracy of decoding resource demand prediction, and the cache optimization loss is used for optimizing the rationality of a cache management strategy. The loss functions are combined in a weighted mode, and gradient updating is carried out by adopting an Adam optimizer so as to ensure that the model can be converged efficiently and has better generalization capability. through multiple rounds of training and optimization iteration, the model can accurately predict inter-frame correlation, decoding resource requirements and a cache allocation strategy, so that the decoding process of the streaming media player is effectively guided, and the decoding efficiency and the playing stability are improved.
S3, constructing a multi-level streaming media cache system based on an inter-frame correlation prediction model;
Specifically, the inter-frame correlation, the decoding resource requirement and the buffer allocation strategy output by the inter-frame correlation prediction model are mapped in a layered manner, and the multi-level buffer configuration parameters are obtained. The multi-level buffer configuration parameters comprise a network layer buffer size, a decoding layer frame type buffer proportion and a display layer buffer capacity, and the parameters determine the resource allocation modes of the buffer system on different levels, so that the stability and the efficiency of the whole streaming media decoding are influenced. And performing buffer space allocation on the network layer based on the multi-level buffer configuration parameters to form a network layer buffer structure. The size of the network layer buffer structure is dynamically adjusted according to the current video code rate, and the capacity range of the network layer buffer structure is set to be between two times and four times of the video code rate, so that enough buffer space is provided when the network fluctuates, and the probability of packet loss or blocking is reduced. The specific adjustment mode of the network layer buffer depends on the real-time network bandwidth jitter condition, when the network is stable, buffer occupation is reduced to reduce data delay, and when the network jitter is larger or the packet loss rate is higher, the buffer capacity is properly increased to enhance the anti-jitter capability. Meanwhile, the data storage strategy of the network layer buffer memory needs to be matched with the data acquisition mechanism of the decoding layer, and an intelligent buffer memory replacement strategy, such as a discarding strategy based on time sequence priority, is adopted to ensure that important frames are reserved preferentially without affecting the continuity of streaming media playing. After the network layer cache allocation is completed, the decoding layer is subjected to cache division to form a decoding layer cache structure. The buffer structure of the decoding layer is composed of an I frame buffer area, a P frame buffer area and a B frame buffer area, and the capacity of the decoding layer is dynamically configured according to the analysis result of the inter-frame correlation prediction model. Since the I-frames are independently decoded key frames, the priority of the buffer area is highest, so that the capacity of the I-frame buffer area is set to a first target value, so as to ensure that the I-frames can be stored and read in time. The P frame is decoded depending on the previous frame, the capacity of the buffer is set to the second target value, and the B frame is decoded with reference to the previous and subsequent frames at the same time, and the buffer is required to be low, but in order to prevent decoding abnormality due to inter prediction error, the capacity of the B frame buffer is set to the third target value. The partition of the decoding layer buffer memory not only needs to consider the buffer memory requirements of different frame types, but also needs to combine the prediction result of the decoding resource requirement to adjust the buffer memory strategy when the chip resource is limited, for example, when the decoding pressure is larger, the capacity of the B frame buffer memory area is reduced, so as to reduce the decoding calculation burden and improve the decoding efficiency. And carrying out buffer configuration on the display layer based on the multi-level buffer configuration parameters so as to ensure that the video frames are smoothly transmitted to the display unit after decoding is completed. The core of the display layer buffer structure is two buffer areas, the sizes of the two buffer areas are 1.5 times of the pixel data required by the current display resolution, and the design provides enough buffer space when the display frames are switched, so that the situation of picture tearing or frame loss is avoided. The strategy of the display layer buffer and the decoding layer buffer work cooperatively, namely when the buffer space of the decoding layer is tense, part of the display layer buffer is released preferentially so as to ensure smooth data transmission of the decoding layer, and meanwhile, after the display frames are stable, the double buffer mechanism is utilized to perform seamless switching of the display frames so as to reduce picture blocking and improve the play fluency. after the configuration of the buffer structures of each layer is completed, the data transmission channels among the network layer, the decoding layer and the display layer are optimized to form an efficient interlayer data transmission strategy. The design of the inter-layer data transmission strategy considers the priority of the data stream, and in general, the transmission priority of the I frame data is highest, then the transmission priority of the P frame is lowest, and the transmission priority of the B frame is lowest, so that the data channel dynamically adjusts the transmission scheduling sequence according to the inter-frame correlation prediction result so as to ensure the priority transmission of key data in the decoding process. Meanwhile, the strategy of data transmission combines with the decoding resource state to avoid buffer overflow caused by insufficient processing capacity of a decoding layer when data is accumulated, and an adaptive flow control mechanism is adopted in realization, and the transmission rate of a data stream is dynamically adjusted by monitoring the length of a decoding queue so as to ensure that a decoder can stably operate. Meanwhile, the data transmission of the display layer adopts an intelligent scheduling mechanism to reduce the phenomenon of frame loss, for example, when the display layer buffer memory is about to be exhausted, the next frame of data is scheduled from the decoding layer buffer memory in advance, so that smooth playing is maintained. The multi-level streaming media buffer system is constructed by integrating a network layer buffer structure, a decoding layer buffer structure, a display layer buffer structure and an interlayer data transmission strategy.
And analyzing the buffer proportion of the decoding layer frame types and the coding parameters in the multi-level buffer configuration parameters, so as to determine the buffer requirement of each frame type. The buffer ratio of the frame types of the decoding layer reflects the distribution ratio of I frames, P frames and B frames in the buffer, and the coding parameters comprise the size characteristics and the decoding complexity of different frame types under the H.265/HEVC coding standard. In the analysis process, the analysis result of the inter-frame correlation prediction model is combined to calculate frame grouping weight values of different frame types, wherein the weight values are used for measuring the importance of various frames in the buffer memory and guiding the subsequent buffer memory allocation strategy. Since the I frame is a key frame, the decoding independence is strong and the storage volume is large, so that a higher buffer weight value is allocated, the buffer requirement of the P frame is moderate due to the higher time correlation, the B frame depends on the front frame and the back frame to carry out bidirectional prediction, and the storage space is properly reduced during buffer allocation so as to optimize the overall buffer efficiency. And calculating the buffer capacity according to the frame grouping weight value and the total memory capacity of the system, and generating a frame type buffer allocation scheme. The core of the buffer capacity calculation is to ensure that the system memory can be reasonably allocated to different types of frames, and simultaneously ensure that the data loss or decoding blocking caused by insufficient buffer in the decoding process can not be caused. And comprehensively considering the maximum available memory of the decoding chip, the dynamic memory requirement during the system operation and the code rate characteristic of the current video stream in the calculation process, and carrying out dynamic adjustment by combining the historical data provided by the inter-frame correlation prediction model. For example, in the case of high-rate video streams, the capacity of the I-frame buffer is increased appropriately, while in low-rate scenarios, the I-frame buffer is decreased and more storage space is reserved for P-frames and B-frames. By the cache capacity calculation mode, reasonable allocation of cache space of different frame types is ensured, and decoding efficiency is maximized within the allowable range of a system memory. And carrying out partition management on the system memory based on the frame type buffer allocation scheme to generate a buffer physical address table. The partition management of the system memory considers the data alignment and the high efficiency of the cache access, so that the partition is performed by adopting memory blocks with fixed sizes, and each partition block corresponds to a specific cache area and is indexed by a physical address table. For example, the I-frame buffer is mapped to a high priority memory region to ensure its access speed, while the P-frame and B-frame buffers are mapped to medium priority and low priority memory regions, respectively, to optimize memory utilization. And constructing a multi-level page table structure based on the buffer physical address table to form a buffer access control table. The multi-level page table structure is used for providing a more flexible and efficient cache management mechanism, so that a decoding chip can perform quick address conversion when reading cache data, and the problem of memory fragmentation is reduced. The multi-stage page table structure comprises a global page table, a first-stage page table and a second-stage page table, wherein the global page table is used for managing the distribution condition of the whole cache space, the first-stage page table is used for mapping the frame type cache region, and the second-stage page table is used for indexing specific storage blocks. By the method, efficient address searching is realized in the decoding process, the delay of memory access is reduced, and meanwhile, the dynamic management capacity of the cache space is ensured, so that the cache space is redistributed according to the load condition in the system operation process. And designing a cache scheduling scheme based on the cache access control table to realize efficient reading and storage of the cache. The core of the cache scheduling scheme is to optimize the access sequence of data and reduce unnecessary cache replacement operations, thereby improving the continuity of decoding. The data access priority of the I frame is highest, so the buffer scheduling scheme needs to ensure that the I frame data can be loaded to the decoder as soon as possible, and release the buffer in time after decoding, so as to reduce the risk of buffer overflow. The time correlation between the P frame and the I frame needs to be considered in the cache access scheduling of the P frame, the P frame data to be used is loaded in advance through a prediction algorithm, and the cache access scheduling of the B frame adopts a delayed loading strategy so as to reduce the occupation of the system memory. in order to optimize the efficiency of cache scheduling, the cache prefetching technology is combined, and data to be accessed in the future is predicted by analyzing the playing mode of the video stream and is loaded into the cache in advance, so that the problem of blocking caused by the fact that the data do not arrive in time in the decoding process is reduced. And creating a decoding layer buffer structure based on the buffer physical address table, the buffer access control table and the buffer scheduling scheme, wherein the buffer structure comprises an I frame buffer with the capacity of a first target value, a P frame buffer with the capacity of a second target value and a B frame buffer with the capacity of a third target value.
Step S4, monitoring the cache state, the network state and the chip state in the multi-level streaming media cache system in real time to obtain a decoding event sequence;
Specifically, the cache layer state in the multi-level streaming media cache system is sampled in a distributed manner to obtain multi-level cache state data. Because the cache system involves multiple levels, including a network layer cache, a decoding layer cache, and a display layer cache, a distributed sampling strategy is adopted in the sampling process to ensure the integrity and real-time of data. The sampling of the network layer buffer status needs to monitor the queuing length, the data loss rate and the buffer utilization rate of the buffer queue, while the sampling of the decoding layer buffer status focuses on the buffer occupancy and the buffer hit rate of the I frame, the P frame and the B frame, and the sampling of the display layer buffer status tracks the buffer switching rate of the frame and the frame refresh delay. And obtaining high-precision multi-layer cache state data with lower calculation cost by a distributed sampling mode. Multi-dimensional threshold analysis is performed on the multi-level cache state data to identify potential anomalies and to determine event triggering conditions. Since different video coding standards and hardware resource configurations affect the stability of the decoding process, the threshold analysis requires dynamic adjustment in combination with video coding characteristics and hardware resource limitations. And establishing a multidimensional feature space for the cache state data, and counting the optimal operation ranges of different parameters according to the historical decoding data, so as to determine a reasonable hierarchical trigger threshold. For example, under the h.265/HEVC coding standard, the buffer hit rate of the I frame should be kept within a certain higher interval, while the buffer availability rates of the P frame and the B frame are adaptively adjusted according to different code rates, and in terms of hardware resources, the temperature of the decoding chip, the CPU occupancy rate and the GPU load also need to be considered in the threshold analysis, so as to ensure that the system will not affect the decoding quality due to overheating or resource overload. in this way, event trigger conditions are dynamically determined based on the cache state data and it is ensured that the setting of the threshold can be adapted to different video streams and hardware environments. And constructing an adaptive priority scheduling mechanism based on the event triggering condition, so as to carry out dynamic priority assignment according to the influence degree of the event on the decoding quality. Because of various different types of events, such as buffer overflow, network jitter, frame loss, excessive decoding load and the like, in the decoding process, a set of flexible priority scheduling mechanism is established so as to dynamically adjust the decoding strategy under different conditions. And calculating a priority score according to the influence range, the duration and the historical influence degree of each event by adopting a weighted scoring method, and generating a multi-level event priority table. in this process, high-impact events (e.g., I-frame buffer loss or decoder overload) are given higher priority, while low-impact events (e.g., short network fluctuations or slight buffer jitter) are arranged at lower priority, thereby ensuring that the system can handle critical events preferentially and avoiding impacting overall decoding performance by taking up excessive computing resources for the secondary events. The multi-level event priority table is rearranged according to time sequence, and the time stamp information and the priority weights of the events are combined to sort, so that an event sequence with weights is obtained. A time weighted ranking algorithm is employed, i.e., time factors are considered in calculating the priority of events, so that newly occurring high priority events get a faster response, while historical events are appropriately adjusted according to their remaining impact levels. The weighted event sequences are input into a policy mapping network to respond to policy matching for various types of events using historical decoding control experience. The strategy mapping network learns the optimal event response scheme from past decoding logs and control strategies based on a deep learning or rule matching method, and provides an optimal processing strategy for the current event sequence. For example, when the system detects a sudden drop in network bandwidth, the policy mapping network automatically adjusts the code rate or increases the buffer capacity according to historical experience to reduce play-out stuck, and when the system detects that the temperature of the decoding chip exceeds the safe range, the system reduces the decoding frequency or adopts a hardware acceleration mode to reduce the temperature. The policy mapping network also needs to have adaptive learning capability, i.e., a policy matching scheme can be dynamically optimized in a continuously updated decoding environment, so as to improve the accuracy and efficiency of response. And carrying out space-time correlation analysis based on the weighted event sequence and the event processing strategy group so as to identify time sequence dependency relationship and space coupling effect among different events and obtain a decoded event sequence. The time sequence dependency relationship is reflected in the occurrence sequence and the influence duration of the events, for example, the buffer overflow event often causes the accumulation of a decoding queue, and further causes the frame loss phenomenon, so that the causal relationship is established in the event sequence, and the key events causing the chain reaction are preferentially processed. The spatial coupling effect is reflected by the mutual influence among different decoding resources, for example, the CPU resources are tensed due to the excessively high load of the GPU, so that the execution efficiency of the decoding task is influenced, and therefore, in the process of constructing an event sequence, the influence ranges of different events are subjected to cross analysis to identify the potential resource competition problem and adjust the decoding scheduling strategy when necessary. By combining time sequence analysis and space correlation analysis, the decoding event sequence can be ensured to accurately reflect the dynamic evolution condition of various events in the decoding process.
And S5, adjusting decoding parameters in real time according to the decoding event sequence to obtain a streaming media decoding control instruction.
Specifically, the events in the decoded event sequence are classified and analyzed, so that corresponding control parameters are extracted for different types of events. The decoding events are classified into temperature events, performance events, network events, cache events and code rate events, and each type of event has different degrees of influence on the decoding process, so that key characteristics of the decoding events are extracted in the analysis process, and a corresponding decoding regulation target matrix is established. Temperature events are mainly related to the trend of variation of core temperature and surface temperature of the decoding chip, and these information are used to adjust the frequency and voltage of the decoding chip to prevent performance degradation or system protection shutdown caused by overheating. The performance event pays attention to the utilization rate of the CPU, the GPU and the memory, and determines the execution capacity of decoding tasks and the resource scheduling mode. The network event relates to parameters such as bandwidth jitter, packet loss rate, data delay and the like, the information is important to the self-adaptive code rate adjustment and the buffer regulation, and the buffer event reflects the state of a multi-level buffer system, including buffer occupancy rate of I frames, P frames and B frames and buffer overflow condition, and the data is used for optimizing a buffer management strategy. Code rate events are mainly related to dynamic code rate changes of the input video stream, affect decoding complexity and computational resource requirements, and thus require targeted adjustment of decoding strategies. And analyzing all the events and constructing a decoding regulation target matrix to form a decoding control parameter set. And performing hardware resource control based on the decoding regulation target matrix to optimize resource allocation and performance management in the decoding process. the frequency and voltage of the decoding chip are adjusted to reduce power consumption at low loads and to increase computing power at high loads, thereby balancing performance and energy efficiency. The number of decoding threads also needs to be dynamically adjusted to adapt to different decoding task requirements, for example, parallel decoding threads are added when high-resolution video is played so as to improve decoding speed, and threads are reduced when low-code-rate video is played so as to save computing resources. Meanwhile, the buffer allocation proportion is dynamically adjusted according to the requirement of a decoding task, for example, under the condition of large network fluctuation, the capacity of a network layer buffer is increased to reduce data loss, and when decoder resources are tense, the occupation of a B frame buffer is reduced to preferentially ensure the storage space of I frames and P frames. Through the hardware resource control measures, a dynamically optimized hardware control parameter sequence is obtained, and the sequence guides the decoding chip to carry out self-adaptive adjustment under different conditions. The hardware control parameter sequence is applied to the decoding process of the H.265/HEVC coded stream and matched with a proper decoding strategy so as to ensure that the decoding process can be kept smooth and the utilization rate of computing resources can be optimized. The decoding strategy matching of the H.265/HEVC coded stream is mainly based on the decoding priorities and frame skipping strategies of I frames, P frames and B frames, wherein the priority of the I frames is highest, because the I frames are key frames and must be completely decoded to ensure the integrity of basic picture information of video, and the decoding priorities of the P frames are the same, and the B frames are subjected to frame skipping processing when resources are limited so as to reduce the calculation cost. When the system resources are tense or the network condition is bad, part of B frames are skipped through an intelligent frame skip strategy, so that the decoding burden is reduced, the playing smoothness is improved, and when the resources are sufficient, the complete decoding of all frames is ensured, so that the picture quality is improved. By combining the decoding priority and the frame skip strategy, the playing effect is optimized under different decoding environments, so that the playing stability of the streaming media is ensured. After the decoding execution scheme is determined, parallelism analysis is performed on the decoding execution scheme to optimize the execution sequence of decoding tasks and reduce processing delay. The core of the parallelism analysis is to calculate the dependency relationship and execution time sequence among decoding tasks so as to construct a task scheduling sequence, wherein the task scheduling sequence comprises the execution sequence and time window of the decoding tasks. In the decoding process, the decoding of the I frames needs to be performed preferentially, and the decoding of the P frames and the B frames needs to be processed in parallel, so that the decoding order needs to be reasonably arranged during task scheduling to fully utilize the computing resources. For example, a double buffer mechanism is adopted, so that the data of the next frame is preloaded while the current frame is decoded, thereby reducing the waiting time and improving the decoding efficiency. Meanwhile, thread parallelism is optimized in a task grouping mode, for example, on a multi-core processor, I-frame decoding is independently allocated with a high-priority thread, and P-frame decoding and B-frame decoding are performed in parallel by adopting a plurality of threads so as to maximize utilization of hardware resources. By optimizing the dependency relationship and execution time sequence of the decoding task, an efficient task scheduling sequence is generated, so that the overall decoding speed is improved. And constructing a resource allocation model based on the task scheduling sequence, and obtaining an optimal resource allocation scheme by calculating dynamic balance points of the CPU occupancy rate, the memory utilization rate and the GPU utilization rate. In the calculation process, an adaptive load balancing algorithm is adopted to reasonably distribute decoding tasks among different calculation units. For example, when the CPU load is high, the partial decoding task is transferred to the GPU for processing, and when the GPU load is too heavy, the CPU is used for performing the complementary computation, so that the overload of the single computing resource is avoided. The memory allocation also needs to be dynamically adjusted to ensure that the decoding cache does not suffer from data loss or decoding failure due to insufficient memory. By calculating the dynamic balance point of the resource, the execution efficiency of the decoding task is ensured to be maximized, and meanwhile, the degradation of decoding performance caused by insufficient resource or overload is avoided. After the optimization of the hardware control parameter sequence, the decoding execution scheme and the resource allocation scheme is completed, the parameters are integrated and mapped to generate the streaming media decoding control instruction. The streaming media decoding control instruction comprises a temperature control instruction, a decoding control instruction and a resource control instruction, wherein the temperature control instruction is used for dynamically adjusting the power consumption mode of the decoding chip so as to ensure that the chip operates within a reasonable temperature range, the decoding control instruction is used for adjusting a decoding strategy, such as optimizing the frame priority or enabling a frame skip mechanism so as to adapt to different decoding environments, and the resource control instruction is used for dynamically distributing computing resources so as to ensure that decoding tasks can be efficiently executed. by sending the control instructions to the decoding system, the accurate control of the streaming media playing is realized, so that the streaming media playing can keep the optimal playing effect under different network environments and hardware conditions, and the decoding efficiency and the user experience are effectively improved.
According to the embodiment of the invention, the decoding prediction accuracy of different types of video frames is remarkably improved by constructing a dual-branch neural network structure and combining time sequence feature extraction and space feature extraction, the cooperative optimization of a network layer, a decoding layer and a display layer is realized based on the hierarchical design of a multi-stage streaming media buffer system, the play start delay is reduced, the self-adaptive decoding control strategy of an event trigger mechanism is adopted, the system resource occupation and play blocking rate are reduced, the attention fusion module is introduced, the accuracy of inter-frame correlation prediction is improved, the decoding resource allocation is more reasonable, the overheating problem of a decoding chip is effectively avoided, the decoding control instruction can simultaneously take account of temperature control, performance optimization and resource allocation through multi-task loss function optimization training, the overall stability of the system is remarkably improved, and the system can timely respond to various abnormal events based on a dynamic scheduling mechanism of event priority, so that the continuous and stable play of an ultra-high-definition video stream is ensured.
In a specific embodiment, the process of executing step S1 may specifically include the following steps:
periodically sampling the core temperature and the surface temperature of a decoding chip in the network high-definition player to obtain a real-time temperature data stream;
performing sliding window detection on the network bandwidth data to obtain a network bandwidth jitter data stream, wherein the network bandwidth jitter data stream comprises a bandwidth change rate and a packet loss rate;
Dynamically monitoring the code rate of an input video stream to obtain a video code rate fluctuation data stream;
Acquiring the state of system resources, and obtaining a system resource state data stream by monitoring the CPU occupancy rate, the memory utilization rate and the GPU utilization rate;
Performing time stamp alignment on the real-time temperature data stream, the network bandwidth jitter data stream, the video code rate fluctuation data stream and the system resource state data stream to obtain a multidimensional state data matrix;
And carrying out annular buffer storage on the multidimensional state data matrix to obtain a streaming media dynamic characteristic data set.
Specifically, an efficient multi-threaded data acquisition mechanism is established to ensure that all data streams can be acquired in real time, and a complete multi-dimensional state data matrix is formed through time stamp alignment. In the aspect of decoding chip temperature monitoring, core temperatureRepresenting the temperature inside the decoding chip, while the surface temperatureRepresenting the temperature of the surface of the decoding chip package. The change trend of the two is used for judging the heat dissipation state and the workload of the decoding chip. Assume that the sampling interval isAt any timeThe core temperature and the surface temperature are expressed as:
Wherein, Is the power consumption of the decoding chip,Is the heat dissipation capacity of the heat sink,Representing the temperature of the environment and,Respectively represent the impact coefficients of power consumption conversion into temperature changes, chip internal conduction, package heat transfer and environmental heat dissipation. These data are periodically sampled to form a real-time temperature data stream. Meanwhile, network bandwidth jitter detection is an important link for ensuring stable streaming media playing. Setting the sliding window length to beThen in windowRate of bandwidth change withinThe calculation is as follows:
Wherein, Represents the firstThe network bandwidth at sub-sampling, while another important indicator of bandwidth jitter data flow is packet loss rateThe calculation formula is as follows:
Wherein, Representative windowThe number of data packets that are lost in the packet,Representing the total number of packets within the window. The data reflect the change of network condition and play an important role in self-adaptive code rate adjustment and cache management in the decoding process. For example, in practical applications, if the rate of change of bandwidthFor a long time below the threshold value of 0.05, simultaneous packet loss rateExceeding 0.02 indicates that significant fluctuations in the network occur, requiring a proper reduction in video rate to prevent jamming. For the code rate fluctuation monitoring of the input video stream, the time window is setInner code rateThe calculation is as follows:
Wherein, Representative timeThe size of the video data at the location,Representing the packet transmission time. This data stream is used to evaluate video transmission stability in real time. For example, ifSudden increase in short time indicates that the video stream enters a high-rate segment, and the decoding system needs to reserve more buffer space ifThe fast drop, the buffer pre-storage amount is properly reduced, so as to save the memory resource. And meanwhile, collecting the state of the system resource. CPU occupancy rateMemory utilizationAnd GPU utilizationAnd acquiring through an operating system interface and performing periodic recording. Assuming that the calculation of CPU occupancy is based on the time of the task performedAnd idle timeThen:
Similarly, memory usage and GPU usage are calculated by similar proportions. And performing time stamp alignment on the real-time temperature data stream, the network bandwidth jitter data stream, the video code rate fluctuation data stream and the system resource state data stream to form a multi-dimensional state data matrix. Setting the data sampling interval to be consistent Then at timeThe data matrix at this point is represented as:
This matrix contains decoding chip temperature, network bandwidth jitter, video rate variation, and system resource status information. To efficiently manage this data, a ring buffer storage mechanism is employed to ensure that storage overflows during long-term operation due to an increase in the amount of data. Assuming that the maximum capacity of the ring buffer is When new dataWhen entering the buffer, the buffer is stored as follows:
when the buffer is full, the new data will overwrite the oldest data, thereby always keeping the most recent Status information of the individual time steps.
In a specific embodiment, the process of executing step S2 may specifically include the following steps:
carrying out data preprocessing on the streaming media dynamic feature data set to obtain a standardized feature data set, wherein the standardized feature data set comprises a temperature feature vector, a network feature vector and a code rate feature vector;
Classifying the standardized feature data set according to I frame, P frame and B frame types in the H.265/HEVC coding standard, and calculating entropy coding complexity and motion vector distribution of each frame type to obtain a frame type feature matrix;
Constructing a double-branch neural network structure based on a frame type feature matrix, wherein the double-branch neural network structure comprises a time sequence feature extraction branch and a space feature extraction branch, the time sequence feature extraction branch is composed of three layers of bidirectional LSTM, the number of LSTM units of each layer is 256, the space feature extraction branch adopts a ResNet structure and comprises four residual block groups, and each residual block group comprises three layers of convolution layers and jump connection;
An attention fusion module is added at two branch output ends of the double-branch neural network structure, the attention fusion module comprises a channel attention sub-module and a space attention sub-module, and a fusion feature map is obtained by calculating channel weight and space weight of the feature map;
Inputting the fusion feature map into a decoding prediction head network, wherein the decoding prediction head network comprises three parallel branches, the inter-frame correlation, the decoding resource requirement and the cache allocation strategy are respectively predicted, each branch comprises two full-connection layers, the number of neurons of a first layer is 512, and the number of neurons of a second layer is matched with the dimension of a prediction target;
And optimizing and training the decoding prediction head network by adopting a multi-task loss function, wherein the multi-task loss function comprises inter-frame correlation loss, resource prediction loss and buffer optimization loss, and the network parameters are iteratively updated by an Adam optimizer to obtain an inter-frame correlation prediction model.
Specifically, the streaming media dynamic feature data set is standardized to eliminate scale differences between different data sources, so that the streaming media dynamic feature data set can be efficiently processed by the neural network. In the normalization process, the data needs to be converted into a temperature feature vector, a network feature vector and a code rate feature vector, wherein the temperature feature vectorBy core temperature of the decoding chipAnd surface temperatureComposition, network feature vectorDithering by bandwidthAnd packet loss rateComposition, code rate feature vectorInstantaneous code rate from video streamAnd frame type code rate distributionComposition is prepared. In order to make the numerical ranges of different features consistent, a mean normalization method is adopted:
Wherein, Representing the original data of the image of the object,Represents the average value of the values,Representing standard deviation. All features are normalized to the distribution of zero mean and unit variance, so that training stability is improved. The standardized feature data set is classified according to the H.265/HEVC coding standard to distinguish different characteristics of I frames, P frames and B frames, and entropy coding complexity and motion vector distribution of each frame type are calculated, so that a frame type feature matrix is constructed. Entropy coding complexityReflecting the amount of encoded information for the pixel blocks in the frame. The calculation formula is as follows:
Wherein, Is the first in the frameThe probability distribution of the individual coding units,Representing the number of coding units. The entropy coding complexity of I frames is typically high, which requires the storage of complete picture information, while the entropy coding complexity of P and B frames is relatively low due to the prediction based on reference frames. Motion vector distributionIs an important index for measuring the complexity of inter-frame prediction, and the calculation mode is as follows:
Wherein, AndRespectively represent the firstThe horizontal and vertical components of the individual motion vectors,Is the total number of motion vectors in the frame. For B frames, since it relies on the previous and subsequent frames for prediction, there is a more complex distribution of motion vectors, whereas the motion vector of P frames relies mainly on the previous frame, whose distribution of motion vectors is more single. The frame type feature matrix is formed by calculating the entropy coding complexity and motion vector distribution for each frame type. The method comprises the steps of constructing a double-branch neural network structure based on a frame type feature matrix, wherein the structure consists of a time sequence feature extraction branch and a space feature extraction branch, the time sequence feature extraction branch adopts three layers of bidirectional LSTM, and each layer contains 256 LSTM units. The bidirectional LSTM captures forward and backward dependency relationships between frames at the same time, and the calculation formula is as follows:
Wherein, Is the hidden state of the current time step,Is an input feature that is used to determine the input,AndIs a matrix of weights that are to be used,Is a bias term. The design of the bidirectional LSTM fully utilizes the time correlation between video frames and improves the perception capability of the model on dynamic change. The spatial feature extraction branch adopts ResNet structure and comprises four residual block groups, each residual block group is formed by three layers of convolution layers and jump connection, and the calculation formula is as follows:
Wherein, Representing the characteristics of the input and,Representing the weight matrix of the convolutional layer,Representing a nonlinear transformation. Residual connection can effectively relieve gradient disappearance problem of deep network, improves training stability. Through the combination of the double-branch structure, the time sequence characteristics and the space characteristics are extracted at the same time, and more comprehensive information is provided for decoding optimization. And adding an attention fusion module at the output end of the double-branch neural network structure so as to improve the expression capability of the characteristics. The attention fusion module consists of a channel attention sub-module and a space attention sub-module, and the calculation mode of the channel attention sub-module is as follows:
Wherein, Representative channelIs used for the weight of the (c),Is a channel attention weight matrix. The spatial attention sub-module is calculated in a similar manner:
Wherein, Representing spatial positionIs used for the weight of the (c),Is a spatial attention weight matrix. And the channel weight and the space weight of the feature map are calculated, so that the expression capability of key features is enhanced, and the accuracy of decoding prediction is improved. The fused feature map is input into a decoding prediction header network comprising three parallel branches for predicting inter-frame correlation, decoding resource requirements, and buffer allocation policies, respectively. Each branch consists of two fully connected layers, wherein the first layer contains 512 neurons and is used for carrying out characteristic transformation, and the calculation formula is as follows:
Wherein, Is a matrix of weights that are to be used,Is a bias term that is used to determine,A nonlinear activation function. The number of neurons of the second layer is matched with the predicted target dimension so as to ensure that the output result meets the decoding optimization requirement. Optimizing and training a decoding prediction head network by adopting a multi-task loss function, wherein the multi-task loss function comprises inter-frame correlation loss, resource prediction loss and buffer optimization loss, and the calculation formula is as follows:
Wherein, Is the loss of inter-frame correlation,Is the predicted loss of resources and,Is the loss of cache optimization and,Is a loss weight coefficient. The optimization adopts an Adam optimizer, and the updating rule is as follows:
Wherein, Representing the parameters of the model and,Representing the rate of learning and,AndWhich are first and second order moment estimates respectively,Is a numerical stability term. And finally obtaining an inter-frame correlation prediction model after iterative training, wherein the model is used for optimizing the decoding process of the streaming media player and improving the fluency and stability of video playing.
The method comprises the steps of carrying out noise reduction diffusion probability modeling on three parallel branch outputs of a decoding pre-measurement head network, obtaining a diffusion probability transition matrix by constructing a Markov diffusion chain, carrying out gradual noise injection on the prediction result based on the diffusion probability transition matrix, obtaining a prediction sequence with noise by setting a forward diffusion process of T time steps, constructing a reverse denoising process on the prediction sequence with noise, obtaining a noise estimated value of each time step by designing a noise prediction network of a U-Net structure, inputting the noise estimated value into a variation inference network, carrying out parameter optimization on a minimum evidence lower boundary to obtain posterior distribution parameters, carrying out probability resampling on the output of the decoding pre-measurement head network based on the posterior distribution parameters to generate a diversified prediction result set, obtaining a prediction result set, carrying out integrated learning on the prediction result set, obtaining a final prediction output by a voting mode, calculating a multi-task loss function according to the prediction output and a real label, carrying out weighted combination on inter-frame correlation loss, resource prediction loss and optimization loss, obtaining an overall loss, carrying out iterative training on the optimization training network by optimizing the parameters based on the parameters, and carrying out optimization training on the parameters by an Adam head to obtain the overall loss.
In a specific embodiment, the process of executing step S3 may specifically include the following steps:
performing hierarchical mapping on the inter-frame correlation, decoding resource requirements and a buffer allocation strategy output by the inter-frame correlation prediction model to obtain multi-level buffer configuration parameters, wherein the multi-level buffer configuration parameters comprise network layer buffer size, decoding layer frame type buffer proportion and display layer buffer capacity;
Performing buffer space allocation on the network layer based on the multi-level buffer configuration parameters to obtain a network layer buffer structure, wherein the size of the network layer buffer structure is dynamically adjusted between 2 times and 4 times of video code rate;
Performing buffer division on the decoding layer based on the multi-level buffer configuration parameters to obtain a decoding layer buffer structure, wherein the decoding layer buffer structure comprises an I frame buffer region with a capacity of a first target value, a P frame buffer region with a capacity of a second target value and a B frame buffer region with a capacity of a third target value;
Performing cache configuration on the display layer based on the multi-level cache configuration parameters to obtain a display layer cache structure, wherein the display layer cache structure comprises two buffer areas with the size 1.5 times that of pixel data of the current display resolution;
Carrying out data transmission channel configuration on the network layer buffer structure, the decoding layer buffer structure and the display layer buffer structure to obtain an interlayer data transmission strategy;
and combining the network layer buffer structure, the decoding layer buffer structure, the display layer buffer structure and the interlayer data transmission strategy to obtain the multi-stage streaming media buffer system.
In particular, for inter-frame correlationDecoding resource requirementsCache allocation policyAnd (5) analyzing. Inter-frame correlationReflecting the redundancy between adjacent frames, when the value is higher, it means that the inter-frame information is more similar, and less calculation resources are used to make prediction compensation in the decoding process, and when the value is lower, it requires larger buffer and more complex decoding calculation. Decoding resource requirementsLoad conditions of CPU, GPU and memory are measured, and a cache allocation strategy is adoptedIt is used to instruct the cache system how to allocate resources between the different levels. Based on these input parameters, a multi-level cache configuration parameter is calculated, including network layer cache sizeBuffer ratio of decoding layer frame typeAnd display layer buffer capacity. In network layer buffer allocation, the size of the network layer buffer structureAccording to the current video code rateAnd dynamically adjusting to ensure the continuity of data transmission. Setting the buffer size in the range of 2-4 times video code rate, and the calculation expression is as follows:
Wherein, In order to dynamically adjust the factor(s),Compensating time for network buffering. When the network condition is relatively stable,Takes a smaller value to reduce cache occupancy, and when network jitter is greater,Take a larger value to increase the usability of the data and thereby reduce the likelihood of stuck. In decoding layer cache partitioning, decoding layer cache structureFrom I-frame bufferP frame bufferAnd B frame bufferComposition, its capacity allocation is based on frame type buffer ratioThe calculation formula is as follows:
Wherein, Mainly used for ensuring the complete storage of the l frame data,For the storage of P-frame prediction data,Since B frames depend on the previous and subsequent frames, their buffering requirements are relatively low. Setting of buffer ratio is subject to inter-frame correlationInfluence whenHigher, the I-frame and P-frame buffers are reduced, the E-frame buffer is increased, and whenAnd when the buffer ratio of the I frame and the P frame is lower, the buffer ratio of the I frame and the P frame is increased so as to enhance the stability of decoding. In a display layer buffer configuration, the size of the display layer bufferDepending on the current display resolutionData storage requirements for each pixel. In order to prevent tearing or display delay, two buffers are used, each buffer should be at least 1.5 times the current display resolution, and the calculation formula is as follows:
Wherein, Representing the number of storage bytes required for each pixel, ensures that buffer overflow does not occur during data switching, affecting the display. After the buffer memory allocation of each layer is completed, the inter-layer data transmission channel configuration is carried out so as to optimize the transmission efficiency of the data among different layers and obtain an inter-layer data transmission strategy. Setting the transmission rate from the network layer to the decoding layer asThe transmission rate from the decoding layer to the display layer isThen the following should be satisfied:
ensuring that the data will not have transmission bottleneck in the circulation process. If it is Below is lower thanThe decoding layer cannot acquire enough data, so that play is blocked, and priority is required to be ensured in a data transmission strategyIs stable. Buffering structure for network layerBuffer structure of decoding layerCache structure of display layerInter-layer data transmission strategyAnd integrating to form a multi-stage streaming media cache system. The system adaptively adjusts the cache allocation strategy under different network environments and decoding conditions so as to optimize the playing stability of the streaming media. When the network bandwidth is low and jitter is large, the system automatically increases the network layer buffer sizeAnd when decoding resources are limited, the buffer division of the decoding layer can preferentially ensure the storage of the I frames, and the buffer proportion of the P frames and the B frames is adjusted so as to reduce the calculation burden. At the display layer, the double buffer mechanism ensures the smoothness of picture switching and improves the fluency of video playing. Through the optimization of the multi-level cache system, the streaming media playing can maintain higher playing quality under different conditions, meanwhile, the decoding resource consumption is reduced, and the playing stability is improved.
The method comprises the steps of carrying out resource optimization configuration on a multi-level streaming media cache system to obtain an optimized multi-level streaming media cache system, carrying out resource association modeling on a network layer buffer structure and a decoding layer buffer structure in the multi-level streaming media cache system, obtaining a resource utility matrix through calculating a mapping relation between a buffer space and decoding performance, constructing a multi-objective optimization function based on the resource utility matrix, taking network bandwidth utilization rate, decoding layer buffer utilization rate and display layer buffer switching frequency as optimization targets to obtain resource configuration constraint conditions, carrying out Lagrange decomposition on the resource configuration constraint conditions, obtaining a resource allocation objective function through introducing network cost coefficients and buffer cost coefficients, designing an iterative optimization algorithm based on the resource allocation objective function, obtaining a frame buffer allocation strategy through calculating optimal allocation proportion of resources of each buffer layer, carrying out dynamic configuration on an I frame buffer region, a P frame buffer region and a B frame buffer region according to the frame buffer allocation strategy, obtaining a buffer optimization scheme through adjusting the buffer space occupation ratio of each type frame, carrying out performance evaluation on the buffer optimization scheme, obtaining an optimization effect index through calculating the buffer hit rate, access delay and the resource utilization rate, obtaining an optimization effect index based on the optimization effect, constructing an optimization weight adjustment scheme, updating the optimization strategy, obtaining the media optimization effect index based on the optimization system, and updating the optimization strategy, and obtaining the optimization effect.
In a specific embodiment, the performing step performs cache division on the decoding layer based on the multi-level cache configuration parameter, and the process of obtaining the cache structure of the decoding layer may specifically include the following steps:
Analyzing the decoding layer frame type buffer proportion and the coding parameter in the multi-level buffer configuration parameter to obtain a frame grouping weight value;
Performing buffer capacity calculation according to the frame grouping weight value and the total memory capacity of the system to obtain a frame type buffer allocation scheme, and performing partition management on the system memory based on the frame type buffer allocation scheme to obtain a buffer physical address table;
establishing a multi-level page table structure for a physical address table of a cache region to obtain a cache access control table;
and constructing a buffer scheduling scheme based on the buffer access control table, and creating a decoding layer buffer structure based on the buffer physical address table, the buffer access control table and the buffer scheduling scheme, wherein the decoding layer buffer structure comprises an I frame buffer with the capacity of a first target value, a P frame buffer with the capacity of a second target value and a B frame buffer with the capacity of a third target value.
Specifically, the coding structure of the video stream is analyzed, and the buffer requirements of different frame types (I frame, P frame and B frame) are calculated in combination with the output of the inter-frame correlation prediction model. Setting the buffer weight of the I frame asThe buffer weight of the P frame is as followsThe buffer weight of the B frame isThen according to the h.265/HEVC coding standard, the frame packet weight value is calculated as:
Wherein, The data sizes of the I frame, P frame and B frame are represented respectively,Respectively representing the average decoding time of the corresponding frames. The I frame needs to store complete picture information, so that the data size is larger, the P frame and the B frame depend on other frames for prediction, the data size is relatively smaller, but the B frame needs to carry out bidirectional prediction on the front frame and the rear frame, and the buffer requirement is dynamically adjusted according to a specific decoding strategy. And determining the priority of each frame type in the buffer memory of the decoding layer by calculating the frame grouping weight value, thereby providing basis for subsequent buffer memory allocation. After the frame grouping weight value is calculated, according to the total memory capacity of the systemAnd calculating the buffer capacity and determining a specific allocation scheme of the frame type buffer. Setting the total capacity of the decoding layer buffer memory asThe buffer capacity of each frame type is calculated as:
Wherein, Respectively representing the buffer capacity of I frames, P frames and B frames. By means of the allocation mode, the buffer ratio is dynamically adjusted according to decoding requirements of different frame types, so that high-priority frames (such as I frames) are guaranteed to obtain enough storage space, and lower-priority frames (such as B frames) are properly reduced in buffer occupation when resources are limited. After the frame type buffer allocation scheme is determined, partition management is performed on the system memory to generate a buffer physical address table. Assume that the minimum storage unit of the memory partition isThe physical address table is mapped as follows:
Wherein, Representing the physical address index of the buffer, each address index corresponding to a memory block of a fixed size. In order to improve the memory access efficiency, a mode based on continuous address allocation is adopted to align the physical addresses of the buffer blocks of the I frame, the P frame and the B frame, so that the required data can be read quickly in the decoding process. After the physical address table of the buffer area is established, a multi-level page table structure is constructed to obtain a buffer access control table. Setting page table structure to be global page tableAnd local page tablesWherein the global page table is used for managing the distribution of the whole cache space, and the local page table is used for indexing the cache regions of different frame types. The specific mapping mode is as follows:
Wherein, Local page tables representing I, P and B frames, respectively, each storing an index pointing to a physical address, ensuring that the decoder can be located quickly to the corresponding cache region. Based on the cache access control table, a cache scheduling scheme is constructed to optimize data read and storage policies. Setting cache scheduling priorityInter-frame correlationAnd buffer occupancyThe scheduling priority is calculated as:
Wherein, AndIn order to adjust the coefficient of the power supply,The calculation method is as follows:
when the buffer occupancy rate When the buffer strategy is too high, for example, the buffer space of the I frame is increased to improve the decoding stability, or the buffer of the B frame is reduced to release the memory resource. And combining the cache access control table and the scheduling scheme to form a complete decoding layer cache structure. After completion of the buffer allocation and scheduling optimization, the decoding layer buffer structure includes a capacity ofI frame buffer area of (2) with capacity ofP frame buffer and capacity of (2) isB frame buffer of (c). The structure ensures that different types of frame data are stored and accessed according to a preset strategy, and improves decoding efficiency and stability.
In a specific embodiment, the process of performing step S4 may specifically include the following steps:
Carrying out distributed sampling on the state of a caching layer in a multi-level streaming media caching system to obtain multi-layer caching state data;
Performing multidimensional threshold analysis on the multi-layer cache state data, and setting a hierarchical trigger threshold according to video coding characteristics and hardware resource limitation to obtain event trigger conditions;
Constructing an adaptive priority scheduling mechanism based on event triggering conditions, and carrying out dynamic priority assignment according to the influence degree of the event on decoding quality to obtain a multi-level event priority table;
rearranging the multi-level event priority table according to the time sequence, and sequencing according to the time stamp information and the priority weight of the event to obtain an event sequence with weight;
Inputting the weighted event sequence into a strategy mapping network, and carrying out response strategy matching on various events according to history decoding control experience to obtain an event processing strategy group;
and carrying out space-time correlation analysis based on the weighted event sequence and the event processing strategy group, and obtaining a decoded event sequence by considering the time sequence dependency relationship and the space coupling effect of the event.
Specifically, a high-efficiency data acquisition system is constructed, and the state of the cache layer is monitored in real time. The caching system comprises a network layer cache, a decoding layer cache and a display layer cache, and the states of each layer are required to be sampled respectively to obtain complete cache service conditions. Setting the sampling interval of the buffer status asAt any timeThe cache state of (1) is expressed as:
Wherein, Representing the state data of the multi-tier cache,Representing the network layer cache utilization rate,Representing the buffer occupancy of the decoding layer,Representing the occupancy of the display layer buffer. By a distributed sampling method, the data are recorded at different time points, and a buffer status time sequence is established. After the cache state data is obtained, multidimensional threshold analysis is carried out to determine whether the system has abnormal conditions or not, and a hierarchical trigger threshold is set according to video coding characteristics and hardware resource limitation. Setting the triggering threshold value of each buffer layer as respectivelyAnd when any cache state data exceeds a corresponding threshold value, triggering an event:
Wherein, Representing an event trigger signal, when the value is 1, representing that the current buffer memory state exceeds a threshold value, and the system needs to take corresponding regulation and control measures. The threshold is set depending on the video coding standard and hardware resources, e.g. in h.265 coding format, the decoding layer buffer occupancy of high-rate video is usually high, and thereforeA suitable increase is required, whereas in a low bandwidth network environment,A reduction is required to prevent overload of the buffered data. Based on the event triggering conditions, an adaptive priority scheduling mechanism is constructed to dynamically adjust priorities of different events. Setting a priority valueEvent typeDecoding quality impact factorInfluence, the priority calculation formula is as follows:
Wherein, Is the adjustment coefficient of the light source,Weights representing event types, e.g. cache overflow events are higher and short-time network fluctuations are lower, whereasReflecting the variation in decoding quality, for example when the frame loss rate is high,The priority of the related event is increased. In this way, the key event is ensured to be responded in time, and the decoding stability of the system is improved. After determining the priority of the event, rearranging the event according to the time sequence, and sorting the event according to the time stamp information and the priority weight of the event to obtain an event sequence with weight. Hypothesis event listEach event in (a)With time stampsPriority levelThe event ordering rules are as follows:
Wherein the ordering is according to the time stamp The events are arranged in descending order, and the events with higher priority are arranged in front, so that the key events can be processed first. After event ordering is completed, the weighted event sequence is input into a strategy mapping network to respond strategy matching to various events by using historical decoding control experience. Setting the strategy mapping function asThe event handling policy group is calculated as follows:
Wherein, On behalf of the set of event handling policies,Representing historical decoding control experience. The strategy mapping network adopts a deep learning model or a rule-based matching method, for example, a mapping relation between a neural network learning historical event and a decoding control strategy is used, or a predefined strategy is set based on an experience rule, for example, when the network jitter exceeds a threshold value, the video code rate is reduced to ensure smooth playing. After the event processing strategy group is obtained, space-time correlation analysis is carried out to comprehensively consider the time sequence dependency relationship and the space coupling effect of the event, and finally a decoding event sequence is generated. Setting the space-time correlation function asThen the event sequence is decodedThe calculation is as follows:
Wherein, Representing spatial coupling factors such as decoding layer buffer overflows resulting in display layer frame delays, spatial correlation analysis needs to be introduced in the event sequence to optimize the processing order of the events. Decoding event sequencesThe method comprises the steps of carrying out optimized sorting according to priority and time sequence, wherein the events comprise different types of events such as abnormal cache state, network jitter, decoding resource overload and the like. The event sequence is used for adjusting the decoding parameters of the streaming media in real time, improving the playing stability, reducing the Kanggang risk and ensuring that the system can provide the best playing experience under different network and decoding loads. For example, when the system detects that the decoding layer buffer is about to overflow, the decoding calculation amount is reduced by adjusting the frame dropping strategy, and when the network jitter is serious, the code rate is dynamically reduced to improve the fluency.
In a specific embodiment, the process of performing step S5 may specifically include the following steps:
classifying and analyzing the events in the decoding event sequence, and extracting control parameters according to the characteristics of the temperature event, the performance event, the network event, the cache event and the code rate event to obtain a decoding regulation target matrix;
performing hardware resource control based on the decoding regulation target matrix, and dynamically configuring the frequency and voltage of a decoding chip, the number of decoding threads and the cache allocation proportion to obtain a hardware control parameter sequence;
Performing decoding strategy matching on the H.265/HEVC coded stream according to the hardware control parameter sequence, and obtaining a decoding execution scheme based on decoding priorities and frame skipping strategies of the I frame, the P frame and the B frame;
Carrying out parallelism analysis on the decoding execution scheme, and obtaining a task scheduling sequence by calculating the dependency relationship and the execution time sequence of each decoding task, wherein the task scheduling sequence comprises the execution sequence and the time window of the decoding task;
Constructing a resource allocation model based on the task scheduling sequence, and obtaining a resource allocation scheme by calculating dynamic balance points of CPU occupancy rate, memory utilization rate and GPU utilization rate;
And carrying out integrated mapping on the hardware control parameter sequence, the decoding execution scheme and the resource allocation scheme to obtain a stream media decoding control instruction, wherein the stream media decoding control instruction comprises a temperature control instruction, a decoding control instruction and a resource control instruction.
Specifically, key features are extracted from the decoded event sequence and classified according to temperature events, performance events, network events, cache events and code rate events. Setting the decoding event sequence asThen split it into five types of sub-event sets:
Wherein, Representing temperature events involving core temperature of decoding chipAnd surface temperatureRepresenting performance events involving CPU occupancyGPU loadAnd memory usage rateRepresenting network events involving bandwidth jitterAnd packet loss rateRepresenting cache events involving cache occupancyAnd buffer overflow rateRepresenting rate events, involving instantaneous rate of video streamsSum code rate fluctuation. By analyzing the events and extracting respective control parameters, a decoding regulation target matrix is constructed:
Where each row represents a particular decoding control dimension, e.g., a first row focuses on hardware temperature and performance, a second row focuses on network and cache state, and a third row focuses on code rate fluctuations. And adjusting the operation parameters of the decoding chip based on the decoding regulation target matrix to ensure that the decoding task can be executed in an optimized hardware resource environment. Setting the chip frequency The voltage isThe number of decoding threads isThe buffer allocation ratio isThe hardware control parameter sequence is calculated as:
Wherein, Representing the default chip frequency, voltage, decoding thread number and buffer allocation ratio,For the set reference threshold value,For adjusting the coefficients. When the temperature isExceeding a reference valueChip frequency at the timeWill decrease to reduce heat generation when the CPU occupancy rate is lowHigher thanAt the time of voltageWill be properly tuned to improve performance while the GPU is loadedWhen too high, the number of threads is decodedWill also decrease to reduce GPU pressure, cache allocation ratioThen according to the buffer occupancyDynamic adjustments are made to ensure that the cache does not overflow. After the hardware control parameter sequence is obtained, the decoding strategy of the H.265/HEVC coded stream is matched, and a decoding execution scheme is formulated according to the decoding priorities and the frame skipping strategies of the I frame, the P frame and the B frame. Setting the priority of l frames asThe P frame priority isB-frame priority isThe decoding priority is calculated as follows:
frame skip policy is subject to decoding resource conditions Influence, set the frame skip rateThe calculation formula is as follows:
When (when) Exceeding a threshold valueWhen the system skips the B frame preferentially, the calculation load is reduced. After determining the decoding execution scheme, parallelism analysis is performed to optimize the execution order of decoding tasks. Setting task scheduling sequencesExecution order by decoding taskAnd a time windowThe composition, then parallelism, is calculated as follows:
When (when) Beyond a certain value, the decoding task needs to reassign threads to ensure computational load balancing. And constructing a resource allocation model based on the task scheduling sequence, and calculating dynamic balance points of the CPU, the GPU and the memory. Setting CPU utilizationGPU utilizationMemory utilizationThe resource allocation scheme is calculated as follows:
If it is If any one of the values exceeds 1, the decoding load needs to be reduced, for example, the thread number is adjustedOr frame skip rate. And integrating and mapping the hardware control parameter sequence, the decoding execution scheme and the resource allocation scheme to obtain a complete streaming media decoding control instruction. Through the control instructions, the system adjusts decoding parameters in a self-adaptive manner, improves decoding efficiency, reduces the phenomenon of blocking, and ensures fluency and stability of video playing under different network conditions and calculation loads.
The method for decoding the streaming media of the network high-definition player in the embodiment of the present invention is described above, and the following describes the device for decoding the streaming media of the network high-definition player in the embodiment of the present invention, please refer to fig. 2, and one embodiment of the device for decoding the streaming media of the network high-definition player in the embodiment of the present invention includes:
The parallel acquisition module is used for carrying out multithread parallel acquisition and storage on real-time temperature data, network bandwidth jitter data and video code rate fluctuation data of a decoding chip in the network high-definition player to obtain a streaming media dynamic characteristic data set;
The classification training module is used for carrying out classification training on the data in the stream media dynamic characteristic data set according to the inter-frame prediction difficulty to obtain an inter-frame correlation prediction model;
The construction module is used for constructing a multi-level streaming media cache system based on the inter-frame correlation prediction model;
the real-time monitoring module is used for monitoring the cache state, the network state and the chip state in the multi-level streaming media cache system in real time to obtain a decoding event sequence;
And the real-time adjustment module is used for adjusting the decoding parameters in real time according to the decoding event sequence to obtain a streaming media decoding control instruction.
The method comprises the steps of constructing a dual-branch neural network structure, combining time sequence feature extraction and space feature extraction, remarkably improving the decoding prediction accuracy of different types of video frames, realizing the cooperative optimization of a network layer, a decoding layer and a display layer based on the hierarchical design of a multi-level streaming media buffer storage system, reducing the play start delay, adopting an adaptive decoding control strategy of an event triggering mechanism, reducing the system resource occupation and the play blocking rate, introducing an attention fusion module, improving the accuracy of inter-frame correlation prediction, ensuring more reasonable decoding resource allocation, effectively avoiding the overheating problem of a decoding chip, optimizing the training through a multi-task loss function, ensuring that a decoding control instruction can simultaneously take account of temperature control, performance optimization and resource allocation, remarkably improving the overall stability of the system, and timely responding various abnormal events based on a dynamic scheduling mechanism of event priority, and ensuring the continuous and stable play of an ultra-high definition video stream.
An embodiment of the present invention also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the above method. It is understood that the computer readable storage medium in this embodiment may be a volatile readable storage medium or a nonvolatile readable storage medium.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium provided by the present invention and used in embodiments may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual speed data rate SDRAM (SSRSDRAM), enhanced SDRAM (ESDRAM), synchronous link (SYNCHLINK) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM.
It will be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, systems and units may refer to the corresponding processes in the foregoing method embodiments, which are not repeated herein.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied essentially or in part or all of the technical solution or in part in the form of a software product stored in a storage medium, including instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. The storage medium includes a U disk, a removable hard disk, a read-only memory (ROM), a random access memory (random access memory, RAM), a magnetic disk, an optical disk, or other various media capable of storing program codes.
While the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those skilled in the art that the foregoing embodiments may be modified or equivalents may be substituted for some of the features thereof, and that the modifications or substitutions do not depart from the spirit and scope of the embodiments of the invention.

Claims (9)

1. A method for decoding streaming media of a network high definition player, the method comprising:
The method comprises the steps of performing multithreading parallel acquisition and storage on real-time temperature data, network bandwidth jitter data and video code rate fluctuation data of a decoding chip in a network high-definition player to obtain a streaming media dynamic characteristic data set;
classifying and training the data in the stream media dynamic characteristic data set according to the inter-frame prediction difficulty to obtain an inter-frame correlation prediction model;
constructing a multi-level streaming media cache system based on the inter-frame correlation prediction model;
monitoring the buffer status, the network status and the chip status in the multi-level streaming media buffer system in real time to obtain a decoding event sequence;
and adjusting decoding parameters in real time according to the decoding event sequence to obtain a streaming media decoding control instruction.
2. The method for decoding streaming media of a network high-definition player according to claim 1, wherein the steps of performing multi-thread parallel acquisition and storage on real-time temperature data, network bandwidth jitter data and video code rate fluctuation data of a decoding chip in the network high-definition player to obtain a streaming media dynamic characteristic data set include:
periodically sampling the core temperature and the surface temperature of a decoding chip in the network high-definition player to obtain a real-time temperature data stream;
performing sliding window detection on network bandwidth data to obtain a network bandwidth jitter data stream, wherein the network bandwidth jitter data stream comprises a bandwidth change rate and a packet loss rate;
Dynamically monitoring the code rate of an input video stream to obtain a video code rate fluctuation data stream;
Acquiring the state of system resources, and obtaining a system resource state data stream by monitoring the CPU occupancy rate, the memory utilization rate and the GPU utilization rate;
Performing time stamp alignment on the real-time temperature data stream, the network bandwidth jitter data stream, the video code rate fluctuation data stream and the system resource state data stream to obtain a multidimensional state data matrix;
And carrying out annular buffer storage on the multidimensional state data matrix to obtain a streaming media dynamic characteristic data set.
3. The method for decoding streaming media of a network high-definition player according to claim 1, wherein the step of classifying and training the data in the streaming media dynamic feature data set according to the inter-frame prediction difficulty to obtain the inter-frame correlation prediction model comprises the following steps:
Carrying out data preprocessing on the stream media dynamic characteristic data set to obtain a standardized characteristic data set, wherein the standardized characteristic data set comprises a temperature characteristic vector, a network characteristic vector and a code rate characteristic vector;
classifying the standardized feature data set according to I frame, P frame and B frame types in an H.265/HEVC coding standard, and calculating entropy coding complexity and motion vector distribution of each frame type to obtain a frame type feature matrix;
Constructing a double-branch neural network structure based on the frame type feature matrix, wherein the double-branch neural network structure comprises a time sequence feature extraction branch and a space feature extraction branch, the time sequence feature extraction branch is composed of three layers of bidirectional LSTM, the number of LSTM units in each layer is 256, the space feature extraction branch adopts a ResNet structure and comprises four residual block groups, and each residual block group comprises three layers of convolution layers and jump connection;
An attention fusion module is added at two branch output ends of the double-branch neural network structure, the attention fusion module comprises a channel attention sub-module and a space attention sub-module, and a fusion feature map is obtained by calculating channel weight and space weight of the feature map;
Inputting the fusion feature map into a decoding prediction head network, wherein the decoding prediction head network comprises three parallel branches, the inter-frame correlation, the decoding resource requirement and the cache allocation strategy are respectively predicted, each branch comprises two full-connection layers, the number of neurons of a first layer is 512, and the number of neurons of a second layer is matched with the predicted target dimension;
and carrying out optimization training on the decoding prediction head network by adopting a multi-task loss function, wherein the multi-task loss function comprises inter-frame correlation loss, resource prediction loss and buffer optimization loss, and the network parameters are iteratively updated by an Adam optimizer to obtain an inter-frame correlation prediction model.
4. The method for decoding streaming media of a network high definition player according to claim 1, wherein said constructing a multi-level streaming media buffer system based on said inter-frame correlation prediction model comprises:
performing hierarchical mapping on the inter-frame correlation, the decoding resource requirement and the buffer allocation strategy output by the inter-frame correlation prediction model to obtain multi-level buffer configuration parameters, wherein the multi-level buffer configuration parameters comprise a network layer buffer size, a decoding layer frame type buffer proportion and a display layer buffer capacity;
Performing buffer space allocation on a network layer based on the multi-level buffer configuration parameters to obtain a network layer buffer structure, wherein the size of the network layer buffer structure is dynamically adjusted between 2 times and 4 times of video code rate;
Performing buffer division on a decoding layer based on the multi-level buffer configuration parameters to obtain a decoding layer buffer structure, wherein the decoding layer buffer structure comprises an I frame buffer area with a capacity of a first target value, a P frame buffer area with a capacity of a second target value and a B frame buffer area with a capacity of a third target value;
Performing cache configuration on the display layer based on the multi-level cache configuration parameters to obtain a display layer cache structure, wherein the display layer cache structure comprises two buffer areas with the size 1.5 times that of pixel data of the current display resolution;
Carrying out data transmission channel configuration on the network layer buffer structure, the decoding layer buffer structure and the display layer buffer structure to obtain an interlayer data transmission strategy;
and combining the network layer buffer structure, the decoding layer buffer structure, the display layer buffer structure and the interlayer data transmission strategy to obtain a multi-level streaming media buffer system.
5. The method for decoding streaming media of a network high-definition player according to claim 4, wherein the performing buffer division on the decoding layer based on the multi-level buffer configuration parameter to obtain a decoding layer buffer structure includes:
Analyzing the decoding layer frame type buffer ratio and the coding parameters in the multi-level buffer configuration parameters to obtain frame grouping weight values;
Performing buffer capacity calculation according to the frame grouping weight value and the total memory capacity of the system to obtain a frame type buffer allocation scheme, and performing partition management on the system memory based on the frame type buffer allocation scheme to obtain a buffer physical address table;
Establishing a multi-level page table structure for the buffer physical address table to obtain a buffer access control table;
And constructing a buffer scheduling scheme based on the buffer access control table, and creating a decoding layer buffer structure based on the buffer physical address table, the buffer access control table and the buffer scheduling scheme, wherein the decoding layer buffer structure comprises an I frame buffer with a capacity of a first target value, a P frame buffer with a capacity of a second target value and a B frame buffer with a capacity of a third target value.
6. The method for decoding streaming media of a network high definition player according to claim 1, wherein the real-time monitoring of the buffer status, the network status and the chip status in the multi-level streaming media buffer system to obtain the decoding event sequence includes:
performing distributed sampling on the buffer layer state in the multi-level streaming media buffer system to obtain multi-level buffer state data;
Performing multidimensional threshold analysis on the multi-layer cache state data, and setting a hierarchical trigger threshold according to video coding characteristics and hardware resource limitation to obtain event trigger conditions;
Constructing a self-adaptive priority scheduling mechanism based on the event triggering condition, and carrying out dynamic priority assignment according to the influence degree of the event on the decoding quality to obtain a multi-level event priority table;
rearranging the multi-level event priority table according to time sequence, and sorting according to time stamp information and priority weights of the events to obtain an event sequence with weights;
inputting the weighted event sequence into a strategy mapping network, and performing response strategy matching on various events according to historical decoding control experience to obtain an event processing strategy group;
and carrying out space-time correlation analysis based on the weighted event sequence and the event processing strategy group, and obtaining a decoded event sequence by considering the time sequence dependency relationship and the space coupling effect of the event.
7. The method for decoding streaming media of the network high-definition player according to claim 1, wherein the real-time adjustment of the decoding parameters according to the decoding event sequence to obtain the streaming media decoding control command comprises:
Classifying and analyzing the events in the decoding event sequence, and extracting control parameters according to the characteristics of temperature events, performance events, network events, cache events and code rate events to obtain a decoding regulation target matrix;
Performing hardware resource control based on the decoding regulation target matrix, and dynamically configuring the frequency and voltage of a decoding chip, the number of decoding threads and the cache allocation proportion to obtain a hardware control parameter sequence;
performing decoding strategy matching on the H.265/HEVC coded stream according to the hardware control parameter sequence, and obtaining a decoding execution scheme based on decoding priorities and frame skipping strategies of I frames, P frames and B frames;
Carrying out parallelism analysis on the decoding execution scheme, and obtaining a task scheduling sequence by calculating the dependency relationship and the execution time sequence of each decoding task, wherein the task scheduling sequence comprises the execution sequence and the time window of the decoding task;
constructing a resource allocation model based on the task scheduling sequence, and obtaining a resource allocation scheme by calculating dynamic balance points of CPU occupancy rate, memory utilization rate and GPU utilization rate;
And integrating and mapping the hardware control parameter sequence, the decoding execution scheme and the resource allocation scheme to obtain a stream media decoding control instruction, wherein the stream media decoding control instruction comprises a temperature control instruction, a decoding control instruction and a resource control instruction.
8. A network high definition player stream media decoding device for performing the network high definition player stream media decoding method according to any one of claims 1 to 7, the network high definition player stream media decoding device comprising:
The parallel acquisition module is used for carrying out multithread parallel acquisition and storage on real-time temperature data, network bandwidth jitter data and video code rate fluctuation data of a decoding chip in the network high-definition player to obtain a streaming media dynamic characteristic data set;
The classification training module is used for carrying out classification training on the data in the stream media dynamic characteristic data set according to the inter-frame prediction difficulty to obtain an inter-frame correlation prediction model;
the construction module is used for constructing a multi-level streaming media cache system based on the inter-frame correlation prediction model;
the real-time monitoring module is used for monitoring the cache state, the network state and the chip state in the multi-stage streaming media cache system in real time to obtain a decoding event sequence;
And the real-time adjustment module is used for adjusting the decoding parameters in real time according to the decoding event sequence to obtain a streaming media decoding control instruction.
9. A computer readable storage medium having stored thereon a computer program which, when executed by a processor, causes the processor to perform the network high definition player streaming media decoding method of any of claims 1 to 7.
CN202510273088.2A 2025-03-10 2025-03-10 Network high-definition player streaming media decoding method, device and storage medium Active CN119815042B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202510273088.2A CN119815042B (en) 2025-03-10 2025-03-10 Network high-definition player streaming media decoding method, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202510273088.2A CN119815042B (en) 2025-03-10 2025-03-10 Network high-definition player streaming media decoding method, device and storage medium

Publications (2)

Publication Number Publication Date
CN119815042A true CN119815042A (en) 2025-04-11
CN119815042B CN119815042B (en) 2025-07-25

Family

ID=95268872

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202510273088.2A Active CN119815042B (en) 2025-03-10 2025-03-10 Network high-definition player streaming media decoding method, device and storage medium

Country Status (1)

Country Link
CN (1) CN119815042B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN121078274A (en) * 2025-11-05 2025-12-05 深圳市深智电科技有限公司 Intelligent methods and systems for preventing and optimizing audio and video playback stuttering.
CN121309558A (en) * 2025-12-08 2026-01-09 深圳市锦锐科技股份有限公司 Methods to improve decoding speed based on internet radio and podcast streams

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130156105A1 (en) * 2011-12-16 2013-06-20 Apple Inc. High quality seamless playback for video decoder clients
US20130230099A1 (en) * 2004-07-30 2013-09-05 Euclid Discoveries, Llc Standards-compliant model-based video encoding and decoding
US20170150159A1 (en) * 2015-11-20 2017-05-25 Intel Corporation Method and system of reference frame caching for video coding
US20180063548A1 (en) * 2016-08-30 2018-03-01 Qualcomm Incorporated Intra-coded video frame caching for video telephony sessions
US20230028941A1 (en) * 2022-10-01 2023-01-26 Intel Corporation Rate estimation congestion control for transmitted media

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130230099A1 (en) * 2004-07-30 2013-09-05 Euclid Discoveries, Llc Standards-compliant model-based video encoding and decoding
US20130156105A1 (en) * 2011-12-16 2013-06-20 Apple Inc. High quality seamless playback for video decoder clients
US20170150159A1 (en) * 2015-11-20 2017-05-25 Intel Corporation Method and system of reference frame caching for video coding
US20180063548A1 (en) * 2016-08-30 2018-03-01 Qualcomm Incorporated Intra-coded video frame caching for video telephony sessions
US20230028941A1 (en) * 2022-10-01 2023-01-26 Intel Corporation Rate estimation congestion control for transmitted media

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ARMAN IRANFAR等: "A Machine Learning-Based Strategy for Efficient Resource Management of Video Encoding on Heterogeneous MPSoCs", 2018 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, 4 May 2018 (2018-05-04), pages 1 - 5, XP033435212, DOI: 10.1109/ISCAS.2018.8351785 *
XIAOYE WANG等: "Adaptive Cache Management for Complex Storage Systems Using CNN-LSTM-Based Spatiotemporal Prediction", ARXIV:2411.12161, 19 November 2024 (2024-11-19), pages 1 - 5 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN121078274A (en) * 2025-11-05 2025-12-05 深圳市深智电科技有限公司 Intelligent methods and systems for preventing and optimizing audio and video playback stuttering.
CN121078274B (en) * 2025-11-05 2026-01-30 深圳市深智电科技有限公司 Intelligent methods and systems for preventing and optimizing audio and video playback stuttering.
CN121309558A (en) * 2025-12-08 2026-01-09 深圳市锦锐科技股份有限公司 Methods to improve decoding speed based on internet radio and podcast streams

Also Published As

Publication number Publication date
CN119815042B (en) 2025-07-25

Similar Documents

Publication Publication Date Title
CN119815042B (en) Network high-definition player streaming media decoding method, device and storage medium
EP2962461B1 (en) Systems and methods of encoding multiple video streams for adaptive bitrate streaming
CN119576507A (en) AI-based automatic optimization method and system for big data distributed computing tasks
US9071814B1 (en) Scene detection based on video encoding parameters
CN111787322B (en) Video coding method and device, electronic equipment and computer readable storage medium
AU2022279597B2 (en) Training rate control neural networks through reinforcement learning
CN119835486B (en) Method and device for quick startup and content preloading of network high-definition player
US20220408097A1 (en) Adaptively encoding video frames using content and network analysis
CN111083535A (en) Video data transmission code rate adaptation method, system, device and storage medium
Wu et al. Paas: A preference-aware deep reinforcement learning approach for 360 video streaming
CN120011091B (en) A real-time data stream processing method based on deep learning in cloud platform
CN116185584A (en) A Multi-tenant Database Resource Planning and Scheduling Method Based on Deep Reinforcement Learning
US20250291600A1 (en) Heterogeneous Processor and Related Scheduling Method
CN116886619A (en) A load balancing method and device based on linear regression algorithm
Shahout et al. Fast inference for augmented large language models
Li et al. Concerto: Client-server orchestration for real-time video analytics
Lin et al. DeCa360: Deadline-aware edge caching for two-tier 360° video streaming
Xu Computer vision-enabled inventory management system: A cloud-native solution for retail cost reduction
JP7766189B2 (en) Anomaly detection cloud resource management system that receives external information and includes short-term resource planning
Tian et al. Accelerating ai-generated content collaborative inference via transfer reinforcement learning in dynamic edge networks
CN120769049A (en) Method, device, electronic device and storage medium for determining streaming media bit rate
CN121029371B (en) A method and apparatus for scheduling edge AI computing tasks for cloud-edge collaboration
CN119545113B (en) A QoE Optimization Method for Multicast Video Transmission in Virtual Reality
CN118885681B (en) A method and system based on online joint recommendation and edge-assisted caching
Zhu et al. DualRT: A Qos‐Aware Soft Real‐Time Video Analytics Framework for Dual‐Stage GPU‐CPU Tasks on Edge

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant