Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
It should be understood that in the description of the embodiments of the present invention, a plurality (or a plurality) means two or more, more than, less than, more than, etc. are understood as excluding the number, and more than, less than, etc. are understood as including the number. If the description of "first", "second", etc. is used for the purpose of distinguishing technical features, it is not intended to indicate or imply relative importance or to implicitly indicate the number of indicated technical features or to implicitly indicate the precedence of the indicated technical features.
At present, multi-view playing becomes an important application scenario of video playing equipment, for example, in a same game, pictures at different views can be acquired through video capturing equipment at different angles, and finally the pictures are displayed through a Set-Top Box (STB), however, due to network transmission delay or different playing start time corresponding to each path of media code stream, each path of pictures is asynchronous.
Referring to fig. 1, an embodiment of the present invention provides a picture synchronization method, which is exemplarily applied to a video playing device, and the embodiment of the present invention is described by taking an STB as an example, and the method specifically includes, but is not limited to, the following steps 101 to 103:
step 101: acquiring a plurality of paths of media code streams;
in step 101, each of the multiple paths of media streams respectively corresponds to a picture of the same content at a different viewing angle, such as a ball game, a concert, and the like, each path of media stream includes an audio stream and a video stream, during playing, the main viewing angle simultaneously plays the audio stream and the video stream, and the remaining other viewing angles only play the video stream.
Step 102: acquiring frame synchronization information of a first media code stream;
in step 102, the first media code stream is any one of the multiple media code streams.
In an embodiment, the media bitstream of the main view may be selected as the first media bitstream.
Step 103: and synchronizing the timestamp information of the multi-path media code stream according to the frame synchronization information.
In the above steps 101 to 103, the timestamp information of the multiple paths of media code streams is synchronized according to the frame synchronization information of any path of the multiple paths of media code streams, so that the frames of the multiple paths of media code streams are played synchronously, thereby improving the user experience.
In one embodiment, the frame synchronization information may have two forms:
one is an audio Presentation Time Stamp (PTS) of the first media stream, and the other is a natural Time Stamp of the first media stream, where the natural Time Stamp is a coded natural Time corresponding to each media stream in the multiple media streams, such as UTC Time, beijing Time, and the like, and the UTC Time is taken as an example for description, and if the coded natural Time is 17 o' clock at 4 month, 12 th, 2020, the natural Time Stamp is 1585818000.
In an embodiment, the synchronizing the timestamp information of the multiple media code streams according to the frame synchronization information in step 103 may specifically be:
and synchronizing the video display time stamp of any one path of media code stream in the multi-path media code streams according to the audio display time stamp of the first media code stream.
The audio display time stamp of the first media code stream is used as frame synchronization information, so that the method is suitable for scenes with consistent time reference (time _ base) of the display time stamp of each path of media code stream in the multi-path media code streams, does not need to carry out additional coding on the multi-path media code streams, is high in efficiency, and is favorable for improving universality.
Specifically, the audio display timestamp of the first media stream may be synchronized with the video display timestamp of the first media stream, so that the sound and the picture of the first media stream are synchronized. Then, synchronizing the video display time stamps of the media code streams of other paths except the first media code stream by the audio display time stamp of the first media code stream, so that the picture of the first media code stream is synchronized with the pictures of the media code streams of other paths, thereby synchronizing each path of picture of the multi-path media code streams, and simultaneously synchronizing the sound and the picture of the multi-path media code streams. Moreover, since the sensitivity of sound is higher than that of a picture, the adoption of the audio display time stamp as the frame synchronization information can be beneficial to improving the accuracy of synchronization. It can be understood that the synchronization sequence of the audio display time stamps of the first media stream may be adjusted according to actual situations, for example, the video display time stamps of other paths of media streams except the first media stream may be synchronized first, and then the video display time stamps of the first media stream may be synchronized.
It is understood that, in addition to the audio display time stamp of the first media stream as the frame synchronization information, the video display time stamp of the first media stream may also be used as the frame synchronization information.
In an embodiment, synchronizing the video display timestamp of any one path of media code stream in the multiple paths of media code streams according to the audio display timestamp of the first media code stream may be performed in two ways:
one situation is that when the audio display time stamp of the first media code stream is larger than the video display time stamp of any one path of media code stream in the multi-path media code streams, the video playing speed of the path of media code stream is increased until the video display time stamp of the path of media code stream is synchronous with the audio display time stamp of the first media code stream.
For example, the fast-playing may be performed at 1.2 times. For example, at normal play speed, if the frame rate of the corresponding video is 30fps, i.e. 30 frames are played per second, the inter-frame interval is about 33 milliseconds. If the fast playback is performed 1.2 times, that is, 36 frames are played per second, the inter-frame interval is adjusted to be about 27 milliseconds. It is understood that the speed of the fast playback may be adjusted according to actual situations, and the embodiments of the present invention are not limited.
And when the audio display time stamp of the first media code stream is smaller than the video display time stamp of any one path of media code stream in the multi-path media code streams, reducing the video playing speed of the path of media code streams until the video display time stamp of the path of media code streams is synchronous with the audio display time stamp of the first media code stream.
For example, slow down may be performed at 0.8 times speed. For example, at normal play speed, if the frame rate of the corresponding video is 30fps, i.e. 30 frames are played per second, the inter-frame interval is about 33 milliseconds. If 0.8 times of slow-play is performed, i.e. 24 frames are played per second, the inter-frame interval is adjusted to about 41 ms. It is understood that the slow-release speed can be adjusted according to practical situations, and the embodiment of the present invention is not limited.
The video display time stamp of any one path of media code stream in the multi-path media code stream is synchronous with the audio display time stamp of the first media code stream, so that the picture synchronization of the multi-path media code stream is realized.
In an embodiment, when it is determined whether the video display time stamp of any one of the multiple media code streams is synchronized with the audio display time stamp of the first media code stream, a difference between the video display time stamp of any one of the multiple media code streams and the audio display time stamp of the first media code stream may be obtained first, and when the difference is within a preset range, it is determined that the video display time stamp of any one of the multiple media code streams is synchronized with the audio display time stamp of the first media code stream. Illustratively, the preset range may be 10 ms to 50 ms, 10 ms to 100 ms, etc., and may be adjusted according to actual situations. By setting the preset range, too frequent picture synchronization can be avoided.
In an embodiment, based on that the frame synchronization information is a natural timestamp of the first media code stream, in step 103, synchronizing timestamp information of multiple paths of media code streams according to the frame synchronization information may also be specifically:
synchronizing a natural time stamp of a second media code stream according to the natural time stamp of the first media code stream;
the second media code stream is any path of media code stream except the first media code stream in the multi-path media code streams. The natural time stamp is used as frame synchronization information, and even if the time reference of the display time stamp of each path of media code stream in the multi-path media code streams is inconsistent, the picture synchronization can be carried out according to the natural time stamp so as to ensure the accuracy and stability of the subsequent picture synchronization. For example, each path of media code stream in the multiple paths of media code streams generally needs to be subjected to audio and video acquisition, encoding, multiplexing and packaging, and an audio display time stamp and a video display time stamp are generally written in a multiplexing stage, however, even if the acquisition time is consistent, different acquisition devices may have differences in processing, which causes different time consumption for forming the media code streams, and finally shows that the display time stamps of different paths of media code streams have certain deviation, so that the time references of the display time stamps of each path of media code streams in the multiple paths of media code streams are inconsistent. For example, the time points of the acquisition by the first camera and the second camera are the same, but the time consumed by the first camera in the intermediate encoding process is 500 milliseconds, and the time consumed by the second camera in the intermediate encoding process is 1 second, so that the display time stamps of the two cameras are 500 milliseconds different from each other for the data acquired at the same time.
In an embodiment, synchronizing the natural time stamp of the second media code stream according to the natural time stamp of the first media code stream may be performed in two cases:
when the natural time stamp of the first media code stream is larger than that of the second media code stream, the video playing speed of the second media code stream is increased until the natural time stamp of the second media code stream is synchronous with that of the first media code stream;
for example, the fast playback can be performed at a speed 1.2 times, and the specific principle is explained above and is not described herein again.
And in the other situation, when the natural time stamp of the first media code stream is smaller than the natural time stamp of the second media code stream, the video playing speed of the second media code stream is reduced until the natural time stamp of the second media code stream is synchronous with the natural time stamp of the first media code stream.
For example, the fast playback can be performed at a speed of 0.8 times, and the specific principle is explained above and is not described herein again.
Because the natural time stamps of the first media code stream and the second media code stream are synchronous, the picture synchronization of the multi-path media code streams is realized.
In an embodiment, when it is determined that the natural time stamp of the second media code stream is synchronized with the natural time stamp of the first media code stream, a difference between the natural time stamp of the second media code stream and the natural time stamp of the first media code stream may also be obtained first, and when the difference is within a preset range, it is determined that the video display time stamp of any one of the multiple media code streams is synchronized with the audio display time stamp of the first media code stream. Illustratively, the preset range may be 10 ms to 50 ms, 10 ms to 100 ms, etc., and may be adjusted according to actual situations. By setting the preset range, too frequent picture synchronization can be avoided.
In an embodiment, in addition to synchronizing the natural time stamp of the second media code stream according to the natural time stamp of the first media code stream, the video display time stamp of the first media code stream may also be synchronized according to the audio display time stamp of the first media code stream, so as to ensure synchronization of sound and pictures of multiple paths of media code streams.
Referring to fig. 2, an embodiment of the present invention further provides an encoding method, which is exemplarily applied to a video encoding device, such as a video camera, and specifically may include, but is not limited to, the following steps 201 to 203:
step 201: generating a plurality of paths of media code streams;
step 202: writing frame synchronization information into each path of media code stream in the multi-path media code stream;
in step 202, the frame synchronization information is used to synchronize timestamp information of multiple media streams, and the frame synchronization information is explained above and is not described herein again.
Step 203: and transmitting the multi-path media code streams.
In step 203, the video encoding device may directly send the multiple media streams to the STB, or may send the multiple media streams to the server first, where the server sends the multiple media streams to the STB. When the server is used for transmitting the multi-path media code streams to the STB, the server does not process the multi-path media code streams.
The frame synchronization information for synchronizing the timestamp information of the multiple paths of media code streams is written into each path of media code stream in the multiple paths of media code streams from the step 201 to the step 203, so that the STB can synchronize the timestamp information of the multiple paths of media code streams according to the frame synchronization information of any path of media code streams after receiving the multiple paths of media code streams, thereby enabling the frames of the multiple paths of media code streams to be played synchronously, improving user experience, storing the frame synchronization information in any path of media code streams of the multiple paths of media code streams, and performing frame synchronization without depending on an external reference clock, which is beneficial to improving the efficiency of frame synchronization.
In an embodiment, the frame synchronization information may be an audio display time stamp of the first media code stream or a natural time stamp of the first media code stream, and the audio display time stamp and the natural time stamp are explained before and are not described herein again. For example, in the H264 format, UTC time can be written to the SEI container during the encoding phase.
The picture synchronization method and the coding method provided by the embodiments of the present invention are described below in two practical examples.
Example one
Referring to fig. 3 and 4, the present example is described with reference to the audio display time stamp of the media code stream with frame synchronization information as the main view, and specifically includes the following steps 401 to 413:
step 401: cameras at different angles in the same shooting place carry out time calibration through a clock synchronizer, the cameras at different angles respectively carry out audio and video acquisition, coding, multiplexing and packaging, and the packaged media code stream is sent to a streaming media server;
step 402: the STB initiates a login request to the service system, and the service system returns a login response to the STB;
step 403: the STB selects to enter a multi-view playing scene according to EPG (electronic Program Guide) navigation and selects any one view as a main view;
step 404: the STB requests to play the multi-path media code stream to the streaming media server, and the streaming media server responds to the playing request of the STB;
step 405: the STB receives the multi-path media code streams from the streaming media server, analyzes the multi-path media code streams, and obtains a video display time stamp of each path of media code stream and an audio display time stamp of the main visual angle media code stream;
step 406: judging whether the difference value of the audio display time stamp of the main visual angle media code stream and the video display time stamp of the main visual angle media code stream is out of a preset range, if so, skipping to a step 407; if not, skipping to step 410;
step 407: judging whether the audio display time stamp of the main visual angle media code stream is larger than the video display time stamp of the main visual angle media code stream, if so, skipping to the step 408; if not, skipping to step 409;
step 408: increasing the video playing speed of the main visual angle media code stream until the difference value between the video display time stamp and the audio display time stamp of the main visual angle media code stream is within a preset range, and skipping to step 410;
step 409: reducing the video playing speed of the main visual angle media code stream until the difference value between the video display time stamp and the audio display time stamp of the main visual angle media code stream is within a preset range;
step 410: judging whether the difference value between the audio display time stamp of the main visual angle media code stream and the video display time stamps of other visual angle media code streams is out of a preset range, if so, skipping to step 411; if not, ending the flow;
step 411: judging whether the audio display time stamp of the main visual angle media code stream is larger than the video display time stamps of other visual angle media code streams, if so, skipping to the step 412; if not, go to step 413;
step 412: increasing the video playing speed of the media code streams of other visual angles until the difference value between the video display time stamps and the audio display time stamps of the media code streams of other visual angles is within a preset range, and ending the process;
step 413: and reducing the video playing speed of the media code streams of other visual angles until the difference value between the video display time stamp and the audio display time stamp of the media code streams of other visual angles is within a preset range.
Example two
Referring to fig. 3 and fig. 5, the present example is described with reference to the UTC timestamp of the media bitstream with frame synchronization information as the main view, and specifically includes the following steps 501 to 513:
step 501: the method comprises the steps that time calibration is carried out on cameras at different angles in the same shooting place through a clock synchronizer, the cameras at different angles respectively carry out audio and video acquisition, encoding, multiplexing and packaging, a UTC timestamp of the current camera is written in the beginning of encoding, and a packaged media code stream is sent to a streaming media server;
step 502: the STB initiates a login request to the service system, and the service system returns a login response to the STB;
step 503: the STB selects a multi-view playing scene according to EPG navigation and selects any one view as a main view;
step 504: the STB requests to play the multi-path media code stream to the streaming media server, and the streaming media server responds to the playing request of the STB;
step 505: the STB receives the multi-path media code streams from the streaming media server, analyzes the multi-path media code streams to obtain a UTC time stamp of each path of media code stream, an audio display time stamp of a main visual angle media code stream and a video display time stamp of the main visual angle media code stream;
step 506: judging whether the difference value of the audio display time stamp of the main visual angle media code stream and the video display time stamp of the main visual angle media code stream is out of a preset range, if so, skipping to a step 507; if not, skipping to step 510;
step 507: judging whether the audio display time stamp of the main visual angle media code stream is larger than the video display time stamp of the main visual angle media code stream, if so, skipping to the step 508; if not, go to step 509;
step 508: increasing the video playing speed of the main view media code stream until the difference value between the video display time stamp and the audio display time stamp of the main view media code stream is within a preset range, and skipping to step 510;
step 509: reducing the video playing speed of the main visual angle media code stream until the difference value between the video display time stamp and the audio display time stamp of the main visual angle media code stream is within a preset range;
step 510: judging whether the difference value between the UTC timestamp of the main view media code stream and the UTC timestamps of other view media code streams is out of a preset range, if so, skipping to the step 511; if not, ending the flow;
step 511: judging whether the UTC timestamp of the main view media code stream is larger than UTC timestamps of other view media code streams, if so, skipping to step 512; if not, go to step 513;
step 512: increasing the video playing speed of the media code streams of other visual angles until the difference value between the video display time stamps and the audio display time stamps of the media code streams of other visual angles is within a preset range, and ending the process;
step 513: and reducing the video playing speed of the media code streams of other visual angles until the difference value between the video display time stamp and the audio display time stamp of the media code streams of other visual angles is within a preset range.
It should also be appreciated that the various implementations provided by the embodiments of the present invention can be combined arbitrarily to achieve different technical effects.
Fig. 6 shows a video playback device 600 provided by an embodiment of the present invention. The video playback device 600 includes: a memory 601, a processor 602, and a computer program stored on the memory 601 and executable on the processor 602, the computer program being operable to execute the above-mentioned picture synchronization method.
The processor 602 and memory 601 may be connected by a bus or other means.
The memory 601, which is a non-transitory computer readable storage medium, can be used to store non-transitory software programs and non-transitory computer executable programs, such as the picture synchronization method described in the embodiments of the present invention. The processor 602 implements the above-described picture synchronization method by running a non-transitory software program and instructions stored in the memory 601.
The memory 601 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store and execute the picture synchronization method described above. Further, the memory 601 may include high speed random access memory 601, and may also include non-transitory memory 601, such as at least one piece of disk memory 601, flash memory device, or other non-transitory solid state memory device. In some embodiments, memory 601 may optionally include memory 601 located remotely from processor 602, and such remote memory 601 may be coupled to the video playback device 600 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The non-transitory software programs and instructions needed to implement the above-described picture synchronization method are stored in the memory 601 and, when executed by the one or more processors 602, perform the above-described picture synchronization method, e.g., perform the method steps 101 to 103 in fig. 1.
Fig. 7 illustrates a video encoding apparatus 700 according to an embodiment of the present invention. The video encoding apparatus 700 includes: a memory 701, a processor 702 and a computer program stored on the memory 701 and executable on the processor 702, the computer program being operable to perform the above-mentioned encoding method.
The processor 702 and the memory 701 may be connected by a bus or other means.
The memory 701, which is a non-transitory computer-readable storage medium, may be used to store a non-transitory software program and a non-transitory computer-executable program, such as a picture synchronization method or an encoding method described in the embodiments of the present invention. The processor 702 implements the above-described encoding method by running non-transitory software programs and instructions stored in the memory 701.
The memory 701 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store a picture synchronization method or a coding method performed as described above. Further, memory 701 may include high speed random access memory 701, and may also include non-transitory memory 701, such as at least one piece of disk memory 701, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 701 may optionally include memory 701 located remotely from the processor 702, and such remote memory 701 may be coupled to the video encoding apparatus 700 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
Non-transitory software programs and instructions needed to implement the above-described picture synchronization method or encoding method are stored in the memory 701 and, when executed by the one or more processors 702, perform the above-described encoding method, e.g., perform the method steps 201 to 203 in fig. 2.
Embodiments of the present invention further provide a computer-readable storage medium, which stores computer-executable instructions for executing the above-mentioned picture synchronization method or encoding method.
In one embodiment, the computer-readable storage medium stores computer-executable instructions that, when executed by one or more control processors, for example, by one of the processors 602 of the video playback device 600, cause the processor 602 to perform the picture synchronization method described above, for example, performing the method steps 101 to 103 of fig. 1. Alternatively, execution by one of the processors 702 in the video encoding apparatus 700 may cause the processor 702 to perform the encoding method described above, e.g., perform the method steps 201 to 203 in fig. 2.
The above-described embodiments of the apparatus are merely illustrative, wherein the units illustrated as separate components may or may not be physically separate, i.e. may be located in one place, or may also be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
One of ordinary skill in the art will appreciate that all or some of the steps, systems, and methods disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as is well known to those of ordinary skill in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by a computer. In addition, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media as known to those skilled in the art.
While the preferred embodiments of the present invention have been described in detail, it will be understood by those skilled in the art that the foregoing and various other changes, omissions and deviations in the form and detail thereof may be made without departing from the scope of this invention.
While the preferred embodiments of the present invention have been described in detail, it will be understood by those skilled in the art that the foregoing and various other changes, omissions and deviations in the form and detail thereof may be made without departing from the scope of this invention.