CN108235113B

CN108235113B - Panoramic video rendering and presentation attribute indication method and system

Info

Publication number: CN108235113B
Application number: CN201611155809.7A
Authority: CN
Inventors: 徐异凌; 张文军; 程铭; 胡颖; 孙军; 管云峰; 王延峰
Original assignee: Shanghai Jiao Tong University
Current assignee: Shanghai Jiao Tong University
Priority date: 2016-12-14
Filing date: 2016-12-14
Publication date: 2022-01-04
Anticipated expiration: 2036-12-14
Also published as: CN108235113A; CN112770178A

Abstract

The present invention provides a method and system for indicating attributes of panoramic video rendering and presentation. The method inserts first indication information, second indication information and/or second indication information for the panoramic video into description information associated with panoramic video rendering and presentation. or third indication information; the first indication information indicates the media content attribute of the panoramic video, including: an identifier of the mapping format of the panoramic video; an identifier of whether the panoramic video covers a 360-degree viewing angle; the second The indication information indicates the spatial information of the panoramic video; the third indication information indicates the quality information of the panoramic video. The present invention can realize the rendering and presentation of media content based on the viewing angle, and the panoramic video matching the viewing angle can satisfy the user's high-quality consumption demand for the panoramic video as much as possible under the condition of limited network bandwidth.

Description

Panoramic video rendering and presentation attribute indication method and system

Technical Field

The invention relates to the technical field of digital media, in particular to a method and a system for indicating rendering and presentation properties of a panoramic video.

Background

With the rapid development of digital media technology, immersive media is becoming one of the development directions of media in the future. While the industry and academia have been widely invested in panoramic video production and coding research, the transmission of panoramic video also faces some problems to be solved.

For panoramic video, the media content consumed by the user at each time is a part of the spatial area of the whole video, and the spatial area of the consumed media content changes as the viewing angle of the panoramic video viewed by the user changes.

Through retrieval, the invention CN201210365946.9 in china discloses a system and a method for displaying and interacting panoramic video, which realizes the free display and interaction process of panoramic video by setting up marks on a reference and adopting at least two marks to realize 360-degree rotation operation control of panoramic video. However, the patent cannot realize the indication of video content, and cannot be used for indexing media content when rendering, presenting and switching the view angle of the panoramic video based on the view angle.

Disclosure of Invention

The invention aims to provide a method and a system for indicating rendering and presentation properties of a panoramic video, which are used for indicating the rendering and presentation properties of the panoramic video in different mapping modes and realizing the rendering and presentation of media contents based on a visual angle.

According to a first aspect of the present invention, there is provided a panoramic video rendering and presentation attribute indication method, the method comprising: inserting first indication information, and second indication information and/or third indication information for a panoramic video in description information associated with panoramic video rendering and presentation;

the first indication information indicates the media content attribute of the panoramic video, and includes:

-an identifier of a mapping format of the panoramic video;

-an identifier of whether the panoramic video covers the full view range;

the second indication information indicates spatial information of the panoramic video;

the third indication information indicates quality information of the panoramic video.

Preferably, the identifier of the mapping format of the panoramic video is used for indicating a mapping manner from the planar frame to the panoramic video when the panoramic video is presented.

Preferably, the identifier of whether the panoramic video covers the whole view angle range is used for indicating whether the data in the panoramic video can be completely mapped into the panoramic video presentation of the whole view angle range.

Preferably, the spatial information indicates a correspondence between the panoramic video media asset and different view angle regions on the client presentation model.

Preferably, for different mapping manners, the spatial information includes the following contents:

-for both longitude and latitude map and binocular stereo longitude and latitude map mapping, the spatial information includes the range of spatial region coverage pitch and azimuth;

-for a polytope mapping mode, the spatial information comprises numbers indicating faces of polytopes contained by the panoramic video;

for a cylinder mapping approach, the spatial information includes the number of faces contained by the panoramic video, and the specific horizontal and spatial coordinate ranges at the sides.

Preferably, the quality information includes resolution information of the panoramic video.

More preferably, the resolution information includes the number of horizontal and vertical pixels of the spatial region, and particularly, the resolution for the upper and lower bottom surfaces of the cylinder is indicated by the radius of the bottom surface.

Further, the method further comprises: and receiving the indication information from a network entity, and analyzing the indication information for rendering and presenting the panoramic video.

According to a second aspect of the present invention, there is provided a panoramic video rendering and presentation method, comprising:

s1: the method comprises the steps of constructing signaling information at a server side, inserting first indication information, second indication information and/or third indication information aiming at a panoramic video in signaling associated with rendering and presenting of the panoramic video, and sending the panoramic video and the corresponding signaling information to a client side;

s2: and after receiving the panoramic video and the corresponding signaling information thereof sent by the server, the client analyzes the signaling information containing the first indication information, and determines the corresponding relation between the panoramic video media resource and different view angle areas of the client presentation model by using the second indication information and/or the third indication information when rendering and presenting the panoramic video so as to correctly render and present.

According to a third aspect of the present invention, there is provided a panoramic video rendering and presentation system comprising:

the server inserts first indication information, second indication information and/or third indication information aiming at the panoramic video in a signaling associated with rendering and presenting of the panoramic video, and sends the panoramic video and the corresponding signaling information to the client;

and the client analyzes the signaling information containing the first indication information after receiving the panoramic video and the corresponding signaling information sent by the server, and determines the corresponding relation between the panoramic video media resource and different view angle areas of a client presentation model by using the second indication information and/or the third indication information when the panoramic video is rendered and presented so as to render and present correctly.

Compared with the prior art, the invention has the following beneficial effects:

the invention provides a set of complete indication mechanism aiming at rendering and presenting attributes of the panoramic video in different mapping modes, and further identifies the space and quality information of the transmitted panoramic video on the basis of the mechanism, thereby realizing the rendering and presenting of the panoramic video based on the visual angle. The panoramic video with the matched view angle can meet the high-quality consumption requirement of a user on the panoramic video as far as possible under the condition that the network bandwidth is limited.

The method and the device indicate the media content attribute, the spatial information and/or the quality information of the panoramic video, and can be well used for indexing the media content during rendering and presenting of the panoramic video based on the view angle and switching of the view angle.

Drawings

Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments with reference to the following drawings:

FIG. 1 is a simplified flow chart of indication information for panoramic video according to an embodiment of the present invention;

FIG. 2 is a media content attribute, spatial position and resolution parsing flow;

FIG. 3 is an exemplary illustration of a longitude and latitude map mapping manner spatial location indication;

FIG. 4 is an exemplary diagram of arrangement of binocular solid longitude and latitude diagrams;

FIG. 5 is an indicative relationship of respective faces of a regular hexahedron;

FIG. 6 is an illustration of different arrangements of regular hexahedrons;

FIG. 7 is an exemplary illustration of spatial position indication for cylinder mapping;

fig. 8 is a flowchart of an example application of the panoramic video attribute indication information according to an embodiment of the present invention.

Detailed Description

The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the invention, but are not intended to limit the invention in any way. It should be noted that variations and modifications can be made by persons skilled in the art without departing from the spirit of the invention. All falling within the scope of the present invention.

In order to realize real-time rendering, presentation and real-time switching of the visual field of the media content in the visual field range of a user, necessary indication on the spatial attributes of the media content is required. Given the different sensitivity of the human eye to different areas within the field of view, there is also a need to indicate quality information of the panoramic video. In this way, the rendering, presentation, and indexing of media content during view-based switching of the panoramic video can be performed normally.

Thus, the present invention indicates media content attributes, spatial location, and resolution information related to panoramic video rendering and presentation attributes at the stream level. A simplified flow of media content indication information is shown in fig. 1.

In order to indicate the panoramic video rendering and presentation properties, the following indication information needs to be inserted into the description information associated with the panoramic video rendering and presentation:

the first indication information: and indicating different mapping modes of the panoramic video and indicating whether the content contained in the panoramic video resource media can completely cover the panoramic video content. The following sub-situation indication of the spatial position is subsequently made on the basis of this indication information (media content properties).

Specifically, the first indication information includes:

mapping format of panoramic video (vr _ projection _ format): indicating the mapping mode of the media content, and performing situation division indication of the following spatial information according to the indication information;

whether the panoramic video covers all views (is _ complete): and indicating whether the content contained in the media resource is complete panoramic video content, wherein the value of 1 represents that the content contained in the media resource is complete panoramic video content, and the value of 0 represents that the content contained in the media resource is partitioned panoramic video content, and performing the following situation division indication of the spatial information according to the media resource content attribute information.

In some embodiments, one or both of the following second indication information, third indication information are further inserted into the description information associated with the panoramic video rendering and presentation:

Specifically, the spatial information indicates a correspondence between the panoramic video media resource and different view angle regions on the client presentation model, that is, a correspondence between the panoramic video media resource and different view angles on the user viewing sphere, and the definition of the spatial range is determined by using a rectangular range on the data plane frame. The spatial position of the media content on the user viewing sphere is uniquely determined by the range definition on the plane frame and the corresponding relation definition from the plane frame to the user viewing sphere. The generation of the planar frame is based on the mapping method, that is, the corresponding relationship between the planar frame and the user viewing sphere in different mapping methods is different, so the spatial position definition of the present invention is based on the mapping method.

The existing panoramic video mapping modes are various, such as a longitude and latitude map, a binocular stereo longitude and latitude map, a regular polyhedron, a cylinder and the like, and the indication modes of the appropriate spatial position of the media content in each mapping mode are different.

For different mapping modes, the spatial information includes the following:

for the mapping mode of the longitude and latitude map and the binocular stereo longitude and latitude map, the spatial information comprises the range of a spatial region covering a pitch angle and an azimuth angle;

for the polygon mapping manner, the spatial information indicates the number of faces of a polygon contained in the media content;

for the cylinder mapping approach, the spatial information includes the number of the faces included in the media content and the specific horizontal and spatial coordinate ranges at the sides.

The resolution information includes the number of horizontal and vertical pixels of the spatial region, and the resolution particularly for the upper and lower base surfaces of the cylinder is indicated by the base radius.

The second indication information and the third indication information respectively indicate spatial information and quality information of video contents with different mapping modes, and specifically, in some preferred embodiments:

the second indication information is performed by defining the following fields:

initial view azimuth coordinate (initial _ center _ yaw): for a longitude and latitude map, a binocular stereo longitude and latitude map and a cylinder mapping mode, an initial azimuth angle of a video watched by a user is identified by adopting an initial visual angle azimuth coordinate;

initial view pitch coordinate (initial _ center _ pitch): for a longitude and latitude map, a binocular stereo longitude and latitude map and a cylinder mapping mode, an initial angle of pitch of a video watched by a user is identified by adopting an initial visual angle pitch coordinate;

uniform resolution flag (is _ uniform _ resolution): for the complete panoramic video content, whether different spatial regions in the video content have uniform resolution or not is marked by the field, the uniform resolution of the media content is represented by the value of 1, and the uniform resolution is not represented by the value of 0;

number of spatial regions (num _ vr _ regions): for complete panoramic video content, when the video content is not uniform in resolution, the number of blocks in spatial regions with different video content resolutions is increased, and then each spatial region needs to be marked with the resolution;

spatial region number (vr _ region _ id): indicating a video content space region number;

spatial region upper end pitch angle (region _ pitch _ top): indicating the pitch angle of the upper edge of the current area;

spatial region left azimuth (region _ yaw _ left): indicating an azimuth angle of a left end of the current region;

spatial region pitch angle range (region _ pitch _ height): indicating the angle range of a pitch angle covered by the current area;

spatial region azimuth range (region _ yaw _ width): indicating the azimuth angle range covered by the current area;

arrangement (layout): for the binocular stereo longitude and latitude map and the polyhedral mapping mode, all the faces can form complete video content covering all panoramic view angles in different arrangement modes;

initial face number (initial _ surface _ id): for a polyhedron, an initial viewing surface number is used for representing an initial surface, and the center of the initial surface is used as an initial default viewing angle for a user to view a video;

default _ rotation _ flag): for the polyhedron mapping mode, whether each surface adopts a default rotation angle during splicing, wherein the default rotation angle is represented by the value of 1, and the default rotation mode is not represented by the value of 0;

face number (face _ id): for a polyhedron or a cylinder, a video frame is formed by splicing a plurality of faces, and the number of the face is the label of the face and is used for marking the spatial position of the face;

face _ rotations: for the polyhedron mapping mode, when the surfaces are spliced into a plane video frame, proper rotary splicing can be carried out on the surfaces for the consideration of coding, and the rotation angle is marked by surface rotation;

side maximum pitch angle (side _ pitch): for the cylinder mapping mode, the angle range corresponding to each surface is uncertain, and the view angle range corresponding to each surface of the cylinder can be uniquely determined according to the maximum pitch angle of the side surface. The determination is as follows:

side width (side _ width): for the cylinder mapping mode, the number of pixels in the horizontal direction after the side surface expansion;

side-unfolded height (side _ height): for the cylinder mapping mode, the number of pixels of the side surface in the vertical direction;

horizontal coordinate of upper left vertex of spatial region (region _ horizontal _ left), vertical coordinate of upper left vertex of spatial region (region _ vertical _ top): for the cylinder mapping approach, the coordinates of the top left vertex of the current spatial region in the expanded graph on the side of the cylinder.

The third indication information is performed by defining the following fields:

horizontal resolution _ width): the number of pixels in the horizontal direction of the current space region;

vertical resolution _ height): the number of pixels in the vertical direction of the current space region;

bottom radius (undercut _ radius): for the cylinder mapping mode, the upper bottom surface and the lower bottom surface are circular, and the resolution of the upper bottom surface and the lower bottom surface is uniquely determined by using the radius of the upper bottom surface and the radius of the lower bottom surface;

further, the use of some of the above fields in some embodiments is detailed:

the mapping format is used to indicate the mapping manner of the media content, and can be defined as follows:

mapping formats	Description of the invention
		0	Longitude and latitude map
1	Binocular stereo longitude and latitude map
		2	Regular hexahedron
3	Cylinder body
		4	Regular 8-face body
5	A positive 20-face body
		Others	Retention

For the longitude and latitude map mapping method, determining the spatial region on the spherical surface is determined by the range of the upper edge pitch angle and the left edge azimuth angle of the region, and the range of the pitch angle and the azimuth angle of the spatial region, i.e. the spatial region pitch angle range and the spatial region azimuth angle range, as shown in fig. 2.

For binocular stereo, two splicing modes of a binocular stereo longitude and latitude map are provided, the arrangement mode is 0, the binocular plane video frames corresponding to the left spherical surface and the right spherical surface are arranged into a plane video frame according to the left and right sequence, the left side corresponds to the left spherical surface, and the right side corresponds to the right spherical surface; the arrangement mode value is 1, which indicates that the binocular plane video frames corresponding to the left and right spherical surfaces are arranged in the vertical order to form a plane video frame, the upper surface corresponds to the left spherical surface, and the lower surface corresponds to the right spherical surface, as shown in fig. 3.

For a polyhedron, there are many ways to splice the faces into a flat video frame, taking a regular hexahedron as an example:

firstly, determining the corresponding relation between the face number and the face, and referring to FIG. 4;

noodle numbering	Name of noodle
		0	Left side of the design reside in
1	Front panel
		2	Right side
3	Back side of the panel
		4	The top surface
5	Bottom surface

② the following several alternative arrangements are assumed, see fig. 5. The specific value examples corresponding to the arrangement mode are as follows:

arrangement methodFormula (II)	Arrangement type
		0	6×1
1	3×2
		2	2×3
Others	Others

Specific examples of the default rotation angle of each surface during splicing are as follows:

when splicing, the default rotating angle is not used, the rotating angles of all the surfaces are indicated one by one and indicated by surface rotation, and specific examples are shown in the following table:

surface rotation	Rotation angle (anticlockwise)
		0	0°
1	–90°
		2	+90°
3	+180°

So far, in the binocular stereo longitude and latitude map and the polyhedron mapping mode, the one-to-one correspondence relationship from the plane frame to the panoramic video is determined.

Specifically, for a cylinder, the maximum pitch angle of the side surface within the maximum pitch angle range of the side surface, the side surface expansion width, the side surface expansion height, and the bottom surface radius are required to determine the one-to-one correspondence relationship from the pixels in the planar frame to different viewing angles of the panoramic video in the cylinder mapping manner, and it should be noted that, for planar frames with different resolutions, the side surface expansion width, the side surface expansion height, and the bottom surface radius are different, and the correspondence relationship is shown in fig. 6.

Based on the above information, taking MMT as an example, an example is given below in which spatial information and resolution information of media content are respectively indicated for each asset in MP _ message, the above indication information is described in vr _ asset _ descriptor, as shown in the following table, and the meaning of each field in the following table refers to the above description:

the media resource attribute description field and the flow for the panoramic video in the table above are designed based on the insertion indication information provided by the present invention.

It should be noted that, in the present invention, the media content identification information is described by taking the above fields as an example, and is not limited to the above fields and the sizes thereof. For a better understanding of the meaning of the above fields, reference may be made to the application example shown in fig. 1.

With the indication of the above-mentioned properties of the panoramic video, in some embodiments, a method of rendering and presenting a panoramic video, comprises:

firstly, signaling information is constructed at a server side, media resource indication media content attribute, spatial position information and quality information (resolution ratio) are added into the signaling, the indication process is shown in figure 1, and the media resource and the corresponding signaling information are sent to a client side;

secondly, the client analyzes the media resource attribute information after receiving the media resource and the signaling of the attribute description thereof sent by the server, and the analyzing flow is shown in figure 2;

when the client uses the media resources to render and present, the corresponding relation between the media resources and the user watching sphere is determined by using the media content attribute, the spatial information and the resolution information provided by the invention, so as to render and present correctly.

Accordingly, in some embodiments, a panoramic video rendering and presentation system comprises:

the server inserts first indication information, second indication information and/or third indication information aiming at the panoramic video in a signaling associated with rendering and presenting of the panoramic video, and sends the panoramic video and the signaling information corresponding to the panoramic video to the client;

the client analyzes the signaling information containing the first indication information after receiving the panoramic video and the corresponding signaling information sent by the server, and determines the corresponding relation between the media resource and the user watching spherical surface by using the second indication information and/or the third indication information when rendering and presenting the panoramic video so as to render and present correctly;

based on the above expression, a specific application example is given below:

to more clearly illustrate the indication mechanism proposed by the present invention to support rendering and presentation of client-side media content, fig. 7 illustrates a specific implementation flow in conjunction with the indication method in the specific example:

when the client logs in the service, the server sends MP _ message (), other signaling information and media resource content, and then the client receives the MP _ message (), the media resource content and other related information;

after receiving the asset and the MP _ message (), the user parses the MP _ message (), reads the MP _ table (), and parses vr _ asset _ descriptor () therein, where the contents to be read include:

firstly, mapping mode of video content in asset;

whether the video content in the asset can completely cover all the visual angles of the panoramic video or not;

thirdly, initial visual angle information of panoramic video content in the asset, layout information mapped to a three-dimensional space from a plane frame, rotation angle information when all surfaces are spliced and the like;

fourthly, the resolution ratio of the video content in the asset;

mapping the different spatial regions of the video content to the corresponding information of the position on the panoramic video.

After the client parses the above information about each asset, the client can use the above indication information to guide the rendering and presentation of the video when the client needs to call a certain asset rendering and presentation.

The above embodiments of the invention illustrate the proposed solutions by way of example for MMT, but these solutions can also be used in other file encapsulation, transmission systems and protocols.

The invention indicates the media content attribute, spatial information and/or quality information to achieve indexing of media content during rendering and presentation of panoramic video. For different mapping modes, a suitable spatial region indication mode can be adopted. There is corresponding spatial position indication information for both full frame video and spatially blocked video content. In addition, the corresponding resolution of different areas of the media content is indicated. The method is beneficial to accurately indexing the media content matched with the user view angle during rendering and presentation. Under the condition of limited network bandwidth, the high-quality requirement of the user is met as much as possible.

It should be understood that the above embodiments are some of the embodiments of the present invention directed to video multimedia content, and the present invention is also applicable to the storage and transmission of other multimedia content, such as images and the like. This can be achieved by a person skilled in the art.

The foregoing description of specific embodiments of the present invention has been presented. It is to be understood that the present invention is not limited to the specific embodiments described above, and that various changes and modifications may be made by one skilled in the art within the scope of the appended claims without departing from the spirit of the invention.

Claims

1. A panoramic video rendering and presentation attribute indication method, characterized in that: inserting first indication information, second indication information and third indication for the panoramic video into description information associated with panoramic video rendering and presentation information;

The first indication information indicates the media content attribute of the panoramic video, including:

- the identifier of the mapping format of the panoramic video, which is used to indicate the mapping method of the plane frame to the panoramic video when it is presented;

- the identifier of whether the panoramic video covers the entire viewing angle range, which is used to indicate whether the data in the panoramic video can be completely mapped into the panoramic video presentation of the entire viewing angle range;

The second indication information indicates the spatial information of the panoramic video;

The third indication information indicates the quality information of the panoramic video;

The spatial information indicates the correspondence between panoramic video media resources and different viewing angle regions on the client-side presentation model;

For different mapping methods, the spatial information includes the following:

- For the latitude and longitude map and binocular stereo latitude and longitude map mapping methods, the spatial information includes the range of the spatial area covering the pitch angle and azimuth angle;

- For the polyhedron mapping method, the spatial information includes a number indicating the face of the polyhedron contained in the panoramic video;

- For the cylinder mapping method, the spatial information includes the numbers of the faces included in the panoramic video, and the specific horizontal and spatial coordinate ranges on the sides;

The quality information includes resolution information of the panoramic video;

The resolution information includes the horizontal and vertical pixel numbers of the spatial region, wherein the resolution of the upper and lower bottom surfaces of the cylinder is indicated by the bottom surface radius;

Initial viewing angle azimuth coordinates: For the latitude and longitude map, binocular stereo latitude and longitude map and cylinder mapping, the initial viewing angle azimuth coordinates are used to identify the initial azimuth of the user watching the video;

Initial viewing angle pitch angle coordinates: For the latitude and longitude map, binocular stereo latitude and longitude map and cylinder mapping, the initial viewing angle pitch angle coordinates are used to identify the initial pitch angle at which the user watches the video;

The pitch angle of the upper end of the space area: indicates the pitch angle of the upper edge of the current area;

Azimuth of the left end of the space area: Indicates the azimuth of the left end of the current area;

Spatial area pitch angle range: Indicates the pitch angle range covered by the current area;

Spatial area azimuth range: Indicates the azimuth angle range covered by the current area.

2 . The panoramic video rendering and presentation attribute indication method according to claim 1 , wherein the method further comprises: receiving the indication information from a network entity, and parsing the indication information for the Rendering and rendering of panoramic videos.

3. A panorama video rendering and presentation method using the method of claim 1 or 2, characterized in that: comprising:

S1: Construct signaling information on the server side, insert first indication information, second indication information and third indication information for the panoramic video into the signaling associated with panoramic video rendering and presentation, and insert the panoramic video into the signaling information. and its corresponding signaling information is sent to the client;

S2: After receiving the panoramic video and the corresponding signaling information sent by the server, the client parses the signaling information including the first indication information, and uses the The second indication information and the third indication information determine the corresponding relationship between the panoramic video media resource and different viewing angle regions on the client-side presentation model, so as to render and present correctly.

4. A panoramic video rendering and presentation system using the method of claim 1 or 2, characterized in that: comprising:

The server side inserts the first indication information, the second indication information and the third indication information for the panoramic video into the signaling associated with the rendering and presentation of the panoramic video, and inserts the panoramic video and its corresponding signaling information sent to the client;

The client, after receiving the panoramic video and the corresponding signaling information sent by the server, parses the signaling information including the first indication information, and uses the information when rendering and presenting the panoramic video. The second indication information and the third indication information determine the corresponding relationship between the panoramic video media resource and the different viewing angle regions on the client-side presentation model, so as to render and present correctly.