GB2638298A - Determining a point of a three-dimensional representation of a scene - Google Patents
Determining a point of a three-dimensional representation of a sceneInfo
- Publication number
- GB2638298A GB2638298A GB2409906.1A GB202409906A GB2638298A GB 2638298 A GB2638298 A GB 2638298A GB 202409906 A GB202409906 A GB 202409906A GB 2638298 A GB2638298 A GB 2638298A
- Authority
- GB
- United Kingdom
- Prior art keywords
- point
- points
- underscan
- determining
- size
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T17/00—Three dimensional [3D] modelling, e.g. data description of 3D objects
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/20—Image signal generators
- H04N13/204—Image signal generators using stereoscopic image cameras
- H04N13/25—Image signal generators using stereoscopic image cameras using two or more image sensors with different characteristics other than in their location or field of view, e.g. having different resolutions or colour pickup characteristics; using image signals from one sensor to control the characteristics of another sensor
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T1/00—General purpose image data processing
- G06T1/60—Memory management
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T15/00—3D [Three Dimensional] image rendering
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T19/00—Manipulating 3D models or images for computer graphics
- G06T19/006—Mixed reality
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10028—Range image; Depth image; 3D point clouds
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2210/00—Indexing scheme for image generation or computer graphics
- G06T2210/56—Particle system, point based geometry or rendering
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computer Graphics (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Computer Hardware Design (AREA)
- Signal Processing (AREA)
- Multimedia (AREA)
- Geometry (AREA)
- Processing Or Creating Images (AREA)
- Image Generation (AREA)
Abstract
A method of determining a point of a three-dimensional representation of a scene comprises identifying 71 a plurality of points of the representation and determining 72 a size of each of the plurality of points. In dependence on the size of each point being beneath a threshold value an underscan point is determined 73 based on one or more of the first plurality of points, the underscan point having a size greater than the threshold value. The method may comprise removing the plurality of points and replacing them with a larger merged point. Attributes of the added underscan point may be based on combining attributes of the plurality of points. The size may relate to a width, height, or an area covered by a point. The method may reduce a file size of a point cloud. Also claimed is a method of rendering a point cloud in which a size of a point is determined based on an underscan factor, an arrangement of angular brackets is determined based on the size and attribute values for each bracket is determined from an attribute of the point.
Description
Determining a point of a three-dimensional representation of a scene Field of the Disclosure The present disclosure relates to methods, systems, and apparatuses for determining (e.g. defining) a point of a three-dimensional representation of a scene, in particular determining a point in a point cloud.
Background to the Disclosure
Three-dimensional representations of environments are used in many contexts, including for the generation of virtual reality videos, in which depth information for a plurality of points of the representation is used to generate different images for a left eye and a right eye of a user. Typically, substantial processing power is required to determine such a three-dimensional representation, and the file size of files associated with these representations is typically large so that substantial amounts of storage are needed to keep the files and substantial amounts of bandwidth are required to transfer the files.
Summary of the Disclosure
According to the present disclosure, there is described a method of: A method of determining a point of a three-dimensional representation of a scene, the method comprising: identifying a first plurality of points of the representation; determining a size of each point of the first plurality of points; and in dependence on the size of each point being beneath a threshold value, determining a point (e.g. an 'underscan' point) based on one or more of the first plurality of points, the underscan point having a size greater than the threshold value.
Preferably, the representation comprises a plurality of points associated with a plurality of different capture devices, and the method comprises identifying a first plurality of points associated with a first capture device.
Preferably, the method comprises identifying a first plurality of adjacent points of the representation.
Preferably, the size of the point is associated with one or more of: a width, height, and/or surface area covered by the point.
Preferably, each point of the first plurality of points is associated with an area of the three-dimensional representation and/or each point is associated with an angular bracket associated with a capture device of the three-dimensional representation. Preferably, the underscan point covers each of said areas and/or each of said angular brackets.
Preferably, the underscan point is associated with an underscan factor, where the underscan factor defines a size of the underscan point. Preferably, the method comprises determining the underscan factor for the underscan point based on a difference between the sizes of each of the identified points and the threshold value.
Preferably, the threshold value is predetermined and/or selected by a user.
Preferably, the threshold value is selected based on a feature of the scene. Preferably, the threshold value is based on a resolution associated with the scene (e.g. an intended resolution on the scene) Preferably, the threshold value is selected based on a desired file size of the representation and/or a selected compression.
Preferably, the threshold value is selected based on a complexity of the scene and/or of a surface associated with the first plurality of points. Preferably, the complexity is defined by a user and/or is determined by a computer device.
Preferably, a size datafield of the underscan point comprises components associated with one or more of: a height; a width; and an underscan factor. Preferably, one or more (or each) component is associated with a number of angular brackets covered by the point.
Preferably, the width and the height are dependent on the underscan factor. Preferably, the underscan factor defines a maximum width and/or height of the underscan point.
Preferably, a value of the size datafield is defined as: size value = underscan*9 + height*3 + width.
Preferably, determining the underscan point comprises one or more of: modifying a point of the first plurality of points; removing one or more of the first plurality of points; and replacing the first plurality of points with the underscan point.
Preferably, the method comprises determining an attribute of the underscan point. Preferably, the method comprises determining the attribute based on the attributes of the identified points.
Preferably, the method comprises determining an attribute of the underscan point. Preferably, the method comprises determining the attribute based on the attributes of the identified points.
Preferably, the method comprises determining a similarity value associated with the first plurality of points, and determining the underscan point in dependence on the similarity value. Preferably, the method comprises determining a similarity value for the first plurality of points; and determining the underscan in dependence on the similarity value exceeding a threshold.
Preferably, the threshold depends on a distance of the point from a viewing zone of the representation.
Preferably, determining the underscan point comprises one or more of: modifying a point of the first plurality of points; removing one or more of the first plurality of points; and replacing the first plurality of points with the underscan point.
Preferably, the similarity value is associated with one or more of: locations of each of the points; attributes of each of the points; capture devices associated with each of the points; and normals associated with each of the points. For example, the method may comprise determining one or more of: a location similarity value; an attribute similarity value; a capture device similarity value; and a normal similarity value.
Preferably, the method comprises determining a similarity of normals associated with each of the first plurality of points, and determining the underscan point in dependence on this similarity of normals exceeding a threshold.
Preferably, the method comprises determining a similarity of attributes associated with each of the first plurality of points, and determining the underscan point in dependence on this similarity of attributes exceeding a threshold.
Preferably, the similarity is associated with one or more of a variance of values associated with the points; and a range and/or spread of values associated with the points.
Preferably, the method comprises determining that each of the first plurality of points lies on a shared plane and/or a shared surface.
Preferably, the threshold depends on a number and/or arrangement of the first plurality of points. Preferably, the threshold depends on a user input.
Preferably, determining the underscan point comprises determining a location for the underscan point.
Preferably, determining the underscan point comprises determining an attribute value for the underscan point.
Preferably, the method comprises determining a location for the underscan point, the location being the same as the location of one of the first plurality of points.
Preferably, determining the underscan point comprises defining a size of the underscan point, the size being dependent on the number and/or arrangement of the first plurality of points.
Preferably, the size indicates a number of angular brackets of the representation that are covered by the point.
Preferably, determining the first plurality of points comprises determining a plurality of contiguous points of the first representation.
Preferably, the method comprises determining a plurality of contiguous points in a first arrangement, the first arrangement being one of a predetermined list of one or more arrangements.
Preferably, the method comprises determining a coverage of the underscan point. Preferably, the method comprises determining the coverage based on the size of the underscan point.
Preferably, the method comprises determining the coverage based on configuration information associated with the three-dimensional representation.
Preferably, the coverage indicates a number of angular brackets covered by the underscan point.
Preferably, the angular brackets are associated with a capture device used to capture the plurality of points.
Preferably, the method comprises: determining a second plurality of points, the second plurality of points comprising one or more points of the first plurality of points; determining a similarity value for the second plurality of points; and in dependence on the similarity value exceeding a threshold, determining an underscan point based on the second plurality of points; Preferably, the method comprises determining the second plurality of points in dependence on the similarity value for the first plurality of points not exceeding the threshold.
Preferably, the method comprises: determining similarity values for at least two pluralities of points; and determining one or more underscan points based on the similarity values.
Preferably, the method comprises: at a first time, determining a first similarity value for a first plurality of points; and at a second time (e.g. subsequent to the first time), determining a second similarity value for a second plurality of points; wherein the second plurality of points comprises a smaller number of points than the first plurality of points.
Preferably, the method comprises: determining a first plurality of points at a first location in the representation; determining a similarity of the first plurality of points; updating the location, preferably by updating (e.g. incrementing) an angle associated with the location; determining a second plurality of points at the second location; determining a similarity of the second plurality of points; and determining one or more underscan points based on the similarity values.
Preferably, the method comprises updating the location so as to determine similarity values for a number of pluralities of points of the representation.
Preferably, the method comprises determining similarity values relating to a plurality of arrangements of points. Preferably, the method comprises determining similarity values in turn for arrangements of decreasing size.
Preferably, the method comprises determining a grid system for the three-dimensional representation, wherein the first plurality of points is located in a first subdivision of the grid system.
Preferably, the grid system is similar (e.g. the same) for each of a plurality of three-dimensional representations, preferably a plurality of three-dimensional representations associated with successive frames of a video.
Preferably, the method comprises, separately for each subdivision, determining at least two pluralities of points; determining similarity values for the at least two pluralities of points; and determining one or more underscan points for said subdivision based on the similarity values.
Preferably, the method comprises determining a plurality of grid systems, wherein each grid system is associated with a different size and/or shape of grid cells; and determining one or more underscan points based on the plurality of grid systems.
Preferably, the similarity value is associated with one or more of: locations of each of the points; distances of each of the points from a capture device associated with the points; and normals associated with each of the points.
Preferably, the method comprises: determining a similarity of normals associated with each of the first plurality of points, and determining the underscan point in dependence on this similarity of normals exceeding a threshold.
Preferably, the method comprises determining that the first plurality of points lie on a shared plane, and determining the underscan point in dependence on said determination that the first plurality of points lie on a shared plane.
Preferably, the method comprises determining a similarity of attributes associated with each of the first plurality of points, and determining the underscan point in dependence on this similarity of attributes exceeding a threshold.
Preferably, the underscan point is determined without consideration of the attributes of the identified points.
Preferably, the similarity is associated with one or more of a variance of values associated with the points; and a range and/or spread of values associated with the points.
Preferably, the threshold depends on a number and/or arrangement of the first plurality of points.
Preferably, determining the underscan point comprises: determining a location for the underscan point, preferably determining the location as being the same as the location of one of the first plurality of points.
Preferably, the method comprises determining an attribute value for the underscan point.
Preferably, the method comprises defining a size of the underscan point, the size being dependent on the number and/or arrangement of the first plurality of points. Preferably, the size indicates a number of angular brackets of the representation that are covered by the underscan point.
Preferably, the method comprises determining the first plurality of points comprises determining a plurality of contiguous points of the first representation. Preferably, the method comprises determining a plurality of contiguous points in a first arrangement.
Preferably, the method comprises determining an underscan factor for the underscan point. Preferably, the method comprises determining the underscan factor based on the sizes of the identified points. Preferably, the method comprises determining the underscan factor so as to obtain an underscan point with a size greater than the threshold value.
Preferably, the method comprises determining sizes relating to a plurality of arrangements of points.
Preferably, the method comprises determining sizes in turn for arrangements of decreasing size.
Preferably, the method comprises: determining, in a first step, one or more points to be captured by one or more capture devices associated with the representation; and determining, in the second step, attribute values for said one or more points.
Preferably, the method comprises determining the first plurality of points from the one or more points.
Preferably, the first step comprises: determining a distance and a normal for one or more potential points; determining the one or more points to be captured from said potential points based on the determined distances and normals. Preferably, the potential points are associated with a plurality of capture devices.
Preferably, the method comprises determining the attribute values in the second step in dependence on one or more determined underscan points, preferably comprising updating an angle associated with the of the capturing attribute values based on the size of said underscan points.
Preferably, the second step comprises capturing attribute values for a plurality of points at a plurality of respective capture angles. Preferably, the second step comprises incrementing a capture angle so as to capture the plurality of attribute values. Preferably, the second step comprises incrementing the angle based on an underscan factor associated with a point for which an attribute is being captured.
Preferably, incrementing the capture angle comprises incrementing the capture angle so as to capture points in a plurality of adjacent brackets. Preferably, the method comprises: identifying an underscan point; determining an attribute value for the underscan point; and incrementing the capture angle based on an underscan factor associated with the underscan point. Preferably, incrementing the capture angle comprises skipping one or more angular brackets associated with the capture process; and determining a further attribute value at the incremented capture angle.
According to another aspect of the present disclosure, there is described a method of determining an attribute of a point of a three-dimensional representation of a scene, the method comprising: identifying a point; identifying an underscan factor associated with the point; determining a size of the point based on the underscan factor; determining, based on the size, an arrangement of angular brackets covered by the point; and determining attribute values for each of the angular brackets based on an attribute value of the point.
Preferably, the method comprises determining the size of the point from a predetermined association between the underscan factor and the size.
Preferably, the method comprises rendering a two-dimensional image from the three-dimensional representation based on the determined attribute values.
Preferably, the three-dimensional representation is associated with a viewing zone, the viewing zone comprising a subset of the scene and/or the viewing zone enabling a user to move through a subset of the scene. Preferably, the user is able to move within the viewing zone with six degrees of freedom (6DoF). Preferably, the viewing zone has a volume of less than 50% of the volume of the scene, less than 20% of the volume of the scene, and/or less than 10% of the volume of the scene. Preferably, the viewing zone has, or is associated with, a volume, preferably a real-world volume, of less than five cubic metres (5m3), less than one cubic metre (1m3), less than one-tenth of a cubic metre (0.1m3) and/or less than one-hundredth of a cubic metre (0.01m3).
Preferably, the three-dimensional representation comprises a point cloud.
Preferably, the method comprises storing the three-dimensional representation and/or outputting the three-dimensional representation. Preferably, the method comprises outputting the three-dimensional representation to a further computer device.
Preferably, the method comprises generating an image and/or a video based on the three-dimensional representation.
Preferably, the method comprises forming one or more two-dimensional representations of the scene based on the three-dimensional representation. Preferably, the method comprises comprising forming a two-dimensional representation for each eye of a viewer.
Preferably, the point is associated with one or more of: a location; an attribute; a transparency; a colour; and a size.
Preferably, the point is associated with an attribute for a right eye and an attribute for a left eye.
Preferably, the scene comprises one or more of: an extended reality (XR) scene; a virtual reality (VR) scene; an augmented reality (AR) scene; and a mixed reality (MR) scene.
Preferably, the method comprises forming a bitstream that includes the underscan point.
According to another aspect of the present disclosure, there is described a system for carrying out the aforesaid method, the system comprising one or more of a processor; a communication interface; and a display.
According to another aspect of the present disclosure, there is described an apparatus for determining a point of a three-dimensional representation of a scene, the apparatus comprising: means for (e.g. a processor for) identifying a first plurality of points of the representation; means for (e.g. a processor for) determining a size of each point of the first plurality of points; and means for (e.g. a processor for) determining, in dependence on the size of each point being beneath a threshold value, an underscan point based on one or more of the first plurality of points, the underscan point having a size greater than the threshold value.
According to another aspect of the present disclosure, there is described an apparatus for determining an attribute of a point of a three-dimensional representation of a scene, the appratus comprising: means for (e.g. a processor for) identifying a point; means for (e.g. a processor for) identifying an underscan factor associated with the point; means for (e.g. a processor for) determining a size of the point based on the underscan factor; means for (e.g. a processor for) determining, based on the size, an arrangement of angular brackets covered by the point; and means for (e.g. a processor for) determining attribute values for each of the angular brackets based on an attribute value of the point.
According to another aspect of the present disclosure, there is described a bitstream comprising one or more underscan points determined using the aforesaid method.
According to another aspect of the present disclosure, there is described a bitstream comprising an underscan point, the underscan point comprising an underscan factor that indicates one or more of: a size of the underscan point; arrangement; and number of angular brackets covered by the underscan point. Preferably, the underscan point is determined using the aforesaid method.
Preferably, the bistream comprises one or more flags indicating one or more of: whether underscan points are present in the representation; an interpretation of a size value of a point; a relationship between an underscan factor and a size of a point; and a process for converting a size value of a point into an actual size of the point.
According to another aspect of the present disclosure, there is described an apparatus (e.g. an encoder) for forming and/or encoding the aforesaid bitstream.
According to another aspect of the present disclosure, there is described an apparatus (e.g. a decoder) for receiving and/or decoding the aforesaid bitstream.
Any feature in one aspect of the disclosure may be applied to other aspects of the invention, in any appropriate combination. In particular, method aspects may be applied to apparatus aspects, and vice versa.
Furthermore, features implemented in hardware may be implemented in software, and vice versa. Any reference to software and hardware features herein should be construed accordingly.
Any apparatus feature as described herein may also be provided as a method feature, and vice versa. As used herein, means plus function features may be expressed alternatively in terms of their corresponding structure, such as a suitably programmed processor and associated memory.
It should also be appreciated that particular combinations of the various features described and defined in any aspects of the disclosure can be implemented and/or supplied and/or used independently.
The disclosure also provides a computer program and a computer program product comprising software code adapted, when executed on a data processing apparatus, to perform any of the methods described herein, including any or all of their component steps.
The disclosure also provides a computer program and a computer program product comprising software code which, when executed on a data processing apparatus, comprises any of the apparatus features described herein.
The disclosure also provides a computer program and a computer program product having an operating system which supports a computer program for carrying out any of the methods described herein and/or for embodying any of the apparatus features described herein.
The disclosure also provides a computer readable medium having stored thereon the computer program as aforesaid.
The disclosure also provides a signal carrying the computer program as aforesaid, and a method of transmitting such a signal.
The disclosure extends to methods and/or apparatus substantially as herein described with reference to the accompanying drawings.
The disclosure will now be described, by way of example, with reference to the accompanying drawings. Description of the Drawings Figure 1 shows a system for generating a sequence of images.
Figure 2 shows a computer device on which components of the system of Figure 1 may be implemented.
Figure 3 shows a method of determining a three-dimensional representation of a scene.
Figures 4a and 4b show method of determining a point based on a plurality of sub-points.
Figure 5 shows a scene comprising a viewing zone.
Figures 6a and 6b show arrangements of capture devices for determining points of the three-dimensional representation.
Figure 7 shows a point that can be captured by a plurality of capture devices. Figures 8a and 8b show grids formed by the different capture devices.
Figure 9 describes a method of determining a location of a point of the three-dimensional representation.
Figure 10 shows a method of determining an angle of a point from a capture device used to capture the point.
Figures 11a and 11b show methods of determining an aggregate point of the three-dimensional representation.
Figures 12a, 12b, 13a, and 13b illustrate the replacement of a plurality of points of the three-dimensional representation with ana aggregate point.
Figures 14a -14e, 15a -15e, and 16a-16d show detailed methods of determining aggregate points. Figure 17 shows different sizes of points that may be captured by a capture device.
Figures 18 and 19 describe methods of determining underscan points that are associated with a size value. Figure 20 shows a bitstream.
Description of the Preferred Embodiments
Referring to Figure 1, there is shown a system for generating a sequence of images. This system can be used to generate, and then display, a representation of an environment, which may comprise a VR environment (or an XR environment).
The system comprises an image generator 11, an encoder 12, a transmitter 13, a network 14, a receiver 15, a decoder 16 and a display device 17.
These components may each be implemented on separate apparatuses. Equally, various combinations of these components may be implemented on a shared apparatus; for example, the image generator 11, the encoder 12, and the transmitter 13 may all be part of a single image data generation device. Similarly, the receiver 15, the decoder 16, and the display device 17 may all be a part of a single image rendering device.
Typically, the system comprises at least one encoding computer device (e.g. a server of a content provider) and at least one rendering computer device (e.g. a VR headset).
Referring to Figure 2, each of the components, and in particular the image generator 11, the encoder 12, the transmitter 13, the receiver 15, the decoder 16 and the display device 17 is typically Implemented on a computer device 20, where, as described above, a plurality of these components may be implemented on a shared computer device.
Each computer device comprises one or more of: a processor 21 for executing instructions (e.g. so as to perform one or more of the steps of the various methods described below), a communication interface 22 for facilitating communication between computer devices (e.g. an ethernet interface, a Bluetooth® interface, or a universal serial bus (UBS) interface, a memory 23 and/or storage 24 for storing information and instructions (e.g. a random access memory (RAM), a read only memory (ROM), a hard drive disk (HDD) a solid state drive (SSD), and/or a flash memory, and a user interface 25 (e.g. a display, a mouse, and/or a keyboard) for enabling a user to interact with the computer device. These components may be coupled to one another by a bus 25 of the computer device.
The computer device 20 may comprise further (or fewer) components. In particular, the computer device (e.g. the display device 17) may comprise one or more sensors, such as an accelerometer, a GPS sensor, or a light sensor. These sensors typically enable the computer device to identify an environmental condition and/or an action of wearer of the display device.
Turning back to Figure 1, the image generator 11 is configured to generate a sequence of image data (e.g. a sequence of image frames) to enable the display device 17 to use this image data to display a plurality of images. The image data may comprise one or more digital objects and the image data may be generated or encoded in any format. For example, the image data may comprise point cloud data, where each point has a 3D position and one or more attributes. These attributes may, for example, include, a surface colour, a transparency value, an object size and a surface normal direction. Each attribute may have a value chosen from a continuous range or may have a value chosen from a discrete set.
The image data enables the later rendering of images. This image data may enable a direct rendering (e.g. the image data may directly represent an image). Equally, the image data may require further processing in order to enable rendering. For example, the image data may comprise three-dimensional point cloud data, where rendering a two-dimensional image using this data requires processing based on a viewpoint of this two-dimensional image.
The image data may comprise depth map data, where one or more pixels or objects in the image is associated with a depth that is specified by the depth map data. The depth map data may be provided as a depth map layer, separate from an image layer. In some contexts, such as MPEG Immersive Video (MIV), the image layer may instead be described as a texture layer. Similarly, in some contexts, the depth map layer may instead be described as a geometry layer.
The image data may include a predicted display window location. The predicted display window location may indicate a potion of an image that is likely to be displayed by the display device 17. The predicted display window location may be based on a viewing position (such as a virtual position and/or orientation of the user in a 3D environment) of the user, where this viewing position may be obtained from the display device. The predicted display window location may be defined using one or more coordinates. For example, the predicted display window location may be defined using the coordinates of a corner or center of a predicted display window, and may be defined using a size of the predicted display window. The predicted display window location may be encoded as part of metadata included with the frame.
The image data for each image (e.g. each frame) may include further information, which may be provided as a part of an image, e.g. as part of the point cloud data, or as separate layers. In particular, the image data may include audio information or haptic feedback information indicating audio or haptics which can accompany displayed visual data. An audio layer or haptic layer may accompany each image, and may be omitted for images where no accompanying audio or haptics are required.
Similarly, the image data may comprise interactivity information, where the image data may contain or indicate elements with which a user can interact. The interactivity information may, for example, define a behaviour of an element, where a user is able to interact with the element based on this behaviour. The behaviour typically defines a change in an element that occurs as a result of a user interaction where this change may comprise a change in the attributes of the element or in the rendering of the element. As an example, where an image contains a target element, the target element may be arranged to disappear when a user interacts with this element, or to provide feedback indicating that the user has interacted with the target. This interactivity data may be provided as part of, or separately to, the image data.
The image data may indicate, or may be combinable with, a state of the virtual environment, a position of a user, or a viewing direction of the user. Here, the position and viewing direction may be physical properties of the user in the real-world, or position and viewing direction may also be purely virtual, for example being controlled using a handheld controller. The image generator 11 may, for example, obtain information from the display device 17 that indicates the position, viewing direction, or motion of the user. Equally, the image generator may generate image data such that it can later be combined with this position, viewing direction, or motion, where the image generator may generate a full scene which is only partially viewed by a user depending on the position of that user.
In some cases, the generated image may be independent of user position and viewing direction. This type of image generation typically requires significant computer resources such as a powerful GPU, and may be implemented in a cloud service, or on a local but powerful computer. For example, a cloud service (such as a Cloud Rendering Service (CRN)) may reduce the cost per-user and thereby make the image frame generation more accessible to a wider range of users. Here "rendering" refers at least to an initial stage of rendering to generate an image. Further rendering may occur at the display device 17 based on the generated image to produce a final image which is displayed.
The image generator 11 may, for example, comprise a rendering engine for initially rendering a virtual environment such as a game or a virtual meeting room.
The encoder 12 is configured to encode frames to be transmitted to the display device 17. The encoder may be implemented using executable software or may be implemented on specific hardware such as an ASIC. In some embodiments, the image generator 11 may transmit raw, unencoded, data through the network 14. However, such transmission typically leads to a high file size and requires a high bandwidth so that it is typically desirable to encode the data prior to the transmission.
The encoder 12 may encode the image data in a lossless manner or may encode the data a lossy manner. The encoder may apply inter-frame or intra-frame compression based on a currently-encoded frame and optionally one or more previously encoded frames. The encoder may be a multi-layer encoder, such as a low complexity enhancement video codec (LCEVC) enabled encoder.
Where the generated frames comprise depth map data, the encoder 12 may perform layered encoding on each instance of image data (e.g. each frame) to generate an encoded frame comprising a base depth map layer and an enhancement depth map layer. Encoding a depth map in this way may improve compression. In some applications, such as HDR video, depth maps are desirably highly detailed with a bit depth of up to twelve or fourteen bits, which is a significant increase in the data to be transmitted. As a result, providing ways to improve compression of the depth map can make more realistic depth map-based displays viable when performing rendering or transmission of rendered data in real-time. Furthermore, this type of layered encoding makes it easy to drop (and then pick back up) one or more of the layers, which provides flexibility and tools for bandwidth management.
Layered encoding is also helpful as the final decoder/user device (such as a user display device) can choose whether to process these extra layers. For example, in a non-layered approach, the best the end device (i.e. the receiver, decoder or display device associated with a user that will view the images) can do is determine that it does not have enough resources for a given quality (be it resolution, frame rate, inclusion of depth map) and then signal to the controller/renderer/encoder that it does not have enough resources. The controller then will send future images at a lower quality. In that alternative scenario, the end device still unfortunately has to process the higher quality data until the lower quality data arrives, if it can process the received images at all.
In some of the described embodiments, this situation is improved upon because when/if the end device determines for example that it does not have the processing capabilities to handle the highest level of quality, then it can drop and/or choose not to process certain layers. The end device may also signal to the controller that it needs a lower level of quality, but in the meantime the end device can only process the number of layers that it can handle. Therefore, the end device can react to conditions much more quickly.
In some cases, depth map data may be embedded in image data. In this case, the base depth map layer may be a base image layer with embedded depth map data, and the enhancement depth map layer may be an enhancement image layer with embedded depth map data.
Alternatively, when the generated images comprise a depth map layer separate from an image layer and multi-layer encoding is applied, the encoded depth map layers may be separate from the encoded image layers. This has the advantage that the encoded depth map layers can be dropped under some conditions while still retaining image layers that can be displayed (albeit with a lower level of realism). For example, the encoded depth map layers can be dropped by a transmitter or encoder when available communication resources are reduced, or can be dropped by an end device which lacks the processing resources to handle the highest level of quality.
Similarly, if some images comprise an audio base layer, a haptic feedback base layer, an audio enhancement layer or a haptic feedback enhancement layer, these can be processed or dropped flexibly.
Again similarly, if some images comprise an interactivity data base layer or an interactivity enhancement layer these can be processed or dropped flexibly. For example, certain interactions may only be possible where a threshold bandwidth is available, where complex interactions (e.g. those enabling a conversation with a digital object) may be disabled before less complex interactions (e.g. changing a pixel colour) are disabled.
Additionally or alternatively, where the image data comprises point cloud data, the encoder may apply a point cloud data encoding technique such as described in European patent application EP21386059.6, which is incorporated herein by reference. Such a point cloud encoder may act as a base encoder for a layered encoding technique such as LCEVC or VC-6. Notably LCEVC and VC-6 techniques encode and decode a layered signal, but are agnostic about the content type of data encoded in the signal. For example, the signal can include textures, video frames, geometry or depth data, meshes, point clouds, rendering attributes or physics engine attributes.
The transmitter 13 may be any known type of transmitter for wired or wireless communications, including an Ethernet transmitter or a Bluetooth transmitter.
The transmitter 13 may be configured to make decisions about how to transmit the image data, and/or may provide feedback to the encoder 12 or the image generator 11. For example, the transmitter may determine available communication resources (e.g. bandwidth) for transmitting image data, and may drop one or more layers from an encoded frame, or indicate to the image generator and/or encoder that image data should be generated and encoded with fewer layers, when insufficient bandwidth is available for transmission of all generated data. As specific examples, the transmitter may be configured to drop a depth map layer, an LCEVC enhancement layer, or a VC-6 enhancement layer from a frame when insufficient communication resources are available.
The network 14 provides a channel for communication between the transmitter 13 and the receiver 15, and may be any known type of network such as a WAN or LAN or a wireless Wi-Fi or Bluetooth network. The network may further be a composite of several networks of different types. Many users only have access to a network with a bandwidth of 30MBps which can lead to latency jitter when streaming. The required bandwidth and the observed latency can be reduced by means of tactics such as forward-looking rendering and last-millisecond reprojection, which are enabled by improved compression.
The receiver 15 may be any known type of receiver for wired or wireless communications, including an Ethernet transmitter or a Bluetooth transmitter.
The decoder 16 is configured to receive and decode an encoded frame. The decoder may be implemented using executable software or may be implemented on specific hardware such as an ASIC.
The display device 17 may for example be a television screen or a VR headset. The timing of the display may be linked to a configured frame rate, such that the display device may wait before displaying the image. The display device may be configured to perform warping, that is, to obtain a final display window location, adjust a warpable image to obtain a final image corresponding to a final viewing direction of the user, and display the final image.
In this regard, the image data is typically arranged to provide a warpable image for which a portion of the image that is displayed at the display device 17 is dependent on a position or orientation of a viewer. The warpable image may then be rendered before a most up to date viewing direction of the user is known. The warpable image may be transmitted to the display device, or the warpable image may be transmitted to a rendering node which is near to the display device, and the display device or rendering node may perform time warping to generate a displayed image portion based on the warpable image and the most up to date viewing direction of the user.
As mentioned above, a single device may provide a plurality of the described components. For example, a first rendering node may comprise the image generator 11, encoder 12 and transmitter 13. Additional similar rendering nodes may be included in the system, and may work together to generate the sequence of frames.
In one case, multiple rendering nodes may each provide separate image data to an image data assembling node; for example, each rendering node may provide a part of a sequence of frames to a frame assembling node.
For example, the receiver 15, decoder 16 or display device 17 may be configured to assemble parts of image data from multiple sources to generate a sequence of images for display on the display device.
Alternatively, the image data assembling node may be separate from the receiver 15, decoder 16 and display device 17.
Additionally or alternatively, multiple rendering nodes may be chained. In other words, successive rendering nodes may add to a sequence of image data as it passes from rendering node to rendering node, and eventually a complete sequence of image data is then provided to the receiver 15. Furthermore, each rendering node may obtain components of a render from multiple upstream rendering nodes and/or distribute components of a render to multiple downstream rendering nodes.
A chain of rendering nodes may be useful for performing different rendering tasks that require different quantities of processing resources, or different frame rates. For example, a company may provide distributed processing in the form of a centralised hub which has abundant processing resources but is distant from users, and peripheral locations which have more scarce processing resources but are closer to users. Expensive but fairly static rendering features such as background lighting or environmental impact on sound may be generated at the central hub (for example using ray tracing), while features that require fewer resources but faster responses or higher frame rates may be generated closer to the user. In other words, the more responsive a rendering feature needs to be, the lower latency it needs between the rendering node which generates the feature and the user display and, in a chain of rendering nodes, the node which generates each rendering feature can be chosen based on a required maximum latency of that feature. On the other hand, if it is expensive to generate a rendering feature, then it may be preferable to generate the feature less frequency and with a higher maximum latency. For example, a static, high-quality background feature may be generated early in the chain of rendering nodes and a dynamic, but potentially lower-quality, foreground feature may be generated later in the chain of rendering nodes, closer to the user device. Here, environmental impact on sound means, for example, a set of surfaces may be constructed where each surface has different sound reflection and absorption properties depending upon material and shape. The frame rates may be matched by creating multiple frames with features generated at the lower frame rate, and combining them with the frames with features generated at the higher frame rate. In a non-limiting embodiment, a preliminary rendering generates volumetric object data including motion vectors at a first (lowest) frame rate, then produces 2D rendered frames plus depth information for a specific user at a second (higher) frame rate, then transmits video plus depth data to the user device, which produces final frames for display via space warping (depth-based reprojections) at a third (highest) frame rate. One or more of these steps may be performed in combination with the other described embodiments. The viewing position of the user may change as additional rendering tasks are performed at different rendering nodes in the chain. Each or any rendering node may obtain an updated viewing position before performing its respective rendering task.
Additionally, the system may simultaneously generate multiple sequences of image data for different respective users or different respective display devices. For example, in the context of a VR or AR experience, each user or display device may view a different 3D environment, or may view different parts of a same 3D environment. When using a chain of rendering nodes, each node may serve multiple users or just one user.
For example, a starting rendering node (e.g. at a centralised hub) may serve a large group of users. For example, the group of users may be viewing nearby parts of a same 3D environment. In this case, the starting node may render a wide zone of view ("field of view") which is relevant for all users in the large group.
The starting node may send this wide field of view to a first middle rendering node which renders additional aspects of the 3D environment. These additional aspects may for example be aspects which require less processing power to render, or may be aspects which are specific to individual users of the group. Additionally, the middle rendering node may render features in a smaller field of view than the starting node -this smaller field of view may be relevant to each user rather than the group of users. The first middle rendering node may additionally only serve a smaller number of users (e.g. half of the large group of users), with the remaining users being served by a second middle rendering node which also receives the wide field of view from the starting node.
The middle rendering node(s) may then send sequences of second partially or fully rendered frames to an end device for each user. The end device may perform further processes such as warping or focal distance adjustments, optionally using depth map data.
Preferably, each rendering node encodes the partially or fully rendered frames before transmitting them on to a next rendering node or to the receiver 15. This means that the required communication resources can be reduced when the rendering nodes are separated by one or more networks, or more generally are implemented in a distributed system such as a cloud.
However, each rendering node in a chain is encoding a different partially or fully rendered frame, with different data. Therefore, it may be advantageous for different rendering nodes to use different rendering formats and/or encoding formats. For example, the output from a first rendering node may be point cloud data which logically describes a 3D scene. This point cloud data can be encoded using the techniques of EP21386059.6. A second rendering node may then operate on the point cloud data to generate image data that is more readily displayed by a generic display device, without requiring the display device to model the 3D environment. This image data may be encoded using video coding techniques.
The chaining of rendering nodes may be extended to arbitrary tree structures, where a rendering node obtains partially rendered frames from more than one preceding rendering node, and generates further partially or fully rendered frames based on the multiple obtained sequences of partially rendered frames.
For example, a content rendering network (CRN) comprising numerous rendering nodes may be used to serve a volumetric event to a large number of same-time users, such as users participating in a shared virtual environment. Rendering the same event for each user is far more expensive in terms of computation time and power consumption than rendering the volumetric effect once and performing the rendering equivalent of multicasting the volumetric effect for multiple users. For example, each user may have a second rendering node (such as a VR headset), and the network may comprise a central first rendering node. The first rendering node may render the volumetric event, and distribute partially rendered frames depicting the volumetric event to the different second rendering nodes. The second rendering node for each user may then integrate the partially rendered frames depicting the volumetric event into a view of the virtual environment which is currently being shown to each user, based on parameters such as the user's virtual position.
The receiver 15, decoder 16 and display device 17 may be consolidated into a single device, or may be separated into two or more devices. For example, some VR headset systems comprise a base unit and a headset unit which communicate with each other. The receiver 15 and decoder 16 may be incorporated into such a base unit.
In some embodiments, the network 14 may be omitted. For example, a home display system may comprise a base unit configured as an image source, and a portable display unit comprising the display device 17.
In the event that the decoder 16 or the display device 17 does not or cannot handle one or more layers, the receiver 15 or another transmitter associated with the decoder or display device may send a corresponding layer drop indication back through the network 14. The layer drop indication may be received by each rendering node. A rendering node which generates partially or fully rendered frames for that specific decoder or display device may cease generating the dropped layer. On the other hand, a rendering node which generates partially or fully rendered frames for multiple end devices may disregard a layer drop indication received from one end device (as the dropped layer is still needed for other devices).
Alternatively, rendering nodes which serve multiple end devices may record received layer drop indications, and may cease generating the dropped layer only when all end devices served by the rendering node indicate that the layer is to be dropped.
In preferred examples, the encoders or decoders are part of a tier-based hierarchical coding scheme or format. Hierarchical coding enables frames to be communicated with higher resolution and/or higher frame rate than is possible in single-tier coding schemes. In hierarchical coding, one or more enhancement layers is communicated with base data, where the enhancement layers can be used to up-sample the base data at the decoder, for example providing up-sampling in a spatial or temporal dimension. When combined with equivalent down-sampling of the original frames and generation of the enhancement layer at an encoder, hierarchical coding can overall provide lossless compression of data, with higher resolution and/or higher frame rate for a given transmission bit rate. Examples of a tier-based hierarchical coding scheme include LCEVC: MPEG-5 Part 2 LCEVC ("Low Complexity Enhancement Video Coding") and VC-6: SMPTE VC-6 ST-2117, the former being described in PCT/GB2020/050695, published as WO 2020/188273, (and the associated standard document) and the latter being described in PCT/GB2018/053552, published as WO 2019/111010, (and the associated standard document), all of which are incorporated by reference herein.
However, the concepts illustrated herein need not be limited to these specific hierarchical coding schemes.
A further example is described in W02018/046940, which is incorporated by reference herein. In this example, a set of residuals are encoded relative to the residuals stored in a temporal buffer.
LCEVC (Low-Complexity Enhancement Video Coding) is a standardised coding method set out in standard specification documents including the Text of ISO/IEC 23094-2 Ed 1 Low Complexity Enhancement Video Coding published in November 2021, which is incorporated by reference herein.
The system describes above is suitable for generating and presenting a representation of a scene, where this scene displays media content to a user. The scene typically comprises an environment, where the user is able to move (e.g. to move their head or to turn their head) to look around the environment and/or to move around the environment. For example, the scene may be a scene of a room in a building, where the user is able to move around the room (e.g. by moving in the real-world and/or by providing an input to a user interface) in order to inspect various parts of the room. Typically, the scene is a XR (e.g. a VR) scene, where the user is able to move about the scene in three degrees of freedom (3DoF) or six degrees of freedom (6DoF) so as to experience the scene.
As has been described with reference to Figure 1, the image generator 11 may be arranged to determine point cloud data, where each point of the point cloud has a 3D position and one or more attributes. More generally, the image generator (or another component) is arranged to determine a three-dimensional representation of a scene, where this three-dimensional representation is thereafter used to generate two-dimensional images that are presented to a user at the display device 17.
While the points are typically points of a point cloud, more generally the disclosure extends to any point that is associated with a location and a value. Therefore, the points may, more generally, be considered to be data (or datapoints), which data is associated with a location and a value, and the 'points' may comprise polygons, planes (regular or irregular), Gaussian splats, etc. Referring to Figure 3, there is described a method of determining (an attribute for) a point of such a three-dimensional representation. The method comprises determining the attribute using a capture device, such as a camera or a scanner. The scene may comprise a real scene, in which attribute values are captured using a camera, or a virtual scene (e.g. a three-dimensional model of a scene), in which attribute values are captured using a virtual scanner.
Where this disclosure describes 'determining a point' it will be understood that this generally refers to determining a point that has a location and an attribute value, where determining the point comprises determining the attribute value and/or storing a pointthat comprises at least an attribute value and a location value (these values may be indirect values, e.g. where the location is identified relative to another point). Once a plurality of points have been captured, these points can be stored as a three-dimensional representation (e.g. a point cloud) so as to enable the reconstruction of the three-dimensional scene based no this representation.
Typically, the scene comprises a simulated scene that exists only on a computer. Such a scene may, for example, be generated using software such as the Maya software produced by Autodesk®. The attributes determined using the methods described herein may then depend on virtual objects located within the scene as well as a virtual lighting arrangement used in the scene.
In a first step 11, a computer device initiates a capture process for a capture device, the capture process being initiated with an initial azimuth angle (e.g. of 0°) and an initial elevation angle (e.g. of 0°).
In a second step 12, the computer device causes a point to be captured using the capture device at the current azimuth angle and current elevation angle. Capturing a point typically comprises assigning an attribute value to the point, which attribute value may, for example, be a color of the point and/or a transparency value of the point. Typically, the point has one or more color values associated with each of a left eye and a right eye of a viewer. Capturing the point may also comprise determining a normal value associated with the point, e.g. a normal of a surface on which the point lies. Typically, capturing the point further comprises determining a location of the point, e.g. by determining a distance of the point from the camera.
In practice, determining the point may comprise sending a 'ray' from the capture device and then stepping through a computer model to determine which surface of the computer model is impacted by the ray. The color, transparency, and normal of this surface are then recorded alongside the distance of the surface from the capture device.
In a third step, 13, the computer device determines whether a point has been captured for the capture device at each azimuth of a range of azimuths and in a fourth step 14, if points have not been captured at each azimuth, then the azimuth angle is incremented and the method returns to the second step 12 and another point is captured. The azimuth angle may, for example, be incremented by between 0.01° and 1° and/or by between 0.025° and 0.1°. Typically, the range of azimuth angles is selected to be 360° (i.e. so that the capture device captures points surrounding the entirety of the capture device), but it will be appreciated that other ranges are possible.
Once a point has been captured for each azimuth, in a fifth step 15, the computer device determines whether a point has been captured for the capture device at each elevation of a range of elevations and in a sixth step 16, if points have not been captured at each elevation, then the azimuth angle is reset to the initial value, elevation angle is incremented and the method returns to the second step 12 and another point is captured. The elevation angles may, for example, be incremented by between 0.01° and 1° and/or by between 0.025° and 0.1°. Typically, the range of elevation angles is selected to be 360° (i.e. so that the capture device captures points surrounding the entirety of the capture device), but it will be appreciated that other ranges are possible.
In a seventh step 17, once points have been captured for each azimuth angle and each elevation angle, the scanning process ends.
This method enables a capture device to capture points at a range of elevation and azimuth angles. This point data is typically stored in a matrix. The point data may then be used to provide a representation of the scene to a user, e.g. the three-dimensional representation formed by the point data may be processed to produce two-dimensional images for each eye of a user, with these images then being shown to a user via the display device 17 to provide a virtual reality experience to the viewer. By using the captured data, a video can be provided to a viewer that enables the viewer to move their head to look around the scene (while remaining at the location of the capture device).
It will be appreciated that the capture pattern (or scanning pattern) described with reference to Figure 3 is purely exemplary and that numerous capture patterns are possible. In general, the capture process for each capture device comprises capturing one or more points at one or more azimuth angles and/or one or more elevation angles.
The 'points' captured by the capture device are typically associated with a size, such as a height, a width, or a depth. That is, the points typically relate to two-dimensional planes/pixels and/or three-dimensional voxels. In this regard, there is necessarily some space between the locations of adjacent points (since if the points had no width, then an infinite number of points would be required to capture points at each angle).
The size provides points that depict a non-negligible area of the three-dimensional space so that a plurality of points can be fit together to provide a depiction of the scene to a viewer.
The width and height of each point is typically dependent on the distance of that point from the capture device, where more distant points have a larger width/height. The width and height of each point is typically determined so that when each point is displayed, there is no space between adjacent points (indeed, there may be some overlap between points to ensure that no gaps appear between points). This height/width of each point can be determined at the time of capturing the points, or can be determined or defined after the capture of the points.
Typically, the points comprise a size value, which is stored as a part of the point data. For example, the points may be stored with a width value and/or a height value. Typically, the minimum width and the minimum height of a point are set by the angle increment of the azimuth angle and the elevation angle respectively. The size may be then specified in terms of this angle increment and/or in terms of this minimum width/minimum height (e.g. as being a multiple of the angle increment). In some embodiments, the size value is stored as an index, which index relates to a known list of sizes (e.g. if the size may be any of 1x1, 2x1, 1x2, 2x2, pixels this may be specified by using 3 bits and a list that relates each combination of bits to a size) The size may be stored based on an underscan value. In this regard, where an object is very near to the viewing zone it may be captured using an unnecessarily dense arrangement of points. Therefore, certain surfaces or areas of the representation may be associated with an underscan value, which underscan value defines a reduction in the number of points captured as compared to a representation without underscan. The size of the points may be defined so as to indicate this underscan value. In an exemplary embodiment, the underscan value is an integer value between 0 and 3 and the size is stored as a combination of point dimensions (e.g. a width in the range [0,2]) and a height in the range ([0,2]) and an underscan factor (e.g. an underscan factor in the range [0,3]).
In some embodiments, the width and the height are dependent on the underscan factor. For example, when the underscan factor exceeds a threshold value, the possible height and width values may be limited. In a specific example, when the underscan factor is 3, the width and the height may be limited to the range [0,1].
The size may then be defined as size = underscan.9 + height*3 + width. Such a method provides efficient storage and indication of width, height, and underscan values.
As shown in Figure 4a, typically, for each capture step (e.g. each azimuth angle and/or each elevation angle), a plurality of sub-points SP1, SP2, SP3, SP4, SP5 is determined. For example, where the azimuth angle increment is 0.1° then for an azimuth angle of 0°, sub-points may be determined at azimuth angles of -0.05°, -0.025°, 0, 0.025°, and 0.05° (and similar sub-points may be determined for a plurality of elevation angles). Attribute values of these sub-points may then be combined to obtain an attribute value for the point. For example, a maximum attribute value of the sub-points may be used as the value for the point, an average attribute value of the sub-points may be used as the value for the point, and/or a weighted average of the sub-points may be used as the value for the point. It will be appreciated that numerous other methods for combining the attribute values of the sub-points are possible.
By determining the attribute of a point based on the attributes of sub-points, the accuracy of the capture process can be increased. While it would be possible to simply reduce the increment of the angle steps to provide a higher resolution scene, by considering sub-points but only storing attributes for points, a balance can be struck between accuracy and file size (since storing every sub-point would lead to a substantial increase in the amount of data that needs storing) With the example of Figure 4a, for each point of the three-dimensional representation that is captured by a capture device, this capture device may obtain attributes associated with each of the sub-points SP1, SP2, SP3, SP4, SP5, combine these attributes to obtain a point attribute, and then store a point with a distance that is an average (e.g. a weighted average) of the distances of the sub-points from the capture device, at the nominal angle of the point, with the point attribute.
As shown in Figure 4b, where a plurality of sub-points SP1, SP2, SP3, SP4, SP5 are considered, these points may have different distances from the location of the capture device. In some embodiments, the attributes of the sub-points may be combined in dependence on this distance, e.g. so that sub-points nearer to the capture device have higher weightings.
However, the possibility of sub-points with substantially different distances raises a potential problem. Typically, in order to determine a distance for a point, the distances for the sub-points are averaged. But where the sub-points have substantially different distances and/or are related to different surfaces in the scene, this may result in the point having a distance that does not correspond to any actual surface in the scene. Therefore, the point may seem to hang in space (e.g. to hang between the front and rear surfaces shown in Figure 4b.
Similarly, where the attribute values of the sub-points greatly differ, e.g. if the sub-points SP1 and SP2 are white in colour and the sub-points SP3 and SP4 are black in colour, then the attribute value of the point may be substantially different to the attribute value of other points in the scene. In an example, if the scene were composed of black and white objects, the point may appear as a grey point hanging in space between these objects.
In some embodiments, the computer device is arranged to aggregate sub-points so as not to create any floating points. For example, the computer device may determine whether the sub-points are spatially coherent by employing a clustering algorithm (e.g. a k-means clustering algorithm). Where the sub-points are spatially coherent (e.g. where a difference in the distance of the sub-points is below a threshold value), these distances may be averaged to obtain a distance for the point. Where the sub-points are not spatially coherent, the sub-points may be processed to ensure that the distance of any point places it upon a surface; for example, in the system of Figure 4b, sub-points SP1, SP2, and SP3 may be grouped into a first point and sub-points SP4 and SP5 may be grouped into a second point. Since each sub-point is associated with the same capture device and capture angle (all of these sub-points being associated with a capture step that has a particular azimuth angle and elevation angle), these points may be located at the same angle with respect to a capture device. Therefore, to ensure that each sub-point affects the representation considered, the first point (made up of sub-points SP1, SP2, and SP3) may have a smaller distance value than the second point (made up of sub-points SP4 and SP5) and the first point may be assigned a nonzero transparency value so that the second point can be seen through the first point.
By capturing points at a plurality of azimuth angles and elevation angles, e.g. using the method described with reference to Figure 3, it is possible to provide a three-dimensional representation of the scene that can later be used to enable a viewer to view the scene from a plurality of angles. More specifically, given the three-dimensional points captured by the capture device, a computer device is able to render a two-dimensional representation (e.g. a two-dimensional image) of the scene for each eye of a viewer so as to provide a representation with an impression of depth. The computer device may render a series of two-dimensional representations to enable the viewer to look around the scene, where the two-dimensional representations are rendered based on an orientation of the viewer's head. In this way, the determined representation is useable to provide, for example, a virtual reality (VR), mixed reality (MR), augmented reality (AR), and/or extended reality (XR) experience to the viewer.
To enable such a display, the display device 17 is typically a virtual reality headset, that comprises a plurality of sensors to track a head movement of the user. By tracking this head movement, the display device is able to update the images being displayed to the viewer as the viewer moves their head to look about the scene. Typically, this involves the display device sensing the sensor data to an external computer device (e.g. a computer connected to the display device via a wire). The external computer device may comprise powerful graphical processing units (GPUs) and/or computer processing units (CPUs) so that the external computer device is able to rapidly render appropriate two-dimensional images for the viewer based on the three-dimensional images and the sensor data.
In some embodiments, the external computer device may comprise a server device, where the display device 17 may be connected to this server device wirelessly. This enables the two-dimensional images to be streamed from the server to the display device so as to enable the display of high-quality images without the need for a viewer to purchase expensive computer equipment. In other words, operations that require large amounts of computing power, such as the rendering of two-dimensional images based on the three-dimensional representation, may be performed by the server, so that the display device is only required to perform relatively simple operations. This enables the experience to be provided to a wide range of viewers.
In some embodiments, a first two-dimensional image is provided to the display device 17 (and/or a connected device) and this first image is 'warped' in order to provide an image for viewing at the display device. The warping of the image comprises processing the image based on the sensor data in order to provide an image that matches a current viewpoint of the viewer. By performing the warping at the display device or another local device, the lag between a head movement of the user and an updating of the two-dimensional representation of the scene can be reduced.
One issue with the above-described method of capturing a three-dimensional representation is that it only enables a viewer to make rotational movements. That is, since the points are captured using a single capture device at a single capture location, there is no possibility of enabling translational movements of a viewer through a scene. This inability to move translationally can induce motion sickness within a viewer, can reduce a degree of immersion of the viewer, and can reduce the viewer's enjoyment of the scene.
Therefore, it is desirable to enable translational movements through the scene. To enable such movements, the three-dimensional representation of the scene may be captured using a plurality of capture devices placed at different locations (or the same capture device placed at different locations). A viewer is then able to move around the scene translationally (e.g. by moving between these locations).
More generally, by capturing points for every possible surface that might be viewed by a viewer, a three-dimensional representation of a scene may be captured that allows a suitable two-dimensional representation of this scene to be rendered regardless of a location of a viewer (e.g. regardless of where a user is standing within a virtual room).
This need to capture points for every possible surface (so as to enable movement about a scene) greatly increases the amount of data that needs to be stored to form the three-dimensional representation.
Therefore, as has been described in the application WO 2016/061640 A1, which is hereby incorporated by reference, the three-dimensional representation may be associated with a viewing zone, or a zone of viewpoints (ZVP), where the three-dimensional representation is arranged to enable a user to move about the viewing zone so as to view the scene.
Figure 5 illustrates such a viewing zone 1 and illustrates how the use of a viewing zone limits the amount of image data that needs to be stored to provide a three-dimensional representation of the scene. With the scene shown in this figure, and the viewing zone 1 shown in this figure, it is not necessary to determine attribute data for the occluded surface 2 since this occluded surface cannot be viewed from any point in the viewing zone. Therefore, by enabling the user to only move within the viewing zone (as opposed to around the whole scene) the amount of data needed to depict the scene is greatly reduced.
While Figure 5 shows a two-dimensional viewing zone, it will be appreciated that in practice the viewing zone 1 is typically a three-dimensional zone or volume.
The viewing zone 1 may, for example, comprise a rectangular volume, or a rectangular parallelepiped, and the viewing zone may have a height of at least 30 cm, a depth of at least 30 cm, and/or a width of at least 30 cm, where these dimensions enable a user to move their head while remaining in the viewing zone. This is merely an exemplary arrangement of the viewing zone; it will be appreciated that viewing zones of various shapes and sizes may be used (e.g. spherical viewing zones). That being said, it is preferable that the viewing zone is limited so as to cover only a part of the volume of the scene, e.g. no more than 50% of the scene no more than 25% of the scene, and/or no more than 10% of the scene. In this regard, if the viewing zone is the same size as the scene, then the three-dimensional representation will simply be a standard representation for virtual reality (that enables a user to move freely about the scene) -and so the use of the viewing zone will not provide any reduction in file size.
The viewing zone 1 enables movement of a viewer around (a portion of) the scene. For example, where the scene is a room, the base representation may enable a user to walk around the room so as to view the room from different angles. In particular, the viewing zone enables a user to move through the scene with six degrees-of-freedom (6DoF) movement through the scene, where this aids in the provision of an immersive experience.
In some embodiments, the viewing zone 1 may be four-dimensional, where a three-dimensional location of the viewing zone changes over time -and in such embodiments the size and location of the occluded surface 2 may also change over time. More generally, it will be appreciated that viewing zones may be formed in any size or shape, with different sizes and shapes being suitable for different scenes.
The volume of the viewing zone 1 is typically selected so that a user is able to move to a degree sufficient to avoid motion sickness and to provide an immersive sensation, while still only enabling a limited amount of movement (where this leads to a smaller file size as compared to an implementation where a user is able to fully move about the scene). Typically, the viewing zone is arranged to enable a user to move their head while they are sitting or standing, but not to freely roam around a room.
The viewing zone 1 may have a (e.g. real-world) volume of less than five cubic metres (5m3), less than one cubic metre (1 m3), less than one-tenth of a cubic metre (0.1 m3) and/or less than one-hundredth of a cubic metre (0.01m3).
The viewing zone 1 may also have a minimum size, e.g. the viewing zone may have a volume of at least 1% of the volume of the scene, at least 5% of the volume of the scene, and/or at least than 10% of the volume of the scene. Similarly, the viewing zone may have a volume of at least one-thousandth of a cubic metre (0.01 m3); at least one-hundredth of a cubic metre (0.01 m3); and/or at least one cubic metre (1 m3).
The 'size' of the viewing zone 1 typically relates to a size in the real world, where if the viewing zone has a length of one metre this means that a user is able to move one metre in the real world while staying within the viewing zone. The size of the viewing zone in the scene may be greater than, equal to, or less than the size of the viewing zone in the real world. For example, the viewing zone may scale a real-world distance so that moving one metre in the real world moves the user less than (or more than) one metre in the scene.
This enables the scene to provide different perceptions to the user (e.g. to make the user feel larger or smaller than they are in real life). Similarly, the viewing zone may scale a real-world angle so that rotating one degree in the real world rotates the user less than (or more than) one degree in the scene.
Therefore, a viewing zone with a volume of one cubic metre typically connotes a viewing zone in which the user is able to move about a one cubic metre volume in the real world while remaining in the viewing zone.
And this may cause the user to move about a volume that is more than, or less than, one metre in the scene Referring to Figure 6a, in order to capture points for each surface and location that is visible from the viewing zone 1, a plurality of capture devices C1, C2, ..., C9 may be used (e.g. a plurality of virtual scanners and/or a plurality of cameras). Each capture device is typically arranged to perform a capture process, e.g. as described with reference to Figure 3, in which the capture device captures points at a plurality of azimuth angles and elevation angles. By locating the capture devices appropriately, e.g. by locating a capture device at each corner of the viewing zone, it can be ensured that most (or all) points of a scene are captured.
Typically, a first capture device C1 is located at a centrepoint of the viewing zone 1. In various embodiments, one or more capture devices C2, C3, C4, 05 may be located at the centre of faces of the viewing zone; and/or one or more capture devices C6, C7, C8, C9 may be located at edges of and/or corners of the viewing zone.
Figure 6a shows a two-dimensional view (e.g. a plan view) of a rectangular viewing zone. It will be appreciated that within this viewing zone each capture device may be located on a shared plane. Equally, the various capture devices may be located on different planes. Referring, for example, to Figure 6b, there is shown a three-dimensional view of a cuboid viewing zone, where there is a capture device located: at the centre of the viewing zone; at the centre of each face of the viewing zone; and at each corner of the viewing zone.
With this arrangement, many locations in the scene (e.g. specific surfaces) will be captured by a plurality of capture devices so that there will be overlapping points relating to different capture devices. This is shown in Figure 7, which shows a first point P1 being captured by each of a first capture device C1, a sixth capture device C6, and a seventh capture device C7. Each capture device captures this point at a different angle and distance and may be considered to capture a different 'version' of the point.
Typically, only a single version of the point is stored, where this version may be the highest quality version of the point and/or may be the version of the point associated with the nearest and/or least angled capture device.
In this regard, the highest 'quality' version of the point is captured by the capture device with the smallest distance and smallest angle to the point (e.g. the smallest solid angle). In this regard, as described with reference to Figures 4a and 4b, capturing a point for a given azimuth angle and elevation angle typically comprises capturing a plurality of sub-points at varying sub-point azimuth and elevation angles spread around the point azimuth and elevation angles. Due to the different spreads of sub-points, each capture device will capture a different version of the point (that has a different attribute) even when the points are at the same location. Capture devices that are close to the point and less angled with respect to the point typically have a smaller spread of sub-points and so typically obtain a version of a point that is sharper than a version of that point captured by more distant capture devices.
In some embodiments, a quality value of a version of the point is determined based on the spread of sub-points associated with this version (e.g. based on the perimeter formed by these sub-points and/or based on a surface area or volume bounded by these sub-points). The version of the point that is stored may depend on the respective quality values of possible versions of the points.
Regarding the 'versions' of the points, it will be appreciated that two 'points' in approximately the same location captured by each capture device may not have exactly the same location in the three-dimensional representation. More specifically, since each capture device typically projects a 'ray at a given angle, the rays of differing capture devices may contact the surface at different locations for each capture device. Two points may be considered to be two 'versions' of a single point when they are within a certain proximity, e.g. a threshold proximity. For example, where the first capture device C1 captures a first point and a second point at subsequent azimuth angles, and the sixth capture device C6 captures a further point that is in between the locations of the first point and the second point, this further point may be considered to be a 'version' of one of the first point and the second point.
This difference in the points captured by different capture devices is illustrated by Figures 8a and 8b, which show the separate captured grids that are formed by two different capture devices. As shown by these figures, each capture device will capture a slightly different 'version' of a point at a given location and these captured points will have different sizes. Each capture step is associated with a particular range of angles (e.g. a nominal capture angle of 1° might encompass angles from 0.9° to 1 1°), and therefore capture devices that are far from a point to be captured represent a wider region at the capture distance than capture devices closer to that point to be captured. As shown in Figure 8a, the capture device C1 would capture the points P1 and P2 in separate brackets, whereas for the capture device C2 these points are in the same bracket. Therefore, the capture device C2 might determine a single point that encompasses both points P1 and P2, whereas the capture device C1 would determine separate points for these two points.
Considering then a situation in which points P1 and P2 are captured separately, and capture device C1 is used to capture point P1 while capture device C2 being used to capture point P2, it should be apparent that the 'sizes' of these captured points, and the locations in space that are encompassed by the captured points will be based on different grids. For example, the width of the captured point P2 captured by the capture device C2 will be larger than the width of the captured point 101 captured by the capture device C1. The capture process may be determined based on the existence of these different grids, and on the different bracket widths that occur at different distances from a capture device.
Figure 8a shows an exaggerated difference between grids for the sake of illustration. Figure 8b shows a more realistic embodiment in which the three-dimensional representation comprises a plurality of points associated with different capture devices, where these points lie on different grids associated with these different capture devices.
In order to store the points of the three-dimensional representation, the points may be stored as a string of bits, where a first portion of the string indicates a location of the point (e.g. using x, y, z coordinates) and a second portion of the string locates an attribute of the point. In various embodiments, further portions of the string may be used to indicate, for example, a transparency of the point, a size of the point, and/or a shape of the point.
A computer device that processes the three-dimensional representation after the generation of this representation is then able to determine the location and attribute of each point so as to recreate the scene.
This location and attribute may then be used to render a two-dimensional representation of the scene that can be displayed to a viewer wearing the display device 17. Specifically, the locations and attributes of the points of the three-dimensional representation can be used to render a two-dimensional image for each of the left eye of the viewer and the right eye of the viewer so as to provide an immersive extended reality (XR) experience to the viewer.
The present disclosure considers an efficient method of storing the locations of the points (e.g. at an encoder) and of determining the locations of the points (e.g. at a decoder).
As has been described with reference to Figures 5a and 5b, the points of the three-dimensional representation are determined using a set of capture devices placed at locations about the viewing zone, where these capture devices are arranged to capture points at a series of azimuth angles and elevation angles. Typically, each of the capture devices is arranged to use the same capture process (e.g. the same series of azimuth angles and elevation angles), though it will be appreciated that different series of capture angles are possible. For example, there may be a plurality of possible series of capture angles, where different capture devices use different capture angles.
In general, the present disclosure considers a method in which points are stored based on a capture device identifier and an indication of a distance of the point from the capture device associated with this capture device identifier. Typically, the point is also associated with an angular indicator, which indicates an azimuth angle and/or an elevation angle of the point relative to the identified capture device.
It will be appreciated that the storage of the distance and the angle may take many forms. For example, the distance and the angle of each point may be converted into a universal coordinate system, where each capture device has a different location in this universal coordinate system. In particular, each point may be stored with reference to a centre of this universal coordinate system, which centre may be co-located with a central capture device. Where a point is determined based on a distance and an angle from a capture device of a known location in this universal coordinate system, the coordinates of the point in this universal coordinate system can be determined trivially -and the location of the point may then be stored either relative to the capture device or as a coordinate in the universal coordinate system.
The capture device identifier may comprise a location of a capture device (e.g. a location in a co-ordinate system of the three-dimensional representation). Equally, the capture device identifier may comprise an index of a capture device. Similarly, the indication of the azimuth angle and the elevation angle for a point may comprise an angle with reference to a zero-angle of a co-ordinate system of the three-dimensional representation. Equally, the azimuth angle and/or the elevation angle may be indicated using an angle index.
In some embodiments, the three-dimensional representation is associated with configuration information, which configuration information comprises one or more of a set of capture device indexes; locations associated with the capture devices and/or the capture device indexes; a spacing of capture devices (e.g. so that locations of the capture devices can be determined from a location of a first capture device and the spacing); angles associated with a capture process for the capture devices; an azimuth angle increment and/or an elevation angle increment associated with the capture process; and a set of angle indexes (e.g. to match an angle index to an angle).
With this configuration information, it is possible to determine a location of each capture device from an index of that capture device and/or to determine a capture angle from a known capture process. Therefore, given two numbers: a capture device index and an angle index (that is associated with a combination of a specific azimuth angle and a specific elevation angle), a location of a capture device and a direction of a point from this capture device can be determined. By also signalling a distance of the point from the signalled capture device, a precise location of the point in the three-dimensional space can be signalled efficiently.
Typically, the point is associated with each of: a camera index, a distance, an first angular index (e.g. a first azimuth), and a second angle (e.g. a second elevation) This method of indicating a location of a point enables point locations to be identified using a much smaller number of bits than if each point location is identified using x, y, z coordinates.
Referring to Figure 9, there is shown a method of determining a location of a point. This method is carried out by a computer device, e.g. the image generator 11 and/or the decoder 15.
In a first step 21, the computer device identifies an indicator of a capture device used to capture the point. Typically, this comprises identifying a portion of a string of bits associated with a capture device index.
In a second step 22, the computer device identifies an indicator of an angle of the point from the capture device. Typically, this comprises identifying an angle index, e.g. an azimuth index and/or an elevation index and/or a combined azimuth/elevation index, which index(es) identifies a step of the capture process during which the point was captured.
In a third step 23, based on the identifiers, the computer device determines the location of the capture device and the angle of the point from the capture device.
The capture device identifier is typically a capture device index, which is related to a capture device location based on configuration information that has been sent before, or along with, the point data. For example, the configuration information may specify: - Location of first capture device is (0,0,0).
- Step between capture devices is (0,0,1) along the grid, then across the grid, then up the grid.
- The grid is (10,10,10).
With this information, a capture device with an index of 1 can be determined to be located at (0,0,0); a capture device with an index of 5 can be determined to be located at (0,0,4); a capture device with an index of 12 can be determined to be located at (0,1,0), and so on.
Equally, the configuration information may specify a list of camera indexes and locations associated with these indexes, where this enables the use of a wide range of setups of capture devices.
Typically, the three-dimensional representation is associated with a frame of video. The configuration information may be constant over the frames of the video so that the configuration information needs to be signalled only once for an entire video. Therefore, the configuration information may be transmitted alongside a three-dimensional representation of a first frame of the video, with this same information being used for any subsequent frames (e.g. until updated configuration information is sent) The angle identifier may similarly be related to an angle by a location and an increment that are signalled in a configuration file. For example, the configuration information may specify: - An azimuth increment and an elevation increment are each 1°.
- There are 359 increments for each angle type.
With this information: a capture angle with an index of 1 can be determined to be at an azimuth angle of 0° and an elevation angle of 0'; a capture angle with an index of 10 can be determined to be at an azimuth angle of 10° and an elevation angle of 0'; a capture angle with an index of 360 can be determined to be at an azimuth angle of 0° and an elevation angle of 1°, and a capture angle with an index of 370 can be determined to be at an azimuth angle of 9° and an elevation angle of 1'; etc. In a fourth step 24, based on the determined location of the capture device and the determined angle, a location of the point is determined. Typically, this comprises determining the location of the point based on the location of the capture device, the capture angle, and a distance of the point from the capture device (where this distance is specified in the point data for the point).
Determining the location of the point typically comprises determining the location of the point relative to a centrepoint of the three-dimensional representation, This location of the point may then be converted into a desired coordinate system and/or the point may be processed based on its location (e.g. to stitch together adjacent points).
The angular identifier typically comprises a first angular identifier and a second angular identifier, where the first identifier provides the azimuthal angle of the point and the second identifier provides the elevation angle of the point.
Referring to Figure 10, each angular identifier may be provided as an index of a segment of the three-dimensional representation, where, for example, an index of 0 may identify the point as being in a first angular bracket 101 and an index of 1 may identify the point as being in a second angular bracket 102.
In this regard, the capture devices are arranged to perform a capture process, e.g. as described with reference to Figure 3, with a non-infinite angular resolution. Given this non-infinite resolution, each point is not a one-dimensional point located at a precise angle. Instead, each point is a point for a particular area of space, with the size of this area being dependent on the angular resolution as well as the distance of the point from the capture device. In other words, each capture angle determines a point for an angular range (with the range being dependent on the angular resolution). That is, if the capture process leads to points being captured at angles of 10°, 11°, and 12° then this can equally be considered to relate to points being captured at a first range of 9.5°-10.5°, a second range of 10.5°-11.5°, and a third range of 11.5°-12.5°.
This is shown in Figure 10, which shows a series of angular brackets, with the size of these angular brackets at a given distance being dependent on the angular resolution. The angular identifier(s) typically comprise a reference to such an angular bracket. Consider, for example, a cube placed with the capture device C1 at the centre of this cube. By dividing this cube into x segments at regular azimuth angles and y segments at regular elevation angles, it is possible to identify any angular range of the representation by reference to an x segment and a y segment (and then the space bracketed by this angular range will depend on both the angular resolution (e.g. the angle between adjacent brackets) and the distance of the point from the capture device).
Typically, each capture device has the same capture pattern so that the angular bracketing of each device is the same (albeit centred differently at the location of the relevant capture device). For example, in an embodiment with 1000 equal angular brackets, the angle for each bracket may be 360/1000.
In some embodiments, different capture devices are associated with different capture patterns, where this may be signalled in configuration information relating to the three-dimensional representation.
In some embodiments, each capture device is arranged to capture a point for a plurality of angular brackets, where each bracket is associated with a different angle. The angular spread of each bracket (that is, the angle between a first, e.g. left, angular boundary of the bracket and a second, e.g. right, angular boundary of the bracket) may be the same; equally, this angular spread may vary. In particular, the angular spread may vary so as to be smaller for points which are directly in front of (or behind, or to a side of) the capture device. For example, the embodiment shown in Figure 7 shows an angular bracketing system that is based on a cube. With this system, a cube is placed such that a capture device is located at the centre of the cube and the cube is then split into 1000 sections of equal size (it will be appreciated thatthe use of 1000 sections is exemplary and any number of sections may be used) Each of these sections is then associated with an angular index. With this arrangement, the angular spread of each section (or bracket) varies, as has been described above.
Figure 10 shows a two-dimensional square, where each angular bracket of the square is referenced by an index number (between 1 and 100). In a three-dimensional implementation, an angular bracket of a cube could be indicated with two separate numbers (with a first azimuthal indicator that identifies a 'column' of the cube and a second elevational indicator that identifies a 'row' of the cube). Equally, a singular indicator may be provided that indicates a specific bracket of the cube. Therefore, for a cube that is divided into 1000 elevational sections and 1000 azimuthal sections, the bracket may be indicated with two separate indicators that are each between 0 and 999 or with a single indicator that is between 0 and 999999.
It will be appreciated that the use of a cube to define the brackets is exemplary and that other bracketing systems are possible. For example, a spherical bracketing system may be used (where this leads to curve angular brackets). Equally, a lookup table may be provided that relates angular indexes to angles, where this enables irregularly spaced brackets to be used.
Typically, determining the location of the point comprises determining the location of the point so as to be at the centre of the angular bracket identified by the angular identifier(s).
Point aggregation In order to reduce the file size of the three-dimensional representation (and the bandwidth required to transmit the three-dimensional representation) it is desirable to reduce the number of points within the three-dimensional representation. Therefore, referring to Figure 11a, there is described a method of aggregating points in the representation. The method of Figure 11a is typically performed by a computer device (e.g. of the image generator 11 or the display device 17).
In a first step 41, the computer device identifies a plurality of points of the representation; in a second step 42, the computer device determines a similarity between the points; and in a third step 43, the computer device determines (e.g. defines) an aggregate point based on the identified points and the determined similarity of the points (e.g. based on whether a similarity value of the identified points exceeds a similarity threshold).
This 'aggregation' process may involve the aggregation of the identified points into the aggregate point. The identified points may then be removed from the representation. Equally, the aggregate point may be formed by modifying one of the identified points; the remainder of the identified points may then be removed from the representation. The 'aggregation' of points described here may equally be termed a 'combination', a 'joining', or a 'replacement' of the identified points. Therefore, in general, the aggregation involves the creation and/or modification of one or more points and, optionally, the removal of one or more points from the representation.
Identifying a similarity between the points typically comprises identifying one or more of: - A similarity of locations. In particular, the computer device may identify a plurality of points that are located in a similar area of the three-dimensional representation and/or on a shared plane or surface of the three-dimensional representation. This may involve comparing the distances between each of a plurality of points to a threshold distance and/or fitting a plane to a plurality of points and determining a distance (e.g. a maximum distance) between the points and the plane. The similarity of locations may involve comparing a distance of the points to a capture device and/or the angles of the points to a capture device, where this comparison of angles may then comprise determining adjacent points in the representation.
- A similarity of capture devices used to capture the points. In particular, the computer device may identify a plurality of points that have been captured by a single capture device and/or by a plurality of similar capture device (e.g. capture devices located in a similar area of the viewing zone and/or adjacent capture devices). Identifying the points may then comprise identifying a plurality of adjacent points that have been captured by a single capture device.
- A similarity of direction (e.g. a similarity of normals). In particular, each point is typically associated with a normal value that identifies a direction of a surface on which the point lies. The computer device may identify a similarity between the normals of the identified points where this may enable the computer device to determine whether the points lie on a shared surface or on different surfaces. The threshold for similarity may require each point to have the same normal (e.g. to lie on the same flat surface). Equally, some difference between normals may be allowed, e.g. so that it is possible to aggregate points that lie on a curved surface.
A similarity of attributes. In particular, the computer device may identify attributes for each of the plurality of points and determine a value that indicates a similarity of the attributes. This may involve determining an average attribute value, a variance of attribute values, a range or spread of attribute values, and a maximum distance of the attribute values from an average attribute value. This determined value may then be compared to a threshold value to determine whether or not the points are similar.
Typically, the computer device identifies, and compares, attribute values relating to each eye of the user. In this regard, each point typically comprises an attribute value (e.g. a colour value) for a left eye of a user and also an attribute value (e.g. a colour value) for a right eye of a user; the determination of similarity may then comprise determining a similarity between the left eye values of the plurality of points and also determining a similarity between the right eye values of the plurality of points. Typically, the determination of the aggregate point is dependent on there being a similarity of the left eye values and the right eye values. Determining the similarity may involve determining that each of the left eye values and the right eye values of the point are within a similarity threshold (so that the right eye attribute values need to be similar and the left eye attribute values also need to be similar); equally, determining the similarity may involve determining that a combined similarity of the left eye values and the right eye values is within (or exceeds) a similarity threshold (e.g. so that if the left eye attribute values are very similar then a significant degree of variance of the right eye attribute values is allowed, and vice versa).
Each of the one or more comparisons of similarity may comprise obtaining a similarity value that is associated with a specific type of similarity and comparing this value to a threshold value. For example, determining the similarity of the points may comprise one or more of (or each of): determining a (e.g. left eye and right eye) attribute similarity value of the points and comparing this attribute similarity value to an attribute similarity threshold; determining a normal similarity value of the points and comparing this normal similarity value to an normal similarity threshold; and determining a location similarity value of the points and comparing this location similarity value to an location similarity threshold;.
For each type of threshold, the similarity that is required for the aggregate point to be determined may depend on an area of the representation in which the identified points are located and/or may depend on a distance of the identified points from the viewing zone. In this regard, viewers are more able to identify differences in an object when they are close to this object. So small variations in the colour or texture of a nearby object will be more noticeable than similar variations in the colour or texture of a distant object. Therefore, a similarity threshold (relating, e.g. to a location similarity threshold, an attribute similarity threshold, or any other similarity threshold) that must be met for aggregation of identified points may depend on the distance of the points from the viewing zone.
Similarly, the similarity thresholds may depend on a feature of the identified points, e.g. an alignment of the points. For example, the attribute similarity threshold for points may depend on a location similarity value of the locations of the points (and vice versa). Therefore, if the identified points lie on an entirely flat plane, the required similarity of the attributes may be lower than if there is some distance between one or more of the identified points and a plane that is associated with these identified points.
The similarity thresholds may depend on a user input. For example, a viewer may be able to identify that they are partially sighted, are colorblind, and/or are unable to distinguish between different shades of red.
In such a situation, the threshold of similarity may be lowered for one or more of the identified groups of points. A partially sighted viewer may be less able to distinguish small differences in colour, and so the threshold of similarity required for aggregation may be universally lowered for such a viewer. For a viewer that has a particular difficulty distinguishing between shades of red, the threshold of similarity required may be defined so as to be lower for red points than for other points.
The similarity thresholds may depend on a type, in particular a size or an arrangement, of the identified points and/or on a number of the identified points. For example, the similarity value(s) required to aggregate nine points into a 3x3 aggregate point may differ from the similarity value(s) required to aggregate four points into a 2x2 aggregate point. Equally, the similarity value(s) required to aggregate six points into a 3x2 aggregate point may differ from the similarity value(s) required to aggregate six points into a 2x3 aggregate point.
Determining the aggregate point may comprise one or more of: Determining an attribute value for the aggregate point based on the attribute values of the identified points (e.g. as an average of the attribute values of the identified points; a maximum of the attribute values of the identified points; a minimum of the attribute values of the identified points; or as the attribute value of one of the identified points). Determining the attribute value may comprise determining each of a left eye attribute and a right eye attribute value for the aggregate point, with these left eye and right eye attribute values being determined from, respectively, the left eye attribute values and the right eye attribute values of the identified points.
- Determining a location of the aggregate point based on the locations of the identified points (e.g. as an average of the locations of the identified points; or as the value of one of the identified points).
Typically, the location of the aggregate point is defined in a similar way as the location of the identified points. In particular, this may involve the location of the aggregate point being defined based on a capture device, a distance of the aggregate point from that capture device, and an angle of the aggregate point from the capture device. To ensure that the aggregate point is located at a suitable angle from the capture device (bearing in mind that the capture device is arranged to capture points at angular increments and that the angle may be signalled as an angle index), the location of the aggregate point is typically determined to be the location of one of the identified points.
Determining a size of the aggregate point based on a quantity/number of the identified points. Typically, each point is associated with a size, where this size may indicate a number of angular brackets covered by that point or where the size may relate to a default size (e.g. where each point begins with a size of '1'). The size of the aggregate point may be dependent on the number of the identified points and/or the sizes of these identified points. The size may also indicate a coverage of the aggregate point (e.g. the size may indicate that the aggregate point is a 3x2 point or a 2x3 point).
Referring then to Figure 11b, there is described a method of determining an aggregate point based on a plurality of identified points. The method is typically performed by a computer device (e.g. the image generator 11 or the display device 17).
In a first step 41, the computer device identifies a location of at least one point of the plurality of points (e.g. a first point of the plurality of points). Identifying the location may comprise identifying a capture device used to capture the first point, identifying a distance of the first point from this capture device, and identifying an angle of the first point from this capture device.
In a second step 42, the computer device identifies a size of the plurality of points. This may comprise identifying a number of the points, where the size of the points (when taken together) may be defined with reference to this number. Identifying the size of the plurality of points may comprise identifying an arrangement of the plurality of points.
In a third step 43, the computer device determines the aggregate point based on the identified location and size.
Referring now to Figures 12a and 12b, a practical implementation of the methods of Figures 11a and llb is illustrated. Figure 12a shows a three-dimensional representation that comprises a first point P1 which lies on a first surface S1 and second, third, fourth, fifth, sixth, and seventh points P2, P3, P4, P5, P6, P7 which lie on a second surface S2. Each of these points is captured by a first capture device C1.
As can be seen from Figure 12a, the first point P1 and the second P2 are substantially spaced from each other and also have substantially different colour values (since these points lie on surfaces with different colours). In contrast, the second, third, fourth, fifth, sixth, and seventh points P2, P3, P4, P5, P6, P7 are located on the same surface at similar locations in space, and these points each have similar colour values.
Figure 12b shows a representation of this surface from a head-on view and shows positions of the second, third, fourth, fifth, sixth, and seventh points P2, P3, P4, P5, P6, P7. The second point P2 may be considered adjacent to points the third point P3 and the fifth point P5 and the second, third, fourth, fifth, sixth, and seventh points P2, P3, P4, P5, P6, P7 can be considered to be an arrangement of adjacent points. In this regard, as has been described with reference to Figure 9 and as can be seen in Figure 12b, each of the points typically lies within an angular bracket; so that two points being adjacent may involve these points being in adjacent angular brackets (e.g. the second point P2 is in an angular bracket that is adjacent to an angular bracket of each of the third point P3 and the fifth point P5) and a larger number of points being adjacent may involve these points covering a contiguous arrangement of adjacent brackets (e.g. where the second point P2, the third point P3, and the fourth point P4 form a contiguous row of adjacent points).
The aggregation of the points may depend on the identified points being in adjacent boundaries and/or being in a contiguous arrangement. Equally, the aggregation of the points may depend on the points lying in a shared row, column, or plane.
In some embodiments, the computer device is arranged to identify specific arrangements of points and to determine the similarity of these points in order to determine aggregations of a specific shape and/or size.
For example, the computer device may be arranged to consider points for a limited number of arrangements or configurations (e.g.: points that are in a 1x2 row or a 1x3 row; points that are in a 2x1 column or a 3x1 column; points that are in a 2x2 square or a 3x3 square; and points that are in a 2x3 rectangle or a 3x2 rectangle). It will be appreciated that various shapes and sizes of aggregate points may be possible. By considering a limited number of arrangements, it is possible to signal the size of an aggregate point so as to identify that the aggregate point covers one of these limited arrangements (e.g. so that a computer device parsing the three-dimensional representation is able to identify an aggregate point, and a size of an aggregate point, with knowledge that it will be one of these arrangements).
In this regard, typically, each point of the representation is stored with a location (e.g. coordinates or a location that is defined with reference to a capture device), one or more attribute values, and a size. The size may relate to a number of angular boundaries that is encompassed by the point. Therefore, a point that has a size of 1x1 may relate to a point that covers a single angular boundary, a point that has a size of 2x1 may cover two angular boundaries in a row etc. Typically, the size is stored via an index so that, for example, a size value of 0 (which may be signalled by a binary value of 000) may signal a point with a coverage of 1x1 angular brackets, a size value of 1 (which may be signalled by a binary value of 001) may signal a point with a coverage of 2x1 angular brackets, a size value of 2 (which may be signalled by a binary value of 010) may signal a point with a coverage of 1x2 angular brackets, and so on.
Determining the aggregate point may comprise defining a point to have a size greater than 1. For example, referring to Figure 12b, if each of the points have similar attribute values, then these points may be replaced with a single attribute point that has a size of 3x2 (e.g. a size index of 100). Determining the aggregate point may then comprise modifying a size of the second point P2 and removing the third, fourth, fifth, sixth, and seventh points P3, P4, P5, P6, P7 from the representation.
The relationship between the size of the point and the location of the space covered by the aggregate point may be predetermined and/or this relationship may already be known by a computer device that is arranged to parse the three-point representation (e.g. it may be a part of a definition of a file type). Equally, this relationship may be defined in configuration information that may be transmitted alongside, or as part of, the three-dimensional representation.
In this regard, if the third point P3 has a size of 2x2, this could feasibly relate to the third point P3 covering the angular brackets of each of: the second point P2, the third point P3, the fifth point P5, and the sixth point P6. Equally, this could feasibly relate to the third point P3 covering the angular brackets of each of the third point P3, the fourth point P4, the sixth point P6, and the seventh point P7.
So that a device parsing the three-dimensional representation is able to determine the space covered by a point, the coverage denoted by a given size of a point is typically defined for the representation, e.g. using configuration information. This coverage may, for example, indicate that an aggregate point covers angular brackets moving clockwise in an azimuthal angle and downwards in an elevational angle (so that if the third point P3 has a size of 2x2, this signals that the third point P3 covers the angular brackets of each of: the second point P2, the third point P3, the fifth point P5, and the sixth point P6). The coverage may depend on the size of the point; for example, for square aggregate points (e.g. points with a 3x3 size), the nominal location of the point may indicate a central bracket that is covered by the point.
Referring then to Figures 13a and 13b, using the methods described above, the second, third, fourth, fifth, sixth, and seventh points P2, P3, P4, P5, P6, P7 may be replaced with a first aggregate point AP1, which first aggregate point AP1 has the location of the second point P2, a size of 3x2, and an attribute value that is determined based on the attribute values of the second, third, fourth, fifth, sixth, and seventh points P2, P3, P4, P5, P6, P7 (e.g. this attribute value may be an average of the attribute values of these points). Such a replacement leads to a substantial decrease in the file size of the three-dimensional representation (and so enables more efficient storage, transmission, etc. of this representation).
The determination of the first aggregate point AP1 may comprise modifying the original second point P2, in particular modifying the size of this original second point P2. In some embodiments, the attribute value of the first aggregate point AP1 is set as the attribute value of one of the identified points, e.g. of the identified point used to determine the location of the first aggregate point AP1 (e.g. so that the only modification made to the original second point P2 to determine the first aggregate point AP1 is to modify the size of the original second point P2) Equally, the method may comprise modifying the attribute value of P2, e.g. as described above.
It will be appreciated that the points typically have a plurality of attribute values (e.g. a left eye colour and a right eye colour) and that references to evaluating, modifying, or defining 'the attribute value' of a point should be understood to encompass embodiments that involve evaluating, modifying, or defining a plurality of, or each of, the attribute values of this point.
One benefit of the above-described method of aggregation is that the aggregate points may be parsed in a similar manner to the original points. The only requirement to implement the aggregate points is the addition of a size value to the point data since, typically, the locations and attributes of the aggregate points are defined in the same way -and using the same datafields -as the original points.
Referring to Figures 14a -14e, there is described a method for determining one or more aggregate points for a three-dimensional representation. This method is typically performed by a computer device as a post-processing step, after the three-dimensional representation has been generated (e.g. after each point of the three-dimensional representation has been captured). This method of Figures 14a -14e may then be used to aggregate a number of points within this three-dimensional representation (e.g. to replace one or more sets of points with one or more aggregate points) so as to reduce the number of points in the three-dimensional representation.
Referring to Figure 14a, in a first step 51, the computer device identifies a plurality of points associated with a given location in the representation.
In a second step 52, the computer device determines a similarity value for these points (e.g. where the similarity value is associated with one or more of: a similarity of locations, a similarity of attributes, a similarity of normals).
In a third step 53, the computer device compares the determined similarity value to a similarity threshold (which may, e.g., depend on the distance of the identified points from the viewing zone).
In a fourth step 54, if the similarity value exceeds a similarity threshold, the computer device determines an aggregate point based on the identified points.
In a fifth step 55, the computer device considers a next location in the representation, and the method then returns to the first step 51.
This method may be performed so as to move through each potentially aggregable plurality of points of the representation and to determine, for each of these identified pluralities of points, whether this plurality of points has a sufficient similarity value to aggregate these points. For example, the fifth step 55 of considering a next location may comprise incrementing an angular identifier so as to move incrementally through the angular brackets for a computer device and, at each stage, to compare the points in a number of adjacent angular brackets. In this way, the computer device essentially forms a window around a plurality of points and then slides this window through the three-dimensional representation so as to evaluate a moving window of points that passes about the entirety of the representation.
Typically, the method of Figure 14a is performed separately for each capture device so that, for each capture device, each set of potentially aggregable points is evaluated to determine a similarity of these points and to aggregate the points appropriately. In some embodiments, points captured by separate capture device may be considered (e.g. points captured by adjacent capture devices may be considered for aggregation based on a similarity in location, attribute, and/or normals). Such a method of considering each capture device separately enables the aggregation of points without affecting the data structure of the points (e.g. the aggregated point can be signalled -and parsed -in the same way as a 'normal' point with a size of 1x1).
In some embodiments, the identification of the plurality of points involves identifying a specific number and/or arrangement (or configuration) of points to determine whether these points can be aggregated. Typically, the method of Figure 14 involves first evaluating an arrangement of (adjacent and/or contiguous) points that corresponds to a largest possible aggregation and then evaluating increasingly small arrangements of points. For example, the computer device may first assess the similarity of each 3x3 set of adjacent points, and then the similarity of each 3x2 set of the remaining points, and then the similarity of each 2x3 set, and then each 2x2 set, and so on. In this way, it can be ensured that the largest possible aggregations are used to ensure the greatest possible reduction in file size. By assessing the 3x3 sets first, it can be ensured that nine similar points are aggregated into a single 3x3 aggregate point instead of being aggregated into three 3x1 aggregate points so as to minimise the number of points in the final representation.
More generally, the computer device may be arranged to evaluate pluralities of points (and determine whether to aggregate these pluralities of points) based on a predetermined order of arrangements (e.g. as may be input by a user or determined based on a feature of the three-dimensional representation).
This method is illustrated by Figures 14b -14e, which shows a 5x5 grid of points captured by a first capture device: Figure 14b shows the consideration of a first identified plurality of points 201, which plurality of points lie in adjacent angular brackets (more specifically, which points lie in a 3x3 contiguous square of angular brackets). The first identified plurality of points are not similar points (e.g. they have different colours and so a attribute similarity value determined for these points does not meet an attribute similarity threshold) and therefore no aggregate point is determined for these points.
Figure 14c shows the updating of a location, e.g. the sliding of a moving window, and the consideration of a second plurality of points 202. The second plurality of points satisfies the similarity threshold(s) and so this second plurality of points is aggregated (e.g. combined) into a second aggregate point AP2 that has a location that is the location of a central point of the second plurality of points.
- The second aggregate point AP2 replaces the second plurality of points 202 so that it is no longer possible for the computer device to consider a 3x3 square of points. The next largest combination of points that is possible with the updated representation is a 3x2 rectangle. Therefore, Figure 14d shows the consideration of a third plurality of points 203 that comprises such a rectangle. This third plurality of points also satisfies the similarity threshold(s) and so the third plurality of points is aggregated into a third aggregate point AP3 that has a location that is the location of an uppermost, lowest azimuth angle (e.g. leftmost) point.
- The third aggregate point AP3 replaces the third plurality of points 203, so that it is no longer possible to consider a 3x2 rectangle of points. Therefore, in Figure 14e shows the consideration of a 2x2 square of points. This points in this 2x2 square are not similar and so no aggregate points is determined for this identified plurality of points.
- The computer device may then consider further pluralities of points (e.g. 3x1, 1x3, 2x1 and 1x2 arrangements of points). Typically, the computer device returns to the starting location and then performs one or more sweeps of the points of the representation until each possible aggregation of points has been considered.
Typically, the plurality of points identified in the first step 51 comprises a plurality of original points (original points being points that were present in the initial three-dimensional representation prior to the onset of the aggregation process) and/or points with a size of 1. Therefore, the computer device typically will not consider for aggregation any plurality of points that includes an aggregate point (or a point with a size greater than 1). This enables the imposition of a maximum size of the aggregate points.
It will be appreciated that the specific shapes and sizes considered here (e.g. with a 3x3 square being the largest size of identified points that is evaluated for aggregation) are purely exemplary and that various implementations are possible with various evaluated pluralities of points, including irregular or non-contiguous pluralities of points.
The iterative process described with reference to Figures 14a -14e is capable of obtaining a three-dimensional representation that efficiently represents a scene. Typically, the three-dimensional representation is associated with a video, e.g. a VR video, and so the three-dimensional representation may relate to a single frame of the video. The video may be composed of a plurality of frames, with each frame relating to a different three-dimensional representation.
By iterating separately over each three-dimensional representation using the method of Figures 14a -14e it is possible to obtain efficient representations for each frame of the video. However, this method of point aggregation can result in similar points being sorted into different aggregations in each frame, which may prevent the use of temporal encoding methods.
In this regard, the aggregate point AP2 may represent an object that is moving through the scene. This point could then be defined in a subsequent three-dimensional representation with reference to the initial three-dimensional representation (e.g. the point AP2 may be associated with a motion vector that indicates a movement of the point and so it may be possible to define a similar point in the subsequent three-dimensional representation based on the point AP2 of the initial three-dimensional representation and the movement vector).
However, if each representation is considered separately, then an equivalent AP2 point may not be formed in the subsequent three-dimensional representation. For example, if the first plurality of points 201 becomes similar in the subsequent representation this may cause this first plurality of points to be aggregated, and this would preventing the formation of the aggregate point AP2 in the subsequent representation. And the aggregation of the entirety of the subsequent representation might change based on a change to only a few points (since the new aggregation will have a knock-on effect). In other words, with the iterative process described with reference to Figures 14a -14e, one change in aggregation that occurs near the beginning of the aggregation process can cause wholesale changes in the aggregation of the three-dimensional representation. So, for example, one moving object that causes a change in the aggregation of points between a firstthree-dimensional representation and a second three-dimensional representation may result in a number of stationary points that are exactly the same in these two representations being aggregated entirely differently (hindering the efficient temporal encoding of these two representations).
Therefore, referring to Figure 15a and Figures 15b -15e, which show a 6x6 grid of points, there is envisaged a grid-based method of processing the representation that can be used to ensure isolated changes in a first area of the scene do not cause wholesale changes in the processing of a three-dimensional representation. While this grid-based system is useable for the aggregation process, it will be appreciated that more generally the grid system may be used as the basis for any processing procedure.
In a first step 61, a grid system of a three-dimensional representation is determined. In this regard, referring to Figures 15b -15e, the three-dimensional representation may be divided into a plurality of (typically, equally sized) grids 210, 220, where each grid contains a plurality of points and/or relates to a plurality of angular brackets. The maximum size of an aggregate point may be equal to a size of a grid.
Typically, the method comprises determining a grid system such that each of a plurality of three-dimensional representations (e.g. relating to frames of a video) has a similar grid system. This may comprise determining the grid system such that a first grid square 210 in a first three-dimensional representation is located at the same space in the scene as a first grid square in a second three-dimensional representation. A grid system is typically determined for each of the capture devices, where the grid system may be based on the angular brackets of that capture device. Therefore, for each of the three-dimensional representations (e.g. for each of a plurality of three-dimensional representations associated with a certain viewing zone), a grid system may be determined for each of the capture devices such that the grid squares of each grid system cover the same angular brackets for each three-dimensional representation.
These grid squares can then be processed, and in particular points within the grid squares may be aggregated, separately. If an object is moving in the first grid square 210, then the aggregation process in the first grid square will likely differ in successive three-dimensional representations (relating to successive frames of a video), but if there are no moving objects in the second square then the aggregation process in the second grid square will likely be the same in these successive frames (this aggregation process will not be affected by the object moving in the first grid square).
Referring again to Figure 15a, for each grid square the aggregation process mirrors that described with reference to Figure 14a. That is, for each grid square, the method involves identifying a plurality of points in a second step 62, determining a similarity value forthese points in a third step 63, comparing the similarity value to a threshold value in a fourth step 64, if this similarity value exceeds the threshold value then determining an aggregate point in a fifth step 65, and then considering a next location in the grid square.
Once, in a seventh step 67, the computer device determines that all possible pluralities of points in the grid square have been identified, then in an eighth step 68, a next grid square is considered and this process is repeated.
This grid-based process is illustrated by Figures 15b -15c, which show how the first grid square 210 is processed to: - First, in Figure 15b, the computer device evaluates a first identified plurality of points 211 in the first grid square 210. Since this plurality of points does not satisfy a similarity threshold, these points are not aggregated.
- It is not possible to identify another 3x3 grid in the first grid squire 210 and so, referring to Figure 15c, a second plurality of points 212 in the first grid square 210 is identified, the second plurality of points having a size of 3x2. This second plurality of points 212 does meet the similarity threshold and so these points are replaced by the aggregate point AP3.
- Referring to Figure 15d, the next largest plurality of points that can be evaluated is a 3x1 row that contains a third plurality of points 213. This row again meets the similarity threshold and so this third plurality of points is replaced by the aggregate point AP4.
Since no more pluralities of points are possible in the first grid square 210, the computer device then evaluates a first plurality of points 221 in the second grid square 220.
As described above, the grid squares typically cover the same angular brackets in a plurality of three-dimensional representations that are associated with the same video and/or the same viewing zone. Equally, in some embodiments, the grid squares may be associated with a motion vector so that the grid system may differ (in a calculable way) between the three-dimensional representations. Such embodiments can enable the efficient encoding of three-dimensional representations that are associated with consistent movements.
While the above method has referred to 'grid squares', it will be appreciated that the grid system may be associated with subdivisions of any shape or size (these subdivisions being consistent through a plurality of three-dimensional representations).
In some embodiments, the method may comprise determining a plurality of grid systems of different sizes, and determining pluralities of points for aggregation based on these different grid systems. For example, after evaluating the three-dimensional representation using a first grid system that comprises an arrangement of 3x3 grid cells, the computer device may evaluate the three-dimensional representation using a second grid system that comprises an arrangement of 3x2 grid cells. The second grid system may be determined independently from the first grid system.
The method may then comprise determining one or more aggregate points based on the first grid system and then determining one or more aggregate points based on the second grid system. Furthermore, the method may comprise determining one or more aggregate points of a first size based on the first grid system and then determining one or more aggregate points of a second size based on the second grid system, where the first size and the second size may depend on (e.g. be equal to) a size of the cells of each grid system.
To give a practical example, the first grid system may comprise 3x3 grid cells and the computer device may use this grid system to determine one or more 3x3 aggregate points before using a second grid system with 3x2 grid cells to determine one or more 3x2 aggregate points. In such a way, aggregate points of various sizes can be maintained through a plurality of three-dimensional representations. The method may comprise determining a plurality of grid system for each possible size of aggregate points and then determining one or more aggregate points using the respective grid system. The grid systems are typically arranged to have a shared origin point, but equally the grid systems may be offset from each other.
Such an embodiment is shown in Figures 16a -16d, in which: - First, in Figure 16a, the computer device considers a grid system of 3x3 cells. The computer device evaluates each cell to determine whether the points in that cell can be aggregated into an aggregate point. For example, a first plurality of points 211 in a first grid square 210 is considered (this is similar to the consideration of the first plurality of points in Figure 15b). Thereafter, a second plurality of points 221 in the second grid square 220 is considered, and so on. Since no grid cell comprises a plurality of points that satisfy a similarity threshold, no aggregated points are generated in the first step.
- Subsequently, in Figure 16b, the computer device considers a grid system of 3x2 cells. The computer device again evaluates each cell to determine whether the points in that cell can be aggregated into an aggregate point. In this example, a first plurality of points 231 in a first 3x2 cell 230 can be combined into the third aggregate point AP3 (similarly, again, to the combination of points into the third aggregate point that has been described with reference to Figure 15c). Furthermore, a second plurality of points 241 in a second cell 240 can be combined into a fifth aggregate point AP5 - Here a difference can be seen between the method of Figures 15b -15e and the method of Figures 16a -16d. With the method of Figures 15b -The, where each possible size of aggregate point is considered within each grid cell, there is formed a fourth aggregate point AP4 of size 3x1 with this aggregate point being within a first 3x3 grid cell 210. With the method of Figures 16a -16d, there is formed a fifth aggregate point AP5 of size 3x2 within a second 3x2 grid cell.
- Subsequently to Figure 16b, in Figure 16c, the computer device considers a grid system of 2x3 cells. The computer device again evaluates each cell to determine whether the points in that cell can be aggregated into an aggregate point. In this example, a first plurality of points 260 in a first 2x3 cell 261 can be combined into a sixth aggregate point AP6.
- Subsequently, in Figure 16d, the computer device considers a grid system of 2x2 cells. The computer device again evaluates each cell to determine whether the points in that cell can be aggregated into an aggregate point. In this example, a first plurality of points 260 in a first 2x3 cell 261 can be combined into a sixth aggregate point AP6.
The method of Figures 16a -16d can typically determine larger aggregate points than the method of Figures 15b -15e at the cost of an increased risk of failing to maintain similar aggregate points across a plurality of three-dimensional representations. It will be appreciated that the methods of Figures 15b -15e and Figures 16a -16d may each be used in, and may each be particularly applicable to, different situations.
Underscan As described above, the process of capturing the points of the three-dimensional representation typically involves using one or more capture devices, where each capture device is able to capture points at a plurality of different angles (e.g. different azimuthal angles and different elevational angles). With such a method, the size of each point is dependent on the distance of that point from the capture device.
Typically, the capture devices are associated with a viewing zone 1, so in practice this leads to surfaces that are close to the viewing zone being associated with smaller points than surfaces that are further from the viewing zone.
This is generally a positive consequence of a capture process based on angles -this leads to a higher resolution of points being captured for surfaces close to the viewing zone 1, where this is beneficial because a user is better able to perceive small differences near to the viewing zone.
This is illustrated in Figure 17, which shows a situation with a first capture device C1 capturing a point at a certain angle. At this angle, a third surface S3 and a fourth surface S4 are present. As shown in this figure, for this single capture angle, the capture device would be able capture a wide section of the distant fourth surface S4 (if the third surface was not in the way) or the capture device would be able to capture a comparatively narrow section of the nearby third surface S3.
A potential issue with this method of capturing points is that for surfaces very close to the viewing zone 1 this method of capture can lead to a very high number of small points being captured in order to cover the surfaces. In some situations, these points may be so small that any differences between adjacent points cannot be resolved by a human eye or by a rendering device.
In this regard, the three-dimensional representation is typically used to render a two-dimensional image that can be presented to the user via the viewing device. This two-dimensional image has a plurality of pixels, where each pixel has a colour. The colours of the pixels are determined based on the points of the three-dimensional representation, where each point and each pixel relates to an object or surface in the scene. For very close surfaces, there may be a plurality of points that need to be resolved into a single pixel value. Therefore, a subset of these points may need to be discounted during the rendering process. Equally, each point could be evaluated/combined in order to determine a corresponding pixel, but such a method can be inefficient not least because a viewer is unlikely to be able to perceive a minor change in a value of a single pixel.
In general, where the points are very small, then it may be impossible for a viewer to discern these points, so that a number of points may provide redundant information. Therefore it is possible to remove some of these points without substantially affecting the quality of the representation (and of a scene rendered based on the representation) Therefore, the present disclosure considers methods by which a plurality of small points of the three-dimensional representation -that is points that are associated with a small volume of the representation -may be identified and these points can be replaced by a single, larger, point. As described below, these methods may be carried out as a post-processing step, where a captured three-dimensional representation is processed to remove small points, or as a pre-processing step, where the small points are not even captured.
Referring to Figure 18, there is described a method of determining an 'underscan' point. This method is typically performed by a computer device, e.g. a device forming a three-dimensional representation (e.g. the image generator 11). The term 'underscan' is used to identify the determined point; this term functions as an identifier and is not intended to limit the scope of the underscan point in any way. Typically, the underscan point is associated with a size value that is greater than one so that the underscan point has a greater size than a typical point of the scene.
In a first step 71, the computer device identifies a plurality of points. Typically, but not necessarily, this comprises identifying a plurality of points captured by a single capture device. Equally, the points may be associated with a plurality of capture devices.
In a second step 72, the computer device determines that a size of each of the points is below a threshold value.
In a third step 73, the computer device determined an 'underscan' point based on the identified points. The underscan point comprises a point that has a size value greater than each of the identified points. The underscan point may be specifically associated with (e.g. may comprise) an underscan factor, which identifies a size of the underscan point. This underscan factor, and more generally the size of the underscan point, may be defined by the computer device that is determining the underscan point (e.g. the computer device may define a point with a location, an attribute, and a size).
Typically, each point of the three-dimensional representation is associated with an angular bracket that is associated with the capture device used to capture that point. As described above with reference to Figure 17, the dimensions of a point within this angular bracket dependend on the distance of this point from the capture device (where points that are close to the capture device cover a smaller section of a scene than points further from the capture device). More generally, each point is associated with a location in the three-dimensional representation (where this may be an absolute location, e.g. in x, y, z coordinates or a relative location, e.g. relative to a capture device).
The size of the point may be associated with an amount of the scene that is covered by the point. For example, the size of the point may be associated with one or more of: a width of the point; a height of the point; a perimeter of the point; an area of the point; and/or a volume of the point. Typically, the point comprises a two-dimensional surface (e.g. a quad) or a three-dimensional volume, where the size of the point is associated with the size of the surface or the volume.
The 'size' of the point may be signalled with reference to a number of angular brackets that are covered by the point. In this regard, each point of the representation typically comprises a 'size' field that indicates a size of this point. For example, a point with a size of '1' may occupy a single angular bracket, a point with a size of '2' may occupy two angular brackets, etc. The size may be defined in a plurality of dimensions, e.g. a point with a size of [2, 2] may cover two angular brackets in an azimuthal direction and two angular brackets in an elevational direction.
The computer device may be arranged to determine an actual size of the point based on a signalled size of the point, where the 'actual' size of the point (e.g. the amount of space of the scene covered by the point) is dependent on the signalled size of the point (e.g. the number of angular brackets covered by the point) and also the distance of the point from the capture device used to capture the point.
Typically, the threshold value is a minimum (e.g. actual) size value, for example this minimum size value may be associated with a dimension (e.g. a width) of 1mm. Therefore, a plurality of points with widths smaller than this minimum size value may be combined into a single underscan point. A minimum point size of 1mm may be suitable for a resolution of 3024x2018. More generally it will be appreciated that various minimum point sizes may be considered (and these minimum sizes may be selected based on an expected viewing resolution and/or based on other factors).
The threshold value is typically a predetermined threshold value that may, for example, be selected by a user or may be determined based on a feature of the scene or the representation. For example, the threshold value may be selected based on a maximum resolution of the scene (where this maximum resolution may be dependent on the display devices 17 that may be used to view the scene) The threshold value may also be determined based on a size (e.g. a desired size or a maximum size) of the three-dimensional representation, where the use of a larger threshold size typically leads to a smaller file size.
Furthermore, the threshold value may be determined based on a complexity of the scene or a complexity of a surface associated with the points, where higher threshold values may be used for more complex scenes or surfaces. This complexity may be determined by evaluating a distribution of attribute values of the surface and/or scene.
The threshold value is typically associated with a size of a point (e.g. a width and/or height of an angular bracket at the location of the point, where this angular bracket is associated with a capture device). The width and height of the angular bracket at the location of the point will depend on the distance of the point from the capture device (where these dimensions increase as the distance increases) so that points that are closer to the capture device are typically smaller than points that are further from the capture device. Equally, the threshold value may be associated with an absolute size of the point, which absolute size is associated with a size of the point in a two-dimensional image that might be rendered based on the three-dimensional representation. In this regard, a point that is close to a first side of the viewing zone 1 may cover a plurality of pixels when a user is located close to this first side of the viewing zone, but may cover less than a single pixel when a user is located at an opposite side of the viewing zone. Typically, the threshold value is associated with a maximum size of the point (e.g. the size of the point as seen by a user at the first side of the viewing zone). In practice, the point will typically be captured by a capture device located at this first side (since this capture device would have a higher quality factor for the point than a capture device located at the opposite side of the viewing zone) so that no further processing is required to determine the maximum size of the point. But this might not be the case in all implementations. Therefore, the method may comprise determining a maximum size (or maximum area) that might be covered by the point (as seen from a user in the viewing zone), where the second step 72 involves determining that this maximum size is less than the threshold value.
The method of Figure 18 may also comprise determining that the points lie on a shared surface. For example, the method may comprise determining one or more similarity values associated with one or more of the distances of the points from a capture device and/or a viewing zone; and the normals of the points and determining the underscan point based on this similarity value exceeding a threshold. In some embodiments, the computer device is arranged to determine the underscan point based on a determination that each of the identified points lies on a shared plane.
In some embodiments, the computer device is arranged to determine the underscan point by determining an attribute similarity value based on the attribute values of the identified points. Typically, however, the determination of the underscan point occurs without any consideration of the attribute values (e.g. the determination of the underscan point may occur before these attribute values are determined).
This method of determining an underscan point shares certain steps with the determination of an aggregate point (and any of the steps described with reference to the determination of an aggregate point -e.g_ the steps relating to the determination of, and consideration of, similarity values -may similarly be applied to the determination of the underscan point and vice versa). The primary difference is that the determination of the aggregate point is dependent on a similarity of the attribute values of a plurality of identified points whereas the determination of the aggregate point is dependent on size values of a plurality of identified points being beneath a threshold value. In particular, as with the determination of the aggregate point, the determination of the underscan point may be determined based on a similarity of capture devices, locations, and/or normals.
As with the aggregate point, determining the underscan point may involve first determining a first plurality of points with a first size, determining whether these points may be replaced by an underscan point, and (if they cannot) determining a second plurality of points with a second size, the second plurality of points being a subset of the first plurality of points. In practice, this may involve determining whether a 4x4 arrangement of points may be replaced with an underscan point and then, if this is not possible, determining whether a 3x3 arrangement of these points may be replaced with an underscan point, etc. The method of determining underscan points may be implemented following the capture of the points of the three-dimensional representation. For example, e.g. similarly to the method described with reference to Figures 14a -14e in relation to the capture of an aggregate point, the computer device may be arranged to step through a three-dimensional representation, to identify pluralities of adjacent points in this representation, to determine whether the sizes of these identified points are beneath a threshold value, and to determine an underscan point based on these sizes being beneath the threshold value.
As with the aggregate point, determining the underscan point may comprise removing one or more of the identified points from the three-dimensional representation. Further, determining the underscan point may comprise combining (e.g. the attribute values of) the identified points to determine an underscan point that represents the identified plurality of points.
Determining the underscan point typically comprises defining a size value for the underscan point, which size value may be a mutiplier factor that is associated with the underscan point. This size and/or multiplier factor is is typically greater than 1 (where a size of '1' may be a minimum size of a point). Typically, the underscan point can have one of a predetermined number of size. For example, the underscan point may be able to have a size of 2x2, 3x3, or 4x4. These sizes may each be associated with underscan factor and/or multiplier factor, e.g. with an underscan factor of 1, 2, or 3 (where, e.g. an underscan factor of 0 is associated with a point of size 1x1). These underscan factors may, for example, provide a local reduction in the number of points of the scene of roughly 2x, 6x, and 24x as compared to implementations without underscan points.
The underscan factor (e.g. a multiplier factor) may be stored alongside another size value of the point, so that the point may have an initial size (e.g. that indicates a height/width of the point) and the multiplier factor may indicate that the point should cover an area of the scene that is a multiple of this initial size.
The size value is typically stored as a part of the point data. For example, the points may be stored with a width value and/or a height value, where these values may indicate a number of angular brackets that are covered by the point. The size value may include an underscan component, which underscan component comprises an underscan factor as described above. Equally, the underscan factor may be signalled separately to the size value; for example, the point may comprise a size value field (that indicates a height or width of the point and also a separate underscan factor field (which underscan factor field may be considered in combination with the size value field by a computer device processing the point).
In embodiments where the underscan factor is combined with the size value, the width and height of the point may be dependent on the underscan factor. For example, when the underscan factor exceeds a threshold value, the possible height and width values may be limited. In a specific example, when the underscan factor is 3, the width and the height may be limited to the range [0,1]. The size may then be defined as size = underscan*9 + height*3 + width. Such a method provides efficient storage and indication of width, height, and underscan values and provides an embodiment in which underscan points and aggregate points can each be signalled efficiently. For example: a point with a size value of 0 can be determined to have an underscan factor of 0 and width and height values of 0 (which may indicate that the point covers a 1x1 arrangement of angular brackets); a point with a size value of 1 can be determined to have an underscan factor of 0 and width and height values of 1 and 0 respectively (which may indicate that the point covers a 2x1 arrangement of angular brackets); a point with a size value of 8 can be determined to have an underscan factor of 0 and width and height values of 1 and 2 respectively (which may indicate that the point covers a 2x3 arrangement of angular brackets); a point with a size value of 10 can be determined to have an underscan factor of 1, a width value of 1 and a height value of 0 (which may indicate that the point covers a 3x2 arrangement of angular brackets); and a point with a size value of 27 can be determined to have an underscan factor of 3 and width and height values of 0 (which may indicate that the point covers a 4x4 arrangement of angular brackets). With this method, the minimum possible size value is 0 and maximum possible size value is 31 (e.g. an underscan value of 9 and width and height values of 1), so that the size can be efficiently encoded using 5 bits (where 25= 32, so that the use of 5 bits enables the defining of values between 0 and 31).
The underscan factor for the underscan point is typically determined based on the sizes of the identified points and more specifically on the difference between the size of these points and the threshold size. The underscan factor may be selected as a factor (e.g. a smallest factor) that results in an underscan point that is above the threshold size. For example, if the threshold size is associated with a minimum width/height of 1mm, then a plurality of identified points with widths/heights of 0.6mm may be replaced with an underscan point with an underscan factor of 2 (where this results in an underscan point that has a width and height of 1.2mm). Similarly, a plurality of points with widths/heights of 0.4mm may be replaced with an underscan point with an underscan factor of 2, and a plurality of points with widths/heights of 0.3mm may be replaced with an underscan point with an underscan factor of 3.
In some embodiments, the process for capturing points comprises a two-phase process where a first phase of the process comprises a detection phase and a second phase comprises a capture (or a rendering) phase.
Referring to Figure 19, the method of determining the underscan point may comprise, in a first step 81, identifying a plurality of points during a detection phase and in a second step 82 marking one or more of the points that have a size less than a threshold size.
The detection step may comprise identifying a surface that is impacted by a scanning ray emitted by each capture device at each angle and then determining points to capture based on these scanning rays. This detection step may, for example, comprise identifying a plurality of versions of the same point (e.g. where each version is associated with a different capture device) and then determining which version to capture based on quality factors associated with each of the versions. In this regard, many surfaces are visible to a plurality of different capture devices so that these surfaces could be captured using any of these capture devices. Typically, the capture device that is used for each surface is determined based on a quality factor associated with this capture device, where the quality factor depends on the distance and angle of the point from the capture device (e.g. so that a first capture device that is close to a surface and is perpendicular to this surface has a higher quality factor than a second capture device that is further away from and/or at an angle to the surface).
The detection step may also involve detecting a size of the point. This enables the computer device to determine points that have a size less than a threshold size. Groups of points that have a size less than a threshold size may then be identified and, in a third step 83, an underscan point may be determined that replaces this group of points. The determination of the underscan point may also be based on the normals and/or distances of the points in the group of points.
In particular, the computer device may be arranged to identify a plurality of adjacent points that are marked as having a size less than a threshold size, to (optionally) identify that these points have similar normals), and then to replace these points with a single point that has a size greater than one (e.g. by associating this single point with an underscan factor). For example, a single point of the plurality of points may be associated with the underscan factor and the remaining points may be removed from the representation.
With the method of Figure 18, this definition of the underscan point may occur before the capturing of any point attributes, so this may be considered to be processing a plurality of potential points and then defining a potential point that has a certain underscan factor.
In a fourth step 84, that is associated with a capture phase, the computer device determines an attribute value for the underscan point. During the detection phase, the computer device identifies a plurality of potential points to be captured by the various capture devices so as to fully capture a scene where this may comprise determining the points based on quality factors of the points and the sizes of the points. Then, during the capture phase, the capture devices determine the attributes (e.g. colours, transparencies, movement vectors, etc.) of these points. This two-phase process ensures that attributes are only captured for points that will be present in the three-dimensional representation.
During this capture phase, the capture device determines one or more attribute values for the underscan point, where these attribute values may be determined based on the locations of the underscan point and/or on the locations of the one or more of the identified points (that are replaced by the underscan point).
Since the underscan point covers a plurality of angular brackets, the capture phase may involve (for a given capture device), capturing a point in a first angular bracket (at a first elevation and azimuth), increasing the elevation angle and/or the azimuth angle by an incrementthat is greater than a minimum angular increment, and then capturing a point in a second angular bracket (that is not adjacent the first angular bracket).
The underscan point may then be defined as having a location, one or more attribute values, and a size that is greater than one.
Considering then the method of rendering a two-dimensional image based on this three-dimensional representation, in order to render this image, a computer device (e.g. the same device or another device) is arranged to determine attribute values of the points of the three-dimensional representation. As part of this determining, the computer device may determine a size of one or more points, and specifically the computer device may determine an underscan factor of an underscan point, where the two-dimensional image may then be rendered based on the determined underscan factor so that the underscan point covers a suitable area of the two-dimensional image.
Formation of a bitstream The processing of a three-dimensional representation, and the rendering of two-dimensional images based on this three-dimensional representation results in a three-dimensional representation and/or a two-dimensional image that may then be stored and/or transmitted by a computer device. In particular, the three-dimensional representation and/or the two-dimensional image may be encoded in a bitstream that is transmitted to another device and/or may be used to render one or more two-dimensional images that can be encoded in a bitstream that is transmitted to another device. This bitstream can then be decoded by this other device (e.g. the display device 17) in order to extract the three-dimensional representation(s) and/or the two-dimensional image(s) from the bitstream.
The present disclosure envisages a bitstream that contains or is determined based on a three-dimensional representation that comprises one or more 'new' or 'underscan' points, which underscan points are associated with (e.g. determined using) the processes described above. The underscan points are typically associated with a size value that is greater than one and/or are associated with an underscan factor that indicates that these points have a greater size than (at least some) other points of the representation.
Figure 20 shows a schematic of such a bitstream. Specifically, Figure 20 shows a bitstream that comprises a plurality of bits Bit-a to Bit-d. These bits signal one or more points of a three-dimensional representation and/or that defines one or more immersive two-dimensional images.
This bitstream can be decoded by a decoding device and this may allow the three-dimensional representation and/or the two-dimensional images to be extracted from the bitstream.
In some embodiments, the bitstream comprises one or more flags that indicate features of the bitstream and/or of the three-dimensional representations or two-dimensional images signalled by the bitstream. For example, the bitstream may comprise one or more flags that indicate: whether underscan points are present in the representation; an interpretation of a size value of a point (e.g. whether this size value accounts for an underscan factor); a relationship between an underscan factor and a size of a point; and a process for converting a size value of a point into an actual size of the point.
Alternatives and modifications It will be understood that the present invention has been described above purely by way of example, and modifications of detail can be made within the scope of the invention.
The representation is typically arranged to provide an extended reality (XR) experience (e.g. a representation that is useable to render a XR video). The term extended reality (XR) covers each of virtual reality (VR), augmented reality (AR), and mixed reality (MR) and it will be appreciated that the disclosures herein are applicable to any of these technologies.
The representation may be encoded into, and/or transmitted using, a bitstream, which bitstream typically comprises point data for one or more points of the three-dimensional representation. The point data may be compressed or encoded to form the bitstream. The bitstream may then be transmitted between devices before being decoded at a receiving device so that this receiving device can determine the point data and reform the three-dimensional representation (or form one or more two-dimensional images based on this three-dimensional representation) In particular, the encoder 13 may be arranged to encode (e_g one or more points of) the three-dimensional representation in order to form the bitstream and the decoder 14 may be arranged to decode the bitstream to generate the one or more two-dimensional images.
In some embodiments, the scene comprises a stationary scene, where the scene itself does not change but a viewer is able to move through the scene to view different parts of the scene. Alternatively, in some embodiments the scene comprises a video and/or a moving (e.g. non-static) scene. That is, in some embodiments the scene comprises a stationary scene, such as a building, where a viewer is able to move through this scene, e.g. to view different rooms of the building, but where the scene itself does not change. In some embodiments, the scene comprises a moving scene, where elements of the scene vary in time even when the viewer remains stationary. It will be appreciated that typically the scene comprises both stationary and moving elements where, for example, non-stationary elements move in front of a stationary background.
Reference numerals appearing in the claims are by way of illustration only and shall have no limiting effect on the scope of the claims.
Claims (25)
- Claims 1 A method of determining a point of a three-dimensional representation of a scene, the method comprising: identifying a first plurality of points of the representation; determining a size of each point of the first plurality of points; and in dependence on the size of each point being beneath a threshold value, determining an underscan point based on one or more of the first plurality of points, the underscan point having a size greater than the threshold value.
- 2. The method of any preceding claim, wherein the representation comprises a plurality of points associated with a plurality of different capture devices, and wherein the method comprises identifying a first plurality of points associated with a first capture device.
- 3. The method of any preceding claim, comprising identifying a first plurality of adjacent points of the representation.
- 4. The method of any preceding claim, wherein the size is associated with one or more of a width, height, and/or surface area covered by the point.
- 5. The method of any preceding claim, wherein each point of the first plurality of points is associated with an area of the three-dimensional representation, preferably an angular bracket associated with a capture device of the three-dimensional representation, and wherein the underscan point covers each of said areas.
- 6. The method of any preceding claim, wherein the underscan point is associated with an underscan factor, wherein the underscan factor defines a size of the underscan point, preferably wherein the method comprises determining the underscan factor for the underscan point based on a difference between the sizes of each of the identified points and the threshold value.
- 7. The method of any preceding claim, wherein the threshold value is one or more of predetermined and/or selected by a user; selected based on a feature of the scene, preferably based on a resolution associated with the scene (e.g. an intended resolution on the scene); selected based on a desired file size of the representation and/or a selected compression; and selected based on a complexity of the scene and/or of a surface associated with the first plurality of points.
- 8. The method of any preceding claim, wherein a size datafield of the underscan point comprises components associated with one or more of: a height; a width; and an underscan factor, preferably wherein each component is associated with a number of angular brackets covered by the point, preferably wherein: the width and the height are dependent on the underscan factor, more preferably wherein the underscan factor defines a maximum width and/or height of the underscan point; and/or a value of the size datafield is defined as: size value = underscan * 9 + height * 3 + width.
- 9. The method of any preceding claim, wherein determining the underscan point comprises one or more of: modifying a point of the first plurality of points; removing one or more of the first plurality of points; and replacing the first plurality of points with the underscan point.
- 10. The method of any preceding claim, comprising: determining an attribute of the underscan point, preferably comprising determining the attribute based on the attributes of the identified points; and/or determining an attribute of the underscan point, preferably comprising determining the attribute based on the attributes of the identified points.
- 11. The method of any preceding claim, comprising determining a similarity value associated with the first plurality of points, and determining the underscan point in dependence on the similarity value; preferably, wherein the similarity value is associated with one or more of: locations of each of the points; distances of each of the points from a capture device associated with the points; and normals associated with each of the points.
- 12. The method of claim 11, comprising: determining a similarity of normals associated with each of the first plurality of points, and determining the underscan point in dependence on this similarity of normals exceeding a threshold; and/or determining that the first plurality of points lie on a shared plane, and determining the underscan point in dependence on said determination that the first plurality of points lie on a shared plane; and/or determining a similarity of attributes associated with each of the first plurality of points, and determining the underscan point in dependence on this similarity of attributes exceeding a threshold.
- 13. The method of any claim 11 or 12, wherein the underscan point is determined without consideration of the attributes of the identified points.
- 14. The method of any of claims 11 to 13, wherein the similarity is associated with one or more of: a variance of values associated with the points; and a range and/or spread of values associated with the points.
- 15. The method of any preceding claim, wherein determining the underscan point comprises: determining a location for the underscan point, preferably determining the location as being the same as the location of one of the first plurality of points; and/or determining an attribute value for the underscan point; and/or defining a size of the underscan point, the size being dependent on the number and/or arrangement of the first plurality of points; preferably, wherein the size indicates a number of angular brackets of the representation that are covered by the underscan point.
- 16. The method of any preceding claim, wherein determining the first plurality of points comprises determining a plurality of contiguous points of the first representation; preferably, wherein the method comprises determining a plurality of contiguous points in a first arrangement; more preferably, wherein the method comprises comprising determining sizes relating to a plurality of arrangements of points, yet more preferably wherein the method comprises determining sizes in turn for arrangements of decreasing size
- 17. The method of any preceding claim, comprising: determining, in a first step, one or more points to be captured by one or more capture devices associated with the representation; and determining, in the second step, attribute values for said one or more points; preferably, comprising determining the first plurality of points from the one or more points.
- 18. The method of claim 17, wherein the first step comprises: determining a distance and a normal for one or more potential points; and determining the one or more points to be captured from said potential points based on the determined distances and normals; preferably, wherein the potential points are associated with a plurality of capture devices.
- 19. The method of claim 17 or 18, comprising determining the attribute values in the second step in dependence on one or more determined underscan points, preferably comprising updating an angle associated with the of the capturing attribute values based on the size of said underscan points.
- 20. The method of any of claims 17 to 19, wherein the second step comprises capturing attribute values for a plurality of points at a plurality of respective capture angles, preferably comprising incrementing a capture angle so as to capture the plurality of attribute values, more preferably comprising incrementing the angle based on an underscan factor associated with a point for which an attribute is being captured; preferably, wherein incrementing the capture angle comprises incrementing the capture angle so as to capture points in a plurality of adjacent brackets, preferably wherein the method comprises: identifying an underscan point; determining an attribute value for the underscan point; and incrementing the capture angle based on an underscan factor associated with the underscan point, preferably wherein incrementing the capture angle comprises skipping one or more angular brackets associated with the capture process; and determining a further attribute value at the incremented capture angle.
- 21. A method of determining an attribute of a point of a three-dimensional representation of a scene, the method comprising: identifying a point; identifying an underscan factor associated with the point; determining a size of the point based on the underscan factor; determining, based on the size, an arrangement of angular brackets covered by the point; and determining attribute values for each of the angular brackets based on an attribute value of the point.
- 22. The method of claim 21, comprising: determining the size of the point from a predetermined association between the underscan factor and the size; and/or rendering a two-dimensional image from the three-dimensional representation based on the determined attribute values.
- 23. An apparatus for determining a point of a three-dimensional representation of a scene, the apparatus comprising: means for identifying a first plurality of points of the representation; means for determining a size of each point of the first plurality of points; and means for determining, in dependence on the size of each point being beneath a threshold value, an underscan point based on one or more of the first plurality of points, the underscan point having a size greater than the threshold value.
- 24. A bitstream comprising one or more underscan points determined using the method of any preceding claim
- 25. A bitstream comprising an underscan point, the underscan point comprising an underscan factor that indicates one or more of: a size of the underscan point; arrangement; and number of angular brackets covered by the underscan point, preferably wherein: the underscan point is determined using the method of any of claims 1 to 19; and/or the bitstream comprises one or more flags indicating one or more of: whether underscan points are present in the representation; an interpretation of a size value of a point; a relationship between an underscan factor and a size of a point; and a process for converting a size value of a point into an actual size of the point.
Priority Applications (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| GB2409906.1A GB2638298A (en) | 2024-07-08 | 2024-07-08 | Determining a point of a three-dimensional representation of a scene |
| PCT/GB2025/051495 WO2026013385A1 (en) | 2024-07-08 | 2025-07-08 | Determining a point of a three-dimensional representation of a scene |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| GB2409906.1A GB2638298A (en) | 2024-07-08 | 2024-07-08 | Determining a point of a three-dimensional representation of a scene |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| GB202409906D0 GB202409906D0 (en) | 2024-08-21 |
| GB2638298A true GB2638298A (en) | 2025-08-20 |
Family
ID=92301848
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| GB2409906.1A Pending GB2638298A (en) | 2024-07-08 | 2024-07-08 | Determining a point of a three-dimensional representation of a scene |
Country Status (2)
| Country | Link |
|---|---|
| GB (1) | GB2638298A (en) |
| WO (1) | WO2026013385A1 (en) |
Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20210183144A1 (en) * | 2019-12-13 | 2021-06-17 | Sony Corporation | Reducing volumetric data while retaining visual fidelity |
| US11625892B1 (en) * | 2019-07-22 | 2023-04-11 | Scale AI, Inc. | Visualization techniques for data labeling |
Family Cites Families (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| EP3143774A4 (en) * | 2014-05-13 | 2018-04-25 | PCP VR Inc. | Method, system and apparatus for generation and playback of virtual reality multimedia |
| BE1022580A9 (en) | 2014-10-22 | 2016-10-06 | Parallaxter | Method of obtaining immersive videos with interactive parallax and method of viewing immersive videos with interactive parallax |
| GB2553556B (en) | 2016-09-08 | 2022-06-29 | V Nova Int Ltd | Data processing apparatuses, methods, computer programs and computer-readable media |
| US10909725B2 (en) * | 2017-09-18 | 2021-02-02 | Apple Inc. | Point cloud compression |
| US11632560B2 (en) | 2017-12-06 | 2023-04-18 | V-Nova International Limited | Methods and apparatuses for encoding and decoding a bytestream |
| GB2618720B (en) | 2019-03-20 | 2024-03-13 | V Nova Int Ltd | Low complexity enhancement video coding |
| WO2021002633A2 (en) * | 2019-07-04 | 2021-01-07 | 엘지전자 주식회사 | Point cloud data transmission device, point cloud data transmission method, point cloud data reception device, and point cloud data reception method |
-
2024
- 2024-07-08 GB GB2409906.1A patent/GB2638298A/en active Pending
-
2025
- 2025-07-08 WO PCT/GB2025/051495 patent/WO2026013385A1/en active Pending
Patent Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US11625892B1 (en) * | 2019-07-22 | 2023-04-11 | Scale AI, Inc. | Visualization techniques for data labeling |
| US20210183144A1 (en) * | 2019-12-13 | 2021-06-17 | Sony Corporation | Reducing volumetric data while retaining visual fidelity |
Also Published As
| Publication number | Publication date |
|---|---|
| WO2026013385A1 (en) | 2026-01-15 |
| GB202409906D0 (en) | 2024-08-21 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| TWI786157B (en) | Apparatus and method for generating a tiled three-dimensional image representation of a scene | |
| KR20190038664A (en) | Splitting content-based streams of video data | |
| US12081719B2 (en) | Method and apparatus for coding and decoding volumetric video with view-driven specularity | |
| EP3564905A1 (en) | Conversion of a volumetric object in a 3d scene into a simpler representation model | |
| US20250184467A1 (en) | Image signal representing a scene | |
| CN114930812B (en) | Method and apparatus for decoding 3D video | |
| US12142013B2 (en) | Haptic atlas coding and decoding format | |
| WO2022259632A1 (en) | Information processing device and information processing method | |
| GB2638298A (en) | Determining a point of a three-dimensional representation of a scene | |
| GB2640911A (en) | Determining a point of a three-dimensional representation of a scene | |
| WO2026013386A1 (en) | Processing a three-dimensional representation of a scene | |
| GB2637364A (en) | Determining a point of a three-dimensional representation of a scene | |
| GB2640002A (en) | Updating a depth buffer | |
| GB2642454A (en) | Rendering a two-dimensional image | |
| WO2026018021A1 (en) | Processing a three-dimensional representation of a scene | |
| GB2640277A (en) | Determining a location of a point | |
| GB2640349A (en) | Processing a three-dimensional representation of a scene | |
| WO2025233632A1 (en) | Bitstream | |
| GB2637804A (en) | Determining a point of a three-dimensional representation of a scene | |
| GB2640278A (en) | Processing a three-dimensional representation of a scene | |
| WO2026018019A1 (en) | Processing a three-dimensional representation of a scene | |
| GB2637367A (en) | Processing a point of a three-dimensional representation | |
| GB2642548A (en) | Processing a point of a three-dimensional representation | |
| GB2642453A (en) | Processing a three-dimensional representation of a scene | |
| WO2025248250A1 (en) | Processing a point of a three-dimensional representation |