WO2025099438A1 - Split-frame coding - Google Patents
Split-frame coding Download PDFInfo
- Publication number
- WO2025099438A1 WO2025099438A1 PCT/GB2024/052832 GB2024052832W WO2025099438A1 WO 2025099438 A1 WO2025099438 A1 WO 2025099438A1 GB 2024052832 W GB2024052832 W GB 2024052832W WO 2025099438 A1 WO2025099438 A1 WO 2025099438A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- image
- data
- processed image
- region
- obtained image
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/10—Processing, recording or transmission of stereoscopic or multi-view image signals
- H04N13/106—Processing image signals
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/10—Processing, recording or transmission of stereoscopic or multi-view image signals
- H04N13/106—Processing image signals
- H04N13/161—Encoding, multiplexing or demultiplexing different image signal components
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/597—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/234—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
- H04N21/2343—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/81—Monomedia components thereof
- H04N21/816—Monomedia components thereof involving special video data, e.g 3D video
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/83—Generation or processing of protective or descriptive data associated with content; Content structuring
- H04N21/845—Structuring of content, e.g. decomposing content into time segments
- H04N21/8451—Structuring of content, e.g. decomposing content into time segments using Advanced Video Coding [AVC]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/83—Generation or processing of protective or descriptive data associated with content; Content structuring
- H04N21/845—Structuring of content, e.g. decomposing content into time segments
- H04N21/8453—Structuring of content, e.g. decomposing content into time segments by locking or enabling a set of features, e.g. optional functionalities in an executable program
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T19/00—Manipulating 3D models or images for computer graphics
- G06T19/006—Mixed reality
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/60—Rotation of whole images or parts thereof
Definitions
- the present disclosure relates to split-frame coding. More particularly, but not exclusively, the present disclosure relates to split-frame coding that uses image processing to enable given encoder and/or decoder technology to be leveraged for additional use cases.
- split-frame encoding In horizontal split-frame encoding, an input image is split in half horizontally into two horizontal stripes. Each stripe spans the full width of the image but only half of the height of the image. Such stripes may be referred to as “strips”, “parts”, “regions”, or the like. Split-frame encoding may also be referred to as “spatial split” encoding.
- SDK Video Codec Software Development Kit
- API Application Programming Interface
- Horizontal stripes of the split image can be encoded in parallel by multiple NVENCs of a Graphics Processing Unit (GPU). Encoding in parallel can significantly decrease encoding times compared to full-frame encoding.
- GPU Graphics Processing Unit
- horizontal split-frame encoding can reduce image quality compared to encoding the input image as a whole. For example, the split might be noticeable in a reconstructed version of the input image, particularly at low bit rates. Compression efficiency may also be reduced where horizontal stripes are encoded separately and independently of each other.
- Figure 1 shows a schematic block diagram of an example of an image processing system
- Figures 2A and 2B show a schematic block diagram of another example of an image processing system
- Figure 3 shows a schematic block diagram of part of another example of an image processing system
- Figure 4 shows a schematic block diagram of another part of the example image processing system shown in Figure 3;
- FIG. 5 shows a schematic block diagram of several parts of the example image processing system shown in Figures 3 and 4;
- Figure 6 shows a schematic block diagram of part of another example of an image processing system
- Figure 7 shows a schematic block diagram of several parts of the example image processing system shown in Figure 6;
- Figure 8 shows a schematic block diagram of part of another example of an image processing system
- Figure 9 shows a schematic block diagram of several parts of the example image processing system shown in Figure 8.
- Figure 10 shows a schematic block diagram of part of another example of an image processing system
- Figure 11 shows a schematic block diagram of several parts of the example image processing system shown in Figure 10.
- Figure 12 shows a schematic block diagram of an example of an apparatus.
- the signal processing system 100 is used to process signals. Examples of types of signal include, but are not limited to, video signals, image signals, audio signals, volumetric signals such as those used in medical, scientific or holographic imaging, or other multidimensional signals.
- the signal processing system 100 includes a first apparatus 102 and a second apparatus 104.
- the first apparatus 102 and second apparatus 104 may have a clientserver relationship, with the first apparatus 102 performing the functions of a server device and the second apparatus 104 performing the functions of a client device.
- the signal processing system 100 may include at least one additional apparatus (not shown).
- the first apparatus 102 and/or second apparatus 104 may comprise one or more components. The one or more components may be implemented in hardware and/or software.
- the one or more components may be co-located or may be located remotely from each other in the signal processing system 100.
- Examples of types of apparatus include, but are not limited to, computerised devices, handheld or laptop computers, tablets, mobile devices, games consoles, smart televisions, set-top boxes, extended reality (XR) headsets (including augmented reality (AR) and/or virtual reality (VR) headsets) etc.
- XR extended reality
- AR augmented reality
- VR virtual reality
- the first apparatus 102 is communicatively coupled to the second apparatus 104 via a data communications network 106.
- Examples of the data communications network 106 include, but are not limited to, the Internet, a Local Area Network (LAN) and a Wide Area Network (WAN).
- the first and/or second apparatus 102, 104 may have a wired and/or wireless connection to the data communications network 106.
- the first apparatus 102 comprises an encoder 108.
- the encoder 108 is configured to encode data comprised in and/or derived based on the signal, which is referred to hereinafter as “signal data”.
- signal data For example, where the signal is a video signal, the encoder 108 is configured to encode video data.
- Video data comprises a sequence of multiple images or frames.
- the encoder 108 may perform one or more further functions in addition to encoding signal data.
- the encoder 108 may be embodied in various different ways.
- the encoder 108 may be embodied in hardware and/or software.
- the encoder 108 may encode metadata associated with the signal.
- the first apparatus 102 may use one or more than one encoder 108.
- the first apparatus 102 comprises the encoder 108
- the first apparatus 102 is separate from the encoder 108.
- the first apparatus 102 is communicatively coupled to the encoder 108.
- the first apparatus 102 may be embodied as one or more software functions and/or hardware modules.
- the second apparatus 104 comprises a decoder 110.
- the decoder 110 is configured to decode signal data.
- the decoder 110 may perform one or more further functions in addition to decoding signal data.
- the decoder 110 may be embodied in various different ways.
- the decoder 110 may be embodied in hardware and/or software.
- the decoder 110 may decoder metadata associated with the signal.
- the second apparatus 104 may use one or more than one decoder 110.
- the second apparatus 104 comprises the decoder 110
- the second apparatus 104 is separate from the decoder 110.
- the second apparatus 104 is communicatively coupled to the decoder 110.
- the second apparatus 104 may be embodied as one or more software functions and/or hardware modules.
- the encoder 108 encodes signal data and transmits the encoded signal data to the decoder 110 via the data communications network 106.
- the decoder 110 decodes the received, encoded signal data and generates decoded signal data.
- the decoder 110 may output the decoded signal data, or data derived using the decoded signal data. For example, the decoder 110 may output such data for display on one or more display devices associated with the second apparatus 104.
- the encoder 108 transmits to the decoder 110 a representation of a signal at a given level of quality and information the decoder 110 can use to reconstruct a representation of the signal at one or more higher levels of quality.
- Such information may be referred to as “reconstruction data”.
- “reconstruction” of a representation involves obtaining a representation that is not an exact replica of an original representation. The extent to which the representation is the same as the original representation may depend on various factors including, but not limited to, quantisation levels.
- a representation of a signal at a given level of quality may be considered to be a rendition, version or depiction of data comprised in the signal at the given level of quality.
- the reconstruction data is included in the signal data that is encoded by the encoder 108 and transmitted to the decoder 110.
- the reconstruction data may be in the form of metadata.
- the reconstruction data is encoded and transmitted separately from the signal data.
- the information the decoder 110 uses to reconstruct the representation of the signal at the one or more higher levels of quality may comprise residual data, as described in more detail below. Residual data is an example of reconstruction data.
- the information the decoder 110 uses to reconstruct the representation of the signal at the one or more higher levels of quality may also comprise configuration data relating to processing of the residual data.
- the configuration data may indicate how the residual data has been processed by the encoder 108 and/or how the residual data is to be processed by the decoder 110.
- the configuration data may be signalled to the decoder 110, for example in the form of metadata.
- the signal processing system 200 includes a first apparatus 202 and a second apparatus 204.
- the first apparatus 202 comprises an encoder and the second apparatus 204 comprises a decoder.
- the encoder is not comprised in the first apparatus 202 and/or the decoder is not comprised in the second apparatus 204.
- items are shown on two logical levels. The two levels are separated by a dashed line. Items on the first, highest level relate to data at a first level of quality. Items on the second, lowest level relate to data at a second level of quality.
- the first level of quality is higher than the second level of quality.
- the first and second levels of quality relate to a tiered hierarchy having multiple levels of quality.
- the tiered hierarchy comprises more than two levels of quality.
- the first apparatus 202 and the second apparatus 204 may include more than two different levels. There may be one or more other levels above and/or below those depicted in Figures 2A and 2B. As described herein, in certain cases, the levels of quality may correspond to different spatial resolutions.
- the first apparatus 202 obtains a first representation of an image at the first level of quality 206.
- a representation of a given image is a representation of data comprised in the image.
- the image may be a given frame of a video.
- the first representation of the image at the first level of quality 206 will be referred to as “input data” hereinafter as, in this example, it is data provided as an input to the encoder in the first apparatus 202.
- the first apparatus 202 may receive the input data 206.
- the first apparatus 202 may receive the input data 206 from at least one other apparatus.
- the first apparatus 202 may be configured to receive successive portions of input data 206, e.g. successive frames of a video, and to perform the operations described herein to each successive frame.
- a video may comprise frames Fi, F2, ... FT and the first apparatus 202 may process each of these in turn.
- the first apparatus 202 derives data 212 based on the input data 206.
- the data 212 based on the input data 206 is a representation 212 of the image at the second, lower level of quality.
- the data 212 is derived by performing a downsampling operation on the input data 206 and will therefore be referred to as “downsampled data” hereinafter.
- the data 212 is derived by performing an operation other than a downsampling operation on the input data 206, or the data 212 is the same as the input data 206 (i.e. the input data 206 is not processed, e.g. downsampled).
- the downsampled data 212 is processed to generate processed data 213 at the second level of quality.
- the downsampled data 212 is not processed at the second level of quality.
- the first apparatus 202 may generate data at the second level of quality, where the data at the second level of quality comprises the downsampled data 212 or the processed data 213.
- generating the processed data 213 involves the downsampled data 212 being encoded. Such encoding may occur within the first apparatus 202, or the first apparatus 202 may output the processed data 213 to an external encoder. Encoding the downsampled data 212 produces an encoded image at the second level of quality.
- the first apparatus 202 may output the encoded image, for example for transmission to the second apparatus 204.
- a series of encoded images, e.g. forming an encoded video, as output for transmission to the second apparatus 204 may be referred to as a “base” stream or base layer.
- the encoded image may be produced by an encoder that is separate from the first apparatus 202.
- the encoded image may be part of an H.264 encoded video, or otherwise.
- Generating the processed data 213 may, for example, comprise generating successive frames of video as output by a separate encoder such as an H.264 video encoder.
- An intermediate set of data for the generation of the processed data 213 may comprise the output of such an encoder, as opposed to any intermediate data generated by the separate encoder.
- Generating the processed data 213 at the second level of quality may further involve decoding the encoded image at the second level of quality.
- the decoding operation may be performed to emulate a decoding operation at the second apparatus 204, as will become apparent below.
- Decoding the encoded image produces a decoded image at the second level of quality.
- the first apparatus 202 decodes the encoded image at the second level of quality to produce the decoded image at the second level of quality.
- the first apparatus 202 receives the decoded image at the second level of quality, for example from an encoder and/or decoder that is separate from the first apparatus 202.
- the encoded image may be decoded using an H.264 decoder.
- the decoding by a separate decoder may comprise inputting encoded video, such as an encoded data stream configured for transmission to a remote decoder, into a separate black-box decoder implemented together with the first apparatus 202 to generate successive decoded frames of video.
- Processed data 213 may thus comprise a frame of video data that is generated via a complex non-linear encoding and decoding process, where the encoding and decoding process may involve modelling spatiotemporal correlations as per a particular encoding standard such as H.264.
- this complexity is effectively hidden from the first apparatus 202.
- generating the processed data 213 at the second level of quality further involves obtaining correction data based on a comparison between the downsampled data 212 and the decoded image obtained by the first apparatus 202, for example based on the difference between the downsampled data 212 and the decoded image.
- the correction data can be used to correct for errors introduced in encoding and decoding the downsampled data 212.
- the first apparatus 202 outputs the correction data, for example for transmission to the second apparatus 204, as well as the encoded signal. This allows the recipient to correct for the errors introduced in encoding and decoding the downsampled data 212.
- This correction data may also be referred to as a “first enhancement” stream or first enhancement layer.
- the correction data may be based on the difference between the downsampled data 212 and the decoded image it may be seen as a form of residual data (e.g. that is different from the other set of residual data described later below).
- generating the processed data 213 at the second level of quality further involves correcting the decoded image using the correction data.
- the correction data as output for transmission may be placed into a form suitable for combination with the decoded image, and then added to the decoded image. This may be performed on a frame-by-frame basis.
- the first apparatus 202 uses the downsampled data 212. For example, in certain cases, just the encoded then decoded data may be used and in other cases, encoding and decoding may be replaced by other processing.
- generating the processed data 213 involves performing one or more operations other than the encoding, decoding, obtaining and correcting acts described above.
- the first apparatus 202 obtains data 214 based on the data at the second level of quality.
- the data at the second level of quality may comprise the processed data 213, or the downsampled data 212 where the downsampled data 212 is not processed at the lower level.
- the processed data 213 may comprise a reconstructed video stream (e.g. from an encoding-decoding operation) that is corrected using correction data.
- the data 214 is a second representation of the image at the first level of quality, the first representation of the image at the first level of quality being the input data 206.
- the second representation at the first level of quality may be considered to be a preliminary or predicted representation of the image at the first level of quality.
- the first apparatus 202 derives the data 214 by performing an upsampling operation on the data at the second level of quality.
- the data 214 will be referred to hereinafter as “upsampled data”.
- one or more other operations could be used to derive the data 214, for example where data 212 is not derived by downsampling the input data 206.
- the input data 206 and the upsampled data 214 are used to obtain residual data 216.
- the residual data 216 is associated with the image.
- the residual data 216 may be in the form of a set of residual elements, which may be referred to as a “residual frame” or a “residual image”.
- a residual element in the set of residual elements 216 may be associated with a respective image element in the input data 206.
- An example of an image element is a pixel.
- a given residual element is obtained by subtracting a value of an image element in the upsampled data 214 from a value of a corresponding image element in the input data 206.
- the residual data 216 is useable in combination with the upsampled data 214 to reconstruct the input data 206.
- the residual data 216 may also be referred to as “reconstruction data” or “enhancement data”.
- the residual data 216 may form part of a “second enhancement” stream or a second enhancement layer.
- the first apparatus 202 obtains configuration data relating to processing of the residual data 216.
- the configuration data indicates how the residual data 216 has been processed and/or generated by the first apparatus 202 and/or how the residual data 216 is to be processed by the second apparatus 204.
- the configuration data may comprise a set of configuration parameters.
- the configuration data may be useable to control how the second apparatus 204 processes data and/or reconstructs the input data 206 using the residual data 216.
- the configuration data may relate to one or more characteristics of the residual data 216.
- the configuration data may relate to one or more characteristics of the input data 206. Different configuration data may result in different processing being performed on and/or using the residual data 216.
- the configuration data is therefore useable to reconstruct the input data 206 using the residual data 216.
- configuration data may also relate to the correction data described herein.
- the first apparatus 202 transmits to the second apparatus 204 data based on the downsampled data 212, data based on the residual data 216, and the configuration data (or data based on the configuration data), to enable the second apparatus 204 to reconstruct the input data 206.
- the second apparatus 204 receives data 220 based on (e.g. derived from) the downsampled data 212.
- the second apparatus 204 also receives data based on the residual data 216.
- the second apparatus 204 may receive a “base” stream (data 220), a “first enhancement stream” (any correction data) and a “second enhancement stream” (residual data 216).
- the second apparatus 204 also receives the configuration data relating to processing of the residual data 216.
- the data 220 based on the downsampled data 212 may be the downsampled data 212 itself, the processed data 213, or data derived from the downsampled data 212 or the processed data 213.
- the data based on the residual data 216 may be the residual data 216 itself, or data derived from the residual data 216.
- the received data 220 comprises the processed data 213, which may comprise the encoded image at the second level of quality and/or the correction data.
- the second apparatus 204 processes the received data 220 to generate processed data 222.
- Such processing by the second apparatus 204 may comprise decoding an encoded image (e.g. that forms part of a “base” encoded video stream) to produce a decoded image at the second level of quality.
- the processing by the second apparatus 204 comprises correcting the decoded image using obtained correction data.
- the processed data 222 may comprise a frame of corrected data at the second level of quality.
- the encoded image at the second level of quality is decoded by a decoder that is separate from the second apparatus 204. The encoded image at the second level of quality may be decoded using an H.264 decoder.
- the received data 220 comprises the downsampled data 212 and does not comprise the processed data 213. In some such examples, the second apparatus 204 does not process the received data 220 to generate processed data 222.
- the second apparatus 204 uses data at the second level of quality to derive the upsampled data 214.
- the data at the second level of quality may comprise the processed data 222, or the received data 220 where the second apparatus 204 does not process the received data 220 at the second level of quality.
- the upsampled data 214 is a preliminary representation of the image at the first level of quality.
- the upsampled data 214 may be derived by performing an upsampling operation on the data at the second level of quality.
- the second apparatus 204 obtains the residual data 216.
- the residual data 216 is useable with the upsampled data 214 to reconstruct the input data 206.
- the residual data 216 is indicative of a comparison between the input data 206 and the upsampled data 214.
- the second apparatus 204 also obtains the configuration data related to processing of the residual data 216.
- the configuration data is useable by the second apparatus 204 to reconstruct the input data 206.
- the configuration data may indicate a characteristic or property relating to the residual data 216 that affects how the residual data 216 is to be used and/or processed, or whether the residual data 216 is to be used at all.
- the configuration data comprises the residual data 216.
- One such consideration is the amount of information that is generated, stored, transmitted and/or processed.
- the more information that is used the greater the amount of resources that may be involved in handling such information. Examples of such resources include transmission resources, storage resources and processing resources.
- Some signal processing techniques allow a relatively small amount of information to be used. This may reduce the amount of data transmitted via the data communications network 106. The savings may be particularly relevant where the data relates to high quality video data, where the amount of information transmitted can be especially high.
- image processing is used herein generally to mean any type of processing operation performed on any type of image.
- processing operations include, but are not limited to, rotation, stacking, de-rotation, de-stacking, scaling, de-scaling, transformation, and de-transformation.
- images include, but are not limited to, photographs, computer-generated images, frames from a video signal, and so on. Referring to Figure 3, an image 302 is obtained.
- the obtained image 302 may be referred to herein as a “frame”, an “obtained image”, a “source image”, an “input image” or the like.
- the image 302 may be obtained in various ways. For example, the obtained image 302 may be received, may be generated, may be retrieved from storage, or otherwise.
- the obtained image 302 comprises XR content.
- the XR content comprises VR content.
- the XR content comprises AR content.
- Latency is particularly, but not exclusively, relevant to XR content. As such, latency reduction is particularly, but not exclusively, effective for XR content. By way of an additional example, latency reduction can also be especially effective in the context of videoconferencing.
- the obtained image 302 is input to an encoder 304.
- the encoder 304 is a horizontal split-frame encoder.
- the encoder 304 may also be referred to as a “horizontal striping encoder”.
- the horizontal split-frame encoder 304 may have an API via which the horizontal split-frame encoder 304 can be instructed to encode the obtained image 302.
- the horizontal split-frame encoder 304 obtains and splits the obtained image 302 in half horizontally and encodes the resulting half-images. Splitting the obtained image 302 in this way may be referred to as splitting the obtained image 302 “in two”, “two ways” or the like.
- the split-frame encoder 304 obtains and splits the obtained image 302 in half horizontally.
- the split-frame encoder 304 may horizontally split the obtained image 302 in other proportions in other examples.
- references to halves of images should be understood accordingly.
- the split-frame encoder 304 may split the obtained image 302 into an upper region (which may also be referred to as a “portion”) corresponding to X% of the height of the obtained image 302 and into a lower region correspoinding to (100 — X)% of the height of the obtained image 302.
- X 50.
- the encoder 304 splits the obtained image 302 and encodes the split images
- splitting may be performed outside of the encoder 304 in other examples.
- the obtained image 302 may be input to a splitter (not shown), the splitter may split the obtained image 302, and the encoder 302 may encode the output of the splitter.
- the splitting provides an upper region 306 of the obtained image 302 and a lower region 308 of the obtained image 302.
- the upper and lower regions 306, 308 may be referred to as “partial images”, “split images”, “partial frames” or the like.
- the upper 306 region may be referred to as a “top region” and the lower region 308 may be referred to as a “bottom region”.
- the horizontal split-frame encoder 304 is also operable to perform other types of encoding, in addition to horizontal split-frame encoding.
- the horizontal split-frame encoder 304 may be operable to perform vertical split-frame encoding, encoding that does not split the obtained image 302, and so on.
- Vertical split-frame encoding may also be referred to as vertical striping.
- the encoder 304 comprises a vertical split-frame encoder.
- the vertical split-frame encoder is configured to split a stereoscopic image along a dividing line between a left region and a right region of the stereoscopic image.
- the vertical split-frame encoder is configured to encode the left and right regions of the stereoscopic image separately.
- the horizontal split-frame encoder 304 splits the obtained image 302 in half horizontally such that the upper and lower regions 306, 308 are upper and lower halves of the obtained image 302.
- the upper and lower halves 306, 308 each have width w and each have height h/2.
- the horizontal split-frame encoder 304 may split the obtained image 302 in different horizontal proportions in other examples.
- the horizontal split-frame encoder 304 encodes the upper and lower halves 306, 308. This provides an encoded upper half 310 of the obtained image 302 and an encoded lower half 312 of the obtained image 302 respectively.
- the horizontal split-frame encoder 304 may use one or more of various different codecs to encode the upper and lower halves 310, 312.
- An example codec is the Low Complexity Enhancement Video Coding (LCEVC) coding standard.
- LCEVC is described in WO2020/188273 (PCT/GB2020/050695) and WO2019/111010 (PCT/GB2018/053552), the entire contents of which are incorporated herein by reference.
- the horizontal split-frame encoder 304 is operable to perform parallel encoding of the upper and lower halves 306, 308. In other words, in this example, the horizontal split-frame encoder 304 does not need to wait for encoding of one of the upper and lower halves 306, 308 to complete, or even start, before encoding the other of the upper and lower halves 306, 308.
- Encoding the upper and lower halves 306, 308 in parallel can increase encoding speed compared to encoding the obtained image 302 as a whole.
- the encoding speed may be doubled. This is especially, but not exclusively, effective in low-latency applications.
- An example of such an application is XR.
- XR content resolution may be increased. Although this may use some of the additional latency headroom gained by parallel encoding, the XR content resolution increase may be restricted such that latency performance does not drop below an acceptable threshold.
- the horizontal split-frame encoder 304 is operable to encode the upper and lower halves 306, 308 independently. In other words, in such examples, the encoding of the upper half 306 is independent of the encoding of the lower half 308 and vice versa.
- the split-frame encoding described above can decrease latency, it may result in a lower quality encoding compared to encoding the obtained image 302 as a whole and/or the split may be noticeable in the final output image (for example, at low bitrates).
- a decoder 402 obtains the encoded upper and lower halves 310, 312.
- the decoder 402 may obtain the encoded upper and lower halves 310, 312 by receiving the encoded upper and lower halves 310, 312 from the horizontal split-frame encoder 304.
- the decoder 402 decodes the encoded upper and lower halves 310, 312. This provides decoded upper and lower halves 404, 406 respectively. In this example, the decoder 402 decodes the encoded upper and lower halves 310, 312 in parallel. Decoding the encoded upper and lower halves 310, 312 in parallel can increase decoding speed compared to decoding the obtained image 302 as a whole. For example, the decoding speed may be doubled. This is especially, but not exclusively, effective in low-latency applications.
- the decoded upper and lower halves 404, 406 are provided to a player 408.
- the player 408 may perform image reconstruction processing and/or may cause images to be displayed.
- Image reconstruction processing may comprise stitching the decoded upper and lower halves 404, 406 together. This provides a stitched-together image having an upper half comprising the decoded upper half 404 and a lower half comprising the decoded lower half 406. As such, post-decoding image reconstruction processing may be performed outside of the decoder 402. In some examples, however, at least some postdecoding image reconstruction processing is performed by the decoder 402.
- the decoder 402 may or may not be aware that horizontal split-frame encoding has been performed.
- the decoder 402 may simply receive encoded images and decode such encoded images as if they had not been subject to horizontal split-frame encoding. The decoder 402 may then output such decoded images as if they had not been subject to horizontal split-frame encoding. The decoded images may then be recombined postdecoding, for example by the player 408. In particular, the player 408 may be aware that horizontal split-frame encoding has been performed, but the decoder 402 may not be aware of such.
- the decoder 402 may obtain (for example, receive) an indication of the size of an encoded image that is to be decoded.
- the size may be indicated in terms of width and height.
- the decoded image may be combined with another decoded image and the size of the combined image that is ultimately displayed may be different from the size of the encoded image.
- the size of the encoded image that the decoder 402 decodes is different from the size of the (combined) image that is ultimately displayed.
- the size of the image that is ultimately displayed may be indicated to the player 408. Again, the size may be indicated in terms of width and height. Referring to Figure 5, an image processing system 500 comprising the splitframe encoder 304, the decoder 402 and the player 408 is shown.
- a bitstream 502 is communicated between the split-frame encoder 304 and the decoder 402.
- the image processing system 500 may comprise one or more components (not shown) between the split-frame encoder 304 and the decoder 402, such as a transmission module and a reception module.
- the bitstream 502 comprises data output by the horizontal splitframe encoder 304 and/or data derived based on such output data.
- the bitstream 502 may be referred to as an “encoded bitstream” accordingly.
- bitstream 502 comprises the encoded upper and lower halves 310, 312.
- the decoder 402 obtains and decodes the bitstream 502 and/or data derived based on the bitstream 502, and outputs to the player 408.
- the obtained image 602 has a left region 604 and a right region 606.
- the left region 604 may be a leftmost region of the obtained image 602 and/or the right region 606 may be a rightmost region of the obtained image 602.
- the obtained image 602 may comprise one or more further regions.
- the one or more further regions comprise one or more intermediate regions, with the one more intermediate regions being between the left and right regions 604, 606 of the obtained image 602.
- the left region 604 is a left half of the obtained image 602 and the right region 604 is a right half of the obtained image 602.
- the left region 604 may be referred to as the “left half’ 604 and the right region 606 may be referred to as the “right half’ 606 accordingly.
- the left and right regions 604, 606 is not a half of the obtained image 602. Additionally, in some other examples, the left and right regions 604, 606 are not the same size (for example, height and/or width) as each other.
- the obtained image 602 comprises a stereoscopic image.
- the left region 604 corresponds to a left view (for example, a left-eye view) of a scene and the right region 606 corresponds to a right view (for example, a right-eye view) of the scene.
- the obtained image 602 is not limited to being a stereoscopic image.
- one of the left and right regions 604, 606 comprises a view of a scene and the other of the left and right regions 604, 606 comprises a corresponding depth map.
- one of the left and right regions 604, 606 comprises an image in one or more frequencies (for example, corresponding to visible light) and the other of the left and right regions 604, 606 comprises a corresponding image in one or more other frequencies (for example, infrared).
- one of the left and right regions 604, 606 comprises a video feed of one participant of a videoconference and the other of the left and right regions 604, 606 comprises a video feed of another participant of the videoconference.
- one of the left and right regions 604, 606 comprises a view of a sporting event from a first angle
- the other of the left and right regions 604, 606 comprises a view of the sporting event from a second, different angle.
- the obtained image 602 could, in principle, be provided to the horizontal splitframe encoder 304 to split the obtained image 602 into upper and lower halves and to encode the upper and lower halves accordingly. However, such splitting would split the obtained image 602 into one split image comprising the upper halves of the left and right halves of the obtained image 602 and another split image comprising the lower halves of the left and right halves of the obtained image 602.
- the horizontal split between the upper and lower halves of the each of the left and right halves of the obtained image 602 may be noticeable, especially at low bitrates.
- splitting the obtained image 602 horizontally means that the player 408 would still need both the (decoded) upper and lower splits before either the left or right half could be displayed.
- An example of such an application is stereoscopic XR content being displayed on a multi-screen XR headset.
- vertical slice encoding could be used.
- the obtained image 602 would be divided into two vertical slices (which may also be referred to as “columns”) in which each slice has, for example, half the width of the obtained image 602 and has the same height as the obtained image 602.
- a stereoscopic image could thereby be sliced into one half comprising the left half of the stereoscopic image and into another half comprising the right half of the stereoscopic image.
- Encoding and/or decoding may be parallelised, thereby improving encoding and/or decoding speed.
- vertical slice encoding relies on availability of an encoder that is operable to perform vertical slice encoding.
- examples described herein facilitate integration with an existing horizontal split-frame encoder.
- the obtained image 602 is subject-to image processing prior to being provided to a horizontal split-frame encoder 304.
- the result of such image processing is a processed image 608.
- the processed image 608 is obtained.
- Such processing may be considered to be “pre-processing” in that the obtained image 602 is subject to initial processing before being subject to additional processing by the horizontal split-frame encoder 304.
- Such pre-processing enables the horizontal split-frame encoder 304 to be leveraged to encode an obtained image 602 that would not be encoded as effectively without the pre-processing as it would with the pre-processing.
- the processed image 608 comprises an upper region 610 and a lower region 612.
- the upper region 610 of the processed image 608 corresponds to one of the left and right regions 604, 606 of the obtained image 602.
- the lower region 612 of the processed image 608 corresponds to the other of the left and right regions 604, 606 of the obtained image 602.
- Such correspondence may take various different forms, as will be described in more detail below.
- such correspondence may comprise a rotational correspondence, a stacking correspondence, a transformational correspondence or otherwise.
- the upper and lower regions 610, 612 of the processed image 608 may correspond to upper and lower halves of the processed image 608, may comprise one or more intermediate regions, may be of unequal sizes to each other, and so on. In this specific example, however, the upper and lower regions 610, 612 of the processed image 608 are upper and lower halves of the processed image 608.
- the processed image 608 is encoded more effectively by the horizontal split-frame encoder 304 than the obtained image 602 would be encoded by the horizontal split-frame encoder 304 (i.e. without the above-described image processing being performed).
- the obtained image 602 could have been provided directly to the horizontal split-frame encoder 304 for encoding. However, this would have resulted in a potentially noticeable visual artefact (related to the split) in the resulting image(s). Additionally, although the left and right regions 604, 606 of the obtained image 602 could have been encoded and decoded in parallel, a complete reconstruction or neither the left nor right region 604, 606 of the obtained image 602 would have been available for display before the other.
- one of the left and right halves 604, 606 of the obtained image 602 becomes one of the upper and lower halves 610, 612 of the processed image 608 and the other of the left and right halves 604, 606 of the obtained image 602 becomes the other of the upper and lower halves 610, 612 of the processed image 608.
- the horizontal split-frame encoder 304 splits the processed image 608 into upper and lower halves 306, 308.
- an indication is provided to the horizontal split-frame encoder 304 that the obtained image 602 has been subject to image processing.
- the horizontal split-frame encoder 304 may use such an indication when encoding the processed image 608 and/or may convey such an indication downstream.
- an instruction 614 is provided to the horizontal split-frame encoder 304 to perform horizontal split-frame encoding in relation to the processed image 608.
- the horizontal split-frame encoder 304 may be operable to perform horizontal split-frame encoding and may be operable to perform one or more other types of encoding.
- the horizontal split-frame encoder 304 defaults to not performing horizontal split-frame encoding and only performs horizontal split-frame encoding when instructed to do so.
- the instruction 614 may be in the form of a flag. For example, a flag value of “0” may indicate that horizontal split-frame encoding is not to be performed and a flag value of “1” may indicate that horizontal split-frame encoding is to be performed.
- the instruction 614 may be provided to the horizontal split-frame encoder 304 with the processed image 608 or otherwise.
- the instruction 614 may be provided to the horizontal split-frame encoder 304 to switch the horizontal split-frame encoder 304 from a horizontal split-frame encoding mode to another encoding mode, and a further instruction 614 may subsequently be provided to the horizontal split-frame encoder 304 to switch the horizontal split-frame encoder 304 back into the horizontal split-frame encoding mode. This may be particularly effective where, for example, a video with a significant number of frames is to be encoded in a given manner.
- FIG. 7 an image processing system 700 is shown.
- the horizontal split-frame encoder 304 is operable to output a bitstream 502 comprising both the encoded upper half 310 and the encoded lower half 312.
- a bitstream 502 may comprise a header, and a payload comprising the encoded upper and lower halves 310, 312.
- the horizontal split-frame encoder 304 is operable to output a first bitstream comprising the encoded upper half 310 and a second bitstream comprising the encoded lower half 312.
- the first bitstream may comprise a header, and a payload comprising the encoded upper half 310.
- the second bitstream may comprise a header, and a payload comprising the encoded lower half 312.
- the encoded upper half 310 can be output in the first bitstream and obtained by the decoder 402 such that the decoder 402 can start decoding the first bitstream.
- the second bitstream can be output, and then obtained and decoded by the decoder 402 when available. This leverages parallelisation of the horizontal split-frame encoder 302 when the horizontal split-frame encoder 302 performs encoding in parallel.
- bitstream 502 may comprise zero, one, or more than one image processing indicator 712.
- each bitstream may comprise zero, one, or more than one image processing indicator 712.
- one of the first and second bitstreams may comprise one or more processing indicators 712 which relate to both the first and second bitstreams, and the other of the first and second bitstreams may not comprise any image processing indicators 712 accordingly.
- the image processing indicator 712 may be signalled in a transport level container.
- transport level container examples include, but are not limited to, MPEG-4 Part 14 (MP4) and MPEG transport stream (MPEG-TS, MTS, TS).
- the bitstream 502 may comprise more than one image processing indicator 712. Where the bitstream 502 comprises more than one image processing indicator 712, at least one of the image processing indicators 712 may be provided to the decoder 402 and/or at least one of the image processing indicators 712 may be provided to the player 408.
- the image processing indicator 712 may indicate whether or not split-frame encoding has been performed.
- the image processing indicator 712 may indicate whether or not image processing has been performed.
- the image processing indicator 712 may comprise an image processing type indicator, indicative of which type of image processing has been performed.
- an image processing indicator 712 is not provided in the bitstream 502.
- the bitstream 502 is not impacted by the image processing described herein (for example rotation and/or stacking).
- the bitstream 502 may indicate whether or not split-frame encoding has been performed irrespective of whether or not image processing has been performed.
- the decoder 402 may decode the bitstream 502 regardless of any image processing (for example rotation and/or stacking) that has been performed.
- the decoder 402 may produce decoded rotated and/or stacked images, which can then be de-rotated and/or de-stacked, for example by the player 408. In such examples, the decoder 402 is not impacted by the above-described image processing (for example rotation and/or stacking).
- the image processing indicator 712 may be used for decoding, for image reconstruction processing and/or for display.
- the image processing indicator 712 may indicate that split-frame encoding with rotation has been used.
- the decoder 402 can use the image processing indicator 712 to determine that split images are to be decoded and to provide decoded split images accordingly.
- the image processing indicator 712 (and/or another image processing indicator 712) can then be used by the player 408 to de-rotate the decoded split images for display.
- the decoder 402 does not need to know whether a to-be-decoded image is a split image or otherwise; the decoder 402 simply decodes the image without knowing its origin and/or nature.
- a portrait video may be recorded and encoded in 16:9 horizontal mode.
- a 90° anticlockwise rotation can be signalled in a container with the encoded landscape video.
- a player can then rotate the decoded landscape video into portrait mode for display.
- Some players ignore the rotation, however, and thus play the decoded portrait video in landscape mode.
- the decoder simply decodes the encoded video.
- the decoder 402 receives the bitstream 502 and decodes the encoded upper and lower regions 310, 312 to obtain decoded upper and lower regions 404, 406 of the processed image 608.
- the decoded upper and lower regions 404, 406 of the processed image 608 are decoded versions of encoded versions of the upper and lower regions 310, 312 of the processed image 608 respectively.
- the upper region 610 of the processed image 608 corresponds to one of the left and right regions 604, 606 of the obtained image 602.
- the lower region 612 of the processed image 608 corresponds to the other of the left and right regions 604, 606 of the obtained image 602.
- the encoded upper and lower regions 310, 312 were generated by the horizontal split-frame encoder 302.
- the decoded upper and lower regions 404, 406 are provided for image reconstruction processing and/or for display.
- Such image reconstruction processing may include de-rotation and/or destacking.
- de- is used herein to indicate a reversal, undoing or the like.
- de-rotation is used herein to mean reversing an already-performed rotation
- de-stacking is used herein to mean reversing an already- performed stacking.
- image reconstruction processing comprises derotating the decoded upper and lower regions 404, 406 of the processed image 608.
- de-rotating is performed in response to receiving an indication that the obtained image 602 was subject to rotation.
- image reconstruction processing comprises destacking the decoded upper and lower regions 404, 406 of the processed image 608.
- the de-stacking is performed in response to receiving an indication that the obtained image 602 was subject to stacking.
- the decoder 402 does not need to perform image reconstruction processing prior to the decoded images being displayed.
- left-eye and right-eye views may be decoded separately and provided to separate respective left and right screens of an XR headset, without needing to be de-rotated and/or de-stacked first.
- the obtained image 602 has a left half 604 and a right half 606.
- the obtained image 602 is subject to image processing prior to encoding.
- processing comprises rotating 802 the obtained image 604, and the processed image 608 is a rotated image.
- the upper region 610 of the processed image 608 is a rotated version of one of the left and right regions 604, 606 of the obtained image 602.
- the lower region 612 of the processed image 608 is a rotated version of the other one of the left and right regions 604, 606 of the obtained image 602.
- the image obtained 602 is rotated 90° anticlockwise, which corresponds to a clockwise rotation of 270°. More generally, in examples, the rotation described herein is at least 90°. This differs from other types of image processing that might only cause a nominal amount of rotation.
- the left half 604 of the obtained image 602 becomes the lower half of the rotated image 608 and the right half 606 of the obtained image 602 has become the upper half of the rotated image 604.
- the lower half 612 of the rotated image 608 is, therefore, a 90°-anticlockwise-rotated version of the left half 604 of the obtained image 602, and the upper half 610 of the rotated image 608 is, similarly, a 90°- anticlockwise-rotated version of the right half 606 of the obtained image 602.
- the rotated image 608 is provided to the horizontal split-frame encoder 304 and is encoded in a corresponding manner to that described above with reference to Figures 3 and 6.
- an indication is provided to the horizontal split-frame encoder 304 that the obtained image 602 has been subject to rotation.
- the indication may indicate one or more rotation attributes. Examples of rotation attributes include, but are not limited to, an amount of rotation and a direction of rotation.
- the left and right halves 604, 606 of the obtained image 602 each has width w/2 and height h.
- the upper and lower halves 610, 612 of the rotated image 608 each has width h and height w/2.
- the rotated image 608 has width h and height w.
- an image processing system 900 is shown.
- a bitstream 502 comprises data output by the horizontal split-frame encoder 304 and/or data derived based on such output data.
- the bitstream 502 comprises the encoded upper and lower halves 310, 312.
- the bitstream 502 comprises one or more image processing indicators 712.
- the one or more image processing indicators 712 indicate that the obtained image 302 was subject to rotation.
- the image processing indicator(s) 712 may indicate a direction of rotation.
- the image processing indicator(s) 712 may indicate an amount of rotation. In this specific example, the image processing indicator(s) 712 indicate the 90° anticlockwise rotation.
- the obtained image 602 has a left half 604 and a right half 608.
- the obtained image 602 is subject to image processing prior to encoding.
- processing comprises stacking 1002 the obtained image 602, and the processed image 608 is a stacked image.
- the upper region 610 of the processed image 608 is one of the left and right regions 604, 606 of the obtained image 602.
- the lower region 612 of the processed image 608 is the other one of the left and right regions 604, 606 of the obtained image 602.
- the image 602 is subject to stacking, resulting in a stacked image 608.
- stacking is generally used herein to mean the process of first and second data that is side-by-side becoming one on top of the other instead of being side-by-side.
- Stacking may comprise, or may be referred to as, “transporting”, “moving” or the like.
- the upper half 610 of the stacked image 608 comprises the left half 604 of the obtained image 602
- the lower half 612 of the stacked image 608 comprises the right half 606 of the obtained image 602.
- the upper half 610 of the stacked image 608 may comprise the right half 606 of the obtained image 602
- the lower half 612 of the stacked image 608 may comprise the left half 604 of the obtained image 602.
- Stacking may be performed in various different ways.
- the obtained image 602 may initially be split vertically such that there is a first image corresponding to the left half 604 of the obtained image 602 and a second image corresponding to the right half 606 of the obtained image 602.
- the first and second images may then be recombined, with one on top of the other, to provide the stacked image 608.
- the obtained image 602 may be read in a predetermined manner (for example, from top left to bottom right), with the read values being written into the appropriate locations of the stacked image 608.
- the top row of pixels of the obtained image 602 may be read from the leftmost pixel to a pixel immediately to the left of the centre of the obtained image 602, and such pixels may be written to the top row of pixels of the upper half 610 of the stacked image 608.
- the top row of pixels of the obtained image 602 may then continue to be read from a pixel immediately to the right of the centre of the obtained image 602 to the rightmost pixel of the obtained image 602, and such pixels may be written to the top row of pixels of the lower half 612 of the stacked image 608.
- the left and right halves 604, 606 of the obtained image 602 are not subject to rotation.
- the stacked image 608 is provided to the horizontal split-frame encoder 304 and is encoded in a corresponding manner to that described above with reference to Figures 3, 6 and 8.
- an indication is provided to the horizontal split-frame encoder 304 that the obtained image 602 has been subject to stacking.
- the indication may indicate one or more stacking attributes. Examples of stacking attributes include, but are not limited to, a correspondence between regions of the obtained image 602 and corresponding regions of the stacked image 608.
- the left and right halves 604, 606 of the obtained image 602 each has width w/2 and height h.
- the upper and lower halves 610, 612 of the rotated image 608 each has width w/2 and height h.
- the stacked image 608 has width w/2 and height 2 h.
- a bitstream 502 comprises data output by the horizontal splitframe encoder 304 and/or data derived based on such output data.
- the bitstream 502 comprises the encoded upper and lower halves 310, 312.
- the bitstream 502 comprises one or more image processing indicators 712.
- the one or more image processing indicators 712 indicate that the obtained image 602 was subject to stacking.
- the image processing indicator 712 may indicate a correspondence between the left and right halves 604, 606 of the obtained image 602 and the upper and lower halves 610, 612 of the stacked image 608.
- the image processing indicator 712 may be a one-bit indicator (which may also be referred to as a “flag”) where a value of “0” indicates that the left and right halves 604, 606 of the obtained image 602 correspond to the upper and lower halves 610, 612 of the stacked image 608 respectively and where a value of “1” indicates that the left and right halves 604, 606 of the obtained image 602 correspond to the lower and upper halves 612, 610 of the stacked image 608 respectively.
- such correspondence may not be signalled.
- FIG. 12 there is shown a schematic block diagram of an example of an apparatus 1200.
- the apparatus 1200 comprises an encoder. In another example, the apparatus 1200 comprises a decoder. In other examples, the apparatus 1200 comprises neither an encoder nor a decoder but is configured to communicate with an encoder and/or a decoder.
- Examples of apparatus 1200 include, but are not limited to, a mobile computer, a personal computer system, a wireless device, base station, phone device, desktop computer, laptop, notebook, netbook computer, mainframe computer system, handheld computer, workstation, network computer, application server, storage device, a consumer electronics device such as a camera, camcorder, mobile device, video game console, handheld video game device, an XR headset, or in general any type of computing or electronic device.
- the apparatus 1200 comprises one or more processors 1201 configured to process information and/or instructions.
- the one or more processors 1201 may comprise a central processing unit (CPU).
- the one or more processors 1201 are coupled with a bus 1202. Operations performed by the one or more processors 1201 may be carried out by hardware and/or software.
- the one or more processors 1201 may comprise multiple co-located processors or multiple disparately located processors.
- the apparatus 1200 comprises computer-useable volatile memory 1203 configured to store information and/or instructions for the one or more processors 1201.
- the computer-useable volatile memory 1203 is coupled with the bus 1202.
- the computer-useable volatile memory 1203 may comprise random access memory (RAM).
- the apparatus 1200 comprises computer-useable non-volatile memory 1204 configured to store information and/or instructions for the one or more processors 1201.
- the computer-useable non-volatile memory 1204 is coupled with the bus 1202.
- the computer-useable non-volatile memory 1204 may comprise read-only memory (ROM).
- the apparatus 1200 comprises one or more data-storage units 1205 configured to store information and/or instructions.
- the one or more data-storage units 1205 are coupled with the bus 1202.
- the one or more data-storage units 1205 may for example comprise a magnetic or optical disk and disk drive or a solid-state drive (SSD).
- the apparatus 1200 comprises one or more input/output (I/O) devices 1206 configured to communicate information to and/or from the one or more processors 1201.
- the one or more I/O devices 1206 are coupled with the bus 1202.
- the one or more I/O devices 1206 may comprise at least one network interface.
- the at least one network interface may enable the apparatus 1200 to communicate via one or more data communications networks. Examples of data communications networks include, but are not limited to, the Internet and a Local Area Network (LAN).
- the one or more I/O devices 1206 may enable a user to provide input to the apparatus 1200 via one or more input devices (not shown).
- the one or more input devices may include for example a remote control, one or more physical buttons etc.
- the one or more I/O devices 1206 may enable information to be provided to a user via one or more output devices (not shown).
- the one or more output devices may for example include a display screen.
- an operating system 1207 image processing module 1208, one or more further modules 1209, and data 1210 are shown as residing in one, or a combination, of the computer-usable volatile memory 1203, computer-usable non-volatile memory 1204 and the one or more data-storage units 1205.
- the data signal processing module 1208 may be implemented by way of computer program code stored in memory locations within the computer-usable non-volatile memory 1204, computer-readable storage media within the one or more data-storage units 1205 and/or other tangible computer- readable storage media.
- tangible computer-readable storage media include, but are not limited to, an optical medium (e.g., CD-ROM, DVD-ROM or Blu- ray), flash memory card, floppy or hard disk or any other medium capable of storing computer-readable instructions such as firmware or microcode in at least one ROM or RAM or Programmable ROM (PROM) chips or as an Application Specific Integrated Circuit (ASIC).
- an optical medium e.g., CD-ROM, DVD-ROM or Blu- ray
- flash memory card e.g., flash memory card, floppy or hard disk or any other medium capable of storing computer-readable instructions such as firmware or microcode in at least one ROM or RAM or Programmable ROM (PROM) chips or as an Application Specific Integrated Circuit (ASIC).
- ASIC Application Specific Integrated Circuit
- the apparatus 1200 may therefore comprise a data signal processing module 1208 which can be executed by the one or more processors 1201.
- the data signal processing module 1208 can be configured to include instructions to implement at least some of the operations described herein.
- the one or more processors 1201 launch, run, execute, interpret or otherwise perform the instructions in the signal processing module 1208.
- examples described herein with reference to the drawings comprise computer processes performed in processing systems or processors
- examples described herein also extend to computer programs, for example computer programs on or in a carrier, adapted for putting the examples into practice.
- the carrier may be any entity or device capable of carrying the program.
- apparatus 1200 may comprise more, fewer and/or different components from those depicted in Figure 12.
- the apparatus 1200 may be located in a single location or may be distributed in multiple locations. Such locations may be local or remote.
- the techniques described herein may be implemented in software or hardware, or may be implemented using a combination of software and hardware. They may include configuring an apparatus to carry out and/or support any or all of techniques described herein.
- an obtained image is split in half horizontally. In other examples, an obtained image is split into more than two horizontal stripes.
- an image having left and right regions is processed to obtain a processed image having upper and lower regions, where the upper region corresponds to one of the left and right regions, and where the lower region corresponds to the other of the left and right regions.
- the processed image is provided to a horizontal split-frame encoder.
- an obtained image having first and second regions is processed to obtain a processed image.
- the processed image has third and fourth regions, where the third region corresponds to one of the first and second regions, and where the fourth region corresponds to the other of the first and second regions.
- the processed image is provided to a split-frame encoder.
- the first and second regions may comprise left and right regions and the third and fourth regions may comprise upper and lower regions.
- the first and second regions may comprise upper and lower regions and the third and fourth regions may comprise left and right regions.
- an obtained image is subject to image processing in the form of either rotation or stacking.
- both rotation and stacking may be used.
- an obtained image may be split into left and right halves.
- the left and right halves may be rotated.
- the rotated left and right halves may then be stacked one on top of the other. This may provide more flexibility than being restricted to only one of rotation and stacking.
- the left and right halves may be rotated by different amounts and/or in different directions.
- the left half may be rotated by 90° anticlockwise and the right half may be rotated by 90° clockwise.
- the left and right halves may be rotated by the same amount and in the same direction as each other (for example, 90° anticlockwise), but the left half may be stacked on top of the right half after rotation (instead of the right half being stacked on top of the left half after such a rotation).
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
- Testing, Inspecting, Measuring Of Stereoscopic Televisions And Televisions (AREA)
Abstract
An obtained image 602 has a left region 604 and a right region 606. The obtained image 606 is processed to obtain a processed image 608. The processed image 608 has an upper region 610 and a lower region 612. The upper region 610 of the processed image 608 corresponds to one of the left and right regions 604, 606 of the obtained image 602. The lower region 612 of the processed image 608 corresponds to the other of the left and right regions 604, 606 of the obtained image 602. The processed image 608 is provided to a horizontal split-frame encoder 304. The obtained image 602 may be a stereoscopic image.
Description
SPLIT-FRAME CODING
Technical Field
The present disclosure relates to split-frame coding. More particularly, but not exclusively, the present disclosure relates to split-frame coding that uses image processing to enable given encoder and/or decoder technology to be leveraged for additional use cases.
Background
In horizontal split-frame encoding, an input image is split in half horizontally into two horizontal stripes. Each stripe spans the full width of the image but only half of the height of the image. Such stripes may be referred to as “strips”, “parts”, “regions”, or the like. Split-frame encoding may also be referred to as “spatial split” encoding.
The Video Codec Software Development Kit (SDK) from NVIDIA provides an Application Programming Interface (API) that allows horizontal split-frame encoding using encoders known as NVENCs.
Horizontal stripes of the split image can be encoded in parallel by multiple NVENCs of a Graphics Processing Unit (GPU). Encoding in parallel can significantly decrease encoding times compared to full-frame encoding.
However, horizontal split-frame encoding can reduce image quality compared to encoding the input image as a whole. For example, the split might be noticeable in a reconstructed version of the input image, particularly at low bit rates. Compression efficiency may also be reduced where horizontal stripes are encoded separately and independently of each other.
Summary
Various aspects of the present disclosure are set out in the appended claims.
Further features and advantages will become apparent from the following description of preferred embodiments, given by way of example only, which is made with reference to the accompanying drawings.
Figure 1 shows a schematic block diagram of an example of an image processing system;
Figures 2A and 2B show a schematic block diagram of another example of an image processing system;
Figure 3 shows a schematic block diagram of part of another example of an image processing system;
Figure 4 shows a schematic block diagram of another part of the example image processing system shown in Figure 3;
Figure 5 shows a schematic block diagram of several parts of the example image processing system shown in Figures 3 and 4;
Figure 6 shows a schematic block diagram of part of another example of an image processing system;
Figure 7 shows a schematic block diagram of several parts of the example image processing system shown in Figure 6;
Figure 8 shows a schematic block diagram of part of another example of an image processing system;
Figure 9 shows a schematic block diagram of several parts of the example image processing system shown in Figure 8;
Figure 10 shows a schematic block diagram of part of another example of an image processing system;
Figure 11 shows a schematic block diagram of several parts of the example image processing system shown in Figure 10; and
Figure 12 shows a schematic block diagram of an example of an apparatus.
Referring to Figure 1, there is shown an example of a signal processing system 100. The signal processing system 100 is used to process signals. Examples of types of signal include, but are not limited to, video signals, image signals, audio signals, volumetric signals such as those used in medical, scientific or holographic imaging, or other multidimensional signals.
The signal processing system 100 includes a first apparatus 102 and a second apparatus 104. The first apparatus 102 and second apparatus 104 may have a clientserver relationship, with the first apparatus 102 performing the functions of a server device and the second apparatus 104 performing the functions of a client device. The signal processing system 100 may include at least one additional apparatus (not shown). The first apparatus 102 and/or second apparatus 104 may comprise one or more components. The one or more components may be implemented in hardware and/or software. The one or more components may be co-located or may be located remotely from each other in the signal processing system 100. Examples of types of apparatus include, but are not limited to, computerised devices, handheld or laptop computers, tablets, mobile devices, games consoles, smart televisions, set-top boxes, extended reality (XR) headsets (including augmented reality (AR) and/or virtual reality (VR) headsets) etc.
The first apparatus 102 is communicatively coupled to the second apparatus 104 via a data communications network 106. Examples of the data communications network 106 include, but are not limited to, the Internet, a Local Area Network (LAN) and a Wide Area Network (WAN). The first and/or second apparatus 102, 104 may have a wired and/or wireless connection to the data communications network 106.
In this example, the first apparatus 102 comprises an encoder 108. The encoder 108 is configured to encode data comprised in and/or derived based on the signal, which is referred to hereinafter as “signal data”. For example, where the signal is a video signal, the encoder 108 is configured to encode video data. Video data comprises a sequence of multiple images or frames. The encoder 108 may perform one or more further functions in addition to encoding signal data. The encoder 108 may be embodied in various different ways. For example, the encoder 108 may be embodied in hardware and/or software. The encoder 108 may encode metadata associated with the signal. The first apparatus 102 may use one or more than one encoder 108.
Although in this example the first apparatus 102 comprises the encoder 108, in other examples the first apparatus 102 is separate from the encoder 108. In such examples, the first apparatus 102 is communicatively coupled to the encoder 108. The first apparatus 102 may be embodied as one or more software functions and/or hardware modules.
In this example, the second apparatus 104 comprises a decoder 110. The decoder 110 is configured to decode signal data. The decoder 110 may perform one or more further functions in addition to decoding signal data. The decoder 110 may be embodied in various different ways. For example, the decoder 110 may be embodied in hardware and/or software. The decoder 110 may decoder metadata associated with the signal. The second apparatus 104 may use one or more than one decoder 110.
Although in this example the second apparatus 104 comprises the decoder 110, in other examples, the second apparatus 104 is separate from the decoder 110. In such examples, the second apparatus 104 is communicatively coupled to the decoder 110. The second apparatus 104 may be embodied as one or more software functions and/or hardware modules.
The encoder 108 encodes signal data and transmits the encoded signal data to the decoder 110 via the data communications network 106. The decoder 110 decodes the received, encoded signal data and generates decoded signal data. The decoder 110 may output the decoded signal data, or data derived using the decoded signal data. For example, the decoder 110 may output such data for display on one or more display devices associated with the second apparatus 104.
In some examples described herein, the encoder 108 transmits to the decoder 110 a representation of a signal at a given level of quality and information the decoder 110 can use to reconstruct a representation of the signal at one or more higher levels of quality. Such information may be referred to as “reconstruction data”. In some examples, “reconstruction” of a representation involves obtaining a representation that is not an exact replica of an original representation. The extent to which the representation is the same as the original representation may depend on various factors including, but not limited to, quantisation levels. A representation of a signal at a given level of quality may be considered to be a rendition, version or depiction of data comprised in the signal at the given level of quality. In some examples, the reconstruction data is included in the signal data that is encoded by the encoder 108 and transmitted to the decoder 110. For example, the reconstruction data may be in the form of metadata. In some examples, the reconstruction data is encoded and transmitted separately from the signal data.
The information the decoder 110 uses to reconstruct the representation of the signal at the one or more higher levels of quality may comprise residual data, as described in more detail below. Residual data is an example of reconstruction data. The information the decoder 110 uses to reconstruct the representation of the signal at the one or more higher levels of quality may also comprise configuration data relating to processing of the residual data. The configuration data may indicate how the residual data has been processed by the encoder 108 and/or how the residual data is to be processed by the decoder 110. The configuration data may be signalled to the decoder 110, for example in the form of metadata.
Referring to Figures 2A and 2B, there is shown schematically an example of a signal processing system 200. The signal processing system 200 includes a first apparatus 202 and a second apparatus 204. In this example, the first apparatus 202 comprises an encoder and the second apparatus 204 comprises a decoder. However, as explained above, in other examples, the encoder is not comprised in the first apparatus 202 and/or the decoder is not comprised in the second apparatus 204. In each of the first apparatus 202 and the second apparatus 204, items are shown on two logical levels. The two levels are separated by a dashed line. Items on the first, highest level relate to data at a first level of quality. Items on the second, lowest level relate to data at a second level of quality. The first level of quality is higher than the second level of quality. The first and second levels of quality relate to a tiered hierarchy having multiple levels of quality. In some examples, the tiered hierarchy comprises more than two levels of quality. In such examples, the first apparatus 202 and the second apparatus 204 may include more than two different levels. There may be one or more other levels above and/or below those depicted in Figures 2A and 2B. As described herein, in certain cases, the levels of quality may correspond to different spatial resolutions.
Referring first to Figure 2A, the first apparatus 202 obtains a first representation of an image at the first level of quality 206. A representation of a given image is a representation of data comprised in the image. The image may be a given frame of a video. The first representation of the image at the first level of quality 206 will be referred to as “input data” hereinafter as, in this example, it is data provided as an input to the encoder in the first apparatus 202. The first apparatus 202 may receive the input data 206. For example, the first apparatus 202 may receive the input data 206 from at
least one other apparatus. The first apparatus 202 may be configured to receive successive portions of input data 206, e.g. successive frames of a video, and to perform the operations described herein to each successive frame. For example, a video may comprise frames Fi, F2, ... FT and the first apparatus 202 may process each of these in turn.
The first apparatus 202 derives data 212 based on the input data 206. In this example, the data 212 based on the input data 206 is a representation 212 of the image at the second, lower level of quality. In this example, the data 212 is derived by performing a downsampling operation on the input data 206 and will therefore be referred to as “downsampled data” hereinafter. In other examples, the data 212 is derived by performing an operation other than a downsampling operation on the input data 206, or the data 212 is the same as the input data 206 (i.e. the input data 206 is not processed, e.g. downsampled).
In this example, the downsampled data 212 is processed to generate processed data 213 at the second level of quality. In other examples, the downsampled data 212 is not processed at the second level of quality. As such, the first apparatus 202 may generate data at the second level of quality, where the data at the second level of quality comprises the downsampled data 212 or the processed data 213.
In some examples, generating the processed data 213 involves the downsampled data 212 being encoded. Such encoding may occur within the first apparatus 202, or the first apparatus 202 may output the processed data 213 to an external encoder. Encoding the downsampled data 212 produces an encoded image at the second level of quality. The first apparatus 202 may output the encoded image, for example for transmission to the second apparatus 204. A series of encoded images, e.g. forming an encoded video, as output for transmission to the second apparatus 204 may be referred to as a “base” stream or base layer. As explained above, instead of being produced in the first apparatus 202, the encoded image may be produced by an encoder that is separate from the first apparatus 202. The encoded image may be part of an H.264 encoded video, or otherwise. Generating the processed data 213 may, for example, comprise generating successive frames of video as output by a separate encoder such as an H.264 video encoder. An intermediate set of data for the generation of the processed data 213 may
comprise the output of such an encoder, as opposed to any intermediate data generated by the separate encoder.
Generating the processed data 213 at the second level of quality may further involve decoding the encoded image at the second level of quality. The decoding operation may be performed to emulate a decoding operation at the second apparatus 204, as will become apparent below. Decoding the encoded image produces a decoded image at the second level of quality. In some examples, the first apparatus 202 decodes the encoded image at the second level of quality to produce the decoded image at the second level of quality. In other examples, the first apparatus 202 receives the decoded image at the second level of quality, for example from an encoder and/or decoder that is separate from the first apparatus 202. The encoded image may be decoded using an H.264 decoder. The decoding by a separate decoder may comprise inputting encoded video, such as an encoded data stream configured for transmission to a remote decoder, into a separate black-box decoder implemented together with the first apparatus 202 to generate successive decoded frames of video. Processed data 213 may thus comprise a frame of video data that is generated via a complex non-linear encoding and decoding process, where the encoding and decoding process may involve modelling spatiotemporal correlations as per a particular encoding standard such as H.264. However, because the output of any encoder is fed into a corresponding decoder, this complexity is effectively hidden from the first apparatus 202.
In an example, generating the processed data 213 at the second level of quality further involves obtaining correction data based on a comparison between the downsampled data 212 and the decoded image obtained by the first apparatus 202, for example based on the difference between the downsampled data 212 and the decoded image. The correction data can be used to correct for errors introduced in encoding and decoding the downsampled data 212. In some examples, the first apparatus 202 outputs the correction data, for example for transmission to the second apparatus 204, as well as the encoded signal. This allows the recipient to correct for the errors introduced in encoding and decoding the downsampled data 212. This correction data may also be referred to as a “first enhancement” stream or first enhancement layer. As the correction data may be based on the difference between the downsampled data 212 and the
decoded image it may be seen as a form of residual data (e.g. that is different from the other set of residual data described later below).
In some examples, generating the processed data 213 at the second level of quality further involves correcting the decoded image using the correction data. For example, the correction data as output for transmission may be placed into a form suitable for combination with the decoded image, and then added to the decoded image. This may be performed on a frame-by-frame basis. In other examples, rather than correcting the decoded image using the correction data, the first apparatus 202 uses the downsampled data 212. For example, in certain cases, just the encoded then decoded data may be used and in other cases, encoding and decoding may be replaced by other processing.
In some examples, generating the processed data 213 involves performing one or more operations other than the encoding, decoding, obtaining and correcting acts described above.
The first apparatus 202 obtains data 214 based on the data at the second level of quality. As indicated above, the data at the second level of quality may comprise the processed data 213, or the downsampled data 212 where the downsampled data 212 is not processed at the lower level. As described above, in certain cases, the processed data 213 may comprise a reconstructed video stream (e.g. from an encoding-decoding operation) that is corrected using correction data. In the example of Figures 2A and 2B, the data 214 is a second representation of the image at the first level of quality, the first representation of the image at the first level of quality being the input data 206. The second representation at the first level of quality may be considered to be a preliminary or predicted representation of the image at the first level of quality. In this example, the first apparatus 202 derives the data 214 by performing an upsampling operation on the data at the second level of quality. The data 214 will be referred to hereinafter as “upsampled data”. However, in other examples one or more other operations could be used to derive the data 214, for example where data 212 is not derived by downsampling the input data 206.
The input data 206 and the upsampled data 214 are used to obtain residual data 216. The residual data 216 is associated with the image. The residual data 216 may be in the form of a set of residual elements, which may be referred to as a “residual frame”
or a “residual image”. A residual element in the set of residual elements 216 may be associated with a respective image element in the input data 206. An example of an image element is a pixel.
In this example, a given residual element is obtained by subtracting a value of an image element in the upsampled data 214 from a value of a corresponding image element in the input data 206. As such, the residual data 216 is useable in combination with the upsampled data 214 to reconstruct the input data 206. The residual data 216 may also be referred to as “reconstruction data” or “enhancement data”. In one case, the residual data 216 may form part of a “second enhancement” stream or a second enhancement layer.
The first apparatus 202 obtains configuration data relating to processing of the residual data 216. The configuration data indicates how the residual data 216 has been processed and/or generated by the first apparatus 202 and/or how the residual data 216 is to be processed by the second apparatus 204. The configuration data may comprise a set of configuration parameters. The configuration data may be useable to control how the second apparatus 204 processes data and/or reconstructs the input data 206 using the residual data 216. The configuration data may relate to one or more characteristics of the residual data 216. The configuration data may relate to one or more characteristics of the input data 206. Different configuration data may result in different processing being performed on and/or using the residual data 216. The configuration data is therefore useable to reconstruct the input data 206 using the residual data 216. As described below, in certain cases, configuration data may also relate to the correction data described herein.
In this example, the first apparatus 202 transmits to the second apparatus 204 data based on the downsampled data 212, data based on the residual data 216, and the configuration data (or data based on the configuration data), to enable the second apparatus 204 to reconstruct the input data 206.
Turning now to Figure 2B, the second apparatus 204 receives data 220 based on (e.g. derived from) the downsampled data 212. The second apparatus 204 also receives data based on the residual data 216. For example, the second apparatus 204 may receive a “base” stream (data 220), a “first enhancement stream” (any correction data) and a “second enhancement stream” (residual data 216). The second apparatus 204 also
receives the configuration data relating to processing of the residual data 216. The data 220 based on the downsampled data 212 may be the downsampled data 212 itself, the processed data 213, or data derived from the downsampled data 212 or the processed data 213. The data based on the residual data 216 may be the residual data 216 itself, or data derived from the residual data 216.
In some examples, the received data 220 comprises the processed data 213, which may comprise the encoded image at the second level of quality and/or the correction data. In some examples, for example where the first apparatus 202 has processed the downsampled data 212 to generate the processed data 213, the second apparatus 204 processes the received data 220 to generate processed data 222. Such processing by the second apparatus 204 may comprise decoding an encoded image (e.g. that forms part of a “base” encoded video stream) to produce a decoded image at the second level of quality. In some examples, the processing by the second apparatus 204 comprises correcting the decoded image using obtained correction data. Hence, the processed data 222 may comprise a frame of corrected data at the second level of quality. In some examples, the encoded image at the second level of quality is decoded by a decoder that is separate from the second apparatus 204. The encoded image at the second level of quality may be decoded using an H.264 decoder.
In other examples, the received data 220 comprises the downsampled data 212 and does not comprise the processed data 213. In some such examples, the second apparatus 204 does not process the received data 220 to generate processed data 222.
The second apparatus 204 uses data at the second level of quality to derive the upsampled data 214. As indicated above, the data at the second level of quality may comprise the processed data 222, or the received data 220 where the second apparatus 204 does not process the received data 220 at the second level of quality. The upsampled data 214 is a preliminary representation of the image at the first level of quality. The upsampled data 214 may be derived by performing an upsampling operation on the data at the second level of quality.
The second apparatus 204 obtains the residual data 216. The residual data 216 is useable with the upsampled data 214 to reconstruct the input data 206. The residual data 216 is indicative of a comparison between the input data 206 and the upsampled data 214.
The second apparatus 204 also obtains the configuration data related to processing of the residual data 216. The configuration data is useable by the second apparatus 204 to reconstruct the input data 206. For example, the configuration data may indicate a characteristic or property relating to the residual data 216 that affects how the residual data 216 is to be used and/or processed, or whether the residual data 216 is to be used at all. In some examples, the configuration data comprises the residual data 216.
There are several considerations relating to such processing. One such consideration is the amount of information that is generated, stored, transmitted and/or processed. The more information that is used, the greater the amount of resources that may be involved in handling such information. Examples of such resources include transmission resources, storage resources and processing resources. Some signal processing techniques allow a relatively small amount of information to be used. This may reduce the amount of data transmitted via the data communications network 106. The savings may be particularly relevant where the data relates to high quality video data, where the amount of information transmitted can be especially high.
Other considerations include the ability of the decoder to perform image reconstruction accurately, reliably, efficiently and/or promptly. Performing image reconstruction accurately and reliably may affect the ultimate visual quality of the displayed image and consequently may affect a viewer’s engagement with the image and/or with a video comprising the image. This can be especially relevant to XR. Efficient reconstruction is especially effective for mobile computing devices, which may readily be used in XR applications. Prompt image reconstruction is especially effective in low-latency applications, such as XR.
Various examples will now be described which relate to image processing.
The term “image processing” is used herein generally to mean any type of processing operation performed on any type of image. Examples of processing operations include, but are not limited to, rotation, stacking, de-rotation, de-stacking, scaling, de-scaling, transformation, and de-transformation. Examples of images include, but are not limited to, photographs, computer-generated images, frames from a video signal, and so on.
Referring to Figure 3, an image 302 is obtained. The obtained image 302 may be referred to herein as a “frame”, an “obtained image”, a “source image”, an “input image” or the like.
The image 302 may be obtained in various ways. For example, the obtained image 302 may be received, may be generated, may be retrieved from storage, or otherwise.
In some examples, the obtained image 302 comprises XR content. In some examples, the XR content comprises VR content. In some examples, the XR content comprises AR content. Latency is particularly, but not exclusively, relevant to XR content. As such, latency reduction is particularly, but not exclusively, effective for XR content. By way of an additional example, latency reduction can also be especially effective in the context of videoconferencing.
The obtained image 302 is input to an encoder 304.
In this example, the encoder 304 is a horizontal split-frame encoder. The encoder 304 may also be referred to as a “horizontal striping encoder”. The horizontal split-frame encoder 304 may have an API via which the horizontal split-frame encoder 304 can be instructed to encode the obtained image 302.
In this example, the horizontal split-frame encoder 304 obtains and splits the obtained image 302 in half horizontally and encodes the resulting half-images. Splitting the obtained image 302 in this way may be referred to as splitting the obtained image 302 “in two”, “two ways” or the like.
In this specific example, the split-frame encoder 304 obtains and splits the obtained image 302 in half horizontally. However, the split-frame encoder 304 may horizontally split the obtained image 302 in other proportions in other examples. As such, references to halves of images should be understood accordingly. More specifically, the split-frame encoder 304 may split the obtained image 302 into an upper region (which may also be referred to as a “portion”) corresponding to X% of the height of the obtained image 302 and into a lower region correspoinding to (100 — X)% of the height of the obtained image 302. Where the split-frame encoder 304 splits the obtained image 302 in half horizontally, X = 50.
Although, in this example, the encoder 304 splits the obtained image 302 and encodes the split images, splitting may be performed outside of the encoder 304 in other
examples. For example, the obtained image 302 may be input to a splitter (not shown), the splitter may split the obtained image 302, and the encoder 302 may encode the output of the splitter.
The splitting provides an upper region 306 of the obtained image 302 and a lower region 308 of the obtained image 302. The upper and lower regions 306, 308 may be referred to as “partial images”, “split images”, “partial frames” or the like. The upper 306 region may be referred to as a “top region” and the lower region 308 may be referred to as a “bottom region”.
In some examples, the horizontal split-frame encoder 304 is also operable to perform other types of encoding, in addition to horizontal split-frame encoding. For example, the horizontal split-frame encoder 304 may be operable to perform vertical split-frame encoding, encoding that does not split the obtained image 302, and so on. Vertical split-frame encoding may also be referred to as vertical striping.
As such, in some examples, the encoder 304 comprises a vertical split-frame encoder. The vertical split-frame encoder is configured to split a stereoscopic image along a dividing line between a left region and a right region of the stereoscopic image. The vertical split-frame encoder is configured to encode the left and right regions of the stereoscopic image separately.
In this example, the horizontal split-frame encoder 304 splits the obtained image 302 in half horizontally such that the upper and lower regions 306, 308 are upper and lower halves of the obtained image 302. In other words, where the obtained image 302 has width w and height h, the upper and lower halves 306, 308 each have width w and each have height h/2. However, as explained above, the horizontal split-frame encoder 304 may split the obtained image 302 in different horizontal proportions in other examples.
The horizontal split-frame encoder 304 encodes the upper and lower halves 306, 308. This provides an encoded upper half 310 of the obtained image 302 and an encoded lower half 312 of the obtained image 302 respectively.
The horizontal split-frame encoder 304 may use one or more of various different codecs to encode the upper and lower halves 310, 312. An example codec is the Low Complexity Enhancement Video Coding (LCEVC) coding standard. LCEVC is described in WO2020/188273 (PCT/GB2020/050695) and WO2019/111010
(PCT/GB2018/053552), the entire contents of which are incorporated herein by reference.
In this example, the horizontal split-frame encoder 304 is operable to perform parallel encoding of the upper and lower halves 306, 308. In other words, in this example, the horizontal split-frame encoder 304 does not need to wait for encoding of one of the upper and lower halves 306, 308 to complete, or even start, before encoding the other of the upper and lower halves 306, 308.
Encoding the upper and lower halves 306, 308 in parallel can increase encoding speed compared to encoding the obtained image 302 as a whole. For example, the encoding speed may be doubled. This is especially, but not exclusively, effective in low-latency applications. An example of such an application is XR.
By decreasing encoding time, higher-resolution XR content may be useable while enabling acceptable latency targets and/or constraints to be met. For example, where parallel encoding results in significant additional latency headroom (i.e. in terms of acceptable performance), XR content resolution may be increased. Although this may use some of the additional latency headroom gained by parallel encoding, the XR content resolution increase may be restricted such that latency performance does not drop below an acceptable threshold.
In some examples, the horizontal split-frame encoder 304 is operable to encode the upper and lower halves 306, 308 independently. In other words, in such examples, the encoding of the upper half 306 is independent of the encoding of the lower half 308 and vice versa.
Although the split-frame encoding described above can decrease latency, it may result in a lower quality encoding compared to encoding the obtained image 302 as a whole and/or the split may be noticeable in the final output image (for example, at low bitrates).
Referring to Figure 4, a decoder 402 obtains the encoded upper and lower halves 310, 312. For example, the decoder 402 may obtain the encoded upper and lower halves 310, 312 by receiving the encoded upper and lower halves 310, 312 from the horizontal split-frame encoder 304.
The decoder 402 decodes the encoded upper and lower halves 310, 312. This provides decoded upper and lower halves 404, 406 respectively.
In this example, the decoder 402 decodes the encoded upper and lower halves 310, 312 in parallel. Decoding the encoded upper and lower halves 310, 312 in parallel can increase decoding speed compared to decoding the obtained image 302 as a whole. For example, the decoding speed may be doubled. This is especially, but not exclusively, effective in low-latency applications.
The decoded upper and lower halves 404, 406 are provided to a player 408. The player 408 may perform image reconstruction processing and/or may cause images to be displayed.
Image reconstruction processing may comprise stitching the decoded upper and lower halves 404, 406 together. This provides a stitched-together image having an upper half comprising the decoded upper half 404 and a lower half comprising the decoded lower half 406. As such, post-decoding image reconstruction processing may be performed outside of the decoder 402. In some examples, however, at least some postdecoding image reconstruction processing is performed by the decoder 402.
The decoder 402 may or may not be aware that horizontal split-frame encoding has been performed.
For example, the decoder 402 may simply receive encoded images and decode such encoded images as if they had not been subject to horizontal split-frame encoding. The decoder 402 may then output such decoded images as if they had not been subject to horizontal split-frame encoding. The decoded images may then be recombined postdecoding, for example by the player 408. In particular, the player 408 may be aware that horizontal split-frame encoding has been performed, but the decoder 402 may not be aware of such.
In such examples, the decoder 402 may obtain (for example, receive) an indication of the size of an encoded image that is to be decoded. The size may be indicated in terms of width and height. However, the decoded image may be combined with another decoded image and the size of the combined image that is ultimately displayed may be different from the size of the encoded image. In such examples, the size of the encoded image that the decoder 402 decodes is different from the size of the (combined) image that is ultimately displayed. The size of the image that is ultimately displayed may be indicated to the player 408. Again, the size may be indicated in terms of width and height.
Referring to Figure 5, an image processing system 500 comprising the splitframe encoder 304, the decoder 402 and the player 408 is shown.
A bitstream 502 is communicated between the split-frame encoder 304 and the decoder 402. The image processing system 500 may comprise one or more components (not shown) between the split-frame encoder 304 and the decoder 402, such as a transmission module and a reception module.
In this example, the bitstream 502 comprises data output by the horizontal splitframe encoder 304 and/or data derived based on such output data. The bitstream 502 may be referred to as an “encoded bitstream” accordingly.
In this example, the bitstream 502 comprises the encoded upper and lower halves 310, 312.
In this example, the decoder 402 obtains and decodes the bitstream 502 and/or data derived based on the bitstream 502, and outputs to the player 408.
Examples will now be described which leverage horizontal split-frame encoding for new use cases.
Referring to Figure 6, an image 602 is obtained.
In this example, the obtained image 602 has a left region 604 and a right region 606. The left region 604 may be a leftmost region of the obtained image 602 and/or the right region 606 may be a rightmost region of the obtained image 602.
Although not shown in this example, the obtained image 602 may comprise one or more further regions. In some examples, the one or more further regions comprise one or more intermediate regions, with the one more intermediate regions being between the left and right regions 604, 606 of the obtained image 602.
In this specific example, the left region 604 is a left half of the obtained image 602 and the right region 604 is a right half of the obtained image 602. In this example, the left region 604 may be referred to as the “left half’ 604 and the right region 606 may be referred to as the “right half’ 606 accordingly.
In some other examples, one or both of the left and right regions 604, 606 is not a half of the obtained image 602. Additionally, in some other examples, the left and right regions 604, 606 are not the same size (for example, height and/or width) as each other.
In this specific example, the obtained image 602 comprises a stereoscopic image. In particular, in this specific example, the left region 604 corresponds to a left view (for example, a left-eye view) of a scene and the right region 606 corresponds to a right view (for example, a right-eye view) of the scene.
However, the obtained image 602 is not limited to being a stereoscopic image. In another example, one of the left and right regions 604, 606 comprises a view of a scene and the other of the left and right regions 604, 606 comprises a corresponding depth map. In another example, one of the left and right regions 604, 606 comprises an image in one or more frequencies (for example, corresponding to visible light) and the other of the left and right regions 604, 606 comprises a corresponding image in one or more other frequencies (for example, infrared). In another example, one of the left and right regions 604, 606 comprises a video feed of one participant of a videoconference and the other of the left and right regions 604, 606 comprises a video feed of another participant of the videoconference. In another example, one of the left and right regions 604, 606 comprises a view of a sporting event from a first angle, and the other of the left and right regions 604, 606 comprises a view of the sporting event from a second, different angle.
The obtained image 602 could, in principle, be provided to the horizontal splitframe encoder 304 to split the obtained image 602 into upper and lower halves and to encode the upper and lower halves accordingly. However, such splitting would split the obtained image 602 into one split image comprising the upper halves of the left and right halves of the obtained image 602 and another split image comprising the lower halves of the left and right halves of the obtained image 602.
While splitting the obtained image 602 in this manner would still allow encoding parallelisation and provide the associated latency benefits, there would be trade-offs.
For example, as explained above, the horizontal split between the upper and lower halves of the each of the left and right halves of the obtained image 602 may be noticeable, especially at low bitrates.
Additionally, in applications in which one of the left and right halves of the obtained image 602 could be displayed before the other of the left and right halves of the obtained image 602, splitting the obtained image 602 horizontally means that the
player 408 would still need both the (decoded) upper and lower splits before either the left or right half could be displayed. An example of such an application is stereoscopic XR content being displayed on a multi-screen XR headset.
In principle, vertical slice encoding could be used. In vertical slice encoding, the obtained image 602 would be divided into two vertical slices (which may also be referred to as “columns”) in which each slice has, for example, half the width of the obtained image 602 and has the same height as the obtained image 602. A stereoscopic image could thereby be sliced into one half comprising the left half of the stereoscopic image and into another half comprising the right half of the stereoscopic image. Encoding and/or decoding may be parallelised, thereby improving encoding and/or decoding speed. However, vertical slice encoding relies on availability of an encoder that is operable to perform vertical slice encoding. In contrast, examples described herein facilitate integration with an existing horizontal split-frame encoder.
In accordance with examples, the obtained image 602 is subject-to image processing prior to being provided to a horizontal split-frame encoder 304. The result of such image processing is a processed image 608. As such, by processing the obtained image 602, the processed image 608 is obtained.
Such processing may be considered to be “pre-processing” in that the obtained image 602 is subject to initial processing before being subject to additional processing by the horizontal split-frame encoder 304.
Such pre-processing enables the horizontal split-frame encoder 304 to be leveraged to encode an obtained image 602 that would not be encoded as effectively without the pre-processing as it would with the pre-processing.
In this example, the processed image 608 comprises an upper region 610 and a lower region 612.
The upper region 610 of the processed image 608 corresponds to one of the left and right regions 604, 606 of the obtained image 602. The lower region 612 of the processed image 608 corresponds to the other of the left and right regions 604, 606 of the obtained image 602.
Such correspondence may take various different forms, as will be described in more detail below. For example, such correspondence may comprise a rotational
correspondence, a stacking correspondence, a transformational correspondence or otherwise.
Similar to the left and right regions 604, 606 of the obtained image 602, the upper and lower regions 610, 612 of the processed image 608 may correspond to upper and lower halves of the processed image 608, may comprise one or more intermediate regions, may be of unequal sizes to each other, and so on. In this specific example, however, the upper and lower regions 610, 612 of the processed image 608 are upper and lower halves of the processed image 608.
In this example, the processed image 608 is encoded more effectively by the horizontal split-frame encoder 304 than the obtained image 602 would be encoded by the horizontal split-frame encoder 304 (i.e. without the above-described image processing being performed).
In particular, as explained above, the obtained image 602 could have been provided directly to the horizontal split-frame encoder 304 for encoding. However, this would have resulted in a potentially noticeable visual artefact (related to the split) in the resulting image(s). Additionally, although the left and right regions 604, 606 of the obtained image 602 could have been encoded and decoded in parallel, a complete reconstruction or neither the left nor right region 604, 606 of the obtained image 602 would have been available for display before the other.
In contrast, in this example, one of the left and right halves 604, 606 of the obtained image 602 becomes one of the upper and lower halves 610, 612 of the processed image 608 and the other of the left and right halves 604, 606 of the obtained image 602 becomes the other of the upper and lower halves 610, 612 of the processed image 608.
In this example, the horizontal split-frame encoder 304 splits the processed image 608 into upper and lower halves 306, 308.
In some examples, an indication is provided to the horizontal split-frame encoder 304 that the obtained image 602 has been subject to image processing. The horizontal split-frame encoder 304 may use such an indication when encoding the processed image 608 and/or may convey such an indication downstream.
In some examples, an instruction 614 is provided to the horizontal split-frame encoder 304 to perform horizontal split-frame encoding in relation to the processed
image 608. For example, the horizontal split-frame encoder 304 may be operable to perform horizontal split-frame encoding and may be operable to perform one or more other types of encoding.
In some examples, the horizontal split-frame encoder 304 defaults to not performing horizontal split-frame encoding and only performs horizontal split-frame encoding when instructed to do so.
The instruction 614 may be in the form of a flag. For example, a flag value of “0” may indicate that horizontal split-frame encoding is not to be performed and a flag value of “1” may indicate that horizontal split-frame encoding is to be performed.
The instruction 614 may be provided to the horizontal split-frame encoder 304 with the processed image 608 or otherwise. For example, the instruction 614 may be provided to the horizontal split-frame encoder 304 to switch the horizontal split-frame encoder 304 from a horizontal split-frame encoding mode to another encoding mode, and a further instruction 614 may subsequently be provided to the horizontal split-frame encoder 304 to switch the horizontal split-frame encoder 304 back into the horizontal split-frame encoding mode. This may be particularly effective where, for example, a video with a significant number of frames is to be encoded in a given manner.
Referring to Figure 7, an image processing system 700 is shown.
In this example, the horizontal split-frame encoder 304 is operable to output a bitstream 502 comprising both the encoded upper half 310 and the encoded lower half 312. Such a bitstream 502 may comprise a header, and a payload comprising the encoded upper and lower halves 310, 312.
In another example, the horizontal split-frame encoder 304 is operable to output a first bitstream comprising the encoded upper half 310 and a second bitstream comprising the encoded lower half 312. The first bitstream may comprise a header, and a payload comprising the encoded upper half 310. The second bitstream may comprise a header, and a payload comprising the encoded lower half 312. For example, where the encoded upper half 310 is available before the encoded lower half 312 is available, the encoded upper half 310 can be output in the first bitstream and obtained by the decoder 402 such that the decoder 402 can start decoding the first bitstream. The second bitstream can be output, and then obtained and decoded by the decoder 402 when
available. This leverages parallelisation of the horizontal split-frame encoder 302 when the horizontal split-frame encoder 302 performs encoding in parallel.
In addition, in this example, the bitstream 502 comprises one or more image processing indicators 712. The one or more image processing indicators 712 are indicative of the image processing performed prior to encoding.
Where the horizontal split-frame encoder 304 outputs one bitstream 502 comprising both the encoded upper half 310 and the encoded lower half 312, that bitstream 502 may comprise zero, one, or more than one image processing indicator 712.
Where the horizontal split-frame encoder 304 outputs a first bitstream comprising the encoded upper half 310 and a second bitstream comprising the encoded lower half 312, each bitstream may comprise zero, one, or more than one image processing indicator 712. For example, one of the first and second bitstreams may comprise one or more processing indicators 712 which relate to both the first and second bitstreams, and the other of the first and second bitstreams may not comprise any image processing indicators 712 accordingly.
The image processing indicator 712 may be signalled in a transport level container. Examples of such containers include, but are not limited to, MPEG-4 Part 14 (MP4) and MPEG transport stream (MPEG-TS, MTS, TS).
The bitstream 502 may comprise more than one image processing indicator 712. Where the bitstream 502 comprises more than one image processing indicator 712, at least one of the image processing indicators 712 may be provided to the decoder 402 and/or at least one of the image processing indicators 712 may be provided to the player 408.
For example, the image processing indicator 712 may indicate whether or not split-frame encoding has been performed.
The image processing indicator 712 may indicate whether or not image processing has been performed.
The image processing indicator 712 may comprise an image processing type indicator, indicative of which type of image processing has been performed.
However, in other examples, an image processing indicator 712 is not provided in the bitstream 502.
In some examples, the bitstream 502 is not impacted by the image processing described herein (for example rotation and/or stacking). For example, the bitstream 502 may indicate whether or not split-frame encoding has been performed irrespective of whether or not image processing has been performed. The decoder 402 may decode the bitstream 502 regardless of any image processing (for example rotation and/or stacking) that has been performed. The decoder 402 may produce decoded rotated and/or stacked images, which can then be de-rotated and/or de-stacked, for example by the player 408. In such examples, the decoder 402 is not impacted by the above-described image processing (for example rotation and/or stacking).
In particular, and as shown in Figure 7, the image processing indicator 712 may be used for decoding, for image reconstruction processing and/or for display. For example, the image processing indicator 712 may indicate that split-frame encoding with rotation has been used. The decoder 402 can use the image processing indicator 712 to determine that split images are to be decoded and to provide decoded split images accordingly. The image processing indicator 712 (and/or another image processing indicator 712) can then be used by the player 408 to de-rotate the decoded split images for display. However, in some examples, the decoder 402 does not need to know whether a to-be-decoded image is a split image or otherwise; the decoder 402 simply decodes the image without knowing its origin and/or nature.
By way of analogy, a portrait video may be recorded and encoded in 16:9 horizontal mode. A 90° anticlockwise rotation can be signalled in a container with the encoded landscape video. A player can then rotate the decoded landscape video into portrait mode for display. Some players ignore the rotation, however, and thus play the decoded portrait video in landscape mode. In such an example, the decoder simply decodes the encoded video.
In this example, the decoder 402 receives the bitstream 502 and decodes the encoded upper and lower regions 310, 312 to obtain decoded upper and lower regions 404, 406 of the processed image 608. The decoded upper and lower regions 404, 406 of the processed image 608 are decoded versions of encoded versions of the upper and lower regions 310, 312 of the processed image 608 respectively.
The upper region 610 of the processed image 608 corresponds to one of the left and right regions 604, 606 of the obtained image 602. The lower region 612 of the
processed image 608 corresponds to the other of the left and right regions 604, 606 of the obtained image 602. The encoded upper and lower regions 310, 312 were generated by the horizontal split-frame encoder 302.
The decoded upper and lower regions 404, 406 are provided for image reconstruction processing and/or for display.
Such image reconstruction processing may include de-rotation and/or destacking. The prefix “de-” is used herein to indicate a reversal, undoing or the like. In particular, the term “de-rotation” is used herein to mean reversing an already-performed rotation, and the term “de-stacking” is used herein to mean reversing an already- performed stacking.
As such, in some examples, image reconstruction processing comprises derotating the decoded upper and lower regions 404, 406 of the processed image 608. In some examples, de-rotating is performed in response to receiving an indication that the obtained image 602 was subject to rotation.
Similarly, in some examples, image reconstruction processing comprises destacking the decoded upper and lower regions 404, 406 of the processed image 608. In some examples, the de-stacking is performed in response to receiving an indication that the obtained image 602 was subject to stacking.
In some examples, the decoder 402 does not need to perform image reconstruction processing prior to the decoded images being displayed.
For example, if left and right halves of a stereoscopic image are stacked and provided in separate bitstreams, left-eye and right-eye views may be decoded separately and provided to separate respective left and right screens of an XR headset, without needing to be de-rotated and/or de-stacked first.
An example will now be described in which an obtained image 602 is rotated prior to being encoded by a horizontal split-frame encoder 304.
Referring to Figure 8, an image 602 is obtained.
In this example, the obtained image 602 has a left half 604 and a right half 606.
In this example, the obtained image 602 is subject to image processing prior to encoding. In this example, such processing comprises rotating 802 the obtained image 604, and the processed image 608 is a rotated image.
The upper region 610 of the processed image 608 is a rotated version of one of the left and right regions 604, 606 of the obtained image 602. The lower region 612 of the processed image 608 is a rotated version of the other one of the left and right regions 604, 606 of the obtained image 602.
In this specific example, the image obtained 602 is rotated 90° anticlockwise, which corresponds to a clockwise rotation of 270°. More generally, in examples, the rotation described herein is at least 90°. This differs from other types of image processing that might only cause a nominal amount of rotation.
As such, in this example, the left half 604 of the obtained image 602 becomes the lower half of the rotated image 608 and the right half 606 of the obtained image 602 has become the upper half of the rotated image 604. The lower half 612 of the rotated image 608 is, therefore, a 90°-anticlockwise-rotated version of the left half 604 of the obtained image 602, and the upper half 610 of the rotated image 608 is, similarly, a 90°- anticlockwise-rotated version of the right half 606 of the obtained image 602.
The rotated image 608 is provided to the horizontal split-frame encoder 304 and is encoded in a corresponding manner to that described above with reference to Figures 3 and 6.
In some examples, an indication is provided to the horizontal split-frame encoder 304 that the obtained image 602 has been subject to rotation. The indication may indicate one or more rotation attributes. Examples of rotation attributes include, but are not limited to, an amount of rotation and a direction of rotation.
In this example, and assuming that the obtained image 602 has a width w and height h, the left and right halves 604, 606 of the obtained image 602 each has width w/2 and height h. As a result of the rotation, the upper and lower halves 610, 612 of the rotated image 608 each has width h and height w/2. As such, the rotated image 608 has width h and height w.
Referring to Figure 9, an image processing system 900 is shown.
A bitstream 502 comprises data output by the horizontal split-frame encoder 304 and/or data derived based on such output data.
In this example, the bitstream 502 comprises the encoded upper and lower halves 310, 312.
In addition, in this example, the bitstream 502 comprises one or more image processing indicators 712. In this example, the one or more image processing indicators 712 indicate that the obtained image 302 was subject to rotation.
The image processing indicator(s) 712 may indicate a direction of rotation. The image processing indicator(s) 712 may indicate an amount of rotation. In this specific example, the image processing indicator(s) 712 indicate the 90° anticlockwise rotation.
An example will now be described in which an obtained image 602 is stacked prior to being encoded by a horizontal split-frame encoder 304.
Referring to Figure 10, an image 602 is obtained.
In this example, the obtained image 602 has a left half 604 and a right half 608.
In this example, the obtained image 602 is subject to image processing prior to encoding. In this example, such processing comprises stacking 1002 the obtained image 602, and the processed image 608 is a stacked image.
In this example, the upper region 610 of the processed image 608 is one of the left and right regions 604, 606 of the obtained image 602. In this example, the lower region 612 of the processed image 608 is the other one of the left and right regions 604, 606 of the obtained image 602.
In this example, the image 602 is subject to stacking, resulting in a stacked image 608. The term “stacking” is generally used herein to mean the process of first and second data that is side-by-side becoming one on top of the other instead of being side-by-side. Stacking may comprise, or may be referred to as, “transporting”, “moving” or the like. In this specific example, the upper half 610 of the stacked image 608 comprises the left half 604 of the obtained image 602, and the lower half 612 of the stacked image 608 comprises the right half 606 of the obtained image 602. However, in other examples, the upper half 610 of the stacked image 608 may comprise the right half 606 of the obtained image 602, and the lower half 612 of the stacked image 608 may comprise the left half 604 of the obtained image 602.
Stacking may be performed in various different ways.
For example, the obtained image 602 may initially be split vertically such that there is a first image corresponding to the left half 604 of the obtained image 602 and a second image corresponding to the right half 606 of the obtained image 602. The first
and second images may then be recombined, with one on top of the other, to provide the stacked image 608.
In another example, the obtained image 602 may be read in a predetermined manner (for example, from top left to bottom right), with the read values being written into the appropriate locations of the stacked image 608. For example, the top row of pixels of the obtained image 602 may be read from the leftmost pixel to a pixel immediately to the left of the centre of the obtained image 602, and such pixels may be written to the top row of pixels of the upper half 610 of the stacked image 608. The top row of pixels of the obtained image 602 may then continue to be read from a pixel immediately to the right of the centre of the obtained image 602 to the rightmost pixel of the obtained image 602, and such pixels may be written to the top row of pixels of the lower half 612 of the stacked image 608.
In this example, and unlike the rotation-type example described above with reference to Figures 8 and 9, the left and right halves 604, 606 of the obtained image 602 are not subject to rotation.
The stacked image 608 is provided to the horizontal split-frame encoder 304 and is encoded in a corresponding manner to that described above with reference to Figures 3, 6 and 8.
In some examples, an indication is provided to the horizontal split-frame encoder 304 that the obtained image 602 has been subject to stacking. The indication may indicate one or more stacking attributes. Examples of stacking attributes include, but are not limited to, a correspondence between regions of the obtained image 602 and corresponding regions of the stacked image 608.
In this example, and assuming that the obtained image 602 has a width w and height h, the left and right halves 604, 606 of the obtained image 602 each has width w/2 and height h. As a result of the stacking, the upper and lower halves 610, 612 of the rotated image 608 each has width w/2 and height h. As such, the stacked image 608 has width w/2 and height 2 h.
Referring to Figure 11, an image processing system 1100 is shown.
In this example, a bitstream 502 comprises data output by the horizontal splitframe encoder 304 and/or data derived based on such output data.
In this example, the bitstream 502 comprises the encoded upper and lower halves 310, 312. In addition, in this example, the bitstream 502 comprises one or more image processing indicators 712. In this example, the one or more image processing indicators 712 indicate that the obtained image 602 was subject to stacking.
The image processing indicator 712 may indicate a correspondence between the left and right halves 604, 606 of the obtained image 602 and the upper and lower halves 610, 612 of the stacked image 608. For example, the image processing indicator 712 may be a one-bit indicator (which may also be referred to as a “flag”) where a value of “0” indicates that the left and right halves 604, 606 of the obtained image 602 correspond to the upper and lower halves 610, 612 of the stacked image 608 respectively and where a value of “1” indicates that the left and right halves 604, 606 of the obtained image 602 correspond to the lower and upper halves 612, 610 of the stacked image 608 respectively. However, in other examples, such correspondence may not be signalled.
Referring to Figure 12, there is shown a schematic block diagram of an example of an apparatus 1200.
In an example, the apparatus 1200 comprises an encoder. In another example, the apparatus 1200 comprises a decoder. In other examples, the apparatus 1200 comprises neither an encoder nor a decoder but is configured to communicate with an encoder and/or a decoder.
Examples of apparatus 1200 include, but are not limited to, a mobile computer, a personal computer system, a wireless device, base station, phone device, desktop computer, laptop, notebook, netbook computer, mainframe computer system, handheld computer, workstation, network computer, application server, storage device, a consumer electronics device such as a camera, camcorder, mobile device, video game console, handheld video game device, an XR headset, or in general any type of computing or electronic device.
In this example, the apparatus 1200 comprises one or more processors 1201 configured to process information and/or instructions. The one or more processors 1201 may comprise a central processing unit (CPU). The one or more processors 1201 are coupled with a bus 1202. Operations performed by the one or more processors 1201
may be carried out by hardware and/or software. The one or more processors 1201 may comprise multiple co-located processors or multiple disparately located processors.
In this example, the apparatus 1200 comprises computer-useable volatile memory 1203 configured to store information and/or instructions for the one or more processors 1201. The computer-useable volatile memory 1203 is coupled with the bus 1202. The computer-useable volatile memory 1203 may comprise random access memory (RAM).
In this example, the apparatus 1200 comprises computer-useable non-volatile memory 1204 configured to store information and/or instructions for the one or more processors 1201. The computer-useable non-volatile memory 1204 is coupled with the bus 1202. The computer-useable non-volatile memory 1204 may comprise read-only memory (ROM).
In this example, the apparatus 1200 comprises one or more data-storage units 1205 configured to store information and/or instructions. The one or more data-storage units 1205 are coupled with the bus 1202. The one or more data-storage units 1205 may for example comprise a magnetic or optical disk and disk drive or a solid-state drive (SSD).
In this example, the apparatus 1200 comprises one or more input/output (I/O) devices 1206 configured to communicate information to and/or from the one or more processors 1201. The one or more I/O devices 1206 are coupled with the bus 1202. The one or more I/O devices 1206 may comprise at least one network interface. The at least one network interface may enable the apparatus 1200 to communicate via one or more data communications networks. Examples of data communications networks include, but are not limited to, the Internet and a Local Area Network (LAN). The one or more I/O devices 1206 may enable a user to provide input to the apparatus 1200 via one or more input devices (not shown). The one or more input devices may include for example a remote control, one or more physical buttons etc. The one or more I/O devices 1206 may enable information to be provided to a user via one or more output devices (not shown). The one or more output devices may for example include a display screen.
Various other entities are depicted for the apparatus 1200. For example, when present, an operating system 1207, image processing module 1208, one or more further
modules 1209, and data 1210 are shown as residing in one, or a combination, of the computer-usable volatile memory 1203, computer-usable non-volatile memory 1204 and the one or more data-storage units 1205. The data signal processing module 1208 may be implemented by way of computer program code stored in memory locations within the computer-usable non-volatile memory 1204, computer-readable storage media within the one or more data-storage units 1205 and/or other tangible computer- readable storage media. Examples of tangible computer-readable storage media include, but are not limited to, an optical medium (e.g., CD-ROM, DVD-ROM or Blu- ray), flash memory card, floppy or hard disk or any other medium capable of storing computer-readable instructions such as firmware or microcode in at least one ROM or RAM or Programmable ROM (PROM) chips or as an Application Specific Integrated Circuit (ASIC).
The apparatus 1200 may therefore comprise a data signal processing module 1208 which can be executed by the one or more processors 1201. The data signal processing module 1208 can be configured to include instructions to implement at least some of the operations described herein. During operation, the one or more processors 1201 launch, run, execute, interpret or otherwise perform the instructions in the signal processing module 1208.
Although at least some aspects of the examples described herein with reference to the drawings comprise computer processes performed in processing systems or processors, examples described herein also extend to computer programs, for example computer programs on or in a carrier, adapted for putting the examples into practice. The carrier may be any entity or device capable of carrying the program.
It will be appreciated that the apparatus 1200 may comprise more, fewer and/or different components from those depicted in Figure 12.
The apparatus 1200 may be located in a single location or may be distributed in multiple locations. Such locations may be local or remote.
The techniques described herein may be implemented in software or hardware, or may be implemented using a combination of software and hardware. They may include configuring an apparatus to carry out and/or support any or all of techniques described herein.
In examples described above, an obtained image is split in half horizontally. In other examples, an obtained image is split into more than two horizontal stripes.
In examples described above, an image having left and right regions is processed to obtain a processed image having upper and lower regions, where the upper region corresponds to one of the left and right regions, and where the lower region corresponds to the other of the left and right regions. The processed image is provided to a horizontal split-frame encoder.
In other, more general, examples, an obtained image having first and second regions is processed to obtain a processed image. The processed image has third and fourth regions, where the third region corresponds to one of the first and second regions, and where the fourth region corresponds to the other of the first and second regions. The processed image is provided to a split-frame encoder. The first and second regions may comprise left and right regions and the third and fourth regions may comprise upper and lower regions. The first and second regions may comprise upper and lower regions and the third and fourth regions may comprise left and right regions.
In examples described above, an obtained image is subject to image processing in the form of either rotation or stacking.
In other examples, both rotation and stacking may be used. For example, an obtained image may be split into left and right halves. The left and right halves may be rotated. The rotated left and right halves may then be stacked one on top of the other. This may provide more flexibility than being restricted to only one of rotation and stacking. For example, the left and right halves may be rotated by different amounts and/or in different directions. For example, the left half may be rotated by 90° anticlockwise and the right half may be rotated by 90° clockwise. By way of another example, the left and right halves may be rotated by the same amount and in the same direction as each other (for example, 90° anticlockwise), but the left half may be stacked on top of the right half after rotation (instead of the right half being stacked on top of the left half after such a rotation).
By way of a summary, examples have been described above which relate to split-screen XR encoding. The parallelisation and latency -reduction of a horizontal split-frame encoder is leveraged in respect of an image that would ordinarily not be
well-suited to encoding by the horizontal split-frame encoder. The image is processed (or pre-processed) to increase its suitability for encoding by the horizontal split-frame encoder. Instead, the image could have been encoded by a different type of encoder, such as a vertical slicing encoder, to provide parallelisation and latency-reduction. However, this may not facilitate or enable integration with an existing horizontal splitframe encoder. Processing the image as described above, for example by rotating and/or stacking, enables a horizontal split-frame encoder to be used, potentially without any modification to the horizontal split-frame encoder itself. In examples, such image processing does not negate the latency -reducing gains of parallelisation. High-quality, low-latency image processing can therefore be provided with high compression efficiency.
It is to be understood that any feature described in relation to any one embodiment may be used alone, or in combination with other features described, and may also be used in combination with one or more features of any other of the embodiments, or any combination of any other of the embodiments. Furthermore, equivalents and modifications not described above may also be employed without departing from the scope of the invention, which is defined in the accompanying claims.
Claims
1. An image processing method, the method comprising: obtaining an image having a left region and a right region; processing the obtained image to obtain a processed image, wherein the processed image has an upper region and a lower region, wherein the upper region of the processed image corresponds to one of the left and right regions of the obtained image, and wherein the lower region of the processed image corresponds to the other of the left and right regions of the obtained image; and providing the processed image to a horizontal split-frame encoder.
2. A method according to claim 1, wherein the obtained image comprises a stereoscopic image.
3. A method according to claim 1 or 2, comprising indicating to the horizontal split-frame encoder that the obtained image has been subject to said processing.
4. A method according to any of claims 1 to 3, wherein said processing comprises rotating the obtained image such that: the upper region of the processed image is a rotated version of the one of the left and right regions of the obtained image; and the lower region of the processed image is a rotated version of the other one of the left and right regions of the obtained image.
5. A method according to claim 4, wherein said rotating comprises rotating the obtained image by 90 degrees.
6. A method according to claim 4 or 5, comprising indicating to the horizontal split-frame encoder that the obtained image has been subject to rotation.
7. A method according to any of claims 1 to 3, wherein said processing comprises stacking the obtained image such that:
the upper region of the processed image is the one of the left and right regions of the obtained image; and the lower region of the processed image is the other one of the left and right regions of the obtained image.
8. A method according to claim 7, comprising indicating to the horizontal splitframe encoder that the obtained image has been subject to stacking.
9. A method according to any of claims 1 to 8, comprising providing an instruction to the horizontal split-frame encoder to perform horizontal split-frame encoding.
10. A method according to any of claims 1 to 9, wherein the left region of the obtained image corresponds to a left half of the obtained image and wherein the right region of the obtained image corresponds to a right half of the obtained image.
11. A method according to any of claims 1 to 10, wherein the upper region of the processed image corresponds to an upper half of the processed image and wherein the lower region of the processed image corresponds to a lower half of the processed image.
12. A method according to any of claims 1 to 11, wherein the horizontal split-frame encoder is operable to perform parallel encoding of the upper and lower regions of the processed image.
13. A method according to any of claims 1 to 12, wherein the horizontal split-frame encoder is operable to output a bitstream comprising both an encoded version of the upper region of the processed image and an encoded version of the lower region of the processed image.
14. A method according to any of claims 1 to 12, wherein the horizontal split-frame encoder is operable to output a first bitstream comprising an encoded version of the upper region of the processed image and a second bitstream comprising an encoded version of the lower region of the processed image.
15. A method compri sing : obtaining a decoded version of an encoded version of an upper region of a processed image, the upper region of the processed image corresponding to one of a left and right region of an obtained image, and the encoded version of the upper region of the processed image having been generated by a horizontal split-frame encoder; obtaining a decoded version of an encoded version of lower region of the processed image, the lower region of the processed image corresponding to the other of the left and right region of the obtained image, and the encoded version of the lower region of the processed image having been generated by the horizontal split-frame encoder; and providing the decoded versions of the upper and lower regions of the processed image for image reconstruction processing and/or for display.
16. A method according to claim 15, wherein said image reconstruction processing comprises de-rotating the decoded versions of the upper and lower regions of the processed image.
17. A method according to claim 16, wherein said de-rotating is performed in response to receiving an indication that the obtained image was subject to rotation.
18. A method according to any of claims 15 to 17, wherein said image reconstruction processing comprises de-stacking the decoded versions of the upper and lower regions of the processed image.
19. A method according to claim 18, wherein the de-stacking is performed in response to receiving an indication that the obtained image was subject to stacking.
20. A method according to any of claims 1 to 19, wherein the obtained image comprises extended reality, XR, content.
21 A method according to claim 20, wherein the XR content comprises virtual reality, VR, content and/or augmented reality, AR, content.
22. An apparatus configured to perform a method according to any of claims 1 to 21.
23. A computer program configured to perform a method according to any of claims 1 to 21.
24. A bitstream compri sing : an encoded version of an upper region of a processed image, the upper region of the processed image corresponding to one of a left and right region of an obtained image, and the encoded version of the upper region of the processed image having been generated by a horizontal split-frame encoder; and/or an encoded version of a lower region of the processed image, the lower region of the processed image corresponding to the other of the left and right region of the obtained image, and the encoded version of the lower region of the processed image having been generated by the horizontal split-frame encoder.
25. A bitstream according to claim 24, wherein the bitstream comprises an image processing indicator that is indicative of image processing performed on the obtained image to obtain the processed image.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| GB2317128.3 | 2023-11-08 | ||
| GB2317128.3A GB2629881B (en) | 2023-11-08 | 2023-11-08 | Split-frame coding |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2025099438A1 true WO2025099438A1 (en) | 2025-05-15 |
Family
ID=89164965
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/GB2024/052832 Pending WO2025099438A1 (en) | 2023-11-08 | 2024-11-08 | Split-frame coding |
Country Status (2)
| Country | Link |
|---|---|
| GB (1) | GB2629881B (en) |
| WO (1) | WO2025099438A1 (en) |
Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20180205934A1 (en) * | 2017-01-13 | 2018-07-19 | Gopro, Inc. | Methods and apparatus for providing a frame packing arrangement for panoramic content |
| WO2019111010A1 (en) | 2017-12-06 | 2019-06-13 | V-Nova International Ltd | Methods and apparatuses for encoding and decoding a bytestream |
| WO2019143551A1 (en) * | 2018-01-16 | 2019-07-25 | Vid Scale, Inc. | Adaptive frame packing for 360-degree video coding |
| US10645362B2 (en) * | 2016-04-11 | 2020-05-05 | Gopro, Inc. | Systems, methods and apparatus for compressing video content |
| WO2020188273A1 (en) | 2019-03-20 | 2020-09-24 | V-Nova International Limited | Low complexity enhancement video coding |
| US11109067B2 (en) * | 2019-06-26 | 2021-08-31 | Gopro, Inc. | Methods and apparatus for maximizing codec bandwidth in video applications |
Family Cites Families (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| EP2418865A3 (en) * | 2010-08-09 | 2014-08-06 | LG Electronics Inc. | 3D viewing device, image display apparatus, and method for operating the same |
| CN103262548B (en) * | 2010-10-28 | 2016-05-11 | Lg电子株式会社 | For receive acceptor device and the method for three-dimensional broadcast singal in mobile environment |
| US9143757B2 (en) * | 2011-04-27 | 2015-09-22 | Electronics And Telecommunications Research Institute | Method and apparatus for transmitting and receiving stereoscopic video |
-
2023
- 2023-11-08 GB GB2317128.3A patent/GB2629881B/en active Active
-
2024
- 2024-11-08 WO PCT/GB2024/052832 patent/WO2025099438A1/en active Pending
Patent Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US10645362B2 (en) * | 2016-04-11 | 2020-05-05 | Gopro, Inc. | Systems, methods and apparatus for compressing video content |
| US20180205934A1 (en) * | 2017-01-13 | 2018-07-19 | Gopro, Inc. | Methods and apparatus for providing a frame packing arrangement for panoramic content |
| WO2019111010A1 (en) | 2017-12-06 | 2019-06-13 | V-Nova International Ltd | Methods and apparatuses for encoding and decoding a bytestream |
| WO2019143551A1 (en) * | 2018-01-16 | 2019-07-25 | Vid Scale, Inc. | Adaptive frame packing for 360-degree video coding |
| WO2020188273A1 (en) | 2019-03-20 | 2020-09-24 | V-Nova International Limited | Low complexity enhancement video coding |
| US11109067B2 (en) * | 2019-06-26 | 2021-08-31 | Gopro, Inc. | Methods and apparatus for maximizing codec bandwidth in video applications |
Non-Patent Citations (2)
| Title |
|---|
| ANONYMOUS: "NVIDIA VIDEO CODEC SDK -ENCODER Programming Guide NVIDIA VIDEO CODEC SDK -ENCODER vNVENCODEAPI_PG-06155-001_v11|ii", 1 November 2022 (2022-11-01), pages 1 - 45, XP055983268, Retrieved from the Internet <URL:https://docs.nvidia.com/video-technologies/video-codec-sdk/pdf/NVENC_VideoEncoder_API_ProgGuide.pdf> [retrieved on 20221121] * |
| FERRARA SIMONE ET AL: "The Next Frontier For MPEG-5 LCEVC: From HDR and Immersive Video to the Metaverse", IEEE MULTIMEDIA, IEEE SERVICE CENTER, NEW YORK, NY, US, vol. 29, no. 4, 1 October 2022 (2022-10-01), pages 111 - 122, XP011932056, ISSN: 1070-986X, [retrieved on 20230106], DOI: 10.1109/MMUL.2022.3213879 * |
Also Published As
| Publication number | Publication date |
|---|---|
| GB2629881A (en) | 2024-11-13 |
| GB2629881B (en) | 2025-05-14 |
| GB202317128D0 (en) | 2023-12-20 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN107454468B (en) | Method, apparatus and stream for formatting immersive video | |
| EP3526966B1 (en) | Decoder-centric uv codec for free-viewpoint video streaming | |
| WO2019073117A1 (en) | An apparatus, a method and a computer program for volumetric video | |
| CN116235497A (en) | A method and apparatus for signaling depth of multiplanar image-based volumetric video | |
| US12356006B2 (en) | Method and apparatus for encoding volumetric video represented as a multiplane image | |
| EP3759925A1 (en) | An apparatus, a method and a computer program for volumetric video | |
| US20200413094A1 (en) | Method and apparatus for encoding/decoding image and recording medium for storing bitstream | |
| JP7692408B2 (en) | Method and apparatus for encoding, transmitting, and decoding volumetric video - Patents.com | |
| US12477133B2 (en) | Synchronising frame decoding in a multi-layer video stream | |
| CN109314791B (en) | Method, apparatus, and processor-readable medium for generating streams and synthesizing video for presentation devices | |
| US20250039418A1 (en) | Enhancement decoding implementation and method | |
| CN113228663B (en) | Method for processing configuration data and method for processing image representation | |
| EP3837669B1 (en) | Packing strategy signaling | |
| EP4038880A1 (en) | A method and apparatus for encoding, transmitting and decoding volumetric video | |
| US20150326873A1 (en) | Image frames multiplexing method and system | |
| CN113228665B (en) | Method, device, computer program and computer-readable medium for processing configuration data | |
| WO2025099438A1 (en) | Split-frame coding | |
| TWI908842B (en) | Packing of views for image or video coding | |
| WO2025099448A1 (en) | Striping | |
| CN120770161A (en) | Image processing using residual and difference frames | |
| WO2024134193A1 (en) | Immersive video data processing | |
| GB2617286A (en) | Enhancement decoding implementation and method | |
| JP2023550940A (en) | Decoding the video stream on the client device |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 24809016 Country of ref document: EP Kind code of ref document: A1 |