GB2635351A

GB2635351A - Striping

Info

Publication number: GB2635351A
Application number: GB2317135.8A
Authority: GB
Inventors: Meardi Guido
Original assignee: V Nova International Ltd
Current assignee: V Nova International Ltd
Priority date: 2023-11-08
Filing date: 2023-11-08
Publication date: 2025-05-14
Also published as: WO2025099448A1; GB202317135D0

Abstract

An encoded version, 610, of an image base layer, 606, is provided to a transmission module, 314, as one or more stripes. The encoded version, 610, of the image base layer, 606, has been generated by a first encoding module, 614. An encoded version, 612, of an image enhancement layer, 608, is provided to the transmission module, 314, as one or more further stripes. The encoded version, 612, of the image enhancement layer, 608, has been generated by a second encoding module, 616. The transmission module may be operable to start transmission of the encoded version of the base layer before the second encoding module starts encoding the image enhancement layer. The transmission module may use different levels of transmission protection (e.g. forward error correction, FEC) for the encoded versions of the base layer and enhancement layer. The image enhancement layer may comprise a Low Complexity Video Coding (LCEVC) enhancement layer.

Description

STRIPING

Technical Field

The present disclosure relates to striping. In particular, not exclusively, the present disclosure relates to low-latency striping.

Background

An image or video can be encoded as one or more stripes. For example, a video can be split in two horizontally, and each of the upper and lower halves of the video can be encoded as separate video stripes.

The encoded video stripes can be provided to a transmission module that can transmit the encoded stripes immediately and independently to a decoder module. The decoder module can then decode encoded stripes immediately and independently.

This enables parallel encoding, transmission, and decoding of a video which, in turn, can reduce video processing latency.

However, video quality may suffer as a consequence of measures used to reduce latency. Additionally, implementing latency-reducing measures can involve significant integration into existing systems, particularly if video quality is to be maintained and/or if video is to be compressed and transmitted efficiently.

Summary

Various aspects of the present disclosure are set out in the appended claims. Further features and advantages will become apparent from the following description of preferred embodiments, given by way of example only, which is made with reference to the accompanying drawings.

Brief Description of the Drawings

Figure 1 shows a schematic block diagram of an example of an image processing system; Figures 2A and 2B show a schematic block diagram of another example of an image processing system; Figure 3 shows a schematic block diagram of part of another example of an image processing system; Figure 4 shows a schematic block diagram of another part of the example image processing system shown in Figure 3; Figure 5 shows a schematic block diagram of several parts of the example image processing system shown in Figures 3 and 4; Figure 6 shows a schematic block diagram of part of another example of an image processing system; Figure 7 shows a schematic block diagram of another part of the example image processing system shown in Figure 6; Figure 8 shows a schematic block diagram of several parts of the example image processing system shown in Figures 6 and 7; Figure 9 shows a schematic diagram of an example of a timing diagram; Figure 10 shows a schematic diagram of another example of a timing diagram; Figure 11 shows a schematic diagram of another example of a timing diagram; Figure 12 shows a table of different levels of transmission protection; Figure 13 shows a schematic diagram of an example of an encoder arrangement output; Figure 14 shows a schematic diagram of an example of an encoder arrangement output; Figure 15 shows a schematic diagram of an example of an encoder arrangement output; Figure 16 shows a schematic diagram of an example of an encoder arrangement output; and Figure 17 shows a schematic block diagram of an example of an apparatus.

Detailed Description

Referring to Figure 1, there is shown an example of a signal processing system 100. The signal processing system 100 is used to process signals. Examples of types of signal include, but are not limited to, video signals, image signals, audio signals, volumetric signals such as those used in medical, scientific or holographic imaging, or other multidimensional signals.

The signal processing system 100 includes a first apparatus 102 and a second apparatus 104. The first apparatus 102 and second apparatus 104 may have a client-server relationship, with the first apparatus 102 performing the functions of a server device and the second apparatus 104 performing the functions of a client device. The signal processing system 100 may include at least one additional apparatus (not shown).

The first apparatus 102 and/or second apparatus 104 may comprise one or more components. The one or more components may be implemented in hardware and/or software. The one or more components may be co-located or may be located remotely from each other in the signal processing system 100. Examples of types of apparatus include, but are not limited to, computerised devices, handheld or laptop computers, tablets, mobile devices, games consoles, smart televisions, set-top boxes, extended reality (XR) headsets (including augmented reality (AR) and/or virtual reality (VR) headsets) etc. The first apparatus 102 is communicatively coupled to the second apparatus 104 via a data communications network 106. Examples of the data communications network 106 include, but are not limited to, the Internet, a Local Area Network (LAN) and a Wide Area Network (WAN). The first and/or second apparatus 102, 104 may have a wired and/or wireless connection to the data communications network 106.

In this example, the first apparatus 102 comprises an encoder 108. The encoder 108 is configured to encode data comprised in and/or derived based on the signal, which is referred to hereinafter as "signal data". For example, where the signal is a video signal, the encoder 108 is configured to encode video data Video data comprises a sequence of multiple images or frames. The encoder 108 may perform one or more further functions in addition to encoding signal data. The encoder 108 may be embodied in various different ways. For example, the encoder 108 may be embodied in hardware and/or software. The encoder 108 may encode metadata associated with the signal. The first apparatus 102 may use one or more than one encoder 108.

Although in this example the first apparatus 102 comprises the encoder 108, in other examples the first apparatus 102 is separate from the encoder 108. In such examples, the first apparatus 102 is communicatively coupled to the encoder 108. The first apparatus 102 may be embodied as one or more software functions and/or hardware modules.

In this example, the second apparatus 104 comprises a decoder 110. The decoder 110 is configured to decode signal data. The decoder 110 may perform one or more further functions in addition to decoding signal data. The decoder 110 may be embodied in various different ways. For example, the decoder 110 may be embodied in hardware and/or software. The decoder 110 may decoder metadata associated with the signal. The second apparatus 104 may use one or more than one decoder 110.

Although in this example the second apparatus 104 comprises the decoder 110, in other examples, the second apparatus 104 is separate from the decoder 110. In such examples, the second apparatus 104 is communicatively coupled to the decoder 110.

The second apparatus 104 may be embodied as one or more software functions and/or hardware modules.

The encoder 108 encodes signal data and transmits the encoded signal data to the decoder 110 via the data communications network 106. The decoder 110 decodes the received, encoded signal data and generates decoded signal data. The decoder 110 may output the decoded signal data, or data derived using the decoded signal data. For example, the decoder 110 may output such data for display on one or more display devices associated with the second apparatus 104.

In some examples described herein, the encoder 108 transmits to the decoder 110 a representation of a signal at a given level of quality and information the decoder 110 can use to reconstruct a representation of the signal at one or more higher levels of quality. Such information may be referred to as "reconstruction data". In some examples, "reconstruction" of a representation involves obtaining a representation that is not an exact replica of an original representation. The extent to which the representation is the same as the original representation may depend on various factors including, but not limited to, quantisation levels. A representation of a signal at a given level of quality may be considered to be a rendition, version or depiction of data comprised in the signal at the given level of quality. In some examples, the reconstruction data is included in the signal data that is encoded by the encoder 108 and transmitted to the decoder 110. For example, the reconstruction data may be in the form of metadata. In some examples, the reconstruction data is encoded and transmitted separately from the signal data.

The information the decoder 110 uses to reconstruct the representation of the signal at the one or more higher levels of quality may comprise residual data, as described in more detail below. Residual data is an example of reconstruction data. The information the decoder 110 uses to reconstruct the representation of the signal at the one or more higher levels of quality may also comprise configuration data relating to processing of the residual data. The configuration data may indicate how the residual data has been processed by the encoder 108 and/or how the residual data is to be processed by the decoder 110. The configuration data may be signalled to the decoder 110, for example in the form of metadata.

Referring to Figures 2A and 2B, there is shown schematically an example of a signal processing system 200. The signal processing system 200 includes a first apparatus 202 and a second apparatus 204. In this example, the first apparatus 202 comprises an encoder and the second apparatus 204 comprises a decoder. However, as explained above, in other examples, the encoder is not comprised in the first apparatus 202 and/or the decoder is not comprised in the second apparatus 204. In each of the first apparatus 202 and the second apparatus 204, items are shown on two logical levels. The two levels are separated by a dashed line. Items on the first, highest level relate to data at a first level of quality. Items on the second, lowest level relate to data at a second level of quality. The first level of quality is higher than the second level of quality. The first and second levels of quality relate to a tiered hierarchy having multiple levels of quality. In some examples, the tiered hierarchy comprises more than two levels of quality. In such examples, the first apparatus 202 and the second apparatus 204 may include more than two different levels. There may be one or more other levels above and/or below those depicted in Figures 2A and 2B. As described herein, in certain cases, the levels of quality may correspond to different spatial resolutions.

Referring first to Figure 2A, the first apparatus 202 obtains a first representation of an image at the first level of quality 206. A representation of a uiven image is a representation of data comprised in the image. The image may be a given frame of a video. The first representation of the image at the first level of quality 206 will be referred to as "input data" hereinafter as, in this example, it is data provided as an input to the encoder in the first apparatus 202. The first apparatus 202 may receive the input data 206. For example, the first apparatus 202 may receive the input data 206 from at least one other apparatus. The first apparatus 202 may be configured to receive successive portions of input data 206, e.g. successive frames of a video, and to perform the operations described herein to each successive frame. For example, a video may comprise frames Fi, F2, FT and the first apparatus 202 may process each of these in turn.

The first apparatus 202 derives data 212 based on the input data 206. In this example, the data 212 based on the input data 206 is a representation 212 of the image at the second, lower level of quality. In this example, the data 212 is derived by performing a downsampling operation on the input data 206 and will therefore be referred to as "downsampled data" hereinafter. In other examples, the data 212 is derived by performing an operation other than a downsampling operation on the input data 206, or the data 212 is the same as the input data 206 (i.e. the input data 206 is not processed, e.g. downsampled).

In this example, the downsampled data 212 is processed to generate processed data 213 at the second level of quality. In other examples, the downsampled data 212 is not processed at the second level of quality. As such, the first apparatus 202 may generate data at the second level of quality, where the data at the second level of quality comprises the downsampled data 212 or the processed data 213.

In some examples, generating the processed data 213 involves the downsampled data 212 being encoded. Such encoding may occur within the first apparatus 202, or the first apparatus 202 may output the processed data 213 to an external encoder. Encoding the downsampled data 212 produces an encoded image at the second level of quality. The first apparatus 202 may output the encoded image, for example for transmission to the second apparatus 204. A series of encoded images, e.g. forming an encoded video, as output for transmission to the second apparatus 204 may be referred to as a "base" stream. As explained above, instead of being produced in the first apparatus 202, the encoded image may be produced by an encoder that is separate from the first apparatus 202. The encoded image may be part of an H.264 encoded video, or otherwise. Generating the processed data 213 may, for example, comprise generating successive frames of video as output by a separate encoder such as an H.264 video encoder. An intermediate set of data for the generation of the processed data 213 may comprise the output of such an encoder, as opposed to any intermediate data generated by the separate encoder.

Generating the processed data 213 at the second level of quality may further involve decoding the encoded image at the second level of quality. The decoding operation may be performed to emulate a decoding operation at the second apparatus 204, as will become apparent below. Decoding the encoded image produces a decoded image at the second level of quality. In some examples, the first apparatus 202 decodes the encoded image at the second level of quality to produce the decoded image at the second level of quality. In other examples, the first apparatus 202 receives the decoded image at the second level of quality, for example from an encoder and/or decoder that is separate from the first apparatus 202. The encoded image may be decoded using an H.264 decoder. The decoding by a separate decoder may comprise inputting encoded video, such as an encoded data stream configured for transmission to a remote decoder, into a separate black-box decoder implemented together with the first apparatus 202 to generate successive decoded frames of video. Processed data 213 may thus comprise a frame of video data that is generated via a complex non-linear encoding and decoding process, where the encoding and decoding process may involve modelling spatiotemporal correlations as per a particular encoding standard such as H.264. However, because the output of any encoder is fed into a corresponding decoder, this complexity is effectively hidden from the first apparatus 202.

In an example, generating the processed data 213 at the second level of quality further involves obtaining correction data based on a comparison between the downsampled data 212 and the decoded image obtained by the first apparatus 202, for example based on the difference between the downsampled data 212 and the decoded image. The correction data can be used to correct for errors introduced in encoding and decoding the downsampled data 212. In some examples, the first apparatus 202 outputs the correction data, for example for transmission to the second apparatus 204, as well as the encoded signal. This allows the recipient to correct for the errors introduced in encoding and decoding the downsampled data 212. This correction data may also be referred to as a "first enhancement" stream. As the correction data may be based on the difference between the downsampled data 212 and the decoded image it may be seen as a form of residual data (e.g. that is different from the other set of residual data described later below).

In some examples, generating the processed data 213 at the second level of quality further involves correcting the decoded image using the correction data. For example, the correction data as output for transmission may be placed into a form suitable for combination with the decoded image, and then added to the decoded image. This may be performed on a frame-by-frame basis. In other examples, rather than correcting the decoded image using the correction data, the first apparatus 202 uses the downsampled data 212. For example, in certain cases, just the encoded then decoded data may be used and in other cases, encoding and decoding may be replaced by other processing.

In some examples, generating the processed data 213 involves performing one or more operations other than the encoding, decoding, obtaining, and correcting acts described above.

The first apparatus 202 obtains data 214 based on the data at the second level of quality. As indicated above, the data at the second level of quality may comprise the processed data 213, or the downsampled data 212 where the downsampled data 212 is not processed at the lower level. As described above, in certain cases, the processed data 213 may comprise a reconstructed video stream (e.g. from an encoding-decoding operation) that is corrected using correction data. In the example of Figures 2A and 2B, the data 214 is a second representation of the image at the first level of quality, the first representation of the image at the first level of quality being the input data 206. The second representation at the first level of quality may be considered to be a preliminary or predicted representation of the image at the first level of quality. In this example, the first apparatus 202 derives the data 214 by performing an upsampling operation on the data at the second level of quality. The data 214 will be referred to hereinafter as "upsampled data". However, in other examples one or more other operations could be used to derive the data 214, for example where data 212 is not derived by downsampling the input data 206.

The input data 206 and the upsampled data 214 are used to obtain residual data 216. The residual data 216 is associated with the image. The residual data 216 may be in the form of a set of residual elements, which may be referred to as a "residual frame" or a "residual image". A residual element in the set of residual elements 216 may be associated with a respective image element in the input data 206. An example of an image element is a pixel.

In this example, a given residual element is obtained by subtracting a value of an image element in the upsampled data 214 from a value of a corresponding image element in the input data 206. As such, the residual data 216 is useable in combination with the upsampled data 214 to reconstruct the input data 206. The residual data 216 may also be referred to as "reconstruction data" or "enhancement data". In one case, the residual data 216 may form part of a "second enhancement" stream.

The first apparatus 202 obtains configuration data relating to processing of the residual data 216. The configuration data indicates how the residual data 216 has been processed and/or generated by the first apparatus 202 and/or how the residual data 216 is to be processed by the second apparatus 204. The configuration data may comprise a set of configuration parameters. The configuration data may be useable to control how the second apparatus 204 processes data and/or reconstructs the input data 206 using the residual data 216. The configuration data may relate to one or more characteristics of the residual data 216. The configuration data may relate to one or more characteristics of the input data 206. Different configuration data may result in different processing being performed on and/or using the residual data 216. The configuration data is therefore useable to reconstruct the input data 206 using the residual data 216. As described below, in certain cases, configuration data may also relate to the correction data described herein.

In this example, the first apparatus 202 transmits to the second apparatus 204 data based on the downsampled data 212, data based on the residual data 216, and the configuration data (or data based on the configuration data), to enable the second apparatus 204 to reconstruct the input data 206.

Turning now to Figure 2B, the second apparatus 204 receives data 220 based on (e.g. derived from) the downsampled data 212. The second apparatus 204 also receives data based on the residual data 216. For example, the second apparatus 204 may receive a "base" stream (data 220), a "first enhancement stream" (any correction data) and a "second enhancement stream" (residual data 216). The second apparatus 204 also receives the configuration data relating to processing of the residual data 216. The data 220 based on the downsampled data 212 may be the downsampled data 212 itself, the processed data 213, or data derived from the downsampled data 212 or the processed data 213. The data based on the residual data 216 may be the residual data 216 itself, or data derived from the residual data 216.

In some examples, the received data 220 comprises the processed data 213, which may comprise the encoded image at the second level of quality and/or the correction data. In some examples, for example where the first apparatus 202 has processed the downsampled data 212 to generate the processed data 213, the second apparatus 204 processes the received data 220 to generate processed data 222. Such processing by the second apparatus 204 may comprise decoding an encoded image (e.g. that forms part of a "base" encoded video stream) to produce a decoded image at the second level of quality. In some examples, the processing by the second apparatus 204 comprises correcting the decoded image using obtained correction data. Hence, the processed data 222 may comprise a frame of corrected data at the second level of quality. In some examples, the encoded image at the second level of quality is decoded by a decoder that is separate from the second apparatus 204. The encoded image at the second level of quality may be decoded using an H.264 decoder.

In other examples, the received data 220 comprises the downsampled data 212 and does not comprise the processed data 213. In some such examples, the second apparatus 204 does not process the received data 220 to generate processed data 222.

The second apparatus 204 uses data at the second level of quality to derive the upsampled data 214. As indicated above, the data at the second level of quality may comprise the processed data 222, or the received data 220 where the second apparatus 204 does not process the received data 220 at the second level of quality. The upsampled data 214 is a preliminary representation of the image at the first level of quality. The upsampled data 214 may be derived by performing an upsampling operation on the data at the second level of quality.

The second apparatus 204 obtains the residual data 216. The residual data 216 is useable with the upsampled data 214 to reconstruct the input data 206. The residual data 216 is indicative of a comparison between the input data 206 and the upsampled data 214.

The second apparatus 204 also obtains the configuration data related to processing of the residual data 216. The configuration data is useable by the second apparatus 204 to reconstruct the input data 206. For example, the configuration data may indicate a characteristic or property relating to the residual data 216 that affects how the residual data 216 is to be used and/or processed, or whether the residual data 216 is to be used at all. In some examples, the configuration data comprises the residual data 216.

There are several considerations relating to such processing. One such consideration is the amount of information that is generated, stored, transmitted and/or processed. The more information that is used, the greater the amount of resources that may be involved in handling such information. Examples of such resources include transmission resources, storage resources and processing resources. Some signal processing techniques allow a relatively small amount of information to be used. This may reduce the amount of data transmitted via the data communications network 106.

The savings may be particularly relevant where the data relates to high quality video data, where the amount of information transmitted can be especially high.

Other considerations include the ability of the decoder to perform image reconstruction accurately, reliably, and/or efficiently. Performing image reconstruction accurately and reliably may affect the ultimate visual quality of the displayed image and consequently may affect a viewer's engagement with the image and/or with a video comprising the image. This can be especially relevant to XR. Efficient reconstruction is especially effective for mobile computing devices, which may readily be used in XR applications.

As explained above, a video (or, more generally, at least one image) can be encoded as one or more horizontal stripes, for example using a horizontal striping encoder. Transmission and decoding modules can transmit and decode the stripes immediately and independently, with low latency. In this connection, the term "module" is used here in relation to control, output, transmission, encoding and decoding modules to mean any hardware and/or software that controls, outputs, transmits, encodes or decodes. More generally, the term "module" is used herein to mean any hardware and/or software that performs one or more given functions.

An example of how this may be implemented will now be described with reference to Figures 3 to 6.

Referring to Figure 3, an image 302 is obtained. The obtained image 302 may be referred to herein as a "frame", an "obtained image", a "source image", an "input image" or the like.

The image 302 may be comprised in a video, or otherwise.

The image 302 may be obtained in various ways. For example, the obtained image 302 may be received, may be generated, may be retrieved from storage, or otherwise.

In some examples, the obtained image 302 comprises extended reality (XR) content. In some examples, the XR content comprises virtual reality (VR) content. In some examples, the XR content comprises augmented reality (AR) content. Latency is particularly, but not exclusively, relevant in the context of XR content. For example, latency is also relevant in relation to videoconferencing.

The obtained image 302 is input to an encoder 304.

In this example, the encoder 304 is a horizontal striping encoder. The encoder 304 may also be referred to as a "horizontal split-frame encoder". The horizontal striping encoder 304 may have an Application Programming Interface (API) via which the horizontal striping encoder 304 can be instructed to encode the obtained image 302.

In this example, the horizontal striping encoder 304 obtains and splits the obtained image 302 in half horizontally and encodes the resulting half-images. Splitting the obtained image 302 in this way may be referred to as splitting the obtained image 302 "in two", "two ways" or the like.

In this specific example, the horizontal striping encoder 304 obtains and splits the obtained image 302 in half horizontally. However, the horizontal striping encoder 304 may horizontally split the obtained image 302 in other proportions in other examples. As such, references to halves of images should be understood accordingly. More specifically, the horizontal striping encoder 304 may split the obtained image 302 into an upper region (which may also be referred to as a "portion") corresponding to X% of the height of the obtained image 302 and into a lower region correspoinding to (100 -X)% of the height of the obtained image 302. Where the horizontal striping encoder 304 splits the obtained image 302 in half horizontally, X = 50.

Another type of encoder is a vertical slicing encoder, which splits an image into two or more vertical slices.

Although, in this example, the encoder 304 splits the obtained image 302 and encodes the split images, splitting may be performed outside of the encoder 304 in other examples. For example, the obtained image 302 may be input to a splitter (not shown), the splitter may split the obtained image 302, and the encoder 302 may encode the output of the splitter.

The splitting provides an upper region 306 of the obtained image 302 and a lower region 308 of the obtained image 302. The upper and lower regions 306, 308 may be referred to as "partial images", "split images", "partial frames" or the like. The upper 306 region may be referred to as a "top region" and the lower region 308 may be referred to as a "bottom region-.

In some examples, the horizontal striping encoder 304 is also operable to perform other types of encoding, in addition to horizontal striping encoding. For example, the horizontal striping encoder 304 may be operable to perform vertical slicing encoding, encoding that does not split the obtained image 302, and so on. In this example, the horizontal striping encoder 304 splits the obtained image 302 in half horizontally such that the upper and lower regions 306, 308 are upper and lower halves of the obtained image 302. In other words, where the obtained image 302 has width w and height h, the upper and lower halves 306, 308 each have width w and each have height h/2. However, as explained above, the horizontal striping encoder 304 may split the obtained image 302 in different horizontal proportions in other examples.

The horizontal striping encoder 304 then encodes the upper and lower halves 306, 308. This provides an encoded upper half 310 of the obtained image 302 and an encoded lower half 312 of the obtained image 302 respectively.

The horizontal striping encoder 304 may use one or more of various different codecs to encode the upper and lower halves 310, 312.

In this example, the horizontal striping encoder 304 is operable to perform parallel encoding of the upper and lower halves 306, 308. In other words, in this example, the horizontal striping encoder 304 does not need to wait for encoding of one of the upper and lower halves 306, 308 to complete before encoding the other of the upper and lower halves 306, 308.

Encoding the upper and lower halves 306, 308 in parallel can increase encoding speed compared to encoding the image 302 as a whole. For example, the encoding speed may be doubled. This is especially, but not exclusively, effective in low-latency applications. An example of such an application is XR.

By decreasing encoding time, higher-resolution XR content may be useable while enabling acceptable latency targets to be met. For example, where parallel encoding results in significant additional latency headroom (i.e. in terms of acceptable performance), XR content resolution may be increased. Although this may use some of the additional latency headroom gained by parallel encoding, the XR content resolution increase may be managed such that latency performance does not drop below an acceptable threshold.

Latency can also arise when encoding a particularly complex video, when network congestion arises in a data communications network between an encoder and a decoder, and in the decoder when decoding encoded video.

Latency can, however, also be relevant in other applications. For example, if latency becomes longer than one second in group discussion hosted via a video conferencing system, interaction in the group discussion can be significantly impaired.

However, a latency of less than 30 milliseconds is generally imperceptible to humans.

In some examples, the horizontal striping encoder 304 is operable to encode the upper and lower halves 306, 308 independently. In other words, in such examples, the encoding of the upper half 306 is independent of the encoding of the lower half 308 and vice versa.

Although the striping encoding described above can decrease latency, it may result in a lower quality encoding compared to encoding the obtained image 302 as a whole and/or the split may be noticeable in the final output image (for example, at low bitrates). Part of the quality degradation is because the encoder 304 is unable to utilise intra encoding to the same extent as if the obtained image 302 had been split in half and encoded separately. This can decrease compression efficiency. If there is a fixed bitrate (e.g. the network bandwidth), the image quality may be lower.

In this example, the horizontal striping encoder 304 outputs the upper and lower halves 306, 308 to a transmission module 314.

In this example, the transmission module 314 has an API via which the encoded upper and lower halves 310, 312 are provided for transmission by the transmission module 314.

In some examples, the transmission module 314 comprises an ultra-low latency transmission module.

In effect, the transmission module 314 receives the encoded upper and lower halves 310, 312 as stripes and transmits the encoded upper and lower halves 310, 312 accordingly. In particular, the transmission module 314 may transmit the encoded upper and lower halves 310, 312 immediately and independently.

Referring to Figure 4, a decoder module 402 obtains the encoded upper and lower halves 310, 312. For example, the decoder module 402 may obtain the encoded upper and lower halves 310, 312 by receiving the encoded upper and lower halves 310, 312 from the transmission module 314.

The decoder module 402 decodes the encoded upper and lower halves 310, 312. This provides decoded upper and lower halves 404, 406 respectively. In examples, the decoder module 402 can decode the encoded upper and lower halves 310, 312 immediately and independently.

In this example, the decoder module 402 decodes the encoded upper and lower halves 310, 312 in parallel. Decoding the encoded upper and lower halves 310, 312 in parallel can increase decoding speed compared to decoding the obtained image 302 as a whole. For example, the decoding speed may be doubled. This is especially, but not exclusively, effective in low-latency applications such as XR.

In this example, the decoded upper and lower halves 404, 406 are provided to a player 408. The player 408 may perform image reconstruction processing and/or may cause images to be displayed.

Referring to Figure 5, an image processing system 500 comprising the horizontal striping encoder 304, the transmission module 312, the decoder module 402, and the player 408 is shown.

A bitstream 502 is communicated between the transmission module 314 and the decoder module 402.

In this example, the bitstream 502 comprises data output by the horizontal striping encoder 304 and/or data derived based on such output data. The bitstream 502 may be referred to as an "encoded bitstream" accordingly.

In this example, the bitstream 502 comprises the encoded upper and lower halves 310, 312.

In this example, the decoder module 402 obtains and decodes the bitstream 502 and/or data derived based on the bitstream 502.

Examples described herein concern encoders and decoders that employ a tiered hierarchical approach to representing signals, for example images and video signals, as corresponding encoded data. The tiered hierarchical approach employs a base layer and one or more enhancement layers, for example as described in. A Low Complexity Enhancement Video Coding (LCEVC) encoder employs a base encoder, for example implemented in hardware, for generating base layer encoded data that is capable of providing a low resolution rendition of a given video signal, and one or more enhancement encoders, for example implemented using software that is executable using computing hardware, to generate enhancement-layer encoded data that includes residual data that can be used to enhance the low resolution rendition to generate enhanced images of higher quality than the low resolution rendition. LCEVC is described in more detail in W02020/188273 (PCT/GB2020/050695) and W02019/111010 (PCT/GB2018/053552), the entire contents of which are incorporated herein by reference. To provide backward compatibility, the base encoder may be implemented using known encoding hardware conforming well-established standards, for example H.264, H.265, MPEG-2, MPEG-4, MPEG-5, VP-9, AV-1. In such examples, the quality of rendition that is achievable at a given decoder depends upon operation of the one or more enhancement encoders, whose operation can be dynamically controlled within software. The one or more enhancement layers include residual data that is generated by employing, during encoding, a combination of image downsampling and upsampling transformations followed by a subtraction operation. The downsampling and upsampling transformations may be mutually asymmetrical in nature to provide the residuals with particularly preferred entropy characteristics that provide for highly efficient data compression during encoding. Moreover, a quantization operation may be employed during encoding to control an amount of data needing to be encoded. Alternatively, in a case of the VC-6 standard, a tiered hierarchy of layers is employed with base layer and one or more enhancement layers, although no attempt is made for the base layer to be backward compatible with the known encoding standards. During encoding of residuals, run-length encoding (RLE) followed by Huffman encoding are employed. The residuals may be subject to additional transformations, for example wavelet transformations such as Hadamard transform, prior to RLE and Huffman encoding to achieve a greater degree of data compression. Reference is also made to GB2601720 and PCT/GB2023/051730, the entire contents of which are incorporated herein by reference.

For ultra-low latency streaming, latency should be minimised as much as possible. As such, and with general reference again to Figures 2A and 2B and examples that will be described in more detail below, transmission and decoding of a base stream (which may also be referred to herein as a "base layer") may be initiated while an encoding of an enhancement stream (which may also be referred to herein as an "enhancement layer") is being processed. This effectively parallelises the encoding, transmission and decoding process and reduces overall latency. In this way, an enhancement layer may be used in a manner akin to horizontal striping used by existing codecs to reduce latency at the cost of some compression inefficiency since stripes are independently encoded.

In addition, this can be implemented from the perspective of an API of a transmission module. This can minimise integration complexity and enable an enhancement layer, such as an LCEVC enhancement layer, to be used seamlessly in combination with striping in connection with an existing API of an existing transmission module. In particular, existing striping of the base layer may still be performed. The enhancement layer may also even be striped.

As will be described in more detail below, the enhancement layer (for example, an LCEVC enhancement layer) is treated by the API as one or more additional stripes. In particular, the enhancement layer (and the base layer) are treated one or more stripes output by a horizontal striping encoder. As such, for a transmission module that is already configured to transmit video stripes, nothing changes even though the stripes it is transmitting are not exclusively base layer video stripes. Ultra-low latency transmission modules are already equipped to handle stripes with immediate and independent transmission, and decoding modules are already equipped to decode stripes immediately and independently, to minimise latency. Such transmission and decoding modules can therefore be leveraged for transmission of the enhancement layer.

In more detail, and from a practical standpoint, a video can be encoded with one or more stripes. If more than one stripe is used, for example in terms of system configuration, transmission and decoding modules can immediately and independently transmit and decode the stripes. With enhancement, such as LCEVC, the same approach can be designed in an API, with the enhancement being treated as one or more additional stripes even though the enhancement is not a conventional video stripe. For example, if the base layer is not striped, the encoding results into two stripes, namely a base layer stripe and an enhancement stripe. If the base layer is striped, for example with three stripes, and the enhancement layer is not striped, then the encoding will feature four stripes, namely the three base layer stripes and the enhancement stripe. If the enhancement in that example were also striped, for example into two stripes, then there would be five stripes.

For the transmission and decoding modules, operation is the same as for the case without enhancement stripes; the enhancement stripe is just handled as one or more additional stripes, in addition to the video stripes.

In addition, and as will be explained in more detail below, the transmission module can also protect, for example with Forward Error Correction (FEC), the transmission in different ways for different stripes. For instance, the enhancement layer can be transmitted with limited protection, since transmission protection and/or retransmission of lost packets can be limited or even avoided for the enhancement layer. However, the base layer stripe can be transmitted with FEC and/or packet retransmission.

The API therefore facilitates backwards compatibility with existing slice-based transmission. Even if an enhancement layer, such as LCEVC, is not a slice of a DCTbased codec, the enhancement layer can be treated as such for practical low-latency delivery purposes.

Referring to Figure 6, an image 602 is obtained.

In this example, the obtained image 602 is provided as input to an encoder arrangement 604. In this example, the encoder arrangement 604 generates a base layer 606 and an enhancement layer 608 based on the obtained image 602. In this example, the encoder arrangement 604 encodes the base layer 606 and encodes the enhancement layer 608 to generate an encoded base layer 610 and an encoded enhancement layer 612 respectively.

The term "encoder arrangement" is used herein to mean any hardware and/or software that encodes. An encoder arrangement may comprise one or more encoders. An encoder may comprise one or more encoding modules.

For example, the encoder arrangement 604 may comprise first and second encoding modules.

In one example, the first encoding module comprises a base encoding module that generates the base layer 606. The base encoding module may be comprised in a base encoder. The second encoding module may comprise an enhancement encoding module that generates the enhancement layer 612. The enhancement encoding module may be comprised in an enhancement encoder.

In another example, the first and second encoding modules are modules of a hierarchical encoder.

In this example, the encoder arrangement 604 comprises a base encoder 614. In this example, the base encoder 614 comprises a base encoding module. In this example, the base encoder 614 generates the encoded base layer 610.

In this example, the encoder arrangement 604 comprises an enhancement encoder 616. In this example, the enhancement encoder 616 comprises an enhancement encoding module. In this example, the enhancement encoder 616 generates the encoded enhancement layer 612. In this example, the enhancement encoder 616 comprises an LCEVC encoder.

In another example, the encoder arrangement 604 comprises a hierarchical encoder. An example of a hierarchical encoder is a VC-6 encoder.

In this example, the encoder arrangement 604 comprises a control module 618. In this example, the control module 618 is configured to direct data between the base encoder 614 and the enhancement encoder 616. In particular, in this example, the control module 618 obtains a decoded version of the encoded base layer 610 and provides the decoded version of the encoded base layer 610 to the enhancement encoder 616.

In this example, the encoder arrangement 604 comprises an output module 620, which may also be referred to as a "striping module", a "transmission module controller", a "transmission module control module", or the like. In this example, the output module 620 is configured to communicate data to the transmission module 314.

In this example, the output module 620 is configured to obtain the encoded base layer 610 from the base encoder 614, to obtain the encoded enhancement layer 612 from the enhancement encoder 616, and to output the encoded base and enhancement layers 610, 612 to the transmission module 314.

However, in other examples, the encoder arrangement 604 does not comprise the output module 620. In some such other examples, the base and enhancement encoders 614, 616 may be able to output data to the transmission module 314 directly, as shown using dotted arrows in Figure 6. In some such other examples, the output module 620 is provided outside of the encoder arrangement 604.

The term "base layer" is used herein to mean a layer to which an enhancement layer can be added. The term "image base layer" is used herein to mean a base layer that encodes one or more images, including video (comprising multiple images). The term "video base layer" is used herein to mean a base layer that encodes one or more videos.

The term "enhancement layer" is used herein to mean a layer that enhances a base layer. The term "image enhancement layer" is used herein to mean an enhancement layer that enhances one or more image base layers. The term "video enhancement layer" is used herein to mean an enhancement layer that enhances one or more video base layers.

The enhancement layer may be encodable and/or decodable independently from encoding and/or decoding of the base layer. This differs from Scalable Video Coding (SVC) and helps with parallelisation.

Images may comprise stereoscopic images. Video may comprise multiple stereoscopic images. A stereoscopic image may comprise a left-eye view of a scene and a right-eye view of the scene.

In this example, the output module 620 of the encoder arrangement 604 outputs the encoded base layer 610 to the transmission module 314 as at least one stripe. In this example, the output module 620 of the encoder arrangement 604 outputs the encoded enhancement layer 612 to the transmission module 314 as at least one stripe.

As such, in this example, the transmission module 314 obtains data for transmission, from the output module 620, in the form of stripes.

The output module 620 may indicate the data as being one or more stripes. The output module 620 may indicate, for a given output stripe, whether the output stripe corresponds to a base layer or to an enhancement layer.

The output module 620 may convert the encoded base layer 610 and/or the encoded enhancement layer 612 prior to outputting to the transmission module 314.

Although the stripes provided by the encoder arrangement 604 do not correspond to halves of the obtained image 602, the transmission module 314 may nevertheless transmit the encoded base layer 610 and the encoded enhancement layer 612 as if they were halves of the obtained image 602. In other words, the transmission module 314 treats the encoded base layer 610 and the encoded enhancement layer 612 as if they were stripes that would be received from a horizontal striping encoder.

As such, a transmission module 314 that is configured to transmit stripes from a horizontal striping encoder (e.g. corresponding to halves of the obtained image 602) may not need to be modified to transmit the encoded base and enhancement layers 610, 612, even though the encoded base and enhancement layers 610, 612 are not halves of the obtained image 602.

In particular, the enhancement layer (for example, an LCEVC enhancement layer) can be treated by the transmission module 314 (and any API of the transmission module 314) effectively as one or more additional stripes.

As such, examples may be implemented from a transmission module 314 API perspective to facilitate enhancement integration, for example using LCEVC, in combination with striping.

Additionally, ultra-low latency transmission modules 314 that are already able to handle stripes output by horizontal striping encoders can be leveraged. Such ultra-low latency transmission modules 314 may be able to handle such stripes with immediate and independent transmission.

This also provides backward compatibility in that ultra-low latency transmission modules 314 can additionally be used with codecs that use base layers and enhancement layers.

This further provides latency reduction for codecs that use base layers and enhancement layers, since the latency-reducing properties of such ultra-low latency transmission modules 314, and parallelisable (e.g. independently decodable) hierarchical coding schemes, can be leveraged.

Referring to Figure 7, a decoder module 702 obtains the encoded base and enhancement layers 610, 612. For example, the decoder module 702 may obtain the encoded base and enhancement layers 610, 612 by receiving the encoded base and enhancement layers 610, 612 from the transmission module 314.

The decoder module 702 then decodes the encoded base and enhancement layers 610, 612. This provides decoded base and enhancement layers 704, 706 respectively.

In this example, the decoder module 702 decodes the encoded base and enhancement layers 610, 612 in parallel. Decoding the encoded base and enhancement layers 610, 612 in parallel can increase decoding speed compared to decoding the obtained image 302 as a whole.

The decoder module 702 generates a reconstructed image 708 using the decoded base and enhancement layers 704, 706 and provides the reconstructed image 708 to a player 408. Such reconstruction may be as described above with reference to Figure 2B, or otherwise.

In examples, an elementary stream received by the decoder module 702 comprises information useable by the decoder module 702 to determine that striping has been used. The elementary stream received by the decoder module 702 also comprises information useable by the decoder module 702 to determine placement of the stripes.

System-level signalling and/or configuring may be used to help recover if there is packet loss and/or delay. Such operations can help recover lost stripes.

Referring to Figure 8, an image processing system 800 is shown.

In this example, the transmission module 314 is operable to output a bitstream 502 comprising both the encoded base layer 610 and the encoded enhancement layer 612. For example, the encoded base and enhancement layers 610, 612 may both be available for transmission at the same time as each other. Such a bitstream 502 may comprise a header, and a payload comprising the encoded base and enhancement layers 610, 612.

In another example, the transmission module 314 is operable to output a first bitstream comprising the encoded base layer 610 and a second bitstream comprising the encoded enhancement layer 612. The first bitstream may comprise a header, and a payload comprising the encoded base layer 610. The second bitstream may comprise a header, and a payload comprising the encoded enhancement layer 612. For example, where the encoded base layer 610 is available before the encoded enhancement layer 612 is available, the encoded base layer 610 can be output in the first bitstream and obtained by the decoder module 702 such that the decoder module 702 can start decoding the first bitstream. The second bitstream can be output, and then obtained and decoded by the decoder module 702 when available. This leverages parallelisation capabilities of the transmission module 314.

In addition, in this example, the bitstream 502 comprises one or more indicators 802. The one or more indicators 802 are indicative that the bitstream 502 comprises a base layer and/or an enhancement layer.

As explained above, the base layer may be transmitted as one or more stripes.

In such an example, the base layer may be transmitted with one or more base layer indicators. The one or more base layer indicators indicate that the associated one or more stripes correspond to the base layer.

As also explained above, the enhancement layer may be transmitted as one or more further stripes. In such an example, the enhancement layer may be transmitted with one or more enhancement layer indicators. The one or more enhancement layer indicators indicate that the associated one or more further stripes correspond to the enhancement layer.

Referring to Figure 9, an example timing diagram 900 is shown.

The example time periods shown in Figure 9 are purely schematic to facilitate understanding. In practice, such time periods may not have regular durations.

In this example, during a first time period, encoding of the base layer 606 starts and completes. Subsequently, encoding of the enhancement layer 608 starts. As such, the base layer 606 and the enhancement layer 608 can be encoded in parallel.

In this example, during a second time period, transmission of the encoded base layer starts and completes. Encoding of the enhancement layer 608 has not completed and transmission of the encoded enhancement layer 608 has not started until part way through the second time period. As such, transmission of the encoded base layer 606 can start before encoding of the enhancement layer 608 has completed.

In this example, during a third time period, decoding of the encoded base layer starts and completes. Transmission of the enhancement layer 608 has not completed and decoding of the enhancement layer 608 has not started until part way through the third time period. As such, decoding of the encoded base layer 606 can start before transmission of the enhancement layer 608 has completed.

In this example, during a fourth time period, playing of the decoded base layer starts. In particular, in this example, the decoded base layer 606 can be played without the enhancement layer 608 being available. However, the enhancement layer 608 can be used to enhance the quality of the base layer 606 when available. Decoding of the enhancement layer 608 has not completed until part way through the fourth time period. As such, playing of the encoded base layer 606 can start before decoding of the encoded enhancement layer 608 has completed, and the decoded enhancement layer 608 can be used to enhance the base layer 606 when it subsequently becomes available.

Referring to Figure 10, another example timing diagram 1000 is shown.

This example timing diagram 1000 is similar to the example timing diagram 900 shown in Figure 9. However, in the example timing diagram 1000, encoding of the enhancement layer 608 starts even later than in the example timing diagram 900.

In particular, in this example, during a first time period, encoding of the base layer 606 starts and completes. However, encoding of the enhancement layer 608 has not yet started. As such, the base layer 606 and the enhancement layer 608 can be encoded in parallel.

In this example, during a second time period, transmission of the encoded base layer starts and completes. Encoding of the enhancement layer 608 has not started until part way through the second time period. As such, transmission of the encoded base layer 606 can start before encoding of the enhancement layer 608 has started.

In this example, during a third time period, decoding of the encoded base layer starts and completes. Encoding of the enhancement layer 608 has not completed until part way through the third time period As such, decoding of the encoded base layer 606 can start before encoding of the enhancement later 608 has completed and before transmission of the encoded enhancement layer 608 has started.

In this example, during a fourth time period, playing of the decoded base layer starts. In particular, in this example, the decoded base layer 606 can be played without the enhancement layer 608 being available. Transmission of the encoded enhancement layer 608 has not completed and decoding of the encoded enhancement layer 608 has not started until part way through the fourth time period. As such, playing of the encoded base layer 606 can start before transmission of the enhancement layer 608 has completed and before decoding of the enhancement layer 608 has started. The decoded enhancement layer 608 can therefore be used to enhance the base layer 606 when it subsequently becomes available.

Referring to Figure 11, another example timing diagram 1100 is shown.

This example timing diagram 1100 is similar to the example timing diagram 1000 shown in Figure 10. However, in the example timing diagram 1100, encoding of the enhancement layer 608 starts even later than in the example timing diagram 1000.

In particular, in this example, encoding of the enhancement layer 608 has not started when playing of the decoded base layer starts during the fourth time period.

Referring to Figure 12, an example transmission protection level table 1200 is shown.

The transmission module 314 may be configured to apply different types and/or levels of transmission protection to different stripes.

In particular, the transmission module 314 may be configured to provide a first level of transmission protection to the base layer 606 and to provide a second, different (for example, lower) level of transmission protection to the enhancement layer 608.

More specifically, the transmission module 314 may be configured to add first transmission protection to a base layer 606 configured as one or more stripes and to add second transmission protection to a base layer 606 configured as one or more further stripes, where the first transmission protection is higher than the second transmission protection. The first transmission protection may be referred to as "high" or "higher" transmission protection and the second transmission protection may be referred to as "low" or "lower" transmission protection, with it being understood that the first transmission protection is high relative to the second transmission protection and that the second transmission protection is low relative to the first transmission protection. For example, an LCEVC enhancement layer 608 may be transmitted with no or limited transmission protection, whereas the base layer 606 may be transmitted with FEC. In this example, transmission protection and/or retransmission may be avoided for lost LCEVC enhancement layer 608 data packets.

As such, the transmission module 314 can also protect (for example, with FEC) a transmission in different ways for each stripe. For instance, LCEVC can be transmitted with limited protection, since transmission protection and/or retransmission of lost packets may be avoided while still providing acceptable performance Referring to Figure 13, an example encoder arrangement output 1300 is shown.

In this example, the base layer 606 is not split into multiple horizontal stripes, but is provided to the transmission module 314 as a (single) stripe. Similarly, the enhancement layer 608 is not split into multiple horizontal stripes, but is provided to the transmission module 314 as a (single) stripe. As such, in this example, the base layer 606 is not in fact striped, but the encoding results into two stripes: a base layer stripe and an LCEVC enhancement stripe.

Referring to Figure 14, another example encoder arrangement output 1400 is shown.

In this example, the base layer 606 is split into three stripes, 606-1, 606-2, 606- 3 and is provided to the transmission module 314 as three stripes. The stripes may be horizontal stripes, for example. However, the enhancement layer 608 is not split into multiple horizontal stripes, and is instead provided to the transmission module 314 as a (single) stripe. As such, in this example, the base layer is striped with three stripes, whereas the LCEVC enhancement layer is not striped but is provided as a stripe. The encoding thus features four stripes, namely the three base layer stripes and the LCEVC enhancement stripe.

Referring to Figure 15, another example encoder arrangement output 1500 is shown.

In this example, the base layer 606 is split into three stripes, 606-1, 606-2, 6063 and is provided to the transmission module 314 as three stripes. In this example, the enhancement layer 608 is split into two stripes and is provided to the transmission module 314 as two stripes. As such, in this example, the base layer is striped with three stripes and the enhancement is striped into two stripes. There are therefore five stripes in total.

Referring to Figure 16, another example encoder arrangement output 1600 is shown.

In this example, the base layer 606 is output as one or more stripes, and the enhancement layer 608 is also output as one or more stripes.

In this example, at least one additional layer is output as one or more stripes. In this specific example, the at least one additional layer comprises a depth map layer. As such, an encoded version of a depth map layer may be provided to the transmission module 314 as one or more stripes, where the encoded version of the depth map layer has been obtained from the encoder arrangement 604.

The at least one additional layer may be provided with a different (for example, lower) level of transmission protection than the base layer and/or the enhancement layer.

Referring to Figure 17, there is shown a schematic block diagram of an example of an apparatus 1700.

In an example, the apparatus 1700 comprises an encoder. In another example, the apparatus 1700 comprises a decoder. In other examples, the apparatus 1700 comprises neither an encoder nor a decoder but is configured to communicate with an encoder and/or a decoder.

Examples of apparatus 1700 include, but are not limited to, a mobile computer, a personal computer system, a wireless device, base station, phone device, desktop computer, laptop, notebook, netbook computer, mainframe computer system, handheld computer, workstation, network computer, application server, storage device, a consumer electronics device such as a camera, camcorder, mobile device, video game console, handheld video game device, an XR headset, or in general any type of computing or electronic device.

In this example, the apparatus 1700 comprises one or more processors 1701 configured to process information and/or instructions. The one or more processors 1701 may comprise a central processing unit (CPU). The one or more processors 1701 are coupled with a bus 1702. Operations performed by the one or more processors 1701 may be carried out by hardware and/or software. The one or more processors 1701 may comprise multiple co-located processors or multiple disparately located processors.

In this example, the apparatus 1700 comprises computer-useable volatile memory 1703 configured to store information and/or instructions for the one or more processors 1701. The computer-useable volatile memory 1703 is coupled with the bus 1702. The computer-useable volatile memory 1703 may comprise random access memory (RAM).

In this example, the apparatus 1700 comprises computer-useable non-volatile memory 1704 configured to store information and/or instructions for the one or more processors 1701. The computer-useable non-volatile memory 1704 is coupled with the bus 1702. The computer-useable non-volatile memory 1704 may comprise read-only memory (ROM).

In this example, the apparatus 1700 comprises one or more data-storage units 1705 configured to store information and/or instructions. The one or more data-storage units 1705 are coupled with the bus 1702. The one or more data-storage units 1705 may for example comprise a magnetic or optical disk and disk drive or a solid-state drive (SSD).

In this example, the apparatus 1700 comprises one or more input/output (I/O) devices 1706 configured to communicate information to and/or from the one or more processors 1701. The one or more I/0 devices 1706 are coupled with the bus 1702. The one or more PO devices 1706 may comprise at least one network interface. The at least one network interface may enable the apparatus 1700 to communicate via one or more data communications networks. Examples of data communications networks include, but are not limited to, the Internet and a Local Area Network (LAN). The one or more I/0 devices 1706 may enable a user to provide input to the apparatus 1700 via one or more input devices (not shown). The one or more input devices may include for example a remote control, one or more physical buttons etc. The one or more I/O devices 1706 may enable information to be provided to a user via one or more output devices (not shown). The one or more output devices may for example include a display screen.

Various other entities are depicted for the apparatus 1700. For example, when present, an operating system 1707, image processing module 1708, one or more further modules 1709, and data 1710 are shown as residing in one, or a combination, of the computer-usable volatile memory 1703, computer-usable non-volatile memory 1704 and the one or more data-storage units 1705. The data signal processing module 1708 may be implemented by way of computer program code stored in memory locations within the computer-usable non-volatile memory 1704, computer-readable storage media within the one or more data-storage units 1705 and/or other tangible computer-readable storage media. Examples of tangible computer-readable storage media include, but are not limited to, an optical medium (e.g., CD-ROM, DVD-ROM or Blu-ray), flash memory card, floppy or hard disk or any other medium capable of storing computer-readable instructions such as firmware or microcode in at least one ROM or RANI or Programmable ROM (PROM) chips or as an Application Specific Integrated Circuit (ASIC).

The apparatus 1700 may therefore comprise a data signal processing module 1708 which can be executed by the one or more processors 1701. The data signal processing module 1708 can be configured to include instructions to implement at least some of the operations described herein. During operation, the one or more processors 1701 launch, run, execute, interpret or otherwise perform the instructions in the signal processing module 1708.

Although at least some aspects of the examples described herein with reference to the drawings comprise computer processes performed in processing systems or processors, examples described herein also extend to computer programs, for example computer programs on or in a carrier, adapted for putting the examples into practice. The carrier may be any entity or device capable of carrying the program.

It will be appreciated that the apparatus 1700 may comprise more, fewer and/or different components from those depicted in Figure 17.

The apparatus 1700 may be located in a single location or may be distributed in multiple locations. Such locations may be local or remote.

The techniques described herein may be implemented in software or hardware, or may be implemented using a combination of software and hardware. They may include configuring an apparatus to carry out and/or support any or all of techniques described herein.

By way of a summary, examples described above leverage the parallelisation and latency of known video coding techniques. However, they are used in relation to base and enhancement layers which are not striped in known video coding techniques.

The base and/or enhancement layers may, themselves, be striped for further parallelisation and latency reduction. By presenting the base and enhancement layers as stripes, and as if they were video stripes in accordance with known video coding techniques, existing transmission and decoding modules may be used. This increases implementation and integration efficiency. In particular, by using ultra-low latency transmission modules, particularly low-latency video processing may be achieved. In addition to low latency, the use of base and enhancement layers enables high quality video to be provided (at low latency). Where the base and enhancement layers are encoded, transmitted, and decoded independently, the base layer may be decoded before the enhancement layer has been decoded. The base layer may still be played, even without the enhancement layer, albeit at a lower quality than if the enhancement layer were available. The enhancement layer can be used once available to increase video quality. The base and enhancement layers can be transmitted with different types and/or levels of transmission protection. For example, the base layer may be transmitted with FEC and/or packet retransmission to increase the likelihood of successful decoding, while the enhancement layer may be transmitted with limited or even no FEC and/or packet retransmission. As explained above, the base layer may be displayable even without the enhancement layer. The different levels of transmission protection can impact reliability, compression efficiency and latency.

It is to be understood that any feature described in relation to any one embodiment may be used alone, or in combination with other features described, and may also be used in combination with one or more features of any other of the embodiments, or any combination of any other of the embodiments. Furthermore, equivalents and modifications not described above may also be employed without departing from the scope of the invention, which is defined in the accompanying claims.

Claims

CLAIMS1. A method comprising: providing an encoded version of an image base layer to a transmission module as one or more stripes, the encoded version of the image base layer having been generated by a first encoding module; and providing an encoded version of an image enhancement layer to the transmission module as one or more further stripes, the encoded version of the image enhancement layer having been generated by a second encoding module.
2. A method according to claim 1, wherein the image base layer and the image enhancement layer are independently encodable.
3. A method according to claim 1 or 2, wherein the first encoding module is operable to start encoding of the image base layer before the second encoding module starts encoding of the image enhancement layer.
4. A method according to any of claims 1 to 3, wherein the first encoding module is operable to complete encoding of the image base layer before the second encoding module completes encoding of the image enhancement layer.
5. A method according to any of claims 1 to 4, wherein the first encoding module is operable to complete encoding of the image base layer before the second encoding module starts encoding of the image enhancement layer.
6. A method according to any of claims 1 to 5, wherein the transmission module is operable to start transmission of the encoded version of the image base layer before the second encoding module starts encoding of the image enhancement layer.
7. A method according to any of claims 1 to 6, wherein the transmission module is operable to start transmission of the encoded version of the image base layer before the second encoding module completes encoding of the image enhancement layer.
8. A method according to any of claims 1 to 7, wherein the transmission module is operable to complete transmission of the encoded version of the image base layer before the transmission module has started to transmit the encoded version of the image enhancement layer.
9. A method according to any of claims 1 to 8, wherein the image enhancement layer comprises a Low Complexity Enhancement Video Coding, LCEVC, enhancement layer.
10. A method according to any of claims 1 to 9, wherein the transmission module is operable to use a first level of transmission protection for the encoded version of the image base layer and to use a second, lower level of transmission protection for the encoded version of the image enhancement layer.
11. A method according to claim 10, wherein using the first level of transmission protection comprises using forward error correction, FEC.
12. A method according to any of claims 1 to 11, wherein: the first encoding module comprises a base encoding module, the second encoding module comprises an enhancement encoding module, a base encoder comprises the base encoding module, and an enhancement encoder comprises the enhancement encoding module; or the first and second encoding modules are modules of a hierarchical encoder.
13. A method according to any of claims 1 to 12, wherein the first encoding module is configured to output the encoded version of the image base layer as a single stripe, and wherein the second encoding module is configured to output the encoded version of the image enhancement layer as a single stripe.
14. A method according to any of claims 1 to 12, wherein the first encoding module is configured to output the encoded version of the image base layer as a single stripe, and wherein the second encoding module is configured to output the encoded version of the image enhancement layer as multiple stripes.
15. A method according to any of claims 1 to 12, wherein the first encoding module is configured to output the encoded version of the image base layer as multiple stripes, and wherein the second encoding module is configured to output the encoded version of the image enhancement layer as a single stripe.
16. A method according to any of claims 1 to 12, wherein the first encoding module is configured to output the encoded version of the image base layer as multiple stripes, and wherein the second encoding module is configured to output the encoded version of the image enhancement layer as multiple stripes.
17. A method according to any of claims 1 to 16, comprising providing an encoded version of a depth map layer to the transmission module as one or more stripes.
18. A method comprising: obtaining a decoded version of an encoded version of an image base layer, the encoded version of the image base layer having been generated by a first encoding module, the encoded version of the image base layer having been output as one or more stripes; and obtaining a decoded version of an encoded version of an image enhancement layer, the encoded version of the image enhancement layer having been generated by a second encoding module, the encoded version of the image enhancement layer having been output as one or more further stripes.
19. A method according to claim 18, wherein the decoded version of the encoded version of the image base layer is displayable without having the decoded version of the encoded version of the image enhancement layer.
20. A method according to any of claims 1 to 19, wherein the encoded version of the image base layer and the encoded version of the image enhancement layer are independently decodable.
21. A method according to any of claims 1 to 20, wherein the image base layer comprises extended reality, XR, content.
22. An apparatus configured to perform a method according to any of claims 1 to 21.
23. A computer program configured to perform a method according to any of claims I to 2 I
24. A bitstream comprising: an encoded version of an image base layer, the encoded version of the image base layer having been generated by a first encoding module, the encoded version of the image base layer having been output as one or more stripes; and an encoded version of an image enhancement layer, the encoded version of the image enhancement layer having been generated by a second encoding module, the encoded version of the image enhancement layer having been output as one or more further stripes.
25. A bitstream according to claim 24, wherein the bitstream comprises one or more indicators, the one or more indicators being indicative that the one or more stripes are to be used as the image base layer and/or that the one or more further stripes are to be used as the image enhancement layer.