GB2379821A

GB2379821A - Image compression method for providing a serially compressed sequence

Info

Publication number: GB2379821A
Application number: GB0122526A
Authority: GB
Inventors: Graham Alexander Thomas; Timothy John Borer
Original assignee: British Broadcasting Corp
Current assignee: British Broadcasting Corp
Priority date: 2001-09-18
Filing date: 2001-09-18
Publication date: 2003-03-19
Also published as: WO2003026311A3; GB0122526D0; WO2003026311A2

Abstract

Disclosed is a method for coding video in which a predetermined algorithm is used to predict portions of the image based on information that will already be available to the decoder at the time of decoding the image. By using known corresponding algorithms in both coder and decoder, and only information that will be available to the decoder, the need to transmit detailed information regarding choices made in coding, e.g. motion vectors is avoided. However, a vector relationship can still be derived for use in the prediction. Prediction can be made based on an assumption that the content of the portion to be coded is similar to the content of an associated image portion, and by determining a vector relating the associated image portion to a reference image (both of which will be available to the decoder), the current image portion can be predicted from a further part of the reference image. The invention can be used in a form of modified pyramid coding.

Description

Video Compression The present invention relates to compression of video.

Video compression has been considered for many years. As long ago as 1978, Netravaii and Robbins (Bell System Technical Journal, Vol. 58, No. 3, March 1979, Page 631-670) proposed a recursive algorithm for estimating frame to frame displacement to enable frame prediction.

In common usage today as a video compression technique is MPEG II compression in which motion vectors approximating motion of blocks of a picture are estimated and these are used to predict changes from frame to frame of a moving picture sequence. Using the predictions, only the differences from the predictions need be transmitted and these can be efficiently compressed. These techniques are very well known and will not be described further.

Pursuant to the invention, it has been appreciated that, in compression such as MPEG compression, a large proportion of the bandwidth is occupied with the transmission of motion vectors.

Because the"motion"vectors are predicted using an algorithm and may not necessarily represent true motion, often spurious motion vectors will be generated. Furthermore, we have found that a typical coder may occupy as much as 80% of its channel capacity sending motion information when feed with a still picture. Thus, whilst conventional compression techniques may offer a significant advantage as compared to the sending of uncompressed data, there is room for improvement.

By way of background, it will be noted that there video sequences may be distributed in essentially two ways. Firstly, an entire video sequence may be transmitted as an entity, for example as a computer file. In such a case, the entire sequence will be available to an entity attempting to display the sequence, and compression techniques may be developed based on complete knowledge of the sequence and relying on the complete coded sequence being available to a decoder in order to display the video. Such techniques may be highly efficient but a drawback is that display of the sequence is only possible if the entire sequence is available and so may require large or prohibitive amounts of memory and are not generally applicable to real time transmission of video sequences or to large

sequences as playback cannot occur until the entire file is received. In a second possibility, the video is compressed in such a way that a sequence can be displayed when only a portion IS available, and generally it is only necessary to store a few frames of data at a time in order to decode the sequence; MPEG II is in this latter category which will herein be termed a "serially accessible" video sequence.

The present invention seeks to provide an alternative method of compressing video to provide a serially accessible sequence. The sequence can of course be accessed in a parallel fashion if desired.

In a first aspect, the invention provides a method of compressing a sequence of video images to provide a serially accessible compressed sequence, the method comprising: forming a portion of the serially accessible compressed sequence by compressing a portion of an image based on forming a prediction of said portion in accordance with a predetermined algorithm and information available from a preceding portion of the serially accessible compressed sequence.

In this way, a sequence will be generated so that the prediction used to encode the second image portion (a) is based on information that will be available to a decoder which receives the earlier part of the sequence and (b) is in accordance with a predetermined algorithm. The result of this is that the prediction can be repeated without having to convey additional information about the prediction (for example the motion vectors that must be carried in MPEG compression).

In a broadest aspect, the invention provides a method of coding a video sequence comprising coding the sequence based on predicting and encoding at least a part of a video image based on information that will be available to a decoder at the time of receipt of the information encoding said part.

It is to be noted that the invention is not limited to any particular type of image; for example it may be applied to progressive or interlaced and to field or frame based images and the term"image" may be construed to include, without limitation, a field of an interlaced image or portion thereof, or complete frame, or portion thereof or a progressive image. Furthermore, although the invention is described in the context of two-dimensional images, it may be applied to three dimensional images; in such a case the associated image element may be a three-dimensional portion.

A remarkable decrease in the amount of bandwidth required to transmit the information may be realised by not transmitting the information derived from the prediction itself, as this can be determined by repeating the prediction process at the decoder. At least in broader aspects, the invention does not exclude the possibility that some but not all of the information derived from the prediction process may be supplied. In certain cases, transmission of a small amount of information may greatly simplify the processing required at the decoder without consuming large amounts of additional bandwidth. For example, if the algorithm includes multiple possible prediction methods or refinements of those methods or if no prediction is found to be appropriate, this fact can be signalled in a few bits.

Preferably, forming a prediction comprises predicting said portion based on a previously encoded associated image portion having a predetermined relationship to said portion.

In a preferred embodiment, the invention provides a method of compressing a sequence of video frames to produce a serially accessible compressed sequence, the method comprising: receiving a sequence of video images; for at least a first image portion, storing picture information for an earlier part of the compressed sequence; for a second image portion of a current image: determining an associated image portion of a template image, the associated image portion having a predetermined relationship to said second image portion, wherein the associated image portion is derivable from the stored picture information for the earlier part of the compressed sequence; comparing the associated image portion to a reference image, wherein the reference image is derivable from the stored picture information for the earlier part of the compressed sequence; forming a prediction according to a predetermined algorithm of the second image portion based on the results of said comparing and said predetermined relationship ; encoding the second image portion based on the prediction; storing encoded picture information for the second image portion as a later part of the compressed sequence; providing sequentially said earlier part and said later part.

In a preferred arrangement, the predetermined relationship comprises a spatial relationship. In one embodiment, for example where images are generally scanned orthogonally, for example from top left across and then down to bottom right in a similar fashion to a conventional television raster, the predetermined relationship may be such that the associated image portion is substantially adjacent to the image portion but on a preceding line and/or earlier in the same line. For example, the associated portion may comprise a portion of image of predetermined size above of and/or to the left of the image portion to be coded. In the case of image portions in the first few scan lines or to the edge of the picture, the associated image portion may be trimmed or modified or prediction may be suspended.

For example (assuming conventional scanning), for image portions in the top few lines, prediction may be based only on an associated portion to the left of the image portion and for image portions at the far left, prediction may be based only on an associated image portion above the image portion. For other image portions, prediction may be based on an L-shaped associated image portion comprising an area of picture above and to the left of the image portion. For the image portion at the top left corner, prediction may be omitted entirely, or may be based on a preceding image.

The predetermined relationship may (additionally or alternatively) comprise a temporal relationship. That is, prediction may be based on portions of a preceding image in the sequence of images.

It is preferable, however, to base prediction on a spatial relationship, preferably to an adjacent portion of the same image.

As an alternative to sequential orthogonal scanning, a coarsely spaced array grid of pixels may be encoded first (for example every n pixels) and then the pixels filling in the spaces in the coarse array may be encoded based on the nearby pixels which have already been encoded. This may be combined with use of prediction based on L-shaped portions as the spaces between the points of the coarse array are filled.

A pyramid coding algorithm may be used in which a picture is sub-sampled and the difference from sub-samples encoded; in such a case the sub samples may be transmitted first to form an earlier part of the sequence (and these may themselves be predicted in some cases) and then prediction

based on the sub samples carried out.

Comparing may comprise matching the associated image portion to a portion of the reference image (typically the immediately preceding image) and determining apparent motion based on the position where a best match is found, for example using a block matching technique. The block matching technique will be similar to conventional block matching, with the exception that the associated image portion may be irregular. Thus, comparing may comprise identifying a matching associated image portion in the reference image. If no suitable match is found, for example if the best match is below a threshold, the method may comprise coding the image portion without prediction, for example by intra-coding, optionally after repeating the step of comparing for alternative associated image portions.

Prediction may be based on the assumption that the block of the reference image which has the same predetermined relationship to the matched portion has become the image portion. For example, if an L-shaped block above and to the left of the image portion (or in other words so that the image portion forms a"missing"corner of a block of which the associated image portion forms the remainder) is used as an associated image portion, it may be assumed that the image portion is based on the portion of the reference image which corresponds to the"missing"corner. Predicting may comprise identifying a portion of the reference image which has the same relationship to the matched associated image portion as the image portion has to the associated image portion.

Encoding may comprise determining the difference between the prediction and the image portion. The difference itself may be compressed, for example using known techniques for compressing picture information (e. g. a discrete cosine transform (DCT) or a wavelet transform), for example as are used in MPEG II compression for encoding difference information.

Preferably, the image portions are encoded as groups of pixels, preferably rectangular blocks.

This reduces the number of prediction steps compared to predicting for each pixel, but prediction for each pixel may be carried out for smaller images of where ample processing power is available. The smaller the blocks, the more accurate the prediction is likely to be but the greater the processing power required. Such considerations will be familiar to those conversant with MPEG II and the like compression techniques and a preferred implementation may use blocks of the size used in MPEG II

compression.

As will be appreciated, in practical Implementations, the method is preferably repeated a plurality of times for a plurality of image portions of each image to be coded. A practical coding algorithm will therefore normally comprise defining a plurality of image portions (e. g. blocks) for an image to be coded and performing the above method for at least some of the image portions.

The method may include determining at least one excluded image portion which is not to be coded according to the above method (e. g. the first image portion/block and/or first row and/or column of image portions/blocks) and coding the or each excluded image portion according to an alternative method (e. g. intra coding). Determining may be based on availability of suitable associated image portions to form a useful prediction (for the first portion in an image, prediction is not possible based on any information in that image).

The method may include varying the predetermined relationship, for example by varying the shape of the associated image portion; this may comprise setting a default associated image portion shape (e. g. L-shape) for a major part of the image and modifying the shape when the associated image portion shape cannot be formed within the current image around a particular image portion (for example for a particular image portion near the periphery of the image). It will be appreciated that

steps such as determining image portions and modifying or setting associated image portion shapes may be implicitly performed ; for example code which loops through a series of blocks of an image, intra codes a first block and then predictively codes subsequent blocks on the basis of a block above and to the left of the current block if such blocks are already available effectively alters the associated image portion shape.

Either or both of the shape and size of the associated image portion may vary between image portions of an image. For later image portions of an image, more information will generally be available and this may enable a larger associated image portion to be used or prediction to be enhanced. Furthermore, information already generated in prediction may be re-used to improve efficiency or enhance prediction. For example, estimates of motion previously formed may be stored and re-used or re-calculated to be more accurate. By the time the last image portion of an image is coded, information for the remainder of the image will be available.

In a practical Implementation, coding will normally be repeated for further images in a sequence; thus coding may be performed as an essentially continuous process on a stream of images of which only a few may be stored at any time.

In the preferred embodiment, the template image preferably comprises the current image and the predetermined relationship preferably comprises a spatial relationship ; that is to say, the image portion is coded based on an associated image portion within the same image. The reference image preferably comprises the image immediately preceding the current image; prediction may be based on comparing the position of the associated portion in the current frame to a matching block in the immediately preceding frame. However, more than one reference frame may be employed and considerations applied in, for example, MPEG II compression concerning revealed and obscured background and multi-frame prediction algorithms may be employed.

It is to be noted that the present invention IS not limited to"forward prediction". That is to say, although it is required that the prediction be based on a portion of the sequence that will already be available to a decoder, it is possible for the images to be encoded in a sequence which does not correspond directly to the sequence of playback. Those skilled In the art will be familiarwith the MPEG principle of intra coding a frame (to generate an !-frame), forward predicting a frame to encode a frame several frames further along (a P-frame) and then bi-directionally encoding one or more intervening frames (B-frames). A similar technique may be applied with the invention. Furthermore, estimations of motion, such as motion vectors, which have been previously generated may be re-used in both the coding and de-coding processes.

One, but by no means the only implementation may be based on an MPEG II coding algorithm, modified so that at least some motion vectors are not transmitted but determined by a defined algorithm which can be reproduced in the decoder based on information that will be available to the coder at the time the motion vector is required. Thus, an i-frame may be encoded in a conventional manner, a P-frame may be encoded by intra-coding the first row and column of blocks and estimating motion vectors the In a complimentary aspect, the invention provides a method of decoding a serially accessible

compressed sequence, the method comprising: receiving a first portion of the serially accessible compressed sequence subsequently receiving a second portion of the serially accessible compressed sequence encoding a portion of an image; decoding said portion of the image based on predicting the portion of an image in accordance with a predetermined algorithm based on information available from the first portion of the serially accessible compressed sequence and the information encoded in said second portion.

In this way, the decoder itself performs prediction based on the information it has available rather than requiring the results of the prediction.

Preferably, in both the coding and decoding aspects, the information encoded for a portion of an image comprises difference information encoding the difference between a prediction of the image portion based on the available information and the image portion to be coded.

A preferred decoding embodiment comprises a method of decoding a serially accessible compressed sequence to produce a sequence of video frames, the method comprising: receiving an earlier part of the compressed sequence; storing picture information including at least a first image portion obtained from the earlier part; receiving a later part of the compressed sequence; for a second image portion of a current image : determining an associated image portion of a template image, the associated image portion having a predetermined relationship to said second image portion, wherein the associated image portion is derived from the stored picture information obtained from the earlier part of the compressed sequence; comparing the associated image portion to a reference image, wherein the reference image is derived from the stored picture information obtained from the earlier part of the compressed sequence; forming a prediction according to a predetermined algorithm of the second image portion based on the results of said comparing and said predetermined relationship; decoding the second image portion based on the prediction and encoded picture information for the second image portion obtained from the later part of the compressed sequence.

The invention further provides a method of transmitting a sequence of video images as a serially accessible compressed sequence, the method comprising, at a transmission site: forming a portion of the serially accessible compressed sequence by compressing a portion of an image based on forming a prediction of said portion in accordance with a predetermined algorithm and information available from a preceding portion of the serially accessible compressed sequence; and at a reception site: receiving a first portion of the serially accessible compressed sequence subsequently receiving a second portion of the serially accessible compressed sequence encoding a portion of an image; decoding said portion of the image based on predicting the portion of an image in accordance with said predetermined algorithm based on information available from the first portion of the serially accessible compressed sequence and the information encoded in said second portion.

The invention extends to apparatus, for example a coder or decoder, for performing any of the above methods, to a computer program or computer program product comprising instructions for performing any such method and to a compressed video sequence generated by such a method.

It is noteworthy that recent work on improving compression techniques have concentrated on providing increasingly sophisticated coding algorithms so as better to detect actual motion. These coding algorithms for predicting motion are often proprietary and of course it is not necessary for the decoder to be aware of the coding algorithms used. Basic coding algorithms include block-matching in which a search is made for a block corresponding to a block of pixels in an original image and phasecorrelation, both of which are well known methods in the art. Both of these methods have advantages and disadvantages and generally, proprietary methods may perform better many times of picture. However, pursuant to the invention, it has been appreciated that however the coding algorithm is optimised, there would still be the necessity to carry the results of the prediction using conventional methods. In a radically different approach, the invention proposes that a specified prediction method

is used, which method can be reproduced at the decoder. Thus, although the prediction method may not necessarily be the most sophisticated prediction method available, the fact that It IS reproducible in fact allows an overall improvement to be gained.

In a further aspect, there is provided a method of coding a sequence of video images, the method comprising predicting portions of images from available portions of images according to a first specified method; updating the prediction method; communicating the updated prediction method to the coder and to a downstream decoder receiving the output from the coder.

An embodiment of the invention will now be described, by way of example, with reference to the accompanying drawings in which :- Fig. 1 is a schematic overview of a transmission system including a coder and decoder in accordance with a first embodiment; Fig. 2 schematically depicts a typical L-shaped associated image portion used in an embodiment to predict an image portion; Fig. 3 schematically depicts a pyramid coder; and Fig. 4 schematically depicts a multi-level pyramid coder.

Referring to Fig. 1 coding and decoding of a picture in accordance with a first embodiment will be explained. As will be apparent from the above, coding is based on use of information that will be available at the decoder and the coder therefore reproduces the core functions of the decoder. The coder will thus be explained in detail first, following which operation of the decoder will become clear.

A subtractor 10 forms the difference signal between an incoming picture input and a"predicted picture"input (to be explained). This is compressed by passing to a forward transform engine 12 and a quantiser 14 to provide a coded picture, which may be further compressed in an entropy coder 16 (or other lossless compressor) to provide an output compressed sequence. The quantised signal is passed through an inverse quantiser 20a and an inverse transform engine 22a to recreate the difference signal but with any quantising and coding losses that would be present in the decoder's

version of the difference signal (as will be explained further the decoder has a corresponding inverse quantiser 20b and an inverse transform engine 22b to recreate the difference signal from the incoming sequence). However, since the entropy coding can be assumed to be lossless, it is not necessary to take the output of the entropy coder and decode that to reproduce the signal available at the decoder.

If the signal were not quantised and loss less compression were used instead, the difference signal could be used directly.

The recreated difference signal is then passed to adder 24a where it is summed with a "predicted picture" signal to form a decoded picture, which is stored in picture store 26a. Preceding frames are stored in picture store 27a. As will become apparent, the process is somewhat circular, with the coder decoding its own output and carrying out further coding based on what it has previously coded; it is this feature which enables the decoder to operate similarly without having to transmit the results of prediction. Of course, there will be circumstances where there is no previously decoded picture or picture portion. Where no (relevant) picture is stored in the picture store, switch 28a functions to supply a null signal to the subtractor as the "predicted picture". In such a case, the difference signal will encode a greater error with corresponding increase in the amount of information to be transmitted.

To predict the picture, search engine 30a searches the preceding image stored in picture store 27a to find a match for the image portion on which prediction is based (to be described in more detail below). If a match is found, the appropriate block from the preceding image is provided as a prediction; if not the switch 28a is controlled to provide a null signal as the"predicted picture"which results in intra-coding of the block.

For the first picture in a sequence, and also preferably at regular intervals to allow easy "locking on"to the compressed sequence, intra-coded frames are transmitted; these can be generated by asserting the intra-frame coding flag which, via OR gate 32a controls the switch 28a to supply a null "predicted picture".

It will be appreciated that the coder generates a replica of the input as a sequence of decoded pictures at the output of summer 24a. This corresponds to the decoded picture sequence provided by a decoder. Thus it will be appreciated that the decoder simply comprises a replication of the core

functions of the coder 20b.. 30b, performing the same functions, but of course omitting the forward transform coding and quantisation and input processing. The decoder also Includes at the input an entropy decoder 18 to undo any lossless compression. Decoded output can be taken from the output of summer 24b.

It will be appreciated that a practical coder and decoder will Include other control logic functions and that these will normally differ. For example the coder will include logic for deciding when to insert !-frames and for packaging the output sequence, as well as for processing the input video.

Conversely, the decoder will include logic for unpackaging the received video sequence and, rather than deciding when to insert l-frames, will deduce where l-frames are included, for example from information carried with the sequence. The details of such functions is not germane to the invention and may be based on, for example, basic schemes used in MPEG II coders and decoders, with the exception of course that motion vectors do not need to be carried in the output sequence.

The prediction process will now be explained in more detail. Referring to Fig. 2, an associated image portion, here comprising an L-shaped block 100 to the left and above, an image portion, here a rectangular block, to be coded 110 will already have been coded and available In the picture store, as will the complete preceding image. This L-shaped block is matched to the preceding image using a block matching process and, assuming a match is found, the motion vector giving apparent motion of the L-shaped block from the preceding image is determined. This is used to predict the block to be coded (the"missing corner") from the preceding image on the assumption that the motion is the same for the block to be coded as for the L-shaped block. Depending on the block sizes, image complexity and motion, this may be more or less accurate, but in general may represent a reasonable estimate of motion. More complex algorithms for prediction may be employed, for example possible motion vectors for each of the three corners (or for an extended L-shaped block extending further to the right than the block in question) may be determined and the most appropriate one of those may be assigned. More than one preceding image may be used in the determination of a motion vector for the image portion.

Figure 2 illustrates how a basic implementation works. The figure shows three, rather small, frames of 64 pixels each. In this example the picture is coded as a set of 2x2 blocks of pixels. The small highlighted block 110 in the current frame, frame n, is to be coded. Frames prior to the current

frame have already been coded. In order to code the highlighted block a prediction for it is required. In a conventional motion compensated system the block would be matched with a previous frame, at the coder, which would code a displacement giving a good prediction as auxiliary data. However, the embodiment seeks to avoid sending the auxiliary displacement information so forms a prediction with data that is only available at the decoder. Clearly the block itself cannot be matched because the decoder will not yet have it. The embodiment matches the surrounding region that has already been coded.

Fig. 2 shows a sequence of images that have each been scanned from top left to bottom right.

The dark (blue) region, surrounded by a light ~shaped region, IS the region to be coded. This embodiment relies on an assumption that if the surrounding L-shaped region matches part of a previously coded picture then the small block to be coded will also match. In most pictures the region to be coded will be part of the image of an object and, surprisingly, this assumption will hold in many cases. In order to form a prediction we search for a matching L-shaped region in previously coded video. Once a match has been found for the L-shaped region, the corresponding block at the bottom right corner of the matching region is taken as prediction for the block to be coded.

Once a prediction has been obtained a difference is calculated between the prediction and the block to be coded and that difference is coded. The difference may be coded directly using entropy coding, Alternatively the difference may be transform coded, for example using a DCT, and the transformed block entropy coded. This is analogous to an MPEG coder and has two advantages.

Firstly, if the match fails (e. g. if the assumption above does not hold) we still achieve compression from the transform itself (equivalent to I frame coding in MPEG). Secondly, by performing a frequency analysis we make it possible to match quantisation to the characteristics of the human visual system.

This allows a perceptual coding advantage.

The decoder has exactly the same information available to it as has been used in the coder.

Provided both encoder and decoder use the same method to find the prediction both will find the same predictor for the block. This avoids the need to send auxiliary information to specify the prediction.

An alternative embodiment may be provided based on pyramid coding, as will now be outlined

with reference to Figs. 3 and 4 which illustrate basic conventional pyramid coding. In such an embodiment, rather than matching a surrounding region with previously coded regions, it is possible to match a previously coded low resolution image with a low resolution version of the current image. The resulting motion vectors could be used to predict a higher resolution version of the current image. This scheme avoids the asymmetry of the implementation described above and has other advantages described below. It provides a multiresolution approach to compression and so ties in naturally with pyramid coding.

To decode the current frame a previously coded"reference"frame is used. The reference frame could be coded as an"intraframe"using conventional two dimensional pyramid coding described above. Periodic intra frame coded reference frames are desirable for practical reasons as described above.

The elements of a suitable coding procedure will be described: it will be appreciated that each feature may be modified or replaced or provided independently unless otherwise stated or clear from the context.

The current frame is preferably decomposed into a multiresolution pyramid, that is a low resolution image plus a sequence of higher resolution difference images. The highest level of the pyramid (i. e. the lowest resolution) is coded first. This could be coded using just quantisation and entropy coding, or transform coding could be used as well. Motion vectors can be measured from the low resolution images of the current and the reference frames.

The resulting motion vectors can then be interpolated to the higher resolution of the next level of the pyramid. The interpolator could use the same interpolation technique as used in the pyramid decomposition, or another technique. The interpolated motion vectors can be used to predict the next layer of the pyramid for the current frame. This can be obtained by motion compensating the corresponding level of reference frame pyramid. The difference between the predicted and actual layer of the pyramid can be coded using a combination of transform coding, quantisation and entropy coding. At this stage another level of the pyramid has been coded without the need to code auxiliary information. The process can be repeated to code successively lower levels of the pyramid. At the decoder the pyramid can be decoded then reconstructed to regenerate the original image.

Motion estimation cannot be guaranteed to succeed for all parts of the picture. In particular it may not be possible to form a prediction for a newly revealed part of the picture. Most motion estimators provide some indication that they have failed to measure a valid motion vector. For a block matching motion estimator a high value for the minimum displaced frame difference indicates an unreliable vector. In this scheme both coder and decoder use the same motion estimation scheme and so both would detect unreliable vectors in the same part of the picture.

If the motion estimation fails in the above scheme, an alternative prediction is needed for the pyramid. A value of zero could be used. Then part of the pyramid can be sent directly; no prediction would be used for these parts. In these circumstances, the technique degenerates into 2D pyramid coding. That is intra coding would be used for parts of the picture for which motion estimation failed.

An advantageous feature of preferred embodiments is that motion estimation failure can be detected at the decoder and need not be explicitly coded.

A number of suitable methods could be used for motion estimation (there are no particular constraints, other than that the method must be reproducible; if a method involved an element of randomness (which would be highly unorthodox), this might present problems). High resolution motion vectors can be generated. The use of motion vectors with the same resolution as the image, i. e. one vector per pixel (or a block size of one), should yield a lower prediction error than 1 vector per block. Furthermore high precision, sub-pixel, motion vectors could be generated further reducing the prediction error. It is noteworthy that these advantages may be possible because motion vectors are generated at the decoder and it is no longer necessary to consider the amount of data required to code the motion vectors. Coding sub-pixel motion vectors for high resolution vector field would be prohibitive, in terms of data rate, for conventional compression systems in which motion vectors are sent as auxiliary data. It is found particularly advantageous to use motion vectors with pixel accuracy or more preferably sub-pixel accuracy, something which is essentially prohibited in conventional compression schemes.

The use of error feedback between levels in pyramid decomposition allows great control of the

quantisation noise at all levels of the pyramid. The quantisation error at any level of the pyramid is fed back to the next lower level where it is corrected. The ultimate quantisation error for the image is governed purely by quantisation applied at the lowest level. Typically, for image compression, quantisation noise IS preferably "blue". i. e. rising with frequency, to match the characteristics of the human visual system. This can be achieved using (spatial) error feedback at the lowest level of the pyramid.

In a lossy compression system each layer of the pyramid will be degraded by quantisation noise. The motion estimation process described above operates between a reference frame and the current frame. If either frame is degraded by quantisation noise the accuracy of the motion estimate will be degraded too. Pyramid coding, with inter-level error feedback, allows control of the quantisation noise at each level, for example by using spatial error feedback. Motion estimation algorithms may perform better with"pink" (predominantly low frequency) noise. This is because images contain most energy at low frequencies, so that a good signal to noise ratio is maintained at all frequencies.

Furthermore"clean" (noise free) high frequency information is required to generate high resolution and high precision motion vector fields. Human viewers, by contrast, are likely to prefer"blue"noise.

A compromise is to quantise to give "blue" nOise and then (mildly) low pass filter the Image. By filtering out a small proportion, for example approximately the top quarter, of the spectrum almost all the noise would be removed, because it is clustered at high frequencies. This would leave a"clean", albeit slightly lower resolution, image with which to perform motion estimation. An advantageous point here is that the use of pyramid coding, with inter-level error feedback, allows quantisation noise to be optimised for one or both of motion estimation and/or perceptual quality.

Pyramid coding inherently decomposes an image into a series of levels with different

resolutions or, equivalently, sizes. It IS therefore well suited to"scalable"coding. This is where a decoder may chose to reconstruct only part of the coded image. For example if a high definition image were coded, a standard definition display system might only chose to decode a standard definition approximation to the original image. Quantisation can be controlled at each level of the pyramid to optimise the perceived quality of more than one display resolution. This is a particularly advantageous feature.

The use of pyramid coding facilitates an efficient, multiresolution. approach to motion estimation. Motion vectors can first be estimated based on low resolution Images. The results of this estimation process can be used as a"seed"for motion estimation at a higher resolution, corresponding to a lower level of the pyramid. So if a particular motion vector were measured at a low resolution a search could be performed around that velocity in the higher resolution image. This is a computationally efficient hierarchical approach to motion estimation. Although this approach is advantageous it is not essential. Other approaches to motion estimation could also be used.

The compression scheme described in this section might, succinctly, be described as pyramid coding using implicit motion vectors.

Alternative Implementation The compression scheme considered above simply describes deriving a new frame from a single reference frame. Practical compression schemes, such as MPEG 2 (ISO/IEC 13818-2), allow more flexible ways of predicting/interpolating images. These techniques, described briefly below, can also be used with the proposed compression algorithm. MPEG uses bi-directionally predicted frames, or"B frames". In B frames a picture is interpolated from both a preceding and a following reference frame.

Obviously the following reference frame must be coded first, necessitating reordering the picture sequence. A bi-directionally interpolated frame is constructed by forming a (possibly weighted) average of predictions from both reference frames. Similarly, and in contrast to the pyramid coding scheme it is possible to use more than 1"B frame"between an") frame"and"P frame". This is done in'long GOP MPEG"coding where the"GOP" (group of pictures) describes the structure of P and B frames between consecutive I frames. The use of B frames generally yields a higher compression ratio for two reasons. Firstly the prediction is more accurate being an average of predictions from 2 reference frames. Secondly B frames can efficiently code revealed and obscured background. This latter advantage arises because regions of revealed and obscured background are available in either the preceding or following frames, even if they are not available in both.

In the pyramid coding embodiment it was proposed to perform motion estimation on a low resolution image and interpolate the motion vectors to predict a higher resolution image. This has a

drawback of requiring interpolation of motion vector fields Whilst this is possible, using linear interpolators or otherwise, the Interpolated vectors may not correspond to the actual movement of any object in the Image. This is because motion vectors may contain discontinuities between regions corresponding to different objects. Errors In the Interpolated motion vectors may result In degraded predictions.

An alternative to interpolating motion vectors is to derive them from interpolated images instead. In this scheme, images constructed from a particular pyramid level (say level n) and above are first interpolated to the resolution corresponding to the next lower. higher resolution. level (level n-1).

Motion vectors are measured between the two interpolated images. These motion vectors are then used to predict level n-1 of the pyramid for the current frame from the reference frame. An advantage of this scheme is that no vector interpolation is required, resulting in more accurate vector fields. This in turn may improve the compression ratio. A drawback is the additional computation required to performing motion estimation on higher resolution images.

The disclosed compression algorithms can effectively adopt a GOP structure, similar to that used in MPEG coding, for efficient coding. Again, the advantage of not requiring explicit coding of motion vectors can give rise to more accurate predictions and a greater compression ratio.

The use of implicit motion vectors allows the use of high precision motion vectors. Using high precision (sub-pixel) motion vectors would Improve the accuracy of the predicted images and, thereby, improve compression. Because the motion vectors are derived from the images and not explicitly coded, there is no need to consider their data rate. The high data rate makes the use of high precision motion vectors impractical in systems where they are explicitly coded as auxiliary data. One way to estimate sub-pixel motion vectors would be to use a gradient motion estimation scheme. For example, the motion vector may be known to integer pixel accuracy, perhaps having used a block-matching algorithm. Then the fractional part of the motion vector could be estimated from the ratio of the spatial and temporal image gradients. This is gradient motion estimation and is well known, but not practically applied to compression schemes owing to data rate problems. The spatial Image gradients can be calculated with simple spatial linear filters and the temporal gradient from the brightness difference between the two frames. Actually the sub-pixel component of the "motion vector", calculated in this simple way, would not necessarily correspond to the true motion. Instead, it would be the component

of the motion parallel to the direction of the image gradient. Nevertheless, it would still give an improvement in the prediction and, therefore, the compression ratio that can be achieved.

The pyramid coding scheme described above predicts a pyramid level from the current frame from the corresponding level in a reference frame (or frames). The pyramid level, in this context, is a high pass signal as described above. An alternative would be to predict the actual image corresponding to the pyramid level from the corresponding image in a reference frame (or frames).

These images are simply the sum of the current level of the pyramid plus the (suitably interpolated) higher levels. The predicted pyramid level for the current frame would then be the predicted image minus the (interpolated) lower level image for the current picture. Although this provides an alternative implementation, a drawback is that it is more complex (requiring pyramid levels to be recombined to from an image).

Thus it can be seen from the above that several embellishments and alternative implementations for the compression technique, based on pyramid coding, presented in the previous sub-section are possible.

Above are suggestions for further compression schemes based on the notion of implicit motion vectors disclosed herein, in addition to specific examples of compression schemes specifically described.

Below some further important novel features, which may be provided independently, are summarised.

An important feature is the use of implicit motion vectors to predict a image (or part of an image). Implicit motion vectors are derived purely from parts of the image that have already been coded. This is in contrast to compression algorithms, such as MPEG, that perform motion estimation in the coder and explicitly code this motion information as auxiliary data, in addition to the actual image data. Furthermore, other decisions about the way in which a prediction is calculated can also be derived solely from already coded picture information. One such decision would be whether to use a motion vector or to fall back to intra frame coding if motion estimation had failed. Another decision would be whether to use both preceding and following reference frames, in forming a bi-directional prediction, or just one reference frame to allow for revealed or obscured background. The use of implicit motion vectors and related decisions does not preclude the use of auxiliary information derived at the coder if this would improve compression. The notion is simply that such auxiliary information is not necessary.

By using implicit motion vectors there IS no need explicitly to code motion vectors, thereby saving capacity for addition picture information. Hence, either the picture quality can be Improved or the bit rate reduced.

Since there is no requirement to explicitly code motion vectors, high resolution, high precision motion vectors may be used. The use of such motion vectors should Improve the accuracy of the prediction and therefore the compression ratio.

Once a prediction has been formed standard compression techniques, such as transform coding, quantisation and entropy coding, may be used to compress the difference signal. The difference signal is the difference between the prediction and the actual picture. More precisely, in the pyramid scheme, it is the difference between the prediction of a level of a pyramid and the actual pyramid level.

A straightforward compression technique IS disclosed In which previously coded picture information is searched for a good match to form a prediction for part of the picture. The key assumption is that the neighbourhood surrounding a region of the picture will match when the region itself matches. This allows the search to be conducted before the region Itself has been coded.

Another compression technique, based on pyramid coding, is disclosed in which motion estimation is performed using low resolution versions of an image. Motion compensation is then applied to a higher resolution reference image to form a prediction. This allows the advantages inherent in pyramid coding to be exploited.

The use of pyramid coding facilitates salable coding and computationally efficient hierarchical motion estimation.

The use of inter-level error feedback in pyramid coding allows control of quantisation noise.

Quantisation noise at each level of the pyramid can be controlled to optimise perceived quality and/or motion estimation accuracy. By preventing the accumulation of quantisation noise, the efficacy of

motion estimation can be maintained.

The advantages of long GOP coding, as used in MPEG compression, are easily incorporated into the compression scheme based on pyramid coding with implicit motion vectors.

The search range can Include any previously coded picture information. So, we may search parts of the current image that have already been coded, as well as the previously coded image. This would help in coding a frame without reference to previous frame, i. e."intraframe coding"analogous to "I frames"in MPEG coding. A practical scheme would normally require sending periodic intra-frames in a sequence to give the decoder somewhere to start and to allow for video editing and transmission errors. Furthermore the search range could include searching a plurality of previously coded frames.

This would, in principle, allow efficient coding of revealed background if that image had ever been transmitted in the past.

There is little constraint on the size of blocks used to code the picture. Typical block sizes for compression are 8x8 pixels and this is a convenient size for DCTs. Smaller blocks and even single pixel blocks are also possible.

There are certain practical constraints or considerations relevant to possible implementations.

The computation needed to search for the prediction is non-trivial. The computational cost increases approximately as the square of the search range so using a large search range quickly becomes computationally costly. Multiresolution and multistep searches may alleviate this problem. It may be difficult to achieve high precision"motion vectors"by matching L-shaped regions, primarily due to the spatial bias In measuring the displacement because the L-shaped region is offset from the block being predicted. That is, the measured displacement is more likely to apply to the centre of the L-shaped region than to the centre of the predicted block. There may be issues associated with the assymetry of the first technique, that is searching for a predictor using a search region on only one side of the predicted block.

It will be appreciated that a large number of techniques have been disclosed and furthermore that considerations and features to enable yet further techniques have been discussed. The embodiments are disclosed by way of example only and the features disclosed may be provided

independently or in other combinations. unless explicitly limited to the context in which they are disclosed.

Claims

1. A method of compressing a sequence of video images to provide a serially accessible compressed sequence, the method comprising: forming a portion of the serially accessible compressed sequence by compressing a portion of an image based on forming a prediction of said portion in accordance with a predetermined algorithm and information available from a preceding portion of the serially accessible compressed sequence.

2. A method acoording to Claim 1, wherein forming a prediction comprises predicting said portion based on a previously encoded associated image portion having a predetermined relationship to said portion.

3. A method of compressing a sequence of video frames to produce a serially accessible compressed sequence, the method comprising: receiving a sequence of video images ; for at least a first image portion, storing picture information for an earlier part of the compressed sequence; for a second image portion of a current image: determining an associated image portion of a template image, the associated image portion having a predetermined relationship to said second image portion, wherein the associated Image portion is derivable from the stored picture information for the earlier part of the compressed sequence; comparing the associated image portion to a reference image, wherein the reference image is derivable from the stored picture information for the earlier part of the compressed sequence; forming a prediction according to a predetermined algorithm of the second image portion based on the results of said comparing and said predetermined relationship; encoding the second image portion based on the prediction; storing encoded picture information for the second image portion as a later part of the compressed sequence; providing sequentially said earlier part and said later part.

4. A method according to any preceding claim, wherein the predetermined relationship comprises

a spatial relationship.

5. A method according to Claim 4 as dependent on Claim 3. wherein the associated Image portion IS substantially adjacent to the image portion.

6. A method according to Claim 3 or 5 wherein comparing comprises identifying a matching associated Image portion in the reference image.

7. A method according to Claim 3,5 or 6, wherein predicting comprises identifying a portion of the reference image which has the same relationship to the matched associated Image portion as the image portion has to the associated image portion.

8. A method according to any preceding claim, wherein encoding comprises oetermining the difference between the prediction and the image portion.

9. A method according to Claim 8 wherein the difference IS compressed.

10. A method according to any preceding claim wherein the image portions are encoded as groups of pixels.

11. A method according to any preceding claim repeated a plurality of times for a plurality of image portions of each image to be coded.

12. A coding method comprising defining a plurality of image portions for an image to be codea and performing a method according to any preceding claim for at least some of the image portions.

13. A method according to Claim 12 including determining at least one excluded image portion which is not to be coded according to a method according to any of Claims 1 to 11.

14. A method according to Claims 2 or 3 or any claim dependent thereon Including varying the predetermined relationship.

15. A method of decoding a serially accessible compressed sequence, the method comprising : receiving a first portion of the serially accessible compressed sequence subsequently receiving a second portion of the serially accessible compressed sequence encoding a portion of an image; decoding said portion of the image based on predicting the portion of an image in accordance with a predetermined algorithm based on information available from the first portion of the serially accessible compressed sequence and the information encoded in said second portion.

16. A method according to any preceding claim wherein the information encoded for a portion of an image comprises difference information encoding the difference between a prediction of the image portion based on the available information and the image portion to be coded.

17. A method of decoding a serially accessible compressed sequence to produce a sequence of video frames, the method comprising: receiving an earlier part of the compressed sequence; storing picture information including at least a first image portion obtained from the earlier part; receiving a later part of the compressed sequence; for a second image portion of a current image: determining an associated image portion of a template image, the associated image portion having a predetermined relationship to said second image portion, wherein the associated image portion is derived from the stored picture information obtained from the earlier part of the compressed sequence; comparing the associated image portion to a reference image, wherein the reference image is derived from the stored picture information obtained from the earlier part of the compressed sequence; forming a prediction according to a predetermined algorithm of the second image portion based on the results of said comparing and said predetermined relationship; decoding the second image portion based on the prediction and encoded picture information for the second image portion obtained from the later part of the compressed sequence.

18. A method of transmitting a sequence of video images as a serially accessible compressed sequence, the method comprising,

at a transmission site : forming a portion of the serially accessible compressed sequence by compressing a portion of an image based on forming a prediction of said portion in accordance with a predetermined algorithm and information available from a preceding portion of the serially accessible compressed seQuence : and at a reception site' receiving a first portion of the serially accessible compressed sequence subsequently receiving a second portion of the serially accessible compressed sequence encoding a portion of an image: decoding said portion of the image based on predicting the portion of an image in accordance with said predetermined algorithm based on information available from the first portion of the serially accessible compressed sequence and the information encoded in said second portion.

19. A method of coding a sequence of video images, the methoo comprising predicting portions of images from available portions of images according to a first specified method : updating the prediction method; communicating the updated prediction method to the coder and to a downstream decoder receiving the output from the coder.

20. A coder arranged to perform a method according to any of Claims 1 to 14.

21. A decoder arranged to perform a method according to Claim 15 or 17.

22. A computer program or computer program product comprising instructions for performing a method according to any of Claims 1 to 19.

23. A compressed video sequence generated by a method according to any of Claims 1 to 14.

24. A coder or decoder substantially as any one herein described or as illustrated in Fig. 1 or with reference to Fig. 2 or as described with reference to Figs. 3 and 4 of the accompanying drawings.

25. A method of coding a video sequence comprising coding the sequence based on predicting and encoding at least a part of a video image based on information that will be available to a decoder at the time of receipt of the information encoding said part.