GB2368220A

GB2368220A - Compression of motion vectors

Info

Publication number: GB2368220A
Application number: GB0024709A
Authority: GB
Inventors: Michael James Knee; Violet Snell
Original assignee: Snell and Wilcox Ltd
Current assignee: Snell Advanced Media Ltd
Priority date: 2000-10-09
Filing date: 2000-10-09
Publication date: 2002-04-24
Also published as: GB0024709D0

Abstract

In a process for reducing the average data rate of motion vector information, the components of the motion vectors are separately or in combination compressed losslessly by exploitation of the correlation of motion vectors in the horizontal and at least one other dimension, in particular using a spatial transform followed by a variable-length encoder. Typically, the transform is a Discrete Cosine Transform. Optionally, where a set of motion vectors is associated with a macroblock of an image, the vectors within the set are translated to equivalent velocity values and passed through a linear transform.

Description

COMPRESSION OF MOTION VECTORS This invention relates to the compression of motion vectors, especially for use in motion compensated video compression schemes, such as MPEG.

One of the key features of both MPEG-2 and MPEG-4 compression is the use of motion compensated prediction, in which blocks of video samples are predicted from other frames or fields that have already been encoded and decoded. Highly efficient compression is achieved by transmitting the prediction error rather than the original samples. The prediction process compensates for the motion between predicted and reference pictures by applying a motion vector to each block. It is then necessary to transmit those motion vectors as part of the compressed bitstream, because the decoder needs to reconstruct the prediction that was made in the encoder.

The transmission of motion vectors incurs an inevitable overhead in bit rate. In the MPEG-2 standard, this overhead is reduced by compressing the motion vectors. The motion vector compression algorithm in the MPEG-2 standard consists of two elements: forming differences between motion vectors in adjacent blocks ; and transmitting these differences using variablelength coding.

This algorithm is a compromise between efficiency and the relatively limited hardware complexity available in an MPEG-2 decoder.

Another application of motion vector compression is in generating a compressed version of the Information Stream. The Information Stream [see for example PCT/GB97/01862] is a data stream consisting of MPEG-2 coding parameters, formatted in such a way that it can be used to aid MPEG-2 reencoding of decoded video signals in order to avoid cascading impairments.

There is in certain cases a requirement for a compressed version of the Information Stream, because the channel capacity available may be limited.

One method of reducing the bit rate of the Information Stream is to use the MPEG-2 syntax itself to transmit the coding parameters. In this case it is sometimes found necessary to reduce the bit rate even further, and this can be done by selectively modifying some motion vectors, as described PCT/GB99/00227. The method of compressing vector information for the

Information Stream by selectively changing them has the disadvantage that it no longer remains lossless. While this compromise between quality, bandwidth and complexity is clearly acceptable for some applications, it remains more desirable to retain the exact values of the motion vectors where possible.

It is therefore one object of the present invention to provide an improved method for the compression of motion vectors.

Accordingly, the present invention consists in one aspect in a process for compressing motion vectors, which exploits the correlation of motion vectors horizontally within a picture and in at least one further dimension.

Preferably, the process exploits the correlation of motion vectors horizontally and vertically within a picture.

In another aspect, the present invention consists in a process for reducing the average data rate of motion vector information, in which the components of the motion vectors are separately or in combination compressed losslessly using a spatial transform followed by a variable-length encoder.

Preferably, the transform is a Discrete Cosine Transform.

In a further aspect, the present invention consists in a method of compressing motion vectors which derive from a process of motion measurement which comprises the identification of a number of candidate vectors and the preliminary assignment of one or more of said candidate vectors to specific picture elements, the method of compressing comprising the steps of defining said candidate vectors as a set of representative values and quantizing the assigned vectors with reference to said set of representative values.

Preferably, the method further comprises the step of run length coding.

Suitably, the method further comprises the steps of noting any error in the quantization of assigned vectors with reference to said set of representative values and additionally coding any such quantization errors.

The invention will now be described by way of example with reference to the accompanying drawings, in which:

Figures 1, 2 and 3 are block diagrams illustrating three embodiments of the present invention.

In one embodiment of the invention, the basic approach is to scan the motion vectors in blocks, typically 8x8, and to use a two-dimensional transform such as the DCT to compress the vectors.

Reference is directed initially to Figure 1.

Let us assume that we are encoding motion vectors in a bidirectionally predicted picture (B-picture) according to the MPEG-2 standard. In this case, each macroblock has up to four motion vectors. The diagram shows the process applied to these four vectors in parallel. The horizontal and vertical components of the motion vectors may either be treated separately and encoded independently in parallel, or they may be considered as the real and imaginary parts of a complex number and encoded together.

The row scanning format of each input vector is first converted to block-based scanning, for example with 8x8 blocks. Note that each 8x8 block of vectors at this stage covers an area corresponding to 8x8 macroblocks, or 128x128 pixels, of the original image.

In the MPEG-2 standard, not every macroblock in a B-picture has all four motion vectors. For example, a particular macro block might be intracoded and have no vectors, or forward predicted using one frame vector only, or at the other extreme it may be bidirectionally field-predicted with four vectors. The DCT will require a value for each of its 64 inputs. Where no such value exists, it will be necessary to calculate an interpolated value. This could be done, for example, by simply repeating an adjacent value or by taking an average or median of neighbouring values.

The method then follows the standard MPEG-2 coding process in the application of the transform, zigzag scan and variable-length coding, the main difference being that there is no quantisation because the motion vector compression is intended to be lossless. Both the shape of the zigzag scan and the design of the variable-length coder may be varied from the MPEG-2 standard in order to maximize the efficiency of the algorithm.

In the case of forward predicted pictures (P-pictures), the above description applies but there are only two motion vectors per macroblock.

In all cases, the decoding process is a direct reversal of the encoding process and requires no further description.

This embodiment of the present invention offers important advantages of the invention. It offers a more efficient method of compressing motion vectors than the prior art, yet the most computationally intensive processing blocks are already available in a standard MPEG-2 encoder.

One possible improvement to the basic method is to exploit the correlation between the four motion vectors of each macroblock. A general way of achieving this is shown in Figure 2.

The sets of four vectors for each macroblock, referred to here as quartets, are passed through a linear transform in order to reduce their entropy and hence the number of bits needed to encode them. In order to do this efficiently, it is necessary first to compensate for the different time intervals and directions over which the vectors are defined, by calculating the velocity to which each one corresponds. Coding of the transformed quartets is therefore carried out in the velocity domain rather than the displacement domain.

The transform that might be applied to the quartets can be expressed as a 4x4 matrix multiplication. There now follow two examples of suitable transforms between velocities vs and transformed velocities Xi : Example 1

Here, the first velocity in the quartet is taken as a prediction and the other three vectors are encoded as prediction errors with respect to the first velocity.

Example 2

In the second example, a prediction is formed by taking the average of the four velocity values and three of the vectors are encoded as prediction errors with respect to that average value.

Further variations to the these examples of the invention are possible.

For example, another transform might be used instead of the 8x8 DCT. For example, a DCT with larger blocks (up to and including the whole picture), a Discrete Fourier Transform (DFT) or any other linear transform known in the art. The spatial transform and the transform used on the vector quartets may be combined in one single linear operation. The values assigned to nonexistent motion vectors may be calculated in such a way as to minimize the energy or entropy of the transform output, or some other estimate of the output bit-rate.

It will be understood that techniques other than spatial transforms and run length coding may be employed to exploit the correlation of motion vectors horizontally and vertically within a picture. It will for example be convenient to employ two dimensional predictive coding of motion vectors The methods so far described exploit correlation between vectors within a picture but do not make any reference to vectors in preceding or subsequent pictures. This is important in an application such as a compressed Information Stream, where the information for each picture is required to stand alone. However, if that independence is not required, the method can work by encoding inter-frame differences between motion vectors, or by some other method involving inter-frame prediction. Again, the vectors can be normalised according to velocity. Where it is desired to use methods that do not involve spatial transforms, the exploitation of motion

vector correlation horizontally, vertically and temporally can be combined in a three-dimensional predictive coding scheme for motion vectors.

The examples of the invention so far described can be used in conjunction with any motion estimation scheme. However, the choice of motion estimation scheme will have a substantial bearing on the efficiency of the invention. For example, some motion estimation schemes based on block matching, as described for example in the MPEG Test Model [ISO/IEC, 1996. Information technology-generic coding of moving pictures and associated audio information: Software simulations. International standard ISO/IEC 13818-5], produce a motion vector field that typically has large variations between adjacent macroblocks. Such schemes can lead to a high motion vector bit rate when the vectors are encoded according to the MPEG standard, and even with the increased efficiency offered by the invention the resulting bit rate might still be relatively high. If, however, a motion estimation scheme that measures more accurately the highly correlated true motion of the picture sequence is used, the bit rate of the compressed motion vectors may be much lower. Such a motion estimation scheme is phase correlation, described in Lau, H and Lyon, D. Motion compensated processing for enhanced slow-motion and standards conversion. IBC, Amsterdam, 1992. lEE Conference Publication no. 358, pp 62-66] and applied to MPEG coding via techniques known as vector tracing and vector refinement [Thomas, G and Dancer, S. Improved motion estimation for MPEG coding within the RACE'COUGAR'project. IBC, Amsterdam, 1995. EE Conference Publication no. 413, pp238-243] There now follows a description of a further improvement to the invention, which is particularly appropriate when a motion estimation scheme such as phase correlation is used. The first part of the phase correlation technique is to generate a small"menu"containing only two or three different velocities, from which the velocities for each pixel in a large region of the image are selected. These velocities are then converted to vectors suitable for MPEG encoding by the vector tracing and vector refinement processes. It follows that the MPEG vectors will usually be clustered around a few distinct velocities corresponding to the original phase correlation menus.

The improved method uses the well-known technique of vector quantisation to encode suitable representative velocities, followed by the DCT technique described above to encode residual errors between the representative vectors and the actual vectors.

A block diagram of the improved technique is shown in Figure 3: For each picture or picture region, a set of representative vectors is calculated. In the general case, this is done using the input vectors themselves, using a vector quantization codebook generation technique such as the Linde-Buzo-Grey algorithm [Y. Linde, A. Buzo and R. M. Gray, "An algorithm for vector quantizer design", IEEE Trans. on Commun., Vol. COM28, No. 1, pp. 84.95, January 1980.]. In the specific case that the vectors are known to originate from phase correlation, the menu vectors themselves can be used in the calculation of representative vectors.

Each vector is then compared to the set of representative vectors and the nearest one chosen. The output of this stage is typically constant over objects or regions in the picture, and hence may efficiently be encoded using a run-length encoding technique. Meanwhile, in order to achieve lossless encoding of vectors, the selected representative vectors are subtracted from the actual vectors and the resulting errors passed through the existing coding algorithm.

It will be recognised that this technique may be employed independently of the spatial transform of motion vectors and will offer particular advantage where the motion measurement technique operates-as in phase correlation-to identify a number of candidate vectors and then to assign one or more of said candidate vectors to specific picture elements.

Those picture elements may be pixels or blocks. The assignment may be preliminary in that actual vectors are refined to increase accuracy. In which case, there is the option of noting any quantization error and additionally coding it such that the scheme remains lossless.

Still other modifications will occur to those skilled in the art.

Claims

CLAIMS 1. A process for compressing motion vectors, which exploits the correlation of motion vectors horizontally within a picture and in at least one further dimension.
2. A process according to Claim 1, which exploits the correlation of motion vectors horizontally and vertically within a picture
3. A process for reducing the average data rate of motion vector information, in which the components of the motion vectors are separately or in combination compressed losslessly using a spatial transform followed by a variable-length encoder.
4. A process according to Claim 3, in which the transform is a Discrete Cosine Transform.
5. A process according to Claim 3 or Claim 4, in which a set of motion vectors is associated with a macroblock of an image and in which the vectors within the set are translated to equivalent velocity values and passed through a linear transform.
6. A process according to Claim 5, in which the linear transform is equivalent to forming a prediction and a set of prediction errors.
7. A process according to Claim 5 or Claim 6, in which calculated values are substituted for vectors that do not appear at the input to the process.
8. A process according to Claim 7, in which the calculation is an average of neighbouring values.
9. A process according to Claim 7, in which the calculation follows an algorithm that minimizes a measure of the output bit rate.
10. A process according to any of the preceding claims, in which the input is the difference between a motion vector and a selected representative vector and in which the representative vectors are separately encoded.
11. A process according to Claim 10, in which the representative vectors are calculated as a function of the input vectors.
12. A process according to Claim 11, in which the representative vectors are calculated as a function of externally provided information such as a vector menu.
13. A method of compressing motion vectors which derive from a process of motion measurement which comprises the identification of a number of candidate vectors and the preliminary assignment of one or more of said candidate vectors to specific picture elements, the method of compressing comprising the steps of defining said candidate vectors as a set of representative values and quantizing the assigned vectors with reference to said set of representative values.
14. A method according to Claim 13, comprising the further step of run length coding.
15. A method according to Claim 13 or Claim 14, comprising the further steps of noting any error in the quantization of assigned vectors with reference to said set of representative values and additionally coding any such quantization errors.