WO2002078327A1

WO2002078327A1 - Method, system, computer program and computer memory means for stabilising video image

Info

Publication number: WO2002078327A1
Application number: PCT/FI2002/000259
Authority: WO
Inventors: Jarno Tulkki
Original assignee: Hantro Products Oy
Priority date: 2001-03-27
Filing date: 2002-03-26
Publication date: 2002-10-03
Also published as: FI20010629A0; FI109395B

Abstract

The invention relates to a method, a system, a computer program and computer memory means for stabilising video image. The method comprises: filming (702) individual images forming video image with a camera by utilising a filming scene which is larger than the image scene used in the images; predicting (708) that the camera motion between the current image and the next image continues in the same direction and is of the same magnitude as was determined for the camera motion between the previous and the current image; and compensating (710) the camera motion between the current image and the next image by shifting the image scene within the filming scene in the next image in a direction opposite to the predicted motion so that the shift magnitude corresponds to the prediction.

Description

METHOD, SYSTEM, COMPUTER PROGRAM AND COMPUTER MEMORY MEANS FOR STABILISING VIDEO IMAGE

FIELD

[0001] The invention relates to a method, a system, a computer program and computer memory means for stabilising video image formed of consecutive still images.

BACKGROUND

[0002] Video image is encoded and decoded in order to reduce the amount of data so that the video image can be stored more efficiently in mem- ory means or transferred using a telecommunication connection. An example of a video coding standard is MPEG-4 (Moving Pictures Expert Group), where the idea is to send video image in real time on a wireless channel. This is a very ambitious aim, as if the image to be sent is for example of cif size (352 x 288 pixels) and the transmission frequency is 15 images per second, then 36.5 million bits should be packed into 64 kilobits each second. The packing ratio would in such a case be extremely high, 570:1. Another typically used image size is qcif size, being 176 x 144 pixels.

[0003] In order to transfer an image, the image is typically divided into image blocks, the size of which is selected to be suitable to the system. The image block information generally comprises information about the brightness, colour and location of an image block in the image itself. The data in the image blocks are compressed block-by-block using a desired coding method. Compression is based on deleting the less significant data. The compression methods are mainly divided into three different categories: spectral redundancy reduction, spatial redundancy reduction and temporal redundancy reduction. Typically various combinations of these methods are employed for compression.

[0004] In order to reduce spectral redundancy a YUV colour model is for instance applied. The YUV model takes advantage of the fact that the human eye is more sensitive to the variation in luminance, or brightness, than to the changes in chrominance, or colour. The YUV model comprises one luminance component (Y) and two chrominance components (U, V). The chrominance components can also be referred to as cb and cr components. For example, the size of a luminance block according to the H.263 video cod- ing standard is 16 x 16 pixels, and the size of each chrominance block cover- ing the same area as the luminance block is 8 x 8 pixels. In this standard the combination of one luminance block and two chrominance blocks is referred to as a macro block. The macro blocks are generally read from the image line-byline. Each pixel in both the luminance and chrominance blocks may obtain a value ranging between 0 and 255, meaning that eight bits are required to present one pixel. For example, value 0 of the luminance pixel refers to black and value 255 refers to white.

[0005] What is used to reduce spatial redundancy is for example discrete cosine transform DCT. In discrete cosine transform, the pixel presen- tation in the image block is transformed to a spatial frequency presentation. Furthermore, only the signal frequencies in the image block that are presented therein have high-amplitude coefficients, and the coefficients of the signals that are not shown in the image block are close to zero. The discrete cosine transform is basically a lossless transform and interference is caused to the signal only in quantization.

[0006] Temporal redundancy tends to be reduced by taking advantage of the fact that consecutive images generally resemble one another, and therefore instead of compressing each individual image, the motion data in the image blocks is generated. The basic principle is the following: a previously encoded reference block that is as good as possible is searched for the image block to be encoded, the motion between the reference block and the image block to be encoded is modelled and the calculated motion vectors are sent to the receiver. The difference between the block to be encoded and the reference block is indicated as a prediction error component. A reference picture previously stored in memory can be used in motion vector prediction of the image block. Such a coding is referred to as inter-coding, which means utilizing the similarities between the images in the same image string.

[0007] A discrete cosine transformed block is "quantized", or each element therein is basically divided using a constant. This constant may vary between different macro blocks. The "quantization parameter", from which said dividers are calculated, ranges between 1 and 31. The more zeroes are obtained to the block, the better the block is packed, since zeroes are not sent to the channel. Different coding methods can also be performed to the quantized blocks and finally a bit stream can be formed thereof that is sent to a decoder. An inverse quantization and an inverse discrete cosine transform are performed to the quantized blocks within the encoder, thus forming a reference image, from which the blocks of the following images can be predicted. Hereafter the encoder sends the difference data between the following block and the reference blocks as well as motion vectors. Consequently the packing efficiency improves. After decoding the bit stream and performing the decoding methods a decoder basically carries out the same measures as the encoder when forming a reference image, meaning that similar measures are carried out in the blocks as in the encoder, but inversely.

[0008] Undesired movement of the camera used for filming, caused by shaking of the cameraman's hands, for instance, is a big problem in video filming. Various mechanical and electronic solutions have been designed for stabilising image, since a stable image looks much more pleasant than an image which is swaying. In addition, stable video image is compressed more efficiently than a swaying one.

[0009] US Patent 5 317 685, for instance, describes a solution in which motion vectors are used for detecting camera movements. Pixels are interpolated between the pixels of an image, and so the image can, in a way, be shifted within the image in a direction opposite to that of the motion in order to stabilise the image. However, the problem of the solution is that it causes the image to be zoomed and information on the edges of the image will be left out.

[0010] US Patent 5 973 733 describes a similar solution in which in addition to interpolation, image warping can also be used to stabilise the image. The problem still remains that the image is zoomed in. The patent also states that one solution for stabilising the image is that the camera utilises a filming scene that is larger than the actual image scene, whereupon the stabilisation can be carried out by shifting the image scene within the filming scene in a direction opposite to that of motion of the camera. The problem of this solution is, however, that the CCD (Charged Couple Device), which is bigger than usual and is used to form an image, causes additional manufac- turing costs.

[0011] A problem of the solutions described in the US Patents is also that they require a high calculation efficiency: in a way, motion compensation is performed in real time, meaning that as soon as the motion is detected, it tends to be compensated in the actual image immediately. BRIEF DESCRIPTION

[0012] It is an object of the invention to provide an improved method for stabilising video image, an improved system for stabilising video image, an improved computer program for stabilising video image and improved com- puter memory means for stabilising video image.

[0013] As an aspect of the invention a method according to claim 1 is provided for stabilising video image. A computer program according to claim 6 is also provided as an aspect of the invention. As a further aspect of the invention computer memory means according to claim 7 are provided. As a still further aspect of the invention a system according to claim 8 is provided for stabilising video image. Further preferred embodiments of the invention are disclosed in the dependent claims.

[0014] The invention is based on the idea that coding for compressing video image and utilising temporal redundancy is utilised for detecting camera motion. Camera motion is thus detected fairly easily by performing only a few additional calculations, since the motion vectors have already been calculated for compressing the image. Motion compensation is not performed immediately when the motion is detected, but it is predicted that the motion continues as it is and the compensation is performed for the predicted camera motion. Moreover, instead of interpolation the invention utilises a filming scene within which the motion is compensated and which is larger than the image scene, because the inventor observed that as to the achieved advantage, or a better picture quality, costs caused by a larger filming scene are not significant. Also, if it is generally possible to use different image sizes, e.g. qcif size (176 x 144 pixels) and cif size (352 x 288 pixels), in the camera, the invention does not cause any additional costs, if a smaller image size is used as an image scene within a filming scene formed by a larger image size.

[0015] The invention thus combines elements known from image stabilisation with compression methods used in modern video coding in a novel and inventive manner. What is essential in this combination is to understand the fact that the camera motion need not be compensated immediately, but the predicted camera motion, based on the detected camera motion, is compensated in the next image.

[0016] With the invention, image stabilisation can preferably be implemented in cost-efficient apparatuses providing video image, such as terminals implementing a video telephone call in a mobile telephone system. LIST OF FIGURES

[0017] The preferred embodiments of the invention are described by way of example below with reference to the accompanying drawings, in which: Figure 1 shows apparatuses for encoding and decoding video im- age,

Figure 2 shows in more detail the apparatus for encoding video image,

Figure 3 shows how a still image is divided into blocks, Figure 4 illustrates a filming scene and an image scene; Figures 5A, 5B and 5C illustrate how a moving camera affects the filming scene and the image scene;

Figures 6A, 6B and 6C illustrate the compensation of camera motion;

Figure 7 is a flow chart illustrating a method for stabilising video im- age.

DESCRIPTION OF PREFERRED EMBODIMENTS

[0018] With reference to Figure 1 , apparatuses for encoding and decoding video image are described. The description is simplified, as video encoding is known to those skilled in the art on the basis of standards and textbooks, such as the one incorporated herein by reference, Vasudev Bhaskaran ja Konstantinos Konstantinides: "Image and Video Compressing Standards - Algorithms and Architectures, Second Edition", Kluwer Academic Publishers 1997, chapter 6: "The MPEG video standards".

[0019] The face of a person 100 is filmed using a video camera 102. The camera 102 produces video image of individual consecutive still images, whereof one still image 104 is shown in the Figure. The camera 102 forms a matrix describing the image 104 as pixels, for example as described above, where both luminance and chrominance are provided with specific matrixes. A data flow 106 depicting the image 104 as pixels is next applied to an encoder 108. It is naturally also possible to provide such an apparatus, in which the data flow 106 is applied to the encoder 108, for instance along a data transmission connection or from computer memory means. In such a case, the idea is to compress un-compressed video image 106 using the encoder 108 for instance in order to be forwarded or stored. A video camera 102, or camera, re- fers in this connection to any means known to a person skilled in the art for forming video image.

[0020] The encoder 108 comprises discrete cosine transform means 110 for performing discrete cosine transform for the pixels in each still image 104. A data flow 112 formed using discrete cosine transform is applied to quantization means 114 that carry out quantization using a selected quantization ratio. Other types of coding can also be performed to a quantized data flow 116 that are not further described in this context. The compressed video image formed using the encoder 108 is transferred over a channel 118 to a decoder 120. How the channel 118 is implemented is not described herein, since the different implementation alternatives are apparent for those skilled in the art. The channel 118 may for instance be a fixed or wireless data transmission connection. The channel 118 can also be interpreted as a transmission path, by means of which the video image is stored in memory means, for ex- ample on a laser disc, and by means of which the video image is read from the memory means and processed using the decoder 120.

[0021] The decoder 120 comprises inverse quantization means 122, which are used to decode the quantization performed in the encoder 108. An inverse quantized data flow 124 is next applied to inverse discrete cosine transform means 126, which carry out inverse discrete cosine transform to the pixels in each still image 104. A data flow 128 obtained is then applied through other possible decoding processes onto a display 130, which shows the video image formed of still images 104.

[0022] The encoder 108 and decoder 120 can be placed into differ- ent apparatuses, such as computers, subscriber terminals of various radio systems like mobile stations, or into other apparatuses where video image is to be processed. The encoder 108 and the decoder 120 can also be connected to the same apparatus, which can in such a case be referred to as a video codec. [0023] Since our interest lies in the compression to be carried out in the encoder 108 to reduce temporal redundancy, the encoder 108 is next described in greater detail with reference to Figure 2. The moving video image 106 to be supplied to the encoder 108 is temporarily stored image-by-image in a frame buffer 200. The first image is what is known as an intra image, meaning that no coding is performed thereto to reduce temporal redundancy, even though it is processed using the discrete cosine transform means 110 and the quantization means 114. Intra images can also be sent after the first image, for example if the error component becomes too significant even for the best motion vector.

[0024] When processing the following images, the coding to be carried out to reduce temporal redundancy can be started. Then the previous im- age is inverse quantized using inverse quantization means 206 and an inverse discrete cosine transform is performed thereto using inverse discrete cosine transform means 208. If a motion vector is already calculated for the previous image, then the effect thereof is added to the image using means 210. Thus, the reconstructed previous image is stored in a frame buffer 212, i.e. the previ- ous image is in such a form in which it is found in the decoder 120 after the processing to be performed. There are two frame buffers 200, 212; the present image from the camera is stored in the first 200 one, and the reconstructed previous image is stored in the second 212 one.

[0025] The previous reconstructed image is thereafter applied to a motion estimation block 216 from the frame buffer 212. Likewise, the present image to be encoded is applied to the motion estimation block 216 from the frame buffer 200. Then a search is carried out in the motion estimation block 216 in order to reduce temporal redundancy, where blocks are to be found from the previous image that correspond with the blocks in the present image. The shifts between the blocks are indicated as motion vectors.

[0026] The motion vectors 220, 222 found are applied to a motion compensation block 214 and to a variable-length coder 204. The previous reconstructed image is also applied to the motion compensation block 214 from the frame buffer 212. On the basis of the previous reconstructed image and the motion vector, the motion compensation block 214 is able to send a block 224, 226 found from the previous image to the means 202 and the means 210. The block found in the previous image is reduced from the present image to be encoded in the means 202, or more precisely from at least one block thereof. Then from the present image, or to be more precise from at least one block thereof, an error component remains to be encoded, which is discrete cosine transformed and quantized.

[0027] The variable-length coder 204 thus obtains as input the discrete cosine transformed and quantized error component 116 and a motion vector 222. The output 118 of the encoder 108 thus provides compressed data representing the present image that illustrates the present image in relation to the previous image using a motion vector or motion vectors and an error term or error terms for the presentation. The motion estimation is carried out using luminance blocks, but the error components to be encoded are calculated both for the luminance and chrominance blocks.

[0028] The search for the best motion vector can be carried out as described above using two rounds, although such a process is not described in Figure 2. Hence, when the wide search area of the first round is reduced, the values of the motion vectors within a smaller search area can be more accurately calculated. In general, the sum of absolute differences is calculated with the accuracy of one pixel, but especially in the second round the calculation can be carried out with the accuracy of half a pixel. When the half pixel accuracy is used, fictitious half pixel values are interpolated between the actual pixels.

[0029] In the following, with reference to the flow chart shown in Figure 7, the method for stabilising video image formed of consecutive still im- ages with a camera is described. The method starts from block 700. In block 702, individual images forming video image are filmed with a camera by utilising a filming scene which is larger than the image scene used in the images. This is illustrated in Figure 4. Like in Figure 1 , a person 100 is standing in front of the text "HANTRO OULU". For the sake of clarity, only a region defined by a frame 400 is shown from this reality in Figure 4. A frame 402 defines a filming scene of the camera. An image scene 104 is found inside the filming scene 402.

[0030] In block 704, to compress video image, at least one motion vector is determined in order to detect motion between the block in the previ- ous image and the block in the current image. Figure 3 illustrates how the image is divided into blocks. The leftmost image in Figure 3 shows the qcif-sized image, the size of which is 176 pixels in the horizontal direction and 144 pixels in the vertical direction. The image is further divided into luminance (macro) blocks 300 of 16 x 16 pixels. The number of said blocks in the horizontal direc- tion is 11 and in the vertical direction 9. The size of the chrominance blocks is generally 8 x 8 pixels, but they are not described in Figure 3, as chrominance blocks are not utilized for detecting a motion vector.

[0031] The image in the middle of Figure 3 shows the current image to be encoded, where a block 302 to be encoded is in a position that can be indicated as the coordinates (128, 112) of the leftmost top corner pixel. Then the motion vector can be illustrated as the motion vector of the leftmost top corner pixel. The other pixels in the block naturally also move in the direction of said motion vector. Thus, the origin (0, 0) of the image is the pixel in the left bottom corner of the image.

[0032] The aim of the method described above is to find a motion vector that performs the calculations as efficiently as possible, so that the difference data is nicely packed. The rightmost part in Figure 3 illustrates the previous image, from which a block 304 is found that best corresponds to the block 302 to be encoded in the present image. The block 304 in the previous image, the leftmost top corner pixel thereof being provided with a coordinate position (112, 96), is shifted in the current image to block 302, the leftmost top corner pixel thereof being provided with a coordinate position (128, 112). In video coding terminology the motions are indicated in such a manner that the motion to the right is positive, negative to the left, negative to the top and positive to the bottom. The motion vector 306 is therefore (-16, 16), in other words the motion is sixteen pixels in the X-axis direction and sixteen pixels in the Y- axis direction. As is shown in Figure 3, the block 304 found in the previous reconstructed image is reduced from the block 302 to be encoded in the present image using the means 202, whereby the difference data, or error term 308, between the blocks is obtained. [0033] To compress the image, the operation in block 704 is thus carried out as efficiently as possible. In block 706, the direction and magnitude of the camera motion between the previous image and the current one are determined on the basis of at least one motion vector. Here, all methods known to a person skilled in the art for detecting camera motion by means of motion vectors can be used, such as solutions according to the teachings of the previously mentioned US Patents 5 317 685 and 5 973 733, which are incorporated herein by reference.

[0034] The applicant has also developed a specific method of detecting camera motion, the advantages of which are simplicity and efficiency of the processing. For each macro block, there are a different number of motion vectors. Since the motion vectors of the chrominance blocks are calculated from the motion vectors of the luminance blocks, the motion vectors of the chrominance blocks are not used in calculations carried out for detecting camera motion. A motion vector consists of two components, a horizontal compo- nent and a vertical one. Camera motion is thus compensated separately for the horizontal and vertical component of the motion vector. [0035] In an embodiment, camera motion is detected separately for the horizontal and vertical components, so that the horizontal/vertical components are first divided into three categories according to their magnitude, and then an average value for the category containing most horizontal/vertical components is formed, the average value indicating the magnitude of the camera motion in the direction of the particular component.

[0036] In an embodiment, the division into categories is carried out by first forming an average value of all horizontal/vertical components. Hence, the biggest and the smallest horizontal/vertical component are searched. Fi- nally the horizontal/vertical components are divided into three categories which are based on the average value of all horizontal/vertical components, the biggest horizontal/vertical component and the smallest horizontal/vertical component.

[0037] Mathematically, camera motion is detected separately for the horizontal and the vertical components in the following manner.

[0038] Horizontal/vertical components of a motion vector are marked with

{v, ,v₂ ,v₃ ,...,v„ } (1 )

An average value is formed ™ = ( ∑ ] / « (2)

The smallest component is selected v_mui = in{v_I ,v₂ ,v₃,...,v_n} (3)

The biggest component is selected

^V _max = ^aX{^{V V} ₂^₃,-, _n} (4) The value range is marked with r = ⁽ Vvinax -v mm )/ / 6 ( V5) /

Three categories are formed

C₀ = {v,|v, < m - r}

where i - \,2,...,n

A category is selected, for which is valid

If I > \c I I^ N (7) where i e {0,1,2}= Q and j e Q/ i And if such a category does not exist, the category C, is selected.

The magnitude of the camera motion is formed: if |C, | = 0α«-i |C₀| =|C₂| then component gmv = 0 (8) otherwise component gmv = _{/ ( t} \^v _k ^e C, }/|C,

[0039] This algorithm can also be illustrated by a pseudo code which resembles the C programming language and is presented in Appendix 1. [0040] The value for the motion vector of the intra-encoded image is zero. If the motion vectors are formed with the accuracy of half a pixel, the values of the motion vectors are divided by two when the camera motion is determined. [0041] In block 708, it is then predicted that the camera motion between the current image and the next image continues in the same direction and is of the same magnitude as was determined for the camera motion between the previous and the current image.

[0042] In block 710, the camera motion is then compensated be- tween the current image and the next image by shifting the image scene 104 within the filming scene 402 in the next image in a direction opposite to the predicted motion so that the shift magnitude corresponds to the prediction. Figures 5A, 5B and 5C illustrate the effect of the camera motion on the filming scene 402 and the image scene 104. Figure 5A illustrates the situation at the beginning, i.e. the first image, in which the image scene 502 is inside and in the middle of the filming scene 500. Figure 5B illustrates the second image, in which the camera has moved to the right in the direction of arrow 504, and the person to be filmed has shifted to the left side of the image scene 508 and also of the filming scene 506. Figure 5C illustrates the third image, in which the camera has moved further to the right according to arrow 510. The person has disturbingly shifted to the left side in the image scene 514, as both the image scene 514 and the filming scene 512 have shifted to the right according to arrow 510. Figures 5A, 5B and 5C thus illustrate how the image is impaired due to the unintended camera motion, if no motion compensation is available. [0043] Figures 6A, 6B and 6C illustrate the compensation of camera motion by employing the shown method. The content of Figures 6A and 6B corresponds to the content of Figures 5A and 5B. On the basis of the motion vectors, the direction and magnitude of the camera motion between the previous image 502 in Figure 6A and the current image 508 in Figure 6B are calcu- lated. For the sake of simplicity, our example only includes the horizontal camera motion 504. It is predicted according to the method that the camera motion between the current image 508 and the next image 600 continues to have the same magnitude and direction as between the previous image 502 and the current image 508. The camera motion 510 is compensated between the current image 508 and the next image 600 by shifting the image scene 600 within the filming scene 512 in the next image 600 in a direction opposite to the predicted motion, so that the shift magnitude corresponds to the prediction. Comparing Figures 5C and 6C, it is detected that by using the compensation, the person to be filmed and the text behind him have not shifted further to the side. It is true that motion has taken place in the previous images, because the motion is not compensated immediately but in the next image on the basis of the prediction. However, the method the Applicant has used according to the tests improves the picture quality considerably, when compared to not using the compensation at all.

[0044] The algorithm executing the compensation is also illustrated by a pseudo code which resembles the C programming language and is found in Appendix 2.

[0045] The described method is carried out in the encoder de- scribed in Figure 2 using motion estimation means 216 and motion stabilisation means 230, which can briefly be referred to as processing means 216, 230. The motion stabilisation means 230 thus inform the frame buffer 200 where in the filming scene 402 the image scene 104 to be used will be taken.

[0046] The processing means 216, 230 can be implemented as a computer program operating in the processor, whereby for instance each required operation is implemented as a specific program module. The computer program thus comprises the routines for implementing the steps of the method. In order to promote the sales of the computer program, said program can be stored into the computer memory means, such as a CD-ROM (Compact Disc Read Only Memory). The computer program can be designed so as to operate also in a standard general-purpose personal computer, in a portable computer, in a computer network server or in another prior art computer.

[0047] The processing means 216, 230 can be implemented also as an equipment solution, for example as one or more application specific inte- grated circuits (ASIC) or as operation logic composed of discrete components. When selecting the way to implement the means, a person skilled in the art observes for example the required processing power and the manufacturing costs. Different hybrid implementations formed of software and equipment are also possible.

[0048] The motion stabilisation can also be illustrated by another simple example. It is assumed that the camera is only moving in the horizontal direction. In the first frame illustrated in Table 1 , the filming scene of the camera contains the entire text "HANTRO OULU". The point marked with a cross, i.e. "H" in a position 0, is the image scene.

Table 1 : First frame

[0049] In the second frame illustrated in Table 2, the camera has moved to the right, whereby the image scene has also shifted to the right, but the image scene is still taken from the position 0.

Table 2: Second frame

[0050] A horizontal motion to the right, having the magnitude of one position, has thus been detected between the first and the second frame. It is predicted that the motion continues to have the same magnitude when it proceeds to the third frame, whereby a compensation is carried out in the manner described in Table 3: the image scene is taken from the position -1 , which is towards the motion that has the magnitude of one position and is predicted to be directed perpendicularly to the right. As can be seen from the image scene, the direction of the motion is, however, accelerated to the right so that it has the magnitude of the two positions.

Table 3: Third frame [0051] In the next frame, the filming scene would thus be taken from the position -3 in order to compensate the motion.

[0052] In principle, after the encoding of the video image has been started, the first frame in which gmv differs from zero is a frame in which the stabilisation of the video image can be started to be applied to the next frames. The camera moves in the same direction at which gmv points. The motion speed is the gmv length. Hence, it can be predicted that the movement continues to have the observed direction and the observed magnitude. The point from which the frame is taken is thus changed in the buffer. Let a variable bp represent a place in the buffer, and the value zero of the bp means that the frame is taken from the middle of the buffer, whereby the image scene is in the middle of the filming scene. A variable add is also used, which is calculated for each frame by using the gmv add=(add+gmv)/2 (9) [0053] The variable add denotes the speed of the camera motion, and the average value and the gmv are calculated to steady the predicted motion. The buffer point from which the image scene is taken within the filming scene is thus bp=bp-add (10) [0054] Even though the invention has above been described with reference to the example in the accompanying drawings, it is apparent that the invention is not restricted thereto but can be modified in various ways within the scope of the inventive idea disclosed in the attached claims. It is apparent for those skilled in the art that various known coding methods can be combined with the basic solution described in order to achieve the desired coding efficiency and quality.

APPENDIX 1 : PSEUDO CODE FOR DETECTING CAMERA MOTION

for(i=0; i<99; i++) /^* 99 components ^*/

{ ka += mv[i]; /* Calculate sum of all components 7

} if(ka>0) ka = (ka)/99; /* Calculate average value 7

if(ka<0) ka = (ka)/99;

min=mv[0]; /* Find min and max of components 7 max=mv[0];

for(i=0; i<99; i++)

{ if(mv[i]<min) min = mv[i]; if(mv[i]>max) max = mv[i];

} rang=(max-min)/6; /^* Determine value range 7 for(i=0; i<99; i++) /* Divide components into three categories 7

{ if(mv[i]<=(ka-rang)) category[i] = 0; else if(mv[i]>=ka+rang) category[i] = 2; else categoryfi] = 1 ;

} for(i=0; i<99; i++) /^* Calculate sums and magnitudes of categories 7

{ if(category[i] == 0) { sum0+=mv[i]; n0++;

} if(category[i] == 1 )

{ sum1+=mv[i]; n1++;

} if(category[i] == 2)

{ sum2+=mv[i]; n2++;

} } n=n1; /* Select biggest category 7 sum=sum1; if(n0>n)

{ sum=sum0; n=nO;

} if(n2>n)

{ sum=sum2; n=n2;

} if(n1==0&&n0==n2)

{ sum=0;

} else

{ sum=sum/n; /* Calculate camera motion 7 } gm v=sum; APPENDIX 2: PSEUDO CODE FOR COMPENSATING CAMERA MOTION

List of variables: variable type purpose i int 0 for horizontal and 1 for vertical prediction buffer_place[i] int This is calculated gmv[i] int General motion vector for rows and columns scale[i] int Motion limits add[i] int Motion speed

for(i=0; i<2; i++) /* Separate prediction for rows and columns 7

{ gmv[i] = General_motion_vector(i); /^Calculate gmv with appendix 1 function7

add[i] = (add[i] + gmv[i]);

if(add[i]<0) /^* Average value 7 add[i]=(add[i]-1 )/2; if(add[i]>0) add[i]=( add[i]+1 )/2;

buffer_place = buffer_place - add[i];

if(buffer_place > scale[i]) /^* Keep motion in limits 7 { buffer_place += add[i]; add[i]=0;

} if(buffer_place < -scale[i]) { buffer_place += add[i]; add[i]=0;

} buffer_place[i] = buffer_place; }

Claims

1. A method for stabilising video image produced by a camera and formed of consecutive still images comprising the steps of: to compress video image, determining (704) at least one motion vector in order to detect motion between the block in the previous image and the block in the current image; determining (706) the direction and magnitude of the camera motion between the previous image and the current image on the basis of at least one motion vector; characterized by filming (702) individual images forming video image with a camera by utilising a filming scene which is larger than the image scene used in the images; predicting (708) that the camera motion between the current image and the next image continues in the same direction and is of the same magnitude as was determined for the camera motion between the previous and the current image; and compensating (710) the camera motion between the current image and the next image by shifting the image scene within the filming scene in the next image in a direction opposite to the predicted motion so that the shift magnitude corresponds to the prediction.

2. A method as claimed in claim 1, characterized in that the camera motion is compensated separately for the horizontal and vertical components of the motion vector.

3. A method as claimed in claim 2, characterized in that the camera motion is detected separately for the horizontal and vertical components in the following manner: dividing the horizontal/vertical components into three categories according to their magnitude; forming an average value for the category containing most horizontal/vertical components, the average value indicating the magnitude of the camera motion in the direction of the particular component.

4. A method as claimed in claim 3, characterized in that the division into categories is performed in the following manner: forming an average value of all horizontal/vertical components; searching the biggest and the smallest horizontal/vertical component; dividing the horizontal/vertical components into three categories which are based on the average value of all horizontal/vertical components, the biggest horizontal/vertical component and the smallest horizontal/vertical component.

5. A method as claimed in claim 4, characterized in that the camera motion is detected separately for the horizontal and vertical components in the following manner: - horizontal/vertical components of a motion vector are marked with

{v₁,v₂,v₃,...,v„}

- an average value is formed m = / n

- the smallest component is selected v_raιn = min{v,,v₂,v₃,...,v_n}

- the biggest component is selected v_max = max{v,,v₂,v₃,...,v_n} - the value range is marked with r = (v_max - v_mιn) / 6

- three categories are formed

C₀ = {v,|v, ≤m-r}

C₂ ={v,|v, >m + r} where i = \,2,...,n

- a category is selected, for which is valid

I 'I I /| where i e {θ,l,2}= Q and j Q/i and if such a category does not exist, the category C, is selected

- the magnitude of the camera motion is formed:

then component gmv = 0 otherwise component gmv = ∑χ_k v_t e C, }/|C, |

6. A computer program as claimed in any one of the preceding claims, characterized in that it comprises routines for implementing the steps of the method.

7. Computer memory means as claimed in claim 6, characterize d in that it comprises the computer program as claimed in claim 6.

8. A system for stabilising video image produced by a camera and formed of consecutive still images, comprising a camera (102) for producing video image of individual consecutive still images and processing means (216, 230) which are arranged to: for compressing video image, determine at least one motion vector in order to detect motion between the block in the previous image and the block in the current image; determine the direction and magnitude of the camera motion between the previous image and the current image on the basis of at least one motion vector; characterized in that the camera (102) is arranged to film individual images forming video image by utilising a filming scene which is larger than the image scene used in the images; the processing means (230) are arranged to predict that the camera (102) motion between the current image and the next image continues in the same direction and is of the same magnitude as was determined for the camera (102) motion between the previous and the current image; and the processing means (230) are arranged to compensate the camera (102) motion between the current image and the next image by shifting the image scene within the filming scene in the next image in a direction opposite to the predicted motion so that the shift magnitude corresponds to the prediction.

9. A system as claimed in claim 8, characterized in that the processing means (230) are arranged to compensate the camera motion sepa- rately for the horizontal and vertical components of the motion vector.

10. A system as claimed in claim 9, characterized in that the processing means (230) are arranged to detect the camera motion separately for the horizontal and vertical components in the following manner: dividing the horizontal/vertical components into three categories ac- cording to their magnitude; forming an average value for the category containing most horizontal/vertical components, the average value indicating the magnitude of the camera motion in the direction of the particular component.

11. A system as claimed in claim 10, characterized in that the processing means (230) are arranged to perform the division into categories is performed in the following manner: forming an average value of all horizontal/vertical components; searching the biggest and the smallest horizontal/vertical component; dividing the horizontal/vertical components into three categories which are based on the average value of all horizontal/vertical components, the biggest horizontal/vertical component and the smallest horizontal/vertical component.

12. A system as claimed in claim 11 , c h a r a c t e r i z e d in that the processing means (230) are arranged to detect the camera motion is de- tected separately for the horizontal and vertical components in the following manner:

- horizontal/vertical components of a motion vector are marked with {v, ,v₂ ,v₃ v„ }

- an average value is formed m = ∑v , / n

- the smallest component is selected v_mm = min{v, ,v₂ ,v₃,...,v_π}

- the biggest component is selected v_max = max{v₁,v₂,v₃,...,v_n)

- the value range is marked with r = (v_max - v_mιn) / 6

- three categories are formed

C₀ = {v,|v, ≤ m - r)

C, = |v₍ |m - r < v, < m + r} C₂ = |v,|v₍ > m + r} where i = \,2,...,n - a category is selected, for which is valid l <l N where i e {θ,l,2}= Q and j e Q/i and if such a category does not exist, the category C, is selected

- the magnitude of the camera motion is formed:

then component gmv = 0 otherwise component gmv = 2_, \_k \^v _k ^G C₍ }/|C, |