WO2010032229A1

WO2010032229A1 - Frame rate up conversion method and system

Info

Publication number: WO2010032229A1
Application number: PCT/IB2009/055097
Authority: WO
Inventors: Ronggang Wang; Yongbing Zhang; Stéphane PATEUX
Original assignee: France Telecom SA
Current assignee: Orange SA
Priority date: 2008-09-22
Filing date: 2009-09-22
Publication date: 2010-03-25
Anticipated expiration: 2011-03-22

Abstract

A method for adding frames in a digital video flow of frames, the added frames comprising a first predicted frame added between a first and a second frames of the digital video flow, and a second predicted frame added between a second and a third frames of said digital video flow, the method further comprising the acts of: a) initializing an intermediate estimate of the first and second predicted frames, and for each intermediate estimate of the first and second predicted frames: b1 ) defining a new intermediate estimate of the first predicted frame using a first interpolation function applied to the first and second frames, b2) defining a new intermediate estimate of the second predicted frame using a second interpolation function applied to the second and third frames, b3) defining a prediction of the second frame with a third interpolation function applied to the intermediate estimate of the first and second predicted frames, b4) optimizing the interpolation functions while minimizing the distortion between the new intermediate estimate and the intermediate estimate of the predicted frames and between the second frame and its prediction, b5) computing the intermediate estimates of the first and second predicted frames using the optimized first and second interpolation functions, the first and second predicted frames being set to the intermediate estimates of the first and second predicted frames when the distortion is smaller than a given threshold or a given number of intermediate estimates of the first and second predicted frames has been computed.

Description

FRAME RATE UP CONVERSION METHOD AND SYSTEM

Field of the Invention

The present invention relates in general to image processing and more specifically to Frame Rate Up-Conversion (FRUC) image processing.

Background of the Invention

In image processing, Frame Rate Up-Conversion (FRUC) is a widely investigated technique to up-convert the temporal resolution of a media sequence (e.g. a video), i.e. to increase the number of frames (or pictures or images) in a media sequence in order to increase its resolution. FRUC has many applications, among which the most practical one is to enhance the visual quality of low bit rate coded video. In low bit rate coding applications, it is possible to reduce the frame rate to the half or even lower ratio of the original frame rate, and then only encode the low rate frames with better visual quality. When decoding the media, FRUC may be performed to restore the loss of temporal resolution with higher visual quality. One of the most common cases is the double up-conversion of the frame rate, an example of which is illustrated on Figure 1 , where for instance, the original frame rate is 15 fps (frames per second), and the up-converted frame rate is 30 fps. The added frames are interpolated using the redundancies between successive frames along the temporal axis.

Besides the low bit rate coding, FRUC is also applicable for instance to Phase Alternating Line-National Television System Committee (PAL-NTSC) conversion and complex video editing. In addition, FRUC may be of great help for slow-motion playback and the rate allocation policy of scalable video coding schemes.

Numerous FRUC algorithms have been developed to convert frame rates. A simple and straightforward FRUC algorithm consists in combining adjacent video frames without taking object motion into account. The object motion is the change of a given object, captured from a scene, from one frame to another (i.e. between adjacent frames). Examples of such a simple FRUC method are e.g. frame repetition or frame averaging. However, such methods work well only if there is little or no motion between successive or adjacent frames. As the intensity of motion increases, the frame repetition method will cause jerkiness and the resulting image will look choppy and/or not smooth. When applying a frame averaging method, images will look blurry when motion occurs, and sometimes ghost artifacts may also be perceived.

As opposed to frame repetition and frame averaging, another kind of FRUC method, called a MC (Motion Compensation)-FRUC, is based on Motion Estimation (ME) and performs frame interpolation along the motion trajectory of an object from frame to frame to achieve better visual quality. This method uses motion vectors, i.e. vectors used for inter-prediction that provides an offset from the coordinates in the decoded picture to the coordinates in a reference picture. Since motion plays a significant role in MC-FRUC, many algorithms have been developed to derive accurate motion vectors. For example, the Block-Matching Algorithm (BMA) detailed in "A method for motion adaptive frame rate up-conversion" R. Castagno, P. Haavisto, and G. Ramponi, IEEE Trans, on Circuits and System Video Technology, vol. 6, No. 5, pp. 436-446, Oct. 1996 and "Adaptive MC interpolation for frame rate up-conversion" S.-H. Lee, Y.-C. Shin, S.-J. Yang, H.-H. Moon and R.-H. Park, IEEE Trans, on Consumer Electron., vol. 48, No. 3, pp. 444-450, Aug. 2002, has a broad application in MC-FRUC. Since motion vectors derived by BMA are often not accurate enough, several approaches for more faithful motion estimation, described in "New frame rate up-conversion using bi-directional motion estimation", B. T. Choi, S. H. Lee, and S. J. Ko, IEEE Trans, on Consumer Electron., vol. 46, No. 3, pp. 603-609, Aug. 2000 and "Motion compensated frame interpolation based on H.264 decoder", Z. Gan, L. Qi, and X. Zhu, Electronics Letters., vol. 43, No. 2, pp. 96-98, Jan. 2007, have also been proposed in recent works. One of these FRUC algorithms "New frame rate up-conversion using bi-directional motion estimation", B. T. Choi, S. H. Lee, and S. J. Ko, IEEE Trans, on Consumer Electron., vol. 46, No. 3, pp. 603-609, Aug. 2000, uses a bi-directional ME to derive more faithful motion vectors. In addition, a hierarchical MC technique was proposed to achieve better visual quality in "Hierarchical motion compensated frame rate up-conversion based on the Gaussian/Laplacian pyramid", G. I. Lee, B. W. Jeon. R. H. Park, and S. H. Lee, Proc. IEEE Int. Conf. Consumer Electronics, 2003, pp. 350-351 . Both methods achieve better performance than BMA does. However, they are all based on the assumption of a translational motion with constant velocity, which is not always true. Furthermore, constant acceleration was exploited to derive more reliable motion trajectories. However, the assumption of a constant acceleration also does not always hold true for all the regions of an image, e.g. for the regions of non-rigid objects (i.e. for instance clouds and waves which change their forms/shape between images). As opposed to its use in video coding where it is used with actual frames,

Motion Estimation may be performed for FRUC with the absence of actual frames, i.e. it allows creating some frames between and from existing ones. However, as the derived motion trajectory may not be consistent sometimes, a median filter is usually applied to the motion vector field (i.e. the field of motion vectors of all the blocks) after performing ME, in which the motion vector field needs to go through the median filter operation in e.g. a 3x3 window (or current block and its 8 neighboring blocks which are located in a 3x3 window) before MC. However, for the areas of small objects, irregular shaped objects, and object boundaries, fixed size block MC usually does not work well. To avoid this situation, variable size block MC was proposed in object boundaries to reconstruct edge information with a higher quality. Furthermore, a method called Overlapped Block MC (OBMC) may be applied to suppress the blocking artifacts, which are usually observed when a block has a significant different motion vector compared with its neighboring blocks. However, OBMC may sometimes over-smooth the edges of the image and degrade the image quality. To reduce the over-smoothing effect of OBMC, an Adaptive OBMC (AOBMC) method was proposed, in which the coefficients of the OBMC are adjusted based on the reliability of neighboring motion vectors. However, AOBMC still has poor ability to represent some complex motions, such as zooming, rotation, and local deformation.

Today there is a need for a FRUC solution that can be easily implemented on the existing communication infrastructures, overcoming the drawbacks of the prior art. Summary of Invention

It is an object of the present system to overcome disadvantages and/or make improvement over the prior art. To that extend, the invention proposes a method A method for adding frames in a digital video flow of frames, the added frames comprising a first predicted frame added between a first and a second frames of the digital video flow, and a second predicted frame added between a second and a third frames of said digital video flow, the method further comprising the acts of: a) initializing an intermediate estimate of the first and second predicted frames, and for each intermediate estimate of the first and second predicted frames: b1 ) defining a new intermediate estimate of the first predicted frame using a first interpolation function applied to the first and second frames, b2) defining a new intermediate estimate of the second predicted frame using a second interpolation function applied to the second and third frames, b3) defining a prediction of the second frame with a third interpolation function applied to the intermediate estimate of the first and second predicted frames, b4) optimizing the interpolation functions while minimizing the distortion between the new intermediate estimate and the intermediate estimate of the predicted frames and between the second frame and its prediction, b5) computing the intermediate estimates of the first and second predicted frames using the optimized first and second interpolation functions, the first and second predicted frames being set to the intermediate estimates of the first and second predicted frames when the distortion is smaller than a given threshold or a given number of intermediate estimates of the first and second predicted frames has been computed. The invention also relates to a system for adding frames in a digital video flow of frames, the added frames comprising a first predicted frame added between a first and a second frames of the digital video flow, and a second predicted frame added between a second and a third frames of said digital video flow, said system comprising:

- an emitting device operable to send said digital video flow,

- a receiving device operable to: a) initialize an intermediate estimate of the first and second predicted frames, and for each intermediate estimate of the first and second predicted frames: b1 ) define a new intermediate estimate of the first predicted frame using a first interpolation function applied to the first and second frames, b2) define a new intermediate estimate of the second predicted frame using a second interpolation function applied to the second and third frames, b3) define a prediction of the second frame with a third interpolation function applied to the intermediate estimate of the first and second predicted frames, b4) optimize the interpolation functions while minimizing the distortion between the new intermediate estimate and the intermediate estimate of the predicted frames and between the second frame and its prediction, b5) compute the intermediate estimates of the first and second predicted frames using the optimized first and second interpolation functions, the first and second predicted frames being set to the intermediate estimates of the first and second predicted frames when the distortion is smaller than a given threshold or a given number of intermediate estimates of the first and second predicted frames has been computed; - a network operable to communicate between the emitting device and the receiving device.

The invention also relates to an interpolating device for adding frames in a digital video flow of frames, the added frames comprising a first predicted frame added between a first and a second frames of the digital video flow, and a second predicted frame added between a second and a third frames of the digital video flow, said device being operable to receive said digital video flow of frames, the device being further operable to: a) initialize an intermediate estimate of the first and second predicted frames, and for each intermediate estimate of the first and second predicted frames: b1 ) define a new intermediate estimate of the first predicted frame using a first interpolation function applied to the first and second frames, b2) define a new intermediate estimate of the second predicted frame using a second interpolation function applied to the second and third frames, b3) define a prediction of the second frame with a third interpolation function applied to the intermediate estimate of the first and second predicted frames, b4) optimize the interpolation functions while minimizing the distortion between the new intermediate estimate and the intermediate estimate of the predicted frames and between the second frame and its prediction, b5) compute the intermediate estimates of the first and second predicted frames using the optimized first and second interpolation functions, the first and second predicted frames being set to the intermediate estimates of the first and second predicted frames when the distortion is smaller than a given threshold or a given number of intermediate estimates of the first and second predicted frames has been computed.

The invention also relates to a computer program providing computer executable instructions stored on a computer readable medium, which when loaded on to a data processor causes the data processor to perform a method for adding frames in a digital video flow of frames according to claims 1 to 3.

The method according to the invention proposes a model for improving the quality of interpolated frames in FRUC. The present method could be described as a Spatiotemporal Auto-Regressive (STAR) model. Indeed, the proposed method exploits the correlations or similarities (of pixels) between adjacent frames (i.e in the temporal domain) and within a same frame (i.e. spatial domain).

The method according to the invention is able to adaptively exploit the redundancies both in the spatial and temporal domains by tuning the interpolation functions (i.e. its interpolation weights) in order to optimize them, making thus interpolated frames more reliable. Indeed, the STAR model allows learning motion information between successive frames by adjusting the interpolation weights adaptively according to the characteristics of the pixels within a local spatiotemporal neighborhood.

Another advantage of the method according to the invention is that it achieves sub-pixel (i.e. pixels located between the space of integer pixels) accuracy with spatially varying interpolation coefficients or weights. As traditional FRUC algorithms use the same interpolation coefficients for all the frames of a sequence, it is not possible to consider non-stationary statistical properties of video signals (i.e. frames or areas with different motion intensity, for example, the boundary of motion objects, where one part of the object is stationary and the other part is moving, or the boundary of foreground and background regions, or the covered and uncovered regions between successive frames). In contrast, the STAR model is able to tune its interpolation coefficients adaptively to match non-stationary statistical properties of video signals.

Brief Description of the Drawings

Embodiments of the present invention will now be described solely by way of example and only with reference to the accompanying drawings, where like parts are provided with corresponding reference numerals, and in which:

Figure 1 schematically illustrates a simple example of Frame Rate Up- Conversion (FRUC);

Figure 2 schematically illustrates a system according to an embodiment of the present invention;

Figure 3 schematically illustrates an example of pixels interpolation according to an embodiment of the present invention; Figure 4 schematically illustrates the method according to an embodiment of the present invention;

Figure 5 schematically illustrates the self-feedback weight training method according to an embodiment of the present invention; Figure 6 schematically illustrates an example of pixels interpolation according to an embodiment of the present invention;

Figure 7 schematically illustrates an example of Motion Compensation Interpolation (MCI) method according to an embodiment of the present invention

Figure 8 schematically illustrates a simulation according to an embodiment of the present invention.

Description of Preferred Embodiments

The method according to the invention proposes a model for improving the quality of interpolated frames in FRUC, and more specifically a Spatiotemporal Auto-Regressive (STAR) model.

In the STAR model, each pixel is approximated as a weighted linear combination or interpolation (through interpolation functions) of pixels within a spatial neighborhood in the current frame as well as a weighted linear combination of pixels within a temporal neighborhood in the previous and following frames. Due to the absence of some actual pixels in the missing frames and in order to improve the accuracy of the model, the method according to the invention proposes an iterative self-feedback weight training algorithm to optimize the interpolation weights (i.e. the coefficients) of the linear combinations. An initialization step allows interpolating at least one pixel in a given missing frame to be interpolated, hereafter called first and second predicted frames. Then, in each iteration, pixels in said given missing frame are interpolated using pixels in the previous and following frames as well as available pixels in the given missing frame. Actual original low rate frames are also predicted (i.e. estimated) using the same interpolation weights in the same iteration. Finally, after a pre-set number of iterations or when the distortion between pixels from one iteration to another is minimized (i.e. under a given threshold), then the optimal interpolation weights are derived. The distortion is calculated between the pixels approximated using the interpolation weights in the current iteration and those in the previous iteration as well as the actual pixels in the original low rate frame.

According to an exemplary embodiment of the present invention, the STAR model is based on a spatiotemporal scheme, which is depicted in Figure 3.

Along the temporal axis, model samples are taken at five successive frames , where F,_₂ , F₁ , and F₁₊₂ denote the successive original low rate frames, respectively called first, second and third frames, F_M and F)₊₁ denote the interpolated or added frames at time instance M and f+1, respectively referred to as first and second predicted frames.

The pixel value to be interpolated at spatial location (m,n) of a current frame, the current frame being either F₍__{ or F,₊₁ , depends on pixel samples within a temporal neighbourhood taken in previous and following frames as well as pixels available within a spatial neighbourhood of said current frame. When the current frame is F_M , the previous and following frames are respectively the first frame F,_, and the second frame F, . When the current frames is F₁₊₁ , the previous and following frames are respectively the second frame F₁ and the third frame F_r+2.

The temporal neighbourhood or N_T can be defined as a region taken in the previous or following frames to the current frame. The spatial neighbourhood or Ns can be defined as a region taken in the current frame where pixels are already available as they have been previously defined.

The size of the temporal neighbourhood N_τ in the previous and following frames may defined as a (2Z + l)χ(2L + l) square bounded by (m±L,n ±L) . To be consistent with N_T, the size of the spatial neighborhood Ns may be chosen as being of the same size as N_T or smaller.

Parameter L is referred to as the spatiotemporal order of STAR model.

Figure 3 shows an exemplary illustration of the STAR model using a spatiotemporal order L-I . The method according to the invention may be performed on a block within the frame, block defined as the supporting region S of the STAR model. It may correspond for example to a block of pixels (for instance 16x16 or 32x32 pixel IU

blocks) or to the whole picture. As seen hereafter within a block, the interpolation weights are identical per iteration and over NT or Ns. In other words, for every (m,n) in the supporting region, the contribution of the pixels of NT or Ns in the first to third frames only depends upon their distance to (m,n).

Figure 2 describes an illustrative embodiment of the system according to the invention.

A self-feedback weight training unit 200 allows deriving the optimal interpolation weights by iteration. Each pixel is approximated as a weighted linear combination of pixels (through interpolation functions) within a spatial neighborhood in the current frame as well as a weighted linear combination of pixels within a temporal neighborhood in the previous and following frames. At the end of each iteration, optimal interpolation weights are derived. The self-feedback weight training process is later described in detail hereunder with reference to Figure 5. After getting the optimal interpolation weights, a first unit 205 allows interpolating the missing frame at time instance M and a second unit 210 allows interpolating the missing frame at time instance f+1.

Figure 4 describes an illustrative embodiment of the method according to the invention.

According to the spatial resolution of the input sequences (i.e. frame resolution), the size of the supporting region is first determined in an act 400. The supporting region is the area of a frame on which the method according to the invention may be performed. According to an illustrative embodiment of the present invention, the determination process may be as follows: for example, if the resolution is QCIF (Quarter Common Intermediate Format), then the size of the supporting region may be set to be 16x16, else if the resolution is larger than QCIF, the size of the supporting region is set to be 32x32. Then, as long as the frames interpolation process is not completed then (act 410) the spatiotemporal order L of the STAR model is computed in an act 420 as

L

(1) where S represents the supporting region, [• J is the floor function, which maps the motion vector mv, of the /th block (i.e. the block used to interpolate the to-be- interpolated frame by the MCI method, in other words it means the blocks used to generate F_t__x and F₁₊₁ as shown in equation (7)) in S to the full-pixel position. After the supporting region and the spatiotemporal order have been determined, the self-feedback weight training process is performed in an act

430 to derive the optimal interpolation weights. Finally, the missing frames F,_, and F₁₊₁ are interpolated by STAR interpolation using the optimal weights in an act 440.

Figure 5 describes an illustrative embodiment of the self-feedback weighting training process of act 430 according to the invention. This training process is an iterative process. In the hereafter description, the supporting region S or block is assumed to be the whole frame. The present teaching may be easily transposed by the man skilled in the art to a supporting region taken in a frame.

Along the temporal axis, the three successive first, second and third frames of the original low rate frames (respectively F,_₂ , F₁ and F₁₊₂ ) are selected in an act 500.

'-' and '^+l denote intermediate values or estimates for the first and second predicted frames for the i^th iteration. /^' represents the iteration counter and ^' represents the optimal weight vector after the Z"¹ iteration.

Before iterations begin, a preliminary act 510 of initialization is performed. p⁰ p⁰

The act 510 allows initializing '-' and '*' . Various solutions may be

implemented to initialize '-' and '^+l (for example, initial values may be pre-set or pre-defined). In this illustrative embodiment according to the present invention, MCI (Motion Compensation Interpolation) is used. MCI is further described hereunder with reference to Figure 7. The iteration counter i is also initialized to 0. The iterative process is then started as follows. In an additional act 520, the iteration counter i is compared to a pre-defined maximum value iMax. If the maximum value has not been reached, an optimal weight vector ^" is further derived in an act 530. This weight vector ^"allows computing the interpolated

values of frames '-' and ^/+l , respectively according to equations (2) and (3) here below:

f,.₂(m + k,n + l)x w,'(k,l)

+ ΣΣ f_t(m + u_tn + v)xw_b'(u₉v)

(u,v)eN_T '

+ ΣΣ f;::{m + e,n + f)xw_s-(e,f)

+ f_t+1{m + u,n + v)xW_b(u,v)

+

wherein any f(i,j) denotes the pixel value at (i,j) for frame F and (m,n) a pixel on the chosen supporting region of a frame.

In an additional embodiment of the method according to the invention,

Λι+ι r"^+l the first and the second predicted frames at iteration i ( '-' and ^/+l ) may be

Aι+ι used to generate a prediction ' of the second frame F₁ according to equation (4) here below:

+ f_t'_+] {m + u,n + v)x W_b (u,v)

+ ΣΣ fr^x{m + e,n + f)xW, (e,f)

{e,f)eN_s (₄)

In equations (2) to (4), ^Wf (^k>1) , ^WΛ' ^U>V) _anc| ^wλ' ^eJ) represent the forward, backward, and the spatial weight components of the weight vector

^' corresponding to the previous, following, and current frames, respectively.

H'/ v ^> ) _f ^"'^ are defined over N_τ as they correspond to the weights for the

previous and following frames, while ^w' {^e' ' ' are defined over Ns as they correspond to the weights for the current frame in the interpolation functions of equations (2) to (4). These equations (2) to (4) can be seen as a weighted sum or interpolation functions applied to frames from the sequence of original low rate frames. In the present illustration, the interpolation functions are chosen as identical functions for equations (2), (3) and (4). More generally 3 interpolations functions may be chosen for (2), (3), and (4) respectively.

Furthermore, in equations (2), (3) and (4), the interpolation function is applied to the already defined pixels. This indeed corresponds to the spatial neighborhood Ns as illustrated in Figure 3.

Aι+ι

For instance, as '-' is defined through (2), Ns increases in size as more

and more pixels are defined in '-' , consequently the contribution of '-' pixels in its own definition increases. In an additional embodiment of the present method, the size of Ns is limited to N_T, as mentioned before to be consistent with the size of the temporal neighborhood.

In a further embodiment, for instance if the pixels in a supporting region S are defined line by line or column by column, always in the same direction (left to right, right to left, top to bottom, ...), in the current spatial neighborhood,

only -[(2Z, + l)χ(2I + l)-l] pixels have been defined through equations (2), (3) and (4). Therefore, N_s may be limited in size to the -[(2Z, + l)χ(2I + l)-l] defined

pixels.

In equation (4), one may see that F ^r', i+l is a weighted combination of the intermediate values of the first and second predicted frames from the previous iteration.

The weight components are variables that may be optimized through equation (5):

Il

(5) wherein the argmin function give the optimal weight vector ^" minimizing the sum or distortion D FZ_-1 - F^₁ ¹ D² + D FZ₊₁ - F^₁ ¹ D² +

Equation (5) allows optimizing the interpolation functions while minimizing the distortion between the new intermediate estimate and the intermediate estimate of the predicted frames (elements D F/_-, — FZ-⁺I D² and

O Ff₊I - Ff₊i Cr respectively of (5)) and between the second frame and its

prediction (element D F, - FZ⁺¹ D² of (5)).

The resulting optimal weights

are used in a further act 540 to generate the intermediate values of the first and second predicted frames,

respectively <-' and ⁽⁺¹ , as well as the prediction of the second frame ' , for iteration i.

To decide whether the exemplary embodiment of the present method will be terminated in the current iteration i, a distortion taking into account the

frames '-' , ^/+1 and ' calculated in act 540 for iteration i, as well as the

Λ intermediate values of the first and second predicted frames, respectively FJ^ and F/₊₁ from iteration i-1. Specifically, the distortion D may be computed according to equation (6) as follows:

The distortion D is actually the minimum of the sum D FZ_-1 — F^₁ Dr +

Q F/₊₁ - F^₁ Cr + UF_t - F/⁺ Cr using the optimal weight components.

The distortion takes into account the distortion between the intermediate values of the predicted frames from one iteration to the other, as well as the distortion between the prediction of the second frame and the second frame itself.

If D is smaller than a preset threshold in act 560, or / is larger than the pre-defined maximum iteration number in act 520, the iteration is terminated and the current weight vector ^w' is set to be the ultimate weight vector of the

STAR model. Thus, the interpolated pixels values generated by ^w' are considered to be the final interpolated pixels values that defined the first and second predicted frames. Otherwise, / is increased by 1 and the self-feedback weight training algorithm is moved to the next iteration.

Figure 6 describes an illustrative embodiment of the self-feedback weight training process according to the present invention. The interpolated pixels in frames M and f+1 are used to approximate the actual pixels in the original low rate frame t. As shown in Figure 6, the actual pixels within temporal layer t may be approximated according to equation (4) described here above.

Figure 7 describes the MCI method. MCI is a very common FRUC method, which is performed block by block. It is composed of two processes: motion estimation and motion compensation. The motion estimation is performed by a bidirectional motion estimation method, which is shown as Figure 7. Starting from the centre of the current to-be-interpolated block, bidirectional motion estimation finds two matching blocks in the previous and following frames in a reverse way. Next, the pixels in the to-be-interpolated frames are computed by motion compensation according to equation (7) as follows:

FiA⁰ - vj) (7.1)

F_!+°_] (x_iy) = -(F_ι (x + v,,y + v₎ ) + F_t+2 (x - v_x,y - v_y )) (7.2)

where ^ ^{x y} Λs the motion vector found in the bidirectional motion estimation process.

Figure 8 describes an example of simulation performed using the method according to the invention, i.e. the STAR model. Various standard video sequences, with varying sizes (QCIF, CIF), have been tested in an attempt to shed some light on the quality of interpolated frames. To compare the proposed model with other FRUC algorithms, every other frame from test sequences listed in Figure 8 is skipped and interpolated by STAR model as well as other FRUC methods. In the self-feedback weight training method, the interpolated frames yielded by the MCI-FRUC method are set to be the initial pixels within frames M and f+1 before the iteration starts, it a similar way to what is done in the present method in relation with Figure 7. The interpolated frames are then compared with the accurate ones within original sequences. The peak signal-noise-ratio (PSNR) is used as an objective image quality measure to evaluate the performances of the interpolated frames yielded by MCI, OBMC-MCI, AOBMC-MCI₁ and STAR model. The PSNR gains of OBMC- MCI, AOBMC-MCI, as well as STAR model compared with the MCI method, averaged over 50 frames within all the test sequences, are depicted in Figure 8. It may be seen that the gain of STAR model is very significant, especially for QCIF sequence mobile and CIF sequence Flower, the gain can be up to 2.832dB and 1 ,824dB, respectively. This is because the mobile sequence is composed of camera motions such as zooming and panning and consequently traditional FRUC methods, e.g. MCI₁ OBMC-MCI and AOBMC-MCI, prove to be inefficient to represent the camera motions. Flower sequence is full of non-rigid objects and consequently could not be approximated well based on the assumption that all the pixels within one block have unique motion, which are the foundation of traditional FRUC methods. For the majority of the test sequences, OBMC and AOBMC achieve higher PSNR than those of MCI method due to the application of multi-hypothesis and thus can alleviate the artifacts when the shape of the objects are not aligned with the blocks. AOBMC achieves better performance than OBMC, since it is capable of adjusting the coefficients adaptively according to the reliability of neighbouring blocks. However, the improvement of AOBMC is rather poor for some sequences, e.g. Miss American and Akiyo. In these two sequences, the motions of neighbouring blocks are very similar to that of the current block, thus the superiority is impaired. Since the proposed STAR model fully exploits the correlations both in spatial and temporal domains within adjacent frames, it outperforms MCI, OBMC- MCI, AOBMC-MCI, for all the test sequences, no matter what the format is.

The method according to the invention may be carried out by a device. Said device may be comprised in a system. As described in Figure 9, said system may comprise for instance an emitting device 900 (e.g. a encoder), a receiving device 920 (e.g. a decoder) and a network 910 for communicating between the emitting device 900 and the receiving device 920. The method according to the invention may be implemented either on the emitting device 900 or on the receiving device 920.

Claims

1. A method for adding frames in a digital video flow of frames, the added frames comprising a first predicted frame added between a first and a second frames of the digital video flow, and a second predicted frame added between a second and a third frames of said digital video flow, the method further comprising the acts of: a) initializing an intermediate estimate of the first and second predicted frames, and for each intermediate estimate of the first and second predicted frames: b1 ) defining a new intermediate estimate of the first predicted frame using a first interpolation function applied to the first and second frames, b2) defining a new intermediate estimate of the second predicted frame using a second interpolation function applied to the second and third frames, b3) defining a prediction of the second frame with a third interpolation function applied to the intermediate estimate of the first and second predicted frames, b4) optimizing the interpolation functions while minimizing the distortion between the new intermediate estimate and the intermediate estimate of the predicted frames and between the second frame and its prediction, b5) computing the intermediate estimates of the first and second predicted frames using the optimized first and second interpolation functions, the first and second predicted frames being set to the intermediate estimates of the first and second predicted frames when the distortion is smaller than a given threshold or a given number of intermediate estimates of the first and second predicted frames has been computed.

2. The method according to claim 1 , wherein the first, second and third interpolation function are identical.

3. The method according to one of the previous claims, wherein in acts b1 ), b2) and b3), the defining act is carried pixel by pixel, and wherein the interpolation function is also applied to the already defined pixels.

5 4. A interpolating device for adding frames in a digital video flow of frames, the added frames comprising a first predicted frame added between a first and a second frames of the digital video flow, and a second predicted frame added between a second and a third frames of the digital video flow, said device being operable to receive said digital video flow of frames, the device being o further operable to: a) initialize an intermediate estimate of the first and second predicted frames, and for each intermediate estimate of the first and second predicted frames: b1 ) define a new intermediate estimate of the first predicted frame using a first interpolation function applied to the first and second frames, 5 b2) define a new intermediate estimate of the second predicted frame using a second interpolation function applied to the second and third frames, b3) define a prediction of the second frame with a third interpolation function applied to the intermediate estimate of the first and second predicted frames, b4) optimize the interpolation functions while minimizing the distortion between the new intermediate estimate and the intermediate estimate of the predicted frames and between the second frame and its prediction, b5) compute the intermediate estimates of the first and second predicted frames using the optimized first and second interpolation functions, the first and second predicted frames being set to the intermediate estimates of the first and second predicted frames when the distortion is smaller than a given threshold or a given number of intermediate estimates of the first and second predicted frames has been computed.

5. The interpolating device according to claim 4, wherein the first, second and third interpolation function are identical.

6. The interpolating device according to any of the previous claims 4 and 5, wherein said device is further operable to carry out acts b1 ), b2) and b3) pixel by pixel, and wherein the interpolation function is also applied to the already defined pixels.

7. A system for adding frames in a digital video flow of frames, the added frames comprising a first predicted frame added between a first and a second frames of the digital video flow, and a second predicted frame added between a second and a third frames of said digital video flow, said system comprising:

- an emitting device operable to send said digital video flow,

- a receiving device operable to: a) initialize an intermediate estimate of the first and second predicted frames, and for each intermediate estimate of the first and second predicted frames: b1 ) define a new intermediate estimate of the first predicted frame using a first interpolation function applied to the first and second frames, b2) define a new intermediate estimate of the second predicted frame using a second interpolation function applied to the second and third frames, b3) define a prediction of the second frame with a third interpolation function applied to the intermediate estimate of the first and second predicted frames, b4) optimize the interpolation functions while minimizing the distortion between the new intermediate estimate and the intermediate estimate of the predicted frames and between the second frame and its prediction, b5) compute the intermediate estimates of the first and second predicted frames using the optimized first and second interpolation functions, the first and second predicted frames being set to the intermediate estimates of the first and second predicted frames when the distortion is smaller than a given threshold or a given number of intermediate estimates of the first and second predicted frames has been computed,

- a network operable to communicate between the emitting device and the receiving device

The system according to claim 7, wherein the first, second and third interpolation function are identical

The system according to any of the previous claims 7 and 8, wherein said system is further operable to carry the defining acts b1 ), b2) and b3) pixel by pixel, and wherein the interpolation function is also applied to the already defined pixels

A computer program providing computer executable instructions stored on a computer readable medium, which when loaded on to a data processor causes the data processor to perform a method for adding frames in a digital video flow of frames according to claims 1 to 3