WO2010032229A1 - Frame rate up conversion method and system - Google Patents
Frame rate up conversion method and system Download PDFInfo
- Publication number
- WO2010032229A1 WO2010032229A1 PCT/IB2009/055097 IB2009055097W WO2010032229A1 WO 2010032229 A1 WO2010032229 A1 WO 2010032229A1 IB 2009055097 W IB2009055097 W IB 2009055097W WO 2010032229 A1 WO2010032229 A1 WO 2010032229A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- frames
- predicted
- frame
- estimate
- interpolation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N7/00—Television systems
- H04N7/01—Conversion of standards, e.g. involving analogue television standards or digital television standards processed at pixel level
- H04N7/0135—Conversion of standards, e.g. involving analogue television standards or digital television standards processed at pixel level involving interpolation processes
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N7/00—Television systems
- H04N7/01—Conversion of standards, e.g. involving analogue television standards or digital television standards processed at pixel level
- H04N7/0135—Conversion of standards, e.g. involving analogue television standards or digital television standards processed at pixel level involving interpolation processes
- H04N7/014—Conversion of standards, e.g. involving analogue television standards or digital television standards processed at pixel level involving interpolation processes involving the use of motion vectors
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N7/00—Television systems
- H04N7/01—Conversion of standards, e.g. involving analogue television standards or digital television standards processed at pixel level
- H04N7/0127—Conversion of standards, e.g. involving analogue television standards or digital television standards processed at pixel level by changing the field or frame frequency of the incoming video signal, e.g. frame rate converter
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N7/00—Television systems
- H04N7/01—Conversion of standards, e.g. involving analogue television standards or digital television standards processed at pixel level
- H04N7/0135—Conversion of standards, e.g. involving analogue television standards or digital television standards processed at pixel level involving interpolation processes
- H04N7/0145—Conversion of standards, e.g. involving analogue television standards or digital television standards processed at pixel level involving interpolation processes the interpolation being class adaptive, i.e. it uses the information of class which is determined for a pixel based upon certain characteristics of the neighbouring pixels
Definitions
- the present invention relates in general to image processing and more specifically to Frame Rate Up-Conversion (FRUC) image processing.
- FRUC Frame Rate Up-Conversion
- FRUC Frame Rate Up-Conversion
- a media sequence e.g. a video
- FRUC has many applications, among which the most practical one is to enhance the visual quality of low bit rate coded video.
- low bit rate coding applications it is possible to reduce the frame rate to the half or even lower ratio of the original frame rate, and then only encode the low rate frames with better visual quality.
- FRUC may be performed to restore the loss of temporal resolution with higher visual quality.
- One of the most common cases is the double up-conversion of the frame rate, an example of which is illustrated on Figure 1 , where for instance, the original frame rate is 15 fps (frames per second), and the up-converted frame rate is 30 fps.
- the added frames are interpolated using the redundancies between successive frames along the temporal axis.
- FRUC is also applicable for instance to Phase Alternating Line-National Television System Committee (PAL-NTSC) conversion and complex video editing.
- PAL-NTSC Phase Alternating Line-National Television System Committee
- FRUC may be of great help for slow-motion playback and the rate allocation policy of scalable video coding schemes.
- a simple and straightforward FRUC algorithm consists in combining adjacent video frames without taking object motion into account.
- the object motion is the change of a given object, captured from a scene, from one frame to another (i.e. between adjacent frames).
- Examples of such a simple FRUC method are e.g. frame repetition or frame averaging.
- frame repetition or frame averaging work well only if there is little or no motion between successive or adjacent frames.
- the frame repetition method will cause jerkiness and the resulting image will look choppy and/or not smooth.
- images will look blurry when motion occurs, and sometimes ghost artifacts may also be perceived.
- MC-FRUC Motion Compensation-FRUC
- ME Motion Estimation
- This method uses motion vectors, i.e. vectors used for inter-prediction that provides an offset from the coordinates in the decoded picture to the coordinates in a reference picture. Since motion plays a significant role in MC-FRUC, many algorithms have been developed to derive accurate motion vectors. For example, the Block-Matching Algorithm (BMA) detailed in "A method for motion adaptive frame rate up-conversion" R. Castagno, P. Haavisto, and G.
- BMA Block-Matching Algorithm
- Motion Estimation may be performed for FRUC with the absence of actual frames, i.e. it allows creating some frames between and from existing ones.
- a median filter is usually applied to the motion vector field (i.e. the field of motion vectors of all the blocks) after performing ME, in which the motion vector field needs to go through the median filter operation in e.g. a 3x3 window (or current block and its 8 neighboring blocks which are located in a 3x3 window) before MC.
- fixed size block MC usually does not work well. To avoid this situation, variable size block MC was proposed in object boundaries to reconstruct edge information with a higher quality.
- OBMC Overlapped Block MC
- AOBMC Adaptive OBMC
- the invention proposes a method A method for adding frames in a digital video flow of frames, the added frames comprising a first predicted frame added between a first and a second frames of the digital video flow, and a second predicted frame added between a second and a third frames of said digital video flow, the method further comprising the acts of: a) initializing an intermediate estimate of the first and second predicted frames, and for each intermediate estimate of the first and second predicted frames: b1 ) defining a new intermediate estimate of the first predicted frame using a first interpolation function applied to the first and second frames, b2) defining a new intermediate estimate of the second predicted frame using a second interpolation function applied to the second and third frames, b3) defining a prediction of the second frame with a third interpolation function applied to the intermediate estimate of the first and second predicted frames, b4) optimizing the interpolation functions while minimizing the distortion between the new intermediate estimate and the intermediate estimate of
- the invention also relates to a system for adding frames in a digital video flow of frames, the added frames comprising a first predicted frame added between a first and a second frames of the digital video flow, and a second predicted frame added between a second and a third frames of said digital video flow, said system comprising:
- a receiving device operable to: a) initialize an intermediate estimate of the first and second predicted frames, and for each intermediate estimate of the first and second predicted frames: b1 ) define a new intermediate estimate of the first predicted frame using a first interpolation function applied to the first and second frames, b2) define a new intermediate estimate of the second predicted frame using a second interpolation function applied to the second and third frames, b3) define a prediction of the second frame with a third interpolation function applied to the intermediate estimate of the first and second predicted frames, b4) optimize the interpolation functions while minimizing the distortion between the new intermediate estimate and the intermediate estimate of the predicted frames and between the second frame and its prediction, b5) compute the intermediate estimates of the first and second predicted frames using the optimized first and second interpolation functions, the first and second predicted frames being set to the intermediate estimates of the first and second predicted frames when the distortion is smaller than a given threshold or a given number of intermediate estimates of the first and second predicted frames has been computed; - a network operable to communicate between the emitting
- the invention also relates to an interpolating device for adding frames in a digital video flow of frames, the added frames comprising a first predicted frame added between a first and a second frames of the digital video flow, and a second predicted frame added between a second and a third frames of the digital video flow, said device being operable to receive said digital video flow of frames, the device being further operable to: a) initialize an intermediate estimate of the first and second predicted frames, and for each intermediate estimate of the first and second predicted frames: b1 ) define a new intermediate estimate of the first predicted frame using a first interpolation function applied to the first and second frames, b2) define a new intermediate estimate of the second predicted frame using a second interpolation function applied to the second and third frames, b3) define a prediction of the second frame with a third interpolation function applied to the intermediate estimate of the first and second predicted frames, b4) optimize the interpolation functions while minimizing the distortion between the new intermediate estimate and the intermediate estimate of the predicted frames and between the second frame and its prediction, b
- the invention also relates to a computer program providing computer executable instructions stored on a computer readable medium, which when loaded on to a data processor causes the data processor to perform a method for adding frames in a digital video flow of frames according to claims 1 to 3.
- the method according to the invention proposes a model for improving the quality of interpolated frames in FRUC.
- the present method could be described as a Spatiotemporal Auto-Regressive (STAR) model.
- STAR Spatiotemporal Auto-Regressive
- the proposed method exploits the correlations or similarities (of pixels) between adjacent frames (i.e in the temporal domain) and within a same frame (i.e. spatial domain).
- the method according to the invention is able to adaptively exploit the redundancies both in the spatial and temporal domains by tuning the interpolation functions (i.e. its interpolation weights) in order to optimize them, making thus interpolated frames more reliable.
- the STAR model allows learning motion information between successive frames by adjusting the interpolation weights adaptively according to the characteristics of the pixels within a local spatiotemporal neighborhood.
- Another advantage of the method according to the invention is that it achieves sub-pixel (i.e. pixels located between the space of integer pixels) accuracy with spatially varying interpolation coefficients or weights.
- sub-pixel i.e. pixels located between the space of integer pixels
- interpolation coefficients or weights As traditional FRUC algorithms use the same interpolation coefficients for all the frames of a sequence, it is not possible to consider non-stationary statistical properties of video signals (i.e. frames or areas with different motion intensity, for example, the boundary of motion objects, where one part of the object is stationary and the other part is moving, or the boundary of foreground and background regions, or the covered and uncovered regions between successive frames).
- the STAR model is able to tune its interpolation coefficients adaptively to match non-stationary statistical properties of video signals.
- FIG 1 schematically illustrates a simple example of Frame Rate Up- Conversion (FRUC);
- Figure 2 schematically illustrates a system according to an embodiment of the present invention
- Figure 3 schematically illustrates an example of pixels interpolation according to an embodiment of the present invention
- Figure 4 schematically illustrates the method according to an embodiment of the present invention
- Figure 5 schematically illustrates the self-feedback weight training method according to an embodiment of the present invention
- Figure 6 schematically illustrates an example of pixels interpolation according to an embodiment of the present invention
- FIG. 7 schematically illustrates an example of Motion Compensation Interpolation (MCI) method according to an embodiment of the present invention
- Figure 8 schematically illustrates a simulation according to an embodiment of the present invention.
- the method according to the invention proposes a model for improving the quality of interpolated frames in FRUC, and more specifically a Spatiotemporal Auto-Regressive (STAR) model.
- STAR Spatiotemporal Auto-Regressive
- each pixel is approximated as a weighted linear combination or interpolation (through interpolation functions) of pixels within a spatial neighborhood in the current frame as well as a weighted linear combination of pixels within a temporal neighborhood in the previous and following frames. Due to the absence of some actual pixels in the missing frames and in order to improve the accuracy of the model, the method according to the invention proposes an iterative self-feedback weight training algorithm to optimize the interpolation weights (i.e. the coefficients) of the linear combinations.
- An initialization step allows interpolating at least one pixel in a given missing frame to be interpolated, hereafter called first and second predicted frames.
- pixels in said given missing frame are interpolated using pixels in the previous and following frames as well as available pixels in the given missing frame.
- Actual original low rate frames are also predicted (i.e. estimated) using the same interpolation weights in the same iteration.
- the optimal interpolation weights are derived. The distortion is calculated between the pixels approximated using the interpolation weights in the current iteration and those in the previous iteration as well as the actual pixels in the original low rate frame.
- the STAR model is based on a spatiotemporal scheme, which is depicted in Figure 3.
- model samples are taken at five successive frames , where F,_ 2 , F 1 , and F 1+2 denote the successive original low rate frames, respectively called first, second and third frames, F M and F) +1 denote the interpolated or added frames at time instance M and f+1, respectively referred to as first and second predicted frames.
- the pixel value to be interpolated at spatial location (m,n) of a current frame the current frame being either F ( _ ⁇ or F, +1 , depends on pixel samples within a temporal neighbourhood taken in previous and following frames as well as pixels available within a spatial neighbourhood of said current frame.
- the current frame is F M
- the previous and following frames are respectively the first frame F,_
- the second frame F .
- the previous and following frames are respectively the second frame F 1 and the third frame F r+2 .
- the temporal neighbourhood or N T can be defined as a region taken in the previous or following frames to the current frame.
- the spatial neighbourhood or Ns can be defined as a region taken in the current frame where pixels are already available as they have been previously defined.
- the size of the temporal neighbourhood N ⁇ in the previous and following frames may defined as a (2Z + l) ⁇ (2L + l) square bounded by (m ⁇ L,n ⁇ L) .
- the size of the spatial neighborhood Ns may be chosen as being of the same size as N T or smaller.
- Parameter L is referred to as the spatiotemporal order of STAR model.
- Figure 3 shows an exemplary illustration of the STAR model using a spatiotemporal order L-I .
- the method according to the invention may be performed on a block within the frame, block defined as the supporting region S of the STAR model. It may correspond for example to a block of pixels (for instance 16x16 or 32x32 pixel IU
- the interpolation weights are identical per iteration and over NT or Ns. In other words, for every (m,n) in the supporting region, the contribution of the pixels of NT or Ns in the first to third frames only depends upon their distance to (m,n).
- Figure 2 describes an illustrative embodiment of the system according to the invention.
- a self-feedback weight training unit 200 allows deriving the optimal interpolation weights by iteration. Each pixel is approximated as a weighted linear combination of pixels (through interpolation functions) within a spatial neighborhood in the current frame as well as a weighted linear combination of pixels within a temporal neighborhood in the previous and following frames. At the end of each iteration, optimal interpolation weights are derived.
- the self-feedback weight training process is later described in detail hereunder with reference to Figure 5. After getting the optimal interpolation weights, a first unit 205 allows interpolating the missing frame at time instance M and a second unit 210 allows interpolating the missing frame at time instance f+1.
- Figure 4 describes an illustrative embodiment of the method according to the invention.
- the size of the supporting region is first determined in an act 400.
- the supporting region is the area of a frame on which the method according to the invention may be performed.
- the determination process may be as follows: for example, if the resolution is QCIF (Quarter Common Intermediate Format), then the size of the supporting region may be set to be 16x16, else if the resolution is larger than QCIF, the size of the supporting region is set to be 32x32. Then, as long as the frames interpolation process is not completed then (act 410) the spatiotemporal order L of the STAR model is computed in an act 420 as
- Figure 5 describes an illustrative embodiment of the self-feedback weighting training process of act 430 according to the invention.
- This training process is an iterative process.
- the supporting region S or block is assumed to be the whole frame.
- the present teaching may be easily transposed by the man skilled in the art to a supporting region taken in a frame.
- the three successive first, second and third frames of the original low rate frames are selected in an act 500.
- '-' and ' +l denote intermediate values or estimates for the first and second predicted frames for the i th iteration.
- / ' represents the iteration counter and ⁇ ' represents the optimal weight vector after the Z" 1 iteration.
- the act 510 allows initializing '-' and '*' .
- Various solutions may be
- MCI Motion Compensation Interpolation
- MCI Motion Compensation Interpolation
- the iteration counter i is also initialized to 0.
- the iterative process is then started as follows. In an additional act 520, the iteration counter i is compared to a pre-defined maximum value iMax. If the maximum value has not been reached, an optimal weight vector ⁇ " is further derived in an act 530. This weight vector ⁇ "allows computing the interpolated
- any f(i,j) denotes the pixel value at (i,j) for frame F and (m,n) a pixel on the chosen supporting region of a frame.
- equations (2) to (4) can be seen as a weighted sum or interpolation functions applied to frames from the sequence of original low rate frames.
- the interpolation functions are chosen as identical functions for equations (2), (3) and (4). More generally 3 interpolations functions may be chosen for (2), (3), and (4) respectively.
- equations (2), (3) and (4) the interpolation function is applied to the already defined pixels. This indeed corresponds to the spatial neighborhood Ns as illustrated in Figure 3.
- Ns increases in size as more
- the size of Ns is limited to N T , as mentioned before to be consistent with the size of the temporal neighborhood.
- N s may be limited in size to the -[(2Z, + l) ⁇ (2I + l)-l] defined
- F r ', i+l is a weighted combination of the intermediate values of the first and second predicted frames from the previous iteration.
- the weight components are variables that may be optimized through equation (5):
- Equation (5) allows optimizing the interpolation functions while minimizing the distortion between the new intermediate estimate and the intermediate estimate of the predicted frames (elements D F/_-, — FZ- + I D 2 and
- the resulting optimal weights are used in a further act 540 to generate the intermediate values of the first and second predicted frames, respectively ⁇ -' and (+1 , as well as the prediction of the second frame ' , for iteration i.
- the distortion D may be computed according to equation (6) as follows:
- the distortion D is actually the minimum of the sum D FZ -1 — F ⁇ 1 Dr +
- the distortion takes into account the distortion between the intermediate values of the predicted frames from one iteration to the other, as well as the distortion between the prediction of the second frame and the second frame itself.
- the interpolated pixels values generated by w ' are considered to be the final interpolated pixels values that defined the first and second predicted frames. Otherwise, / is increased by 1 and the self-feedback weight training algorithm is moved to the next iteration.
- Figure 6 describes an illustrative embodiment of the self-feedback weight training process according to the present invention.
- the interpolated pixels in frames M and f+1 are used to approximate the actual pixels in the original low rate frame t.
- the actual pixels within temporal layer t may be approximated according to equation (4) described here above.
- Figure 7 describes the MCI method.
- MCI is a very common FRUC method, which is performed block by block. It is composed of two processes: motion estimation and motion compensation.
- the motion estimation is performed by a bidirectional motion estimation method, which is shown as Figure 7. Starting from the centre of the current to-be-interpolated block, bidirectional motion estimation finds two matching blocks in the previous and following frames in a reverse way. Next, the pixels in the to-be-interpolated frames are computed by motion compensation according to equation (7) as follows:
- Figure 8 describes an example of simulation performed using the method according to the invention, i.e. the STAR model.
- Various standard video sequences, with varying sizes (QCIF, CIF) have been tested in an attempt to shed some light on the quality of interpolated frames.
- every other frame from test sequences listed in Figure 8 is skipped and interpolated by STAR model as well as other FRUC methods.
- the interpolated frames yielded by the MCI-FRUC method are set to be the initial pixels within frames M and f+1 before the iteration starts, it a similar way to what is done in the present method in relation with Figure 7.
- the interpolated frames are then compared with the accurate ones within original sequences.
- the peak signal-noise-ratio is used as an objective image quality measure to evaluate the performances of the interpolated frames yielded by MCI, OBMC-MCI, AOBMC-MCI 1 and STAR model.
- PSNR gains of OBMC- MCI, AOBMC-MCI, as well as STAR model compared with the MCI method, averaged over 50 frames within all the test sequences, are depicted in Figure 8. It may be seen that the gain of STAR model is very significant, especially for QCIF sequence mobile and CIF sequence Flower, the gain can be up to 2.832dB and 1 ,824dB, respectively.
- FRUC methods e.g. MCI 1 OBMC-MCI and AOBMC-MCI
- OBMC and AOBMC achieve higher PSNR than those of MCI method due to the application of multi-hypothesis and thus can alleviate the artifacts when the shape of the objects are not aligned with the blocks.
- AOBMC achieves better performance than OBMC, since it is capable of adjusting the coefficients adaptively according to the reliability of neighbouring blocks.
- the improvement of AOBMC is rather poor for some sequences, e.g. Miss American and Akiyo. In these two sequences, the motions of neighbouring blocks are very similar to that of the current block, thus the superiority is impaired.
- the proposed STAR model fully exploits the correlations both in spatial and temporal domains within adjacent frames, it outperforms MCI, OBMC- MCI, AOBMC-MCI, for all the test sequences, no matter what the format is.
- the method according to the invention may be carried out by a device.
- Said device may be comprised in a system.
- said system may comprise for instance an emitting device 900 (e.g. a encoder), a receiving device 920 (e.g. a decoder) and a network 910 for communicating between the emitting device 900 and the receiving device 920.
- the method according to the invention may be implemented either on the emitting device 900 or on the receiving device 920.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Television Systems (AREA)
Abstract
A method for adding frames in a digital video flow of frames, the added frames comprising a first predicted frame added between a first and a second frames of the digital video flow, and a second predicted frame added between a second and a third frames of said digital video flow, the method further comprising the acts of: a) initializing an intermediate estimate of the first and second predicted frames, and for each intermediate estimate of the first and second predicted frames: b1 ) defining a new intermediate estimate of the first predicted frame using a first interpolation function applied to the first and second frames, b2) defining a new intermediate estimate of the second predicted frame using a second interpolation function applied to the second and third frames, b3) defining a prediction of the second frame with a third interpolation function applied to the intermediate estimate of the first and second predicted frames, b4) optimizing the interpolation functions while minimizing the distortion between the new intermediate estimate and the intermediate estimate of the predicted frames and between the second frame and its prediction, b5) computing the intermediate estimates of the first and second predicted frames using the optimized first and second interpolation functions, the first and second predicted frames being set to the intermediate estimates of the first and second predicted frames when the distortion is smaller than a given threshold or a given number of intermediate estimates of the first and second predicted frames has been computed.
Description
FRAME RATE UP CONVERSION METHOD AND SYSTEM
Field of the Invention
The present invention relates in general to image processing and more specifically to Frame Rate Up-Conversion (FRUC) image processing.
Background of the Invention
In image processing, Frame Rate Up-Conversion (FRUC) is a widely investigated technique to up-convert the temporal resolution of a media sequence (e.g. a video), i.e. to increase the number of frames (or pictures or images) in a media sequence in order to increase its resolution. FRUC has many applications, among which the most practical one is to enhance the visual quality of low bit rate coded video. In low bit rate coding applications, it is possible to reduce the frame rate to the half or even lower ratio of the original frame rate, and then only encode the low rate frames with better visual quality. When decoding the media, FRUC may be performed to restore the loss of temporal resolution with higher visual quality. One of the most common cases is the double up-conversion of the frame rate, an example of which is illustrated on Figure 1 , where for instance, the original frame rate is 15 fps (frames per second), and the up-converted frame rate is 30 fps. The added frames are interpolated using the redundancies between successive frames along the temporal axis.
Besides the low bit rate coding, FRUC is also applicable for instance to Phase Alternating Line-National Television System Committee (PAL-NTSC) conversion and complex video editing. In addition, FRUC may be of great help for slow-motion playback and the rate allocation policy of scalable video coding schemes.
Numerous FRUC algorithms have been developed to convert frame rates. A simple and straightforward FRUC algorithm consists in combining adjacent video frames without taking object motion into account. The object motion is the change of a given object, captured from a scene, from one frame to another (i.e. between adjacent frames). Examples of such a simple FRUC method are e.g. frame repetition or frame averaging.
However, such methods work well only if there is little or no motion between successive or adjacent frames. As the intensity of motion increases, the frame repetition method will cause jerkiness and the resulting image will look choppy and/or not smooth. When applying a frame averaging method, images will look blurry when motion occurs, and sometimes ghost artifacts may also be perceived.
As opposed to frame repetition and frame averaging, another kind of FRUC method, called a MC (Motion Compensation)-FRUC, is based on Motion Estimation (ME) and performs frame interpolation along the motion trajectory of an object from frame to frame to achieve better visual quality. This method uses motion vectors, i.e. vectors used for inter-prediction that provides an offset from the coordinates in the decoded picture to the coordinates in a reference picture. Since motion plays a significant role in MC-FRUC, many algorithms have been developed to derive accurate motion vectors. For example, the Block-Matching Algorithm (BMA) detailed in "A method for motion adaptive frame rate up-conversion" R. Castagno, P. Haavisto, and G. Ramponi, IEEE Trans, on Circuits and System Video Technology, vol. 6, No. 5, pp. 436-446, Oct. 1996 and "Adaptive MC interpolation for frame rate up-conversion" S.-H. Lee, Y.-C. Shin, S.-J. Yang, H.-H. Moon and R.-H. Park, IEEE Trans, on Consumer Electron., vol. 48, No. 3, pp. 444-450, Aug. 2002, has a broad application in MC-FRUC. Since motion vectors derived by BMA are often not accurate enough, several approaches for more faithful motion estimation, described in "New frame rate up-conversion using bi-directional motion estimation", B. T. Choi, S. H. Lee, and S. J. Ko, IEEE Trans, on Consumer Electron., vol. 46, No. 3, pp. 603-609, Aug. 2000 and "Motion compensated frame interpolation based on H.264 decoder", Z. Gan, L. Qi, and X. Zhu, Electronics Letters., vol. 43, No. 2, pp. 96-98, Jan. 2007, have also been proposed in recent works. One of these FRUC algorithms "New frame rate up-conversion using bi-directional motion estimation", B. T. Choi, S. H. Lee, and S. J. Ko, IEEE Trans, on Consumer Electron., vol. 46, No. 3, pp. 603-609, Aug. 2000, uses a bi-directional ME to derive more faithful motion vectors. In addition, a hierarchical MC technique was proposed to achieve better visual quality in "Hierarchical motion compensated frame rate up-conversion based
on the Gaussian/Laplacian pyramid", G. I. Lee, B. W. Jeon. R. H. Park, and S. H. Lee, Proc. IEEE Int. Conf. Consumer Electronics, 2003, pp. 350-351 . Both methods achieve better performance than BMA does. However, they are all based on the assumption of a translational motion with constant velocity, which is not always true. Furthermore, constant acceleration was exploited to derive more reliable motion trajectories. However, the assumption of a constant acceleration also does not always hold true for all the regions of an image, e.g. for the regions of non-rigid objects (i.e. for instance clouds and waves which change their forms/shape between images). As opposed to its use in video coding where it is used with actual frames,
Motion Estimation may be performed for FRUC with the absence of actual frames, i.e. it allows creating some frames between and from existing ones. However, as the derived motion trajectory may not be consistent sometimes, a median filter is usually applied to the motion vector field (i.e. the field of motion vectors of all the blocks) after performing ME, in which the motion vector field needs to go through the median filter operation in e.g. a 3x3 window (or current block and its 8 neighboring blocks which are located in a 3x3 window) before MC. However, for the areas of small objects, irregular shaped objects, and object boundaries, fixed size block MC usually does not work well. To avoid this situation, variable size block MC was proposed in object boundaries to reconstruct edge information with a higher quality. Furthermore, a method called Overlapped Block MC (OBMC) may be applied to suppress the blocking artifacts, which are usually observed when a block has a significant different motion vector compared with its neighboring blocks. However, OBMC may sometimes over-smooth the edges of the image and degrade the image quality. To reduce the over-smoothing effect of OBMC, an Adaptive OBMC (AOBMC) method was proposed, in which the coefficients of the OBMC are adjusted based on the reliability of neighboring motion vectors. However, AOBMC still has poor ability to represent some complex motions, such as zooming, rotation, and local deformation.
Today there is a need for a FRUC solution that can be easily implemented on the existing communication infrastructures, overcoming the drawbacks of the prior art.
Summary of Invention
It is an object of the present system to overcome disadvantages and/or make improvement over the prior art. To that extend, the invention proposes a method A method for adding frames in a digital video flow of frames, the added frames comprising a first predicted frame added between a first and a second frames of the digital video flow, and a second predicted frame added between a second and a third frames of said digital video flow, the method further comprising the acts of: a) initializing an intermediate estimate of the first and second predicted frames, and for each intermediate estimate of the first and second predicted frames: b1 ) defining a new intermediate estimate of the first predicted frame using a first interpolation function applied to the first and second frames, b2) defining a new intermediate estimate of the second predicted frame using a second interpolation function applied to the second and third frames, b3) defining a prediction of the second frame with a third interpolation function applied to the intermediate estimate of the first and second predicted frames, b4) optimizing the interpolation functions while minimizing the distortion between the new intermediate estimate and the intermediate estimate of the predicted frames and between the second frame and its prediction, b5) computing the intermediate estimates of the first and second predicted frames using the optimized first and second interpolation functions, the first and second predicted frames being set to the intermediate estimates of the first and second predicted frames when the distortion is smaller than a given threshold or a given number of intermediate estimates of the first and second predicted frames has been computed.
The invention also relates to a system for adding frames in a digital video flow of frames, the added frames comprising a first predicted frame added between a first and a second frames of the digital video flow, and a second predicted frame added between a second and a third frames of said digital video flow, said system comprising:
- an emitting device operable to send said digital video flow,
- a receiving device operable to: a) initialize an intermediate estimate of the first and second predicted frames, and for each intermediate estimate of the first and second predicted frames: b1 ) define a new intermediate estimate of the first predicted frame using a first interpolation function applied to the first and second frames, b2) define a new intermediate estimate of the second predicted frame using a second interpolation function applied to the second and third frames, b3) define a prediction of the second frame with a third interpolation function applied to the intermediate estimate of the first and second predicted frames, b4) optimize the interpolation functions while minimizing the distortion between the new intermediate estimate and the intermediate estimate of the predicted frames and between the second frame and its prediction, b5) compute the intermediate estimates of the first and second predicted frames using the optimized first and second interpolation functions, the first and second predicted frames being set to the intermediate estimates of the first and second predicted frames when the distortion is smaller than a given threshold or a given number of intermediate estimates of the first and second predicted frames has been computed; - a network operable to communicate between the emitting device and the receiving device.
The invention also relates to an interpolating device for adding frames in a digital video flow of frames, the added frames comprising a first predicted frame added between a first and a second frames of the digital video flow, and a second
predicted frame added between a second and a third frames of the digital video flow, said device being operable to receive said digital video flow of frames, the device being further operable to: a) initialize an intermediate estimate of the first and second predicted frames, and for each intermediate estimate of the first and second predicted frames: b1 ) define a new intermediate estimate of the first predicted frame using a first interpolation function applied to the first and second frames, b2) define a new intermediate estimate of the second predicted frame using a second interpolation function applied to the second and third frames, b3) define a prediction of the second frame with a third interpolation function applied to the intermediate estimate of the first and second predicted frames, b4) optimize the interpolation functions while minimizing the distortion between the new intermediate estimate and the intermediate estimate of the predicted frames and between the second frame and its prediction, b5) compute the intermediate estimates of the first and second predicted frames using the optimized first and second interpolation functions, the first and second predicted frames being set to the intermediate estimates of the first and second predicted frames when the distortion is smaller than a given threshold or a given number of intermediate estimates of the first and second predicted frames has been computed.
The invention also relates to a computer program providing computer executable instructions stored on a computer readable medium, which when loaded on to a data processor causes the data processor to perform a method for adding frames in a digital video flow of frames according to claims 1 to 3.
The method according to the invention proposes a model for improving the quality of interpolated frames in FRUC. The present method could be described as a Spatiotemporal Auto-Regressive (STAR) model.
Indeed, the proposed method exploits the correlations or similarities (of pixels) between adjacent frames (i.e in the temporal domain) and within a same frame (i.e. spatial domain).
The method according to the invention is able to adaptively exploit the redundancies both in the spatial and temporal domains by tuning the interpolation functions (i.e. its interpolation weights) in order to optimize them, making thus interpolated frames more reliable. Indeed, the STAR model allows learning motion information between successive frames by adjusting the interpolation weights adaptively according to the characteristics of the pixels within a local spatiotemporal neighborhood.
Another advantage of the method according to the invention is that it achieves sub-pixel (i.e. pixels located between the space of integer pixels) accuracy with spatially varying interpolation coefficients or weights. As traditional FRUC algorithms use the same interpolation coefficients for all the frames of a sequence, it is not possible to consider non-stationary statistical properties of video signals (i.e. frames or areas with different motion intensity, for example, the boundary of motion objects, where one part of the object is stationary and the other part is moving, or the boundary of foreground and background regions, or the covered and uncovered regions between successive frames). In contrast, the STAR model is able to tune its interpolation coefficients adaptively to match non-stationary statistical properties of video signals.
Brief Description of the Drawings
Embodiments of the present invention will now be described solely by way of example and only with reference to the accompanying drawings, where like parts are provided with corresponding reference numerals, and in which:
Figure 1 schematically illustrates a simple example of Frame Rate Up- Conversion (FRUC);
Figure 2 schematically illustrates a system according to an embodiment of the present invention;
Figure 3 schematically illustrates an example of pixels interpolation according to an embodiment of the present invention;
Figure 4 schematically illustrates the method according to an embodiment of the present invention;
Figure 5 schematically illustrates the self-feedback weight training method according to an embodiment of the present invention; Figure 6 schematically illustrates an example of pixels interpolation according to an embodiment of the present invention;
Figure 7 schematically illustrates an example of Motion Compensation Interpolation (MCI) method according to an embodiment of the present invention
Figure 8 schematically illustrates a simulation according to an embodiment of the present invention.
Description of Preferred Embodiments
The method according to the invention proposes a model for improving the quality of interpolated frames in FRUC, and more specifically a Spatiotemporal Auto-Regressive (STAR) model.
In the STAR model, each pixel is approximated as a weighted linear combination or interpolation (through interpolation functions) of pixels within a spatial neighborhood in the current frame as well as a weighted linear combination of pixels within a temporal neighborhood in the previous and following frames. Due to the absence of some actual pixels in the missing frames and in order to improve the accuracy of the model, the method according to the invention proposes an iterative self-feedback weight training algorithm to optimize the interpolation weights (i.e. the coefficients) of the linear combinations. An initialization step allows interpolating at least one pixel in a given missing frame to be interpolated, hereafter called first and second predicted frames. Then, in each iteration, pixels in said given missing frame are interpolated using pixels in the previous and following frames as well as available pixels in the given missing frame. Actual original low rate frames are also predicted (i.e. estimated) using the same interpolation weights in the same iteration. Finally, after a pre-set number of iterations or when the distortion between pixels from one iteration to another is minimized (i.e. under a given threshold), then the optimal interpolation weights are
derived. The distortion is calculated between the pixels approximated using the interpolation weights in the current iteration and those in the previous iteration as well as the actual pixels in the original low rate frame.
According to an exemplary embodiment of the present invention, the STAR model is based on a spatiotemporal scheme, which is depicted in Figure 3.
Along the temporal axis, model samples are taken at five successive frames , where F,_2 , F1 , and F1+2 denote the successive original low rate frames, respectively called first, second and third frames, FM and F)+1 denote the interpolated or added frames at time instance M and f+1, respectively referred to as first and second predicted frames.
The pixel value to be interpolated at spatial location (m,n) of a current frame, the current frame being either F(_{ or F,+1 , depends on pixel samples within a temporal neighbourhood taken in previous and following frames as well as pixels available within a spatial neighbourhood of said current frame. When the current frame is FM , the previous and following frames are respectively the first frame F,_, and the second frame F, . When the current frames is F1+1 , the previous and following frames are respectively the second frame F1 and the third frame Fr+2.
The temporal neighbourhood or NT can be defined as a region taken in the previous or following frames to the current frame. The spatial neighbourhood or Ns can be defined as a region taken in the current frame where pixels are already available as they have been previously defined.
The size of the temporal neighbourhood Nτ in the previous and following frames may defined as a (2Z + l)χ(2L + l) square bounded by (m±L,n ±L) . To be consistent with NT, the size of the spatial neighborhood Ns may be chosen as being of the same size as NT or smaller.
Parameter L is referred to as the spatiotemporal order of STAR model.
Figure 3 shows an exemplary illustration of the STAR model using a spatiotemporal order L-I . The method according to the invention may be performed on a block within the frame, block defined as the supporting region S of the STAR model. It may correspond for example to a block of pixels (for instance 16x16 or 32x32 pixel
IU
blocks) or to the whole picture. As seen hereafter within a block, the interpolation weights are identical per iteration and over NT or Ns. In other words, for every (m,n) in the supporting region, the contribution of the pixels of NT or Ns in the first to third frames only depends upon their distance to (m,n).
Figure 2 describes an illustrative embodiment of the system according to the invention.
A self-feedback weight training unit 200 allows deriving the optimal interpolation weights by iteration. Each pixel is approximated as a weighted linear combination of pixels (through interpolation functions) within a spatial neighborhood in the current frame as well as a weighted linear combination of pixels within a temporal neighborhood in the previous and following frames. At the end of each iteration, optimal interpolation weights are derived. The self-feedback weight training process is later described in detail hereunder with reference to Figure 5. After getting the optimal interpolation weights, a first unit 205 allows interpolating the missing frame at time instance M and a second unit 210 allows interpolating the missing frame at time instance f+1.
Figure 4 describes an illustrative embodiment of the method according to the invention.
According to the spatial resolution of the input sequences (i.e. frame resolution), the size of the supporting region is first determined in an act 400. The supporting region is the area of a frame on which the method according to the invention may be performed. According to an illustrative embodiment of the present invention, the determination process may be as follows: for example, if the resolution is QCIF (Quarter Common Intermediate Format), then the size of the supporting region may be set to be 16x16, else if the resolution is larger than QCIF, the size of the supporting region is set to be 32x32. Then, as long as the frames interpolation process is not completed then (act 410) the spatiotemporal order L of the STAR model is computed in an act 420 as
L
(1)
where S represents the supporting region, [• J is the floor function, which maps the motion vector mv, of the /th block (i.e. the block used to interpolate the to-be- interpolated frame by the MCI method, in other words it means the blocks used to generate Ft_x and F1+1 as shown in equation (7)) in S to the full-pixel position. After the supporting region and the spatiotemporal order have been determined, the self-feedback weight training process is performed in an act
430 to derive the optimal interpolation weights. Finally, the missing frames F,_, and F1+1 are interpolated by STAR interpolation using the optimal weights in an act 440.
Figure 5 describes an illustrative embodiment of the self-feedback weighting training process of act 430 according to the invention. This training process is an iterative process. In the hereafter description, the supporting region S or block is assumed to be the whole frame. The present teaching may be easily transposed by the man skilled in the art to a supporting region taken in a frame.
Along the temporal axis, the three successive first, second and third frames of the original low rate frames (respectively F,_2 , F1 and F1+2 ) are selected in an act 500.
'-' and '+l denote intermediate values or estimates for the first and second predicted frames for the ith iteration. /' represents the iteration counter and ^' represents the optimal weight vector after the Z"1 iteration.
Before iterations begin, a preliminary act 510 of initialization is performed. p0 p0
The act 510 allows initializing '-' and '*' . Various solutions may be
implemented to initialize '-' and '+l (for example, initial values may be pre-set or pre-defined). In this illustrative embodiment according to the present invention, MCI (Motion Compensation Interpolation) is used. MCI is further described hereunder with reference to Figure 7. The iteration counter i is also initialized to 0. The iterative process is then started as follows. In an additional act 520, the iteration counter i is compared to a pre-defined maximum value iMax. If the
maximum value has not been reached, an optimal weight vector ^" is further derived in an act 530. This weight vector ^"allows computing the interpolated
values of frames '-' and /+l , respectively according to equations (2) and (3) here below:
+ ΣΣ ft(m + utn + v)xwb'(u9v)
(u,v)eNT '
+ ΣΣ f;::{m + e,n + f)xws-(e,f)
wherein any f(i,j) denotes the pixel value at (i,j) for frame F and (m,n) a pixel on the chosen supporting region of a frame.
In an additional embodiment of the method according to the invention,
Λι+ι r"+l the first and the second predicted frames at iteration i ( '-' and /+l ) may be
+ ΣΣ frx{m + e,n + f)xW, (e,f)
{e,f)eNs (4)
In equations (2) to (4), Wf (k>1) , WΛ' U>V) anc| wλ' eJ) represent the forward, backward, and the spatial weight components of the weight vector
^' corresponding to the previous, following, and current frames, respectively.
H'/ v > ) f ^"'^ are defined over Nτ as they correspond to the weights for the
previous and following frames, while w' {e' ' ' are defined over Ns as they correspond to the weights for the current frame in the interpolation functions of equations (2) to (4). These equations (2) to (4) can be seen as a weighted sum or interpolation functions applied to frames from the sequence of original low rate frames. In the present illustration, the interpolation functions are chosen as identical functions for equations (2), (3) and (4). More generally 3 interpolations functions may be chosen for (2), (3), and (4) respectively.
Furthermore, in equations (2), (3) and (4), the interpolation function is applied to the already defined pixels. This indeed corresponds to the spatial neighborhood Ns as illustrated in Figure 3.
Aι+ι
For instance, as '-' is defined through (2), Ns increases in size as more
and more pixels are defined in '-' , consequently the contribution of '-' pixels in its own definition increases. In an additional embodiment of the present method, the size of Ns is limited to NT, as mentioned before to be consistent with the size of the temporal neighborhood.
In a further embodiment, for instance if the pixels in a supporting region S are defined line by line or column by column, always in the same direction (left to right, right to left, top to bottom, ...), in the current spatial neighborhood,
only -[(2Z, + l)χ(2I + l)-l] pixels have been defined through equations (2), (3)
and (4). Therefore, Ns may be limited in size to the -[(2Z, + l)χ(2I + l)-l] defined
pixels.
In equation (4), one may see that F r', i+l is a weighted combination of the intermediate values of the first and second predicted frames from the previous iteration.
The weight components are variables that may be optimized through equation (5):
(5) wherein the argmin function give the optimal weight vector ^" minimizing the sum or distortion D FZ-1 - F^1 1 D2 + D FZ+1 - F^1 1 D2 +
Equation (5) allows optimizing the interpolation functions while minimizing the distortion between the new intermediate estimate and the intermediate estimate of the predicted frames (elements D F/_-, — FZ-+I D2 and
O Ff+I - Ff+i Cr respectively of (5)) and between the second frame and its
prediction (element D F, - FZ+1 D2 of (5)).
The resulting optimal weights
are used in a further act 540 to generate the intermediate values of the first and second predicted frames,
respectively <-' and (+1 , as well as the prediction of the second frame ' , for iteration i.
To decide whether the exemplary embodiment of the present method will be terminated in the current iteration i, a distortion taking into account the
frames '-' , /+1 and ' calculated in act 540 for iteration i, as well as the
Λ intermediate values of the first and second predicted frames, respectively FJ^
and F/+1 from iteration i-1. Specifically, the distortion D may be computed according to equation (6) as follows:
The distortion D is actually the minimum of the sum D FZ-1 — F^1 Dr +
Q F/+1 - F^1 Cr + UFt - F/+ Cr using the optimal weight components.
The distortion takes into account the distortion between the intermediate values of the predicted frames from one iteration to the other, as well as the distortion between the prediction of the second frame and the second frame itself.
If D is smaller than a preset threshold in act 560, or / is larger than the pre-defined maximum iteration number in act 520, the iteration is terminated and the current weight vector w' is set to be the ultimate weight vector of the
STAR model. Thus, the interpolated pixels values generated by w' are considered to be the final interpolated pixels values that defined the first and second predicted frames. Otherwise, / is increased by 1 and the self-feedback weight training algorithm is moved to the next iteration.
Figure 6 describes an illustrative embodiment of the self-feedback weight training process according to the present invention. The interpolated pixels in frames M and f+1 are used to approximate the actual pixels in the original low rate frame t. As shown in Figure 6, the actual pixels within temporal layer t may be approximated according to equation (4) described here above.
Figure 7 describes the MCI method. MCI is a very common FRUC method, which is performed block by block. It is composed of two processes: motion estimation and motion compensation. The motion estimation is performed by a bidirectional motion estimation method, which is shown as
Figure 7. Starting from the centre of the current to-be-interpolated block, bidirectional motion estimation finds two matching blocks in the previous and following frames in a reverse way. Next, the pixels in the to-be-interpolated frames are computed by motion compensation according to equation (7) as follows:
F!+°] (xiy) = -(Fι (x + v,,y + v) ) + Ft+2 (x - vx,y - vy )) (7.2)
where ^ x y Λs the motion vector found in the bidirectional motion estimation process.
Figure 8 describes an example of simulation performed using the method according to the invention, i.e. the STAR model. Various standard video sequences, with varying sizes (QCIF, CIF), have been tested in an attempt to shed some light on the quality of interpolated frames. To compare the proposed model with other FRUC algorithms, every other frame from test sequences listed in Figure 8 is skipped and interpolated by STAR model as well as other FRUC methods. In the self-feedback weight training method, the interpolated frames yielded by the MCI-FRUC method are set to be the initial pixels within frames M and f+1 before the iteration starts, it a similar way to what is done in the present method in relation with Figure 7. The interpolated frames are then compared with the accurate ones within original sequences. The peak signal-noise-ratio (PSNR) is used as an objective image quality measure to evaluate the performances of the interpolated frames yielded by MCI, OBMC-MCI, AOBMC-MCI1 and STAR model. The PSNR gains of OBMC- MCI, AOBMC-MCI, as well as STAR model compared with the MCI method, averaged over 50 frames within all the test sequences, are depicted in Figure 8. It may be seen that the gain of STAR model is very significant, especially for QCIF
sequence mobile and CIF sequence Flower, the gain can be up to 2.832dB and 1 ,824dB, respectively. This is because the mobile sequence is composed of camera motions such as zooming and panning and consequently traditional FRUC methods, e.g. MCI1 OBMC-MCI and AOBMC-MCI, prove to be inefficient to represent the camera motions. Flower sequence is full of non-rigid objects and consequently could not be approximated well based on the assumption that all the pixels within one block have unique motion, which are the foundation of traditional FRUC methods. For the majority of the test sequences, OBMC and AOBMC achieve higher PSNR than those of MCI method due to the application of multi-hypothesis and thus can alleviate the artifacts when the shape of the objects are not aligned with the blocks. AOBMC achieves better performance than OBMC, since it is capable of adjusting the coefficients adaptively according to the reliability of neighbouring blocks. However, the improvement of AOBMC is rather poor for some sequences, e.g. Miss American and Akiyo. In these two sequences, the motions of neighbouring blocks are very similar to that of the current block, thus the superiority is impaired. Since the proposed STAR model fully exploits the correlations both in spatial and temporal domains within adjacent frames, it outperforms MCI, OBMC- MCI, AOBMC-MCI, for all the test sequences, no matter what the format is.
The method according to the invention may be carried out by a device. Said device may be comprised in a system. As described in Figure 9, said system may comprise for instance an emitting device 900 (e.g. a encoder), a receiving device 920 (e.g. a decoder) and a network 910 for communicating between the emitting device 900 and the receiving device 920. The method according to the invention may be implemented either on the emitting device 900 or on the receiving device 920.
Claims
1. A method for adding frames in a digital video flow of frames, the added frames comprising a first predicted frame added between a first and a second frames of the digital video flow, and a second predicted frame added between a second and a third frames of said digital video flow, the method further comprising the acts of: a) initializing an intermediate estimate of the first and second predicted frames, and for each intermediate estimate of the first and second predicted frames: b1 ) defining a new intermediate estimate of the first predicted frame using a first interpolation function applied to the first and second frames, b2) defining a new intermediate estimate of the second predicted frame using a second interpolation function applied to the second and third frames, b3) defining a prediction of the second frame with a third interpolation function applied to the intermediate estimate of the first and second predicted frames, b4) optimizing the interpolation functions while minimizing the distortion between the new intermediate estimate and the intermediate estimate of the predicted frames and between the second frame and its prediction, b5) computing the intermediate estimates of the first and second predicted frames using the optimized first and second interpolation functions, the first and second predicted frames being set to the intermediate estimates of the first and second predicted frames when the distortion is smaller than a given threshold or a given number of intermediate estimates of the first and second predicted frames has been computed.
2. The method according to claim 1 , wherein the first, second and third interpolation function are identical.
3. The method according to one of the previous claims, wherein in acts b1 ), b2) and b3), the defining act is carried pixel by pixel, and wherein the interpolation function is also applied to the already defined pixels.
5 4. A interpolating device for adding frames in a digital video flow of frames, the added frames comprising a first predicted frame added between a first and a second frames of the digital video flow, and a second predicted frame added between a second and a third frames of the digital video flow, said device being operable to receive said digital video flow of frames, the device being o further operable to: a) initialize an intermediate estimate of the first and second predicted frames, and for each intermediate estimate of the first and second predicted frames: b1 ) define a new intermediate estimate of the first predicted frame using a first interpolation function applied to the first and second frames, 5 b2) define a new intermediate estimate of the second predicted frame using a second interpolation function applied to the second and third frames, b3) define a prediction of the second frame with a third interpolation function applied to the intermediate estimate of the first and second predicted frames, b4) optimize the interpolation functions while minimizing the distortion between the new intermediate estimate and the intermediate estimate of the predicted frames and between the second frame and its prediction, b5) compute the intermediate estimates of the first and second predicted frames using the optimized first and second interpolation functions, the first and second predicted frames being set to the intermediate estimates of the first and second predicted frames when the distortion is smaller than a given threshold or a given number of intermediate estimates of the first and second predicted frames has been computed.
5. The interpolating device according to claim 4, wherein the first, second and third interpolation function are identical.
6. The interpolating device according to any of the previous claims 4 and 5, wherein said device is further operable to carry out acts b1 ), b2) and b3) pixel by pixel, and wherein the interpolation function is also applied to the already defined pixels.
7. A system for adding frames in a digital video flow of frames, the added frames comprising a first predicted frame added between a first and a second frames of the digital video flow, and a second predicted frame added between a second and a third frames of said digital video flow, said system comprising:
- an emitting device operable to send said digital video flow,
- a receiving device operable to: a) initialize an intermediate estimate of the first and second predicted frames, and for each intermediate estimate of the first and second predicted frames: b1 ) define a new intermediate estimate of the first predicted frame using a first interpolation function applied to the first and second frames, b2) define a new intermediate estimate of the second predicted frame using a second interpolation function applied to the second and third frames, b3) define a prediction of the second frame with a third interpolation function applied to the intermediate estimate of the first and second predicted frames, b4) optimize the interpolation functions while minimizing the distortion between the new intermediate estimate and the intermediate estimate of the predicted frames and between the second frame and its prediction, b5) compute the intermediate estimates of the first and second predicted frames using the optimized first and second interpolation functions, the first and second predicted frames being set to the intermediate estimates of the first and second predicted frames when the distortion is smaller than a given threshold or a given number of intermediate estimates of the first and second predicted frames has been computed,
- a network operable to communicate between the emitting device and the receiving device
The system according to claim 7, wherein the first, second and third interpolation function are identical
The system according to any of the previous claims 7 and 8, wherein said system is further operable to carry the defining acts b1 ), b2) and b3) pixel by pixel, and wherein the interpolation function is also applied to the already defined pixels
A computer program providing computer executable instructions stored on a computer readable medium, which when loaded on to a data processor causes the data processor to perform a method for adding frames in a digital video flow of frames according to claims 1 to 3
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CNPCT/CN2008/072447 | 2008-09-22 | ||
| CN2008072447 | 2008-09-22 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2010032229A1 true WO2010032229A1 (en) | 2010-03-25 |
Family
ID=41510600
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/IB2009/055097 Ceased WO2010032229A1 (en) | 2008-09-22 | 2009-09-22 | Frame rate up conversion method and system |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2010032229A1 (en) |
Cited By (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2012001520A3 (en) * | 2010-06-30 | 2012-03-01 | France Telecom | Pixel interpolation method and system |
| CN102685438A (en) * | 2012-05-08 | 2012-09-19 | 清华大学 | Up-conversion method of video frame rate based on time-domain evolution |
| CN105392000A (en) * | 2015-10-29 | 2016-03-09 | 无锡天脉聚源传媒科技有限公司 | Alignment method and device of video frame rate conversion |
| CN106303546A (en) * | 2016-08-31 | 2017-01-04 | 四川长虹通信科技有限公司 | Conversion method and system in a kind of frame rate |
| CN108900856A (en) * | 2018-07-26 | 2018-11-27 | 腾讯科技(深圳)有限公司 | A kind of video frame rate prediction technique, device and equipment |
| CN116452628A (en) * | 2023-03-29 | 2023-07-18 | 上海顺久电子科技有限公司 | Image processing method, device, equipment and medium |
-
2009
- 2009-09-22 WO PCT/IB2009/055097 patent/WO2010032229A1/en not_active Ceased
Non-Patent Citations (1)
| Title |
|---|
| YONGBING ZHANG ET AL: "A Spatio-Temporal Autoregressive Frame Rate Up Conversion Scheme", IMAGE PROCESSING, 2007. ICIP 2007. IEEE INTERNATIONAL CONFERENCE ON, IEEE, PI, 1 September 2007 (2007-09-01), pages I - 441, XP031157773, ISBN: 978-1-4244-1436-9 * |
Cited By (10)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2012001520A3 (en) * | 2010-06-30 | 2012-03-01 | France Telecom | Pixel interpolation method and system |
| CN102685438A (en) * | 2012-05-08 | 2012-09-19 | 清华大学 | Up-conversion method of video frame rate based on time-domain evolution |
| CN102685438B (en) * | 2012-05-08 | 2015-07-29 | 清华大学 | A kind of up-conversion method of video frame rate based on time-domain evolution |
| CN105392000A (en) * | 2015-10-29 | 2016-03-09 | 无锡天脉聚源传媒科技有限公司 | Alignment method and device of video frame rate conversion |
| CN105392000B (en) * | 2015-10-29 | 2018-11-20 | 无锡天脉聚源传媒科技有限公司 | A kind of alignment schemes and device of video frame rate conversion |
| CN106303546A (en) * | 2016-08-31 | 2017-01-04 | 四川长虹通信科技有限公司 | Conversion method and system in a kind of frame rate |
| CN106303546B (en) * | 2016-08-31 | 2019-05-14 | 四川长虹通信科技有限公司 | Conversion method and system in a kind of frame rate |
| CN108900856A (en) * | 2018-07-26 | 2018-11-27 | 腾讯科技(深圳)有限公司 | A kind of video frame rate prediction technique, device and equipment |
| CN108900856B (en) * | 2018-07-26 | 2020-04-28 | 腾讯科技(深圳)有限公司 | Video frame rate prediction method, device and equipment |
| CN116452628A (en) * | 2023-03-29 | 2023-07-18 | 上海顺久电子科技有限公司 | Image processing method, device, equipment and medium |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Choi et al. | Motion-compensated frame interpolation using bilateral motion estimation and adaptive overlapped block motion compensation | |
| JP3393832B2 (en) | Image Data Interpolation Method for Electronic Digital Image Sequence Reproduction System | |
| EP1747678B1 (en) | Method and apparatus for motion compensated frame rate up conversion | |
| US6192079B1 (en) | Method and apparatus for increasing video frame rate | |
| EP0720383B1 (en) | Method and apparatus for detecting motion vectors in a frame decimating video encoder | |
| WO2010032229A1 (en) | Frame rate up conversion method and system | |
| EP0721284B1 (en) | An image processing system using pixel-by-pixel motion estimation and frame decimation | |
| US5862261A (en) | Current frame prediction method and apparatus for use in an image signal encoding system | |
| Zhang et al. | A spatio-temporal auto regressive model for frame rate upconversion | |
| KR100584597B1 (en) | Motion Estimation Method Applying Adaptive Weight and Frame Rate Conversion Apparatus | |
| CN102340664A (en) | Techniques for motion estimation | |
| JP4906864B2 (en) | Scalable video coding method | |
| Kubasov et al. | Mesh-based motion-compensated interpolation for side information extraction in distributed video coding | |
| Fujiwara et al. | Motion-compensated frame rate up-conversion based on block matching algorithm with multi-size blocks | |
| KR100393063B1 (en) | Video decoder having frame rate conversion and decoding method | |
| JP2006279917A (en) | Moving picture encoding apparatus, moving picture decoding apparatus, and moving picture transmission system | |
| US6061401A (en) | Method and apparatus for selectively encoding/decoding a video signal | |
| US9549184B2 (en) | Image prediction method and system | |
| WO2010049917A2 (en) | Image prediction method and system | |
| CN102204256B (en) | Image prediction method and system | |
| Min et al. | Side information generation using adaptive search range for distributed video coding | |
| Zhao et al. | Frame rate up-conversion based on edge information | |
| Pan et al. | Sparse spatio-temporal representation with adaptive regularized dictionaries for super-resolution based video coding | |
| Alfonso et al. | Bi-directionady motion-compensated frame-rate up-conversion for H. 264/AVC decoders | |
| Song et al. | High-resolution image scaler using hierarchical motion estimation and overlapped block motion compensation |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 09761007 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 09761007 Country of ref document: EP Kind code of ref document: A1 |