HK1091632A

HK1091632A - Non-integer pixel sharing for video encoding

Info

Publication number: HK1091632A
Application number: HK06112002.5A
Authority: HK
Inventors: S．莫洛伊; L．F．黄
Original assignee: 高通股份有限公司
Priority date: 2004-10-27
Filing date: 2006-11-01
Publication date: 2007-01-19

Description

Non-integer pixel sharing for video coding

Technical Field

The present disclosure relates to digital video processing, and more particularly to encoding of video sequences.

Background

Digital video capabilities can be incorporated into a wide range of devices including digital televisions, digital direct broadcast systems, wireless communication devices, Personal Digital Assistants (PDAs), laptop computers, desktop computers, digital cameras, digital recording devices, cellular or satellite radiotelephones, and the like. Digital video devices provide significant improvements over conventional analog video systems in creating, modifying, transmitting, storing, recording and playing full motion video sequences.

Many different video coding standards have been established for the coding of digital video sequences. For example, the motion Picture experts group Standard (MPEG) has developed a number of standards including MPEG-1, MPEG-2, and MPEG-4. Other standards include the International Union of telecommunication (ITU) H.263 standard, QuickTime developed by apple computer, Inc. of Cupertino California^TMTechnology, Window developed by Microsoft corporation of Redmond (Redmond) of Washington^TMVideo, Indeo developed by intel corporation^TMRealVideo of RealNetworks, Seattle, Washington^TMAnd Cinepak developed by SuperMac corporation^TM. New standards continue to emerge and evolve, including the itu h.264 standard and a number of proprietary standards.

Many video coding standards allow for improved transmission rates of video sequences by encoding data in a compressed manner. Compression can reduce the total amount of data that needs to be transmitted to efficiently transmit video frames. For example, most video coding standards utilize designed graphics and video compression techniques to facilitate the transmission of video and images over narrower bandwidths, as compared to the bandwidth occupied without compression.

For example, the MPEG standards and ITU h.263 and ITU h.264 standards support video coding techniques that exploit similarities between successive video frames, referred to as temporal or inter-frame correlation, to provide inter-frame compression. The inter-frame compression technique exploits data redundancy across frames of video by converting pixel-based representations to motion representations. In addition, some video coding techniques may further compress video frames using similarities within frames referred to as spatial or intra-frame correlation.

To support compression, a digital video device typically includes an encoder to compress a digital video sequence, and a decoder to decompress the digital video sequence. In many cases, the encoder and decoder form an integrated encoder/decoder (CODEC) that operates on blocks of pixels within frames that define a sequence of video images. For example, in the MPEG-4 standard, the encoder typically divides a video frame to be transmitted into "macroblocks" comprising a 16 x 16 array of pixels. The ITU h.264 standard supports 16 × 16 video blocks, 16 × 8 video blocks, 8 × 16 video blocks, 8 × 8 video blocks, 8 × 4 video blocks, 4 × 8 video blocks, and 4 × 4 video blocks.

For each video block in a video frame, the encoder looks for similarly sized video blocks of one or more immediately preceding video frames (or subsequent frames) to identify the most similar video block, which is referred to as "best prediction". The process of comparing the current video block to the video blocks of other frames is commonly referred to as motion estimation. Once the "best prediction" for a video block is identified, the encoder may encode the difference between the current video block and the best prediction. The process of encoding the difference between the current video block and the best prediction includes a process known as motion compensation. Motion compensation involves the process of creating a difference block that represents the difference between the current video block to be encoded and the best prediction. Motion compensation generally refers to the act of taking the best predicted block using a motion vector and then subtracting the best predicted block from an input block to produce a difference block.

After the motion compensation has created the difference block, a series of further encoding steps are typically performed to encode the difference block. These additional encoding steps may depend on the encoding standard used. In an MPEG4 compatible encoder, for example, these additional encoding steps may include an 8 x 8 discrete cosine transform, followed by scalar quantization (scalar quantization), followed by raster-to-zigzag (raster-to-zigzag) rearrangement, followed by run-length encoding, followed by huffman encoding.

The encoded difference block may be transmitted along with a motion vector that indicates which video block from the previous frame was used for encoding. A decoder receives the motion vector and the encoded difference block and decodes the received information to reconstruct the video sequence.

In many standards, half-pixel values are also generated during motion estimation and motion compensation. For example, in MPEG4, a half-pixel value is generated as an average pixel value between two adjacent pixels. The half-pixels used in the candidate video block may form part of the best prediction identified in the motion estimation process. Relatively simple 2-tap (tap) filters can be used to generate the half-pixel values as they are needed in the motion estimation and motion compensation process. The generation of non-integer pixel values may improve the resolution of intra-frame correlations, but typically complicates the encoding and decoding processes.

Disclosure of Invention

This disclosure describes video encoding techniques and video encoding devices that implement these techniques. The described video encoding techniques may be used for a wide variety of encoding standards that allow the use of non-integer pixel values in motion estimation and motion compensation. In particular, video coding standards, such as the ITU h.264 standard, that utilize half-pixel and quarter-pixel values in motion estimation and motion compensation may specifically benefit from the techniques described herein. More generally, any standard that specifies a 3-tap or more-tap filter in a given dimension, such as the generation of non-integer pixel values in the vertical or horizontal directions, may benefit from the techniques described herein. These techniques are particularly useful for portable devices where processing overhead can greatly affect the size of the device and the drain on the battery.

In one embodiment, the present disclosure describes a video encoding device comprising a motion estimator that generates motion estimated non-integer pixel values, the motion estimator comprising a filter that receives an input of at least 3 integer pixel values. The apparatus also includes a memory that stores the non-integer pixel values generated by the motion estimator, and a motion compensator that uses the stored non-integer pixel values for motion compensation. For example, to be compatible with the ITU h.264 standard, the motion estimator may generate half-pixel values using a 6-tap filter and store these half-pixel values for simultaneous application in motion estimation and motion compensation. The motion estimator may also generate a quarter-pixel value using a 2-tap filter and use the quarter-pixel value in motion estimation without storing the quarter-pixel value for motion compensation. In this case, the motion compensator uses the stored half-pixel values generated by the motion estimator, but regenerates the quarter-pixel values using another 2-tap filter. In some cases, separate filters may be implemented for both horizontal and vertical interpolation, but the output of any large filter (3-or more-taps) may be reused for motion estimation and motion compensation. In other cases, the same large filter may be used for both horizontal and vertical interpolation. But in these cases the clock speed of the encoding device may need to be increased.

These and other techniques described herein may be applied in a digital video apparatus in hardware, software, firmware, or a combination thereof. If implemented in software, the techniques are directed to a computer readable medium including program code that, when executed, performs one or more of the encoding techniques described herein. Further details of various embodiments are set forth in the accompanying drawings and the description below. Other features, objects, and advantages will be apparent from the description and drawings, and from the claims.

Drawings

Fig. 1 is a block diagram illustrating an exemplary system in which a source digital video device transmits an encoded sequence of video data to a receiving digital video device.

Fig. 2 is an exemplary block diagram of a device including a video encoder.

Fig. 3 is another exemplary block diagram of a device including a video encoder.

FIG. 4 is a diagram of an exemplary search space formed around a location corresponding to a 4 pixel by 4 pixel video block.

FIG. 5 is a diagram of an exemplary search space including columns of half-pixel values.

FIG. 6 is a diagram of an exemplary search space including multiple rows and columns of half-pixel values.

FIG. 7 is a diagram of a search space and various pixels that can be generated from the search space to support decoding.

Fig. 8 is a flow diagram illustrating a video encoding technique.

Detailed Description

Fig. 1 is a block diagram illustrating an exemplary system 10 in which a source device 12 transmits an encoded sequence of video data to a sink device 14 via a communication link 15. Source device 12 and sink device 14 are both digital video devices. In particular, source device 12 encodes video data in a video standard, such as the itu h.264 video coding standard, that allows non-integer pixel values in motion estimation and motion compensation. System 10 implements these techniques where non-integer pixel values are generated, stored, and used for both motion estimation and motion compensation. This eliminates the need for large filters in the motion estimator and motion compensator to produce the same non-integer pixel values. The techniques described herein are particularly useful using any published or proprietary standard that specifies a 3-tap filter or more in the generation of non-integer pixel values for a given dimension, such as vertical or horizontal interpolation. However, according to the present disclosure, any non-integer pixel values produced by the smaller filter (2-tap filter) may be produced when needed, and it is not necessary to store these values for later use.

The communication link 15 may comprise a wireless link, a physical transmission line, an optical fiber, a packet-switched network such as a local area network, a wide area network, or a global network such as the internet, a Public Switched Telephone Network (PSTN), or any other communication link capable of transmitting data. Thus, communication link 15 represents any suitable communication medium, or possible combination of different networks and links, to transmit video data from source device 12 to sink device 14.

Source device 12 may be any digital video device capable of encoding and transmitting video data. Source device 12 may include a video memory 16 that stores digital video sequences, a video encoder 18 that encodes the sequences, and a transmitter 20 that transmits the encoded sequences to receiving device 14 over communication link 15. Video encoder 18 may include, for example, various hardware, software, or firmware, or one or more Digital Signal Processors (DSPs) executing programmable software modules to control video encoding techniques, as described herein. Related memory and logic circuitry may be provided to support DSP controlled video coding techniques. As will be described, video encoder 18 may be configured to generate non-integer pixel values that may be used for both motion estimation and motion compensation.

The source device 12 may also include a video capture device 23, such as a video camera, to capture video sequences and store the captured sequences in the memory 16. In particular, the video capture device 23 may comprise a Charge Coupled Device (CCD), a charge injection device, a photodiode array, a Complementary Metal Oxide Semiconductor (CMOS) device, or any other photosensitive device capable of capturing video images or digital video sequences.

As a further example, the video capture device 23 may be a video converter that converts analog video data from, for example, a television, a video cassette recorder, a video camera, or another video device into digital video data. In some embodiments, source device 12 may be configured to transmit real-time video sequences over communication link 15. In this case, receiving device 14 may receive the real-time video sequence and display the video sequence to the user. Alternatively, source device 12 may capture and encode video sequences, which may be transmitted to sink device 14 as video data files, i.e., in a non-real-time manner. In this way, source device 12 and sink device 14 may support applications such as video clip playback, video mail, or video conferencing such as over a mobile wireless network. Devices 12 and 14 may include various other elements not specifically shown in fig. 1.

Receiving device 14 may take the form of any digital video device that receives and decodes video data. For example, the receiving device 14 may include a receiver 22 that receives the encoded digital video sequence from the transmitter 20, such as through an intermediate link, a router, other network device, or the like. The receiving device 14 may also include a video decoder 24 to decode the sequence and a display device 26 to display the sequence to the user. In some embodiments, however, the receiving device 14 may not include an integrated display device 14. In such a case, the receiving device 14 may act as a receiver to decode the received video data to drive a separate display device, such as a television or monitor.

Exemplary devices of source device 12 and sink device 14 may include servers, workstations, or other desktop computing devices located on a computer network, as well as mobile computing devices such as laptop computers or Personal Digital Assistants (PDAs). Other examples include digital television broadcast satellites and receiving devices such as digital television, digital cameras, digital video cameras or other digital recording devices, digital video telephones such as mobile telephones having video capabilities, direct two-way communication devices having video capabilities, other wireless video devices, and so forth.

In some cases, source device 12 and sink device 14 each include a coder/decoder (CODEC) (not shown) to encode and decode digital video data. In particular, source device 12 and sink device 14 each include a transmitter, a receiver, and a memory and display. Many of the encoding techniques listed below are described in the context of a digital video device that includes an encoder. It should be understood that the encoder may form part of the CODEC. In this case, the CODEC may be implemented in hardware, software, firmware, DSP, microprocessor, Application Specific Integrated Circuit (ASIC), Field Programmable Gate Array (FPGA), separate hardware components, or various combinations thereof. In addition, the encoding techniques described herein allow various digital filters or hardware components to be applied to both encoding and decoding operations.

Video encoder 18 within source device 12 operates on blocks of pixels within a series of video frames to encode the video data. For example, video encoder 18 may perform motion estimation and motion compensation techniques in which a video frame being transmitted is divided into blocks of pixels (referred to as video blocks). To illustrate, these video blocks may comprise blocks of any size and may vary within a given video sequence. As one example, the ITU h.264 standard supports 16 × 16 video blocks, 16 × 8 video blocks, 8 × 16 video blocks, 8 × 8 video blocks, 8 × 4 video blocks, 4 × 8 video blocks, and 4 × 4 video blocks. Smaller video blocks may provide better resolution in encoding, and are particularly useful for the positioning of video frames that include higher layers of detail. Also, as described below, video encoder 18 may be designed to operate on 4 x 4 video blocks in a pipelined manner, with larger video blocks being reconstructed from the 4 x 4 video blocks if desired.

Each pixel in a video block may be represented by an n-bit value, such as 8 bits, which defines the visual characteristics of the pixel, such as color and luminance, as represented by chrominance and luminance values. But since human vision is more sensitive to changes in luminance than to changes in chrominance, motion estimation is often performed only in the luminance portion. Thus, for motion estimation, the overall n-bit value may quantify the luminance of a given pixel. The principles herein are not limited to this pixel format and may be extended to pixel formats using simpler fewer bits or more complex more bits.

For each video block in the video frame, video encoder 18 of source device 12 performs motion estimation by searching video blocks stored in memory 16, which are one or more previous video frames (or subsequent video frames) that have been transmitted, to identify a similar video block. After determining a "best prediction" from the previous or subsequent video frames, video encoder 18 performs motion compensation to create a difference block that indicates the difference between the video block currently to be encoded and the best prediction. Motion compensation generally refers to the act of taking the best prediction using a motion vector and then subtracting the best prediction from an input block to produce a difference block.

After the motion compensation process has created the difference block, a series of further encoding steps are typically performed to encode the difference block. These additional encoding steps depend on the encoding standard used. For example, in an MPEG4 compatible encoder, additional encoding steps may include an 8 x 8 discrete cosine transform, followed by scalar quantization, followed by raster-to-zigzag (raster-to-zigzag) reordering, followed by run-length encoding, followed by huffman encoding.

Once encoded, the encoded difference block may be transmitted along with motion vectors that identify the video block from the previous frame (or subsequent frame) used for encoding. In this manner, instead of encoding each frame as an independent image, video encoder 18 encodes the difference between adjacent frames. Such techniques significantly reduce the amount of data required to accurately represent each frame of a video sequence.

The motion vector may define a pixel location relative to the upper left corner of the video block being encoded, although other formats for the motion vector may be used. In any case, by encoding video blocks with motion vectors, the bandwidth used for transmission of the video data stream can be greatly reduced.

In some cases, video encoder 18 supports intra-coding in addition to inter-coding. Intra-frame coding exploits intra-frame similarities, referred to as spatial or intra-frame correlation, to further compress video frames. Intra-frame compression is typically based on texture coding, such as Discrete Cosine Transform (DCT) coding, of compressed still images. Intra-frame compression is often used in conjunction with inter-frame compression, but may also be used as an alternative in some implementations.

Receiver 22 of receiving device 14 may receive encoded video data in the form of motion vectors and encoded difference blocks representing the encoded differences between the video block being encoded and the best prediction used in motion estimation. Decoder 24 performs video decoding to produce a video sequence that is displayed to a user via display device 26. Decoder 24 of receiving device 14 may also be implemented as an encoder/decoder (CODEC). In this case, both source device 12 and sink device 14 are capable of decoding, transmitting, receiving, and decoding digital video sequences.

In accordance with the present disclosure, non-integer pixel values generated in a given dimension (horizontal or vertical) from 3 or more input pixel values during video encoding may be stored on a local memory of video encoder 18 and then used for both motion estimation and motion compensation. The stored non-integer pixel values may be individually buffered or assigned to any particular memory location, so long as the non-integer pixel values can be located and identified as needed. In contrast, non-integer pixel values generated in a given dimension from two input pixel values do not require any significant amount of time storage, but can be generally computed for motion estimation or motion compensation if desired.

Fig. 2 is an exemplary block diagram of device 12A, device 12A including video encoder 18A. Device 12A in fig. 2 may correspond to device 12 in fig. 1. As shown in fig. 2, device 12A includes a video encoder 18A for encoding video sequences and a video memory 16A for storing video sequences before and after encoding. Device 12A may also include a transmitter 20A that transmits the encoded sequence to another device, and possibly a video capture device 23A, such as a video camera, to capture the video sequence and store the captured sequence in memory 16A. The various elements of device 12A are communicatively coupled via a communication bus 35A. Various other elements, such as intra-frame encoding devices, various filters, or other elements may also be included in the apparatus 12A, but for simplicity these devices are not specifically shown.

Video memory 16A typically includes a relatively large memory space. For example, video memory 16A may comprise Dynamic Random Access Memory (DRAM) or FLASH memory. In other examples, video memory 16A may include non-volatile memory or any other data storage device.

Video encoder 18A includes a local memory 25A, which may include a smaller, faster memory space relative to video storage 16A. By way of example, the local memory 25A may include Synchronous Random Access Memory (SRAM). Local memory 25A may also include "on-chip" memory integrated with other components of video encoder 18A to provide very fast data access during the processor's high-intensity encoding process. During the encoding of a given video frame, the current video block being encoded may be loaded from video memory 16A into local memory 25A. The search space for locating the best prediction may also be loaded from video memory 16A to local memory 25A. The search space may include a subset of pixels of one or more previous video frames (or subsequent frames). The selected subset may be pre-identified as a similar location of the best prediction identification that closely matches the current video block being encoded.

In many video standards, fractional pixels or non-integer pixels are also considered in the encoding process. For example, in MPEG-4, a half-pixel value is calculated as an average value between two adjacent pixels. In an MPEG-4 compatible encoder, the average between two adjacent pixels can be easily generated in a given dimension using a simpler digital filter with two inputs and one output, if desired, which is commonly referred to as a 2-tap digital filter.

By way of example, in the case of simple MPEG2 or MPEG4, a 2-tap digital filter may be applied in each dimension if the interpolation is done both horizontally and vertically. Alternatively, the interpolation in two dimensions can be done as a single 4-tap averaging filter. The techniques described herein would be very useful when the filter specifies more than 2 inputs in a given dimension, or more than 5 inputs for two-dimensional interpolation.

The tap weights of the digital filter are specified by the coding standard. To support the MPEG-4 standard, motion estimator 26A and motion compensator 28A may include similar 2-tap digital filters that generate half-pixel values for the horizontal and vertical dimensions at any time using integer pixel values downloaded into the search space of local memory 25A.

However, for some newer standards, it is more complicated to generate non-integer pixels. For example, many newer criteria specify that half-pixel values be generated in a given dimension based on a weighted sum of more than two pixels. As a specific example, the ITU h.264 standard specifies that half-pixel values are calculated as weighted averages between 6 pixels in both the horizontal and vertical dimensions. For fractional horizontal pixels, the 3 pixels on the left of the half-pixel value are weighted similarly to the 3 pixels on the right of the half-pixel value. For fractional vertical pixels, the top 3 pixels at the half-pixel value are weighted similarly to the bottom 3 pixels of the half-pixel value. In both cases, a filter with 6 inputs and one output (6-tap digital filter) is typically required to produce the half-pixel value.

In addition, the ITU h.264 standard also provides for the generation of 1/4 pixel values, which are calculated as the average of one integer pixel and one adjacent half-pixel. Thus, the generation of 1/4 pixel values typically includes using a 6-tap filter to generate half pixel values and then using a 2-tap filter to generate 1/4 pixel values. Many proprietary standards also utilize other weighted averaging rules for non-integer pixel generation, which adds significant complexity to the generation of non-integer pixel values.

In accordance with the present disclosure, non-integer pixel values generated from 3 or more input pixel values in a given dimension may be stored as part of the search space in local memory 25A. The stored non-integer pixel values may be cached separately or assigned to any particular storage location, so long as the non-integer pixel values can be located and identified when needed. In contrast, non-integer pixel values generated from two input pixel values need not be stored for a significant amount of time, but can be calculated as usual when needed.

The present disclosure recognizes a tradeoff between the additional storage space in local memory 25A required to store any non-integer pixel values for a significant amount of time and the hardware or processing power required to filter the input and generate the non-integer pixel values. The 2-tap filter is very simple to implement in one dimension, so the 2-tap filter can be used in many locations of the video encoder to generate non-integer pixel values from two inputs, if desired. However, filters with more than 3 inputs in one dimension, particularly 6-tap filters for compliance with the ITU h.264 standard, are more complex. When these larger filters are needed, it is further advantageous to implement a single filter that receives 3 or more inputs, and then store or buffer the output of the larger filter in local memory 25A for reuse in the encoding process if needed.

For example, video encoder 18A includes a motion estimator 26A and a motion compensator 28A that perform motion estimation and motion compensation, respectively, during the video encoding process. As shown in fig. 2, motion estimator 26A and motion compensator 28A each include one or more non-integer pixel computation units 32A and 36A. Non-integer pixel computation units 32A and 36A may include one or more digital filters. However, although 2-tap filters may be duplicated in both non-integer pixel computation units 32A and 36A, any N-tap filter (where N represents an integer greater than or equal to 3) may be implemented in only one of units 32A and 36A. The output of a filter having more than 3 inputs may be stored in local memory 25A for utilization and reuse in later encoding processes.

In some cases, separate filters may be implemented for both horizontal and vertical interpolation, but the output of any large filter (3-taps or more) may be reused for motion estimation and motion compensation. In other cases, the same large filter may be used for both horizontal and vertical interpolation, and the output of the large filter may be stored for both motion estimation and motion compensation. But under other conditions the clock speed may need to be increased because a single filter is used for both horizontal and vertical interpolation, which may increase power consumption.

Local memory 25A is loaded with the current video block to be encoded and a search space, which includes some or all of the one or more different video frames used in inter-coding. Motion estimator 26A compares the current video block to various video blocks in the search space to identify a best prediction. In some cases, however, a proper match for encoding may be identified more quickly without specifically checking each possible candidate, in which case, while sufficient for efficient video encoding, the proper match may not actually be the "best" prediction.

Motion estimator 26A supports encoding schemes that utilize non-integer pixel values. In particular, non-integer pixel computation unit 32A may generate non-integer pixel values that expand the search space to fractional or non-integer pixel values. Both horizontal and vertical non-integer pixel values may be generated. Any non-integer pixel values generated from the two inputs may be used and then discarded or overwritten in local memory 25A, as these non-integer pixel values generated from the two inputs may be easily regenerated when needed. However, any non-integer pixel values generated from 3 or more inputs may be used and maintained in local memory 25A for later use in the encoding process, as the generation and regeneration of these non-integer pixel values generated from 3 or more inputs is more complex.

Video block matching unit 34A performs comparisons between the current video block to be encoded and candidate video blocks in the search space of memory 25A, including any candidate video blocks that include non-integer pixel values generated by non-integer pixel calculation unit 32A. For example, video block matching unit 34A may include a difference processor, or a software routine, that performs difference computations to identify the best prediction (or simply an appropriate prediction).

As an example, video block matching unit 34A may perform SAD techniques (sum of absolute differences techniques), SSD techniques (sum of squared differences techniques), or other comparison techniques as desired. SAD techniques include the task of performing absolute difference calculations between the pixel values of the current video block to be encoded and the pixel values of candidate video blocks compared to the current video block. The results of these absolute difference calculations are added, i.e., accumulated, to define a difference value indicative of the difference between the current video block and the candidate video block. For an 8 x 8 pixel image block 64 differences may be calculated and added, and for a 16 x 16 pixel macroblock 256 differences may be calculated and added. The total sum of all computations may define the difference value of the candidate video block.

A lower difference value generally indicates that a candidate video block is a better match, and thus, a better candidate, than other candidate video blocks that produce higher difference values, i.e., increased distortion. In some cases, these calculations may be terminated when the accumulated difference exceeds a defined threshold, or when a sufficient match is identified at an early stage, even if other candidate video blocks have not been considered.

SSD techniques also include the task of performing difference computations between pixel values of the current video block to be encoded and pixel values of candidate video blocks. However, in the SSD technique, the result of the difference calculation is squared and then the squared values are added, i.e., accumulated, to define a difference value indicating the difference between the current video block and the candidate video block to which the current macroblock is compared. Alternatively, video block matching unit 34A may utilize other comparison techniques, such as Mean Square Error (MSE), normalized cross-correlation function (NCCF), or other suitable comparison algorithms.

Finally, video block matching unit 34A may identify a "best prediction" that is the candidate video block that best matches the video block to be encoded. It will be appreciated that in many cases a sufficient match may be located before the best prediction, in which case the sufficient match may be used for encoding. In the following description, reference is made to "best prediction" identified by video block matching unit 34A, but it is understood that the disclosure is not limited in this respect and that any sufficient matches may be utilized and may be identified faster than the best prediction.

In some embodiments, video block matching unit 34A may be performed in a pipelined manner. For example, video block matching unit 34A may include one processing pipeline that processes more than one video block simultaneously. Additionally, in some cases, the processing pipeline may be designed to operate on 4 pixel by 4 pixel video blocks even if the size of the video block to be encoded is greater than a 4 pixel by 4 pixel video block. In this case, the difference computations for adjacent sets of 4 pixel by 4 pixel candidate video blocks may be added to represent the difference computations for a larger video block, such as a 4 pixel by 8 pixel video block including two 4 pixel by 4 pixel candidate blocks, an 8 pixel by 4 pixel video block including two 4 pixel by 4 pixel candidate blocks, an 8 pixel by 8 pixel video block including four 4 pixel by 4 pixel candidate blocks, an 8 pixel by 16 pixel video block including eight 4 pixel by 4 pixel candidate blocks, a 16 pixel by 8 pixel video block including eight 4 pixel by 4 pixel candidate blocks, a 16 pixel by 16 pixel video block including sixteen 4 pixel by 4 pixel candidate blocks, and so forth.

In any case, once the best prediction for a video block is identified by video block matching unit 34A, motion compensator 28A creates a difference block that represents the difference between the current video block and the best prediction. Video block encoder 39A may further encode the difference block to compress the difference block and transmit the encoded difference block for transmission to another device along a motion vector that indicates which candidate video block from the search space was used for encoding. Since the specific components will vary depending on the particular standard being supported, for simplicity of description, the additional components used to perform encoding after motion compensation are summarized as difference block encoder 39A. In other words, the difference block encoder 39A may perform one or more conventional encoding techniques on the difference block generated as described above.

Motion compensator 28A includes a non-integer pixel computation unit 36A that produces any non-integer pixels for which a best prediction is made. As already listed, however, non-integer pixel computation unit 36A of motion compensator 28A includes only 2-tap digital filters of a given dimension and typically does not include larger digital filters, since the output of any larger digital filters of non-integer pixel computation unit 32A of motion estimator 26A are stored in local memory 25A for use in both motion estimation and motion compensation. Thus, the need to implement a digital filter requiring 3 or more inputs in a given dimension in motion compensator 28A may be avoided.

Difference block calculation unit 38A generates a difference block that generally represents the difference between the current video block and the best prediction. This difference block may also be referred to as a "prediction matrix" or a "residual". The difference block is typically a matrix of values representing the differences between the best prediction and the pixel values of the current video block. Namely:

difference block-best predicted pixel value-pixel value of current video block

The video block encoder 39 encodes the difference block to compress the difference block, and then the encoded video block is transmitted to the transmitter 20A for transmission to another device. In some cases, the encoded video blocks may be temporarily stored in video memory 16A, where the encoded video blocks are accumulated and then transmitted as a stream of video frames by transmitter 20A. In any case, the encoded video block may take the form of an encoded difference block and a motion vector. The difference block represents the difference between the pixel values of the best prediction and the current video block. The motion vector identifies the location of the best prediction, whether within a frame or within a fractional pixel generated by the frame. In different video standards, there are different ways to identify the frame to which the motion vector is applied. For example, in the h.264 standard, a reference picture index is used, which is carried in macroblock header information in MPEG4 or MPEG 2.

Fig. 3 is another exemplary block diagram of device 12B, which includes a video encoder 18B. Device 12B in fig. 3 may correspond to device 12 in fig. 1, similar to device 12A in fig. 2. Device 12B in fig. 3 represents a more specific embodiment than device 12A shown in fig. 2. For example, device 12B may be compatible with the ITU h.264 video coding standard.

As shown in fig. 3, device 12B includes a video encoder 18B that encodes video sequences and a video memory 16B that stores video sequences before and after encoding. The device 12B may also include a transmitter 20B that transmits the encoded sequence to another device and possibly a video capture device 23B, such as a video camera, to capture the video sequence and store the captured sequence in the memory 16B. The various elements of device 12B may be communicatively coupled via a communication bus 35B. Various other components, such as intra-frame encoder components, various filters, or other components may also be included in device 12B, but are not specifically shown for simplicity of description.

Video memory 16B typically includes a relatively large memory space. For example, video memory 16B may include DRAM, FLASH memory, possibly non-volatile memory, or any other data storage device.

Video encoder 18B includes a local memory 25B, and memory 25B may include a smaller and faster memory space than video storage 16B. By way of example, the local memory 25B may include Synchronous Random Access Memory (SRAM). Local memory 25B may also include "on-chip" memory integrated with other components of video encoder 18B to provide fast data access during the processor intensive encoding process. During the encoding of a given video frame, the current video block being encoded may be loaded from video memory 16B into local memory 25B.

Motion estimator 26B compares the current video block to various video blocks in the search space to identify a best prediction. Motion estimator 26B supports the ITU h.264 coding scheme with half-pixel values and 1/4 pixel values. Specifically, the non-integer pixel calculation unit 32B may include a half-pixel interpolated 6-tap filter 31, and an 1/4 pixel interpolated 2-tap filter 33. Horizontal half-pixel and 1/4 pixel values and vertical half-pixel and 1/4 pixel values may be generated.

According to the ITU h.264 video coding standard, the 6-tap filter 31 generates half-pixel values as a weighted average of 6 consecutive pixels. 1/4 pixel values are generated by the 2-tap filter 33 as an average of an integer pixel value and an adjacent half pixel value. In other words, the tap weights of the filter may be specified by the ITU h.264 video coding standard, although the disclosure is not limited in this respect.

In some cases, separate 6-tap filters are implemented in motion estimator 26B for both horizontal and vertical interpolation, and the outputs of both 6-tap filters can be used for motion estimation and motion compensation. In other cases, the same 6-tap filter may be used for both horizontal and vertical interpolation. But in the latter case the clock speed needs to be increased, which will increase the power consumption. Thus, it may be more desirable to employ two 6-tap digital filters for separate horizontal and vertical interpolation in motion estimation, and then reuse the outputs of the two 6-tap digital filters for horizontal and vertical interpolation in motion compensation. Regardless of whether motion estimator 26B implements two 6-tap filters for horizontal and vertical half-pixel interpolation or uses a single 6-tap filter for both horizontal and vertical half-pixel interpolation, a single 2-tap digital filter for 1/4 pixel interpolation may be implemented in each of motion estimator 26B and motion compensator 28B. However, additional 2-tap filters may be included to increase processing speed.

In any event, the half-pixel output of the 6-tap filter 31 is available for both motion estimation and motion compensation in accordance with the present disclosure. In other words, the half-pixel output of the 6-tap filter 31 is used for motion estimation and then stored in the memory 25B for subsequent use in motion compensation. In contrast, the 1/4 pixel output of 2-tap filter 33 is used for motion estimation only and is then discarded or overwritten in memory 25B.

Video block matching unit 34B performs comparisons between the current video block to be encoded and candidate video blocks in the search space of memory 25B, including any candidate video blocks that include 1/4 or half-pixel values generated by non-integer pixel calculation unit 32B. For example, video block matching unit 34B may include a difference processor, or a software routine, that performs difference computations to identify the best prediction (or just a sufficient prediction). As an example, video block matching unit 34A may perform SAD techniques, SSD techniques, or other comparison techniques such as Mean Square Error (MSE), normalized cross-correlation function (NCCF), or other suitable comparison algorithms.

Finally, video block matching unit 34B may identify a "best prediction" that is the candidate video block that best matches the video block to be encoded. In some embodiments, video block matching unit 34B may be performed in a pipelined manner. For example, video block matching unit 34B may include a processing pipeline that processes more than one video block simultaneously. Additionally, in some cases, the processing pipeline may be designed to operate on 4 pixels by 4 pixels, even if the video block size to be encoded is greater than a 4 pixel by 4 pixel video block. In this pipelined embodiment, once a pixel is considered in the pipeline, the memory allocated for storage of 1/4 pixels can be rewritten, which can reduce the amount of memory required. Of course, as outlined herein, the half-pixel values may be stored for later use.

Once a best prediction for a video block is identified by video matching unit 34B, motion compensator 28B may create a difference block that represents the difference between the current video block and the best prediction. Motion compensator 28B may then transmit the difference block to difference block encoder 39B, encoder 39B performing various additional encoding supported by the ITU h.264 coding standard. The difference block encoder 39B transmits the encoded difference block to the transmitter 20B over the bus 35B for transmission to another device along a motion vector indicating which video block was used for the encoding.

Motion compensator 28B includes a non-integer pixel computation unit 36B to generate any non-integer pixels for the best prediction that are not stored in local memory 25B. A motion compensator: the non-integer pixel computation unit 36B of 28B includes only one 2-tap digital filter 37 that produces 1/4 pixel values and typically does not include a 6-tap digital filter that produces half-pixel values because the half-pixel output of the 6-tap digital filter 31 of the motion estimator 26B is stored in local memory 25B for both motion estimation and motion compensation. Thus, the need to use a 6-tap digital filter in motion compensator 28B may be avoided. Also, a 2-tap digital filter can be implemented very easily without requiring a large chip circuit area. In contrast, a 6-tap digital filter is more complex. Therefore, it is worth the additional memory space required to buffer the half-pixel output of the 6-tap digital filter 31 for a significant amount of time during the encoding process of a given video block, as this eliminates the need for an additional 6-tap digital filter.

Difference block calculation unit 38B generates a difference block that generally represents the difference between the current video block and the best prediction. Also, the difference block is typically computed as follows:

difference block-best predicted pixel value-pixel value of current video block

The motion compensator 28B sends the difference block to the difference block encoder 39B, which encoder 39B encodes and compresses the difference block and sends the encoded difference block to the transmitter 20B for transmission to another device. The transmitted information may take the form of coded difference blocks and motion vectors. The difference block represents the difference between the pixel values of the best prediction and the current video block. The motion vector identifies the location of the best prediction, whether it is within a frame or within a fractional pixel generated from the frame.

Fig. 4 is a diagram of an exemplary search space 40 formed around a location corresponding to a 4 pixel by 4 pixel video block. In particular, the search space 40 may include pixels of a preceding or following video frame. The current video block to be encoded may comprise a 4 pixel by 4 pixel video block of the current frame that corresponds to the location of the center-most pixel 42 of search space 40.

Fig. 5 is a diagram of an exemplary search space 50 including columns of half-pixel values. The pixel value labeled "Bxx" corresponds to a horizontal half-pixel value, which, as described herein, may be generated by a 6-tap digital filter. For example, pixel B00 may include a weighted average of pixels A00-A05 (FIG. 4). The tap weights of the filter may define the weighting of given different integer pixels and may be specified by the standard being supported. The horizontal half-pixel value labeled "Bxx" may be stored in local memory, as described herein, and reused for both motion estimation and motion compensation. The actual storage scheme may vary in different applications. In one example, a horizontal buffer is maintained in local memory, specifically for storing horizontal half-pixel values, i.e., those labeled "Bxx".

Fig. 6 is another illustration of an exemplary search space 60 including a plurality of rows and columns of half-pixel values. As described herein, the pixel values labeled "Cxx" correspond to vertical half-pixel values and may be generated by a 6-tap digital filter of the motion estimator. For example, pixel C00 may include a weighted average of pixels A02-A52 (FIG. 5), and pixel C01 may include a weighted average of pixels B00-B05 (FIG. 5). As described herein, the vertical half-pixel values labeled "Cxx" may be stored in local memory and reused for both motion estimation and motion compensation. However, the storage scheme may vary in different implementations. In one example, a vertical buffer is maintained in local memory, specifically for storing vertical half-pixel values, i.e., those labeled "Cxx".

Another buffer may be assigned 1/4 pixel values, but the buffer is more limited in volume. The 1/4 pixel values may be stored in the 1/4 pixel buffer but, after being considered, overwritten with other 1/4 pixel values. The present disclosure recognizes that from a chip application perspective, a 2-tap digital filter is less costly than additional memory space that would otherwise be required to store each resulting 1/4 pixel value for the entire encoding process of a given video block.

In addition, the same hardware may be used for both encoding and decoding. The decoding scheme is not very robust and typically requires the generation of pixel values if desired. The same digital filters used in the motion estimator and motion compensator may also be used for decoding to produce any non-integer pixel values in accordance with the present disclosure.

Fig. 7 is an illustration of a search space 70 that may be used to decode a 4 pixel by 4 pixel video block. In this case, if any horizontal or vertical pixel value needs to be generated based on the search space 70, a 6-tap digital filter of the motion estimator may be used. A set of pixels 72 defines all horizontal pixel values that can be generated from the search space 70 when a 6-tap digital filter is used for the itu h.264 standard. As shown, pixel B00 includes a weighted sum of pixels A00-A05, and pixel B31 includes a weighted sum of pixels A31-A36. The vertical half-pixel values corresponding to the weighted sum of integer pixels may be generated in a similar manner, but are not specifically shown for simplicity of this document.

Likewise, additional vertical half-pixel values may be generated from one set of pixels 72 to define another set of pixels 74. For example, pixel C03 may include a weighted sum of pixels B03-B53. Any 1/4 pixel values can be similarly generated using a 2-tap digital filter with inputs of integer pixel values and adjacent half-pixel values, as desired. For example, the 1/4 pixel value between pixels a02 and a03 that is closer to pixel a02 would be the average of a02 and B00. Similarly, the 1/4 pixel value between pixels a02 and a03 that is closer to pixel a03 would be the average of B00 and a 03.

Importantly, the same hardware used for encoding, i.e., the 6-tap digital filter and the various 2-tap digital filters, can be used to produce as inputs any outputs needed for decoding based on the search space 70. Thus, the encoding techniques described herein are fully consistent with a decoding scheme in which the same hardware may be used for both encoding and decoding.

Fig. 8 is a flow diagram illustrating a video encoding technique. For purposes of illustration, FIG. 8 will be described from the perspective of device 12B in FIG. 3. The video coding technique may include all of the steps shown in fig. 8, or a subset of the steps shown. As shown in fig. 8, video encoder 18B loads integer pixels of the search area from video storage 16B into local memory 25B (81). Video block matching unit 34B may then immediately begin motion estimation difference calculations for integer video blocks, i.e., video blocks having only integer pixel values (82). At the same time, the 6-tap digital filter 31 generates half-pixel values based on a weighted sum of the integer pixels of the various subsets (83). Importantly, video encoder 18B stores the generated half-pixel values for use not only in motion estimation, but also for use in subsequent motion compensation (84).

In this regard, video block matching unit 34B may perform motion estimation difference calculations for half-integer video blocks, i.e., any video blocks that include half-integer pixel values (85). The 2-tap digital filter 33 generates 1/4 pixel values, for example, as an average of the integer pixel values and the adjacent half-pixel values (86). The 1/4 pixel values may be used for motion estimation, but need not be stored for any subsequent use. Video block matching unit 34B may calculate motion estimation differences for 1/4 integer video blocks, i.e., any video blocks that include 1/4 integer pixel values (87).

Once each candidate video block, including half-pel blocks and 1/4-pel blocks, has been compared to the current video block to be encoded, motion estimator 26B identifies a best prediction (88). However, as noted above, the disclosure also contemplates the use of a proper match, which need not be the "best" match, but rather a proper match for efficient video encoding and compression. Motion compensation is then performed.

During motion compensation, motion compensator 28B uses the half-pixel values generated by 6-tap filter 31 and stored in local memory 25B (89). However, the 2-tap filter 37 produces any quarter-pixel values required for motion compensation (90). In this case, the 2-tap filter 37 may regenerate at least some of the quarter-pixel values previously generated by the 2-tap digital filter 33. Difference block calculation unit 38A generates a difference block, e.g., representing the difference between the current video block to be encoded and the best prediction video block (91). The difference block may then be encoded and transmitted along with a motion vector that identifies the location of a candidate video block for video encoding.

Many different embodiments have been described. These techniques can improve video coding by achieving an effective balance between local memory space and hardware used to perform non-integer pixel computations. In these and other possible ways, these techniques may improve video coding according to a standard such as ITU h.264 or any other video coding standard that uses non-integer pixel values, including any of a wide variety of proprietary standards. In particular, these techniques are particularly useful when the video coding standard requires the use of 3-tap filters or larger filters in the generation of non-integer pixel values in a particular dimension. In other words, these techniques are particularly useful when the video coding standard requires the use of 3-tap filters or larger filters for one-dimensional interpolation. These techniques may also be useful if a criterion on 2-dimensional interpolation can be implemented with a 5-tap filter or larger. The given standard being supported may specify tap weights for various filters.

These techniques may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the techniques may be directed to a computer readable medium including program code that, when executed in a device that encodes video sequences, performs one or more of the methods mentioned above. In this case, the computer-readable medium may include a Random Access Memory (RAM) such as a Synchronous Dynamic Random Access Memory (SDRAM), a Read Only Memory (ROM), a non-volatile random access memory (NVRAM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a flash memory, and the like.

The program code may be stored on the memory in the form of computer readable instructions. In this case, a processor, such as a DSP, may execute instructions stored in memory for performing one or more of the techniques described herein. In some cases, these techniques may be performed by a DSP that invokes various hardware components, such as a motion estimator, to accelerate the encoding process. In other cases, the video encoder may be implemented as a microprocessor, one or more Application Specific Integrated Circuits (ASICs), one or more Field Programmable Gate Arrays (FPGAs), or some other hardware-software combination. These and other embodiments are intended to be within the scope of the appended claims.

Claims

1. A video encoding apparatus comprising:

a motion estimator for generating non-integer pixel values for motion estimation, the motion estimator comprising a filter for one-dimensional interpolation receiving an input of at least 3 integer pixel values;

a memory storing the non-integer pixel values generated by the motion estimator; and

a motion compensator for motion compensation using the stored non-integer pixel values.

2. The apparatus of claim 1, wherein the non-integer pixel values comprise half-pixel values, and wherein:

said motion estimator using said half-pixel values to generate quarter-pixel values for said motion estimation without storing said quarter-pixel values for said motion compensation; and

the motion compensator regenerates the quarter-pixel values for the motion compensation using the half-pixel values.

3. The apparatus of claim 2, wherein:

said motion estimator comprising a 6-tap filter for generating said half-pixel values for motion estimation and motion compensation, and a 2-tap filter for generating said quarter-pixel values for motion estimation; and

the motion compensator comprises a further 2-tap filter to regenerate the quarter-pixel values for motion compensation.

4. The apparatus of claim 2, wherein:

said motion estimator comprises two 6-tap filters which generate said half-pixel values for horizontal and vertical interpolation for motion estimation and motion compensation, and a 2-tap filter which generates said quarter-pixel values for motion estimation for horizontal and vertical interpolation; and

the motion compensator comprises a further 2-tap filter which regenerates the quarter-pixel values for motion compensation for horizontal and vertical interpolation.

5. The apparatus of claim 4, wherein the apparatus is compatible with an ITU H.264 video coding standard, the tap weights of the 6-tap and 2-tap filters being specified by the ITU H.264 video coding standard.

6. The apparatus of claim 1, wherein said motion estimator comprises a second filter that generates additional non-integer pixel values for motion estimation based on said stored non-integer pixel values.

7. The apparatus of claim 6, wherein said motion estimator generates said additional non-integer pixel values for motion estimation without storing said additional non-integer pixel values for motion compensation, said motion compensator comprising a third filter to regenerate said additional non-integer pixel values for motion compensation.

8. The device of claim 1, wherein the device performs the motion estimation and motion compensation on a 4-pixel by 4-pixel video block.

9. The apparatus of claim 8, wherein the apparatus performs the motion estimation and motion compensation in a pipelined manner, generating motion vectors and difference matrices for video blocks that are larger than the 4 pixel by 4 pixel sub-video block.

10. The device of claim 1, wherein the device comprises at least one of a digital television, a wireless communication device, a personal digital assistant, a laptop computer, a desktop computer, a digital camera, a digital recording device, a cellular radiotelephone having video capabilities, and a satellite radiotelephone having video capabilities.

11. The apparatus of claim 1, wherein the memory comprises a local on-chip memory, the apparatus further comprising an off-chip video memory electrically coupled to the local on-chip memory by a bus.

12. The apparatus of claim 1, further comprising a transmitter that transmits video frames encoded by said motion estimator and said motion compensator.

13. The device of claim 12, further comprising a video capture device that captures video frames in real time, said motion estimator and said motion compensator being configured to encode said video frames in real time, and said transmitter being configured to transmit said encoded video frames in real time.

14. The apparatus of claim 1, wherein the generation of the non-integer pixel value comprises a horizontal or vertical pixel interpolation.

15. A video encoding device comprising:

a first filter receiving an input of at least three integer pixel values to generate non-integer pixel values for motion estimation and motion compensation;

a second filter receiving an input of said non-integer pixel values to generate further non-integer pixel values for said motion estimation; and

a third filter receiving an input of the non-integer pixel values to generate further non-integer pixel values for the motion compensation.

16. The apparatus of claim 15, wherein:

said first filter comprises a 6-tap filter that receives an input of six integer pixel values to generate said non-integer pixel values for said motion estimation and said motion compensation;

said second filter comprises a 2-tap filter receiving inputs of two of said non-integer pixel values to generate said further non-integer pixel values for said motion estimation; and

said third filter comprises a 2-tap filter receiving an input of two of said non-integer pixel values to generate said further non-integer pixel values for said motion compensation.

17. The apparatus of claim 15, wherein the apparatus is compatible with an ITU h.264 video coding standard, the tap weights of the 6-tap and 2-tap filters being specified by the ITU h.264 video coding standard.

18. The apparatus of claim 15, wherein the filter produces the non-integer pixel values for horizontal interpolation.

19. The apparatus of claim 18, further comprising:

another first filter comprising a 6-tap filter that receives an input of six integer pixel values to generate non-integer pixel values for said motion estimation and said motion compensation for vertical interpolation;

a second further filter comprising a 2-tap filter receiving an input of two of said non-integer pixel values to generate said further non-integer pixel values for said motion estimation for vertical interpolation; and

a further third filter comprising a 2-tap filter receiving an input of two of said non-integer pixel values for generating said further non-integer pixel values for said motion compensation for vertical interpolation.

20. A method of video encoding comprising:

generating non-integer pixel values for motion estimation using a filter that receives an input of at least three integer pixel values for horizontal or vertical interpolation;

using the non-integer pixel values for motion estimation;

storing the non-integer pixel values; and

using said stored non-integer pixel values for motion compensation.

21. The method of claim 20, wherein the non-integer pixel values comprise half-pixel values, the method further comprising:

generating quarter-pixel values for said motion estimation without storing said quarter-pixel values for motion compensation; and

the quarter-pixel values are regenerated for the motion compensation.

22. The method of claim 21, wherein:

generating the half-pixel values includes applying a 6-tap filter; and

generating the quarter-pixel values includes applying a 2-tap filter.

23. The method of claim 22, wherein the method is compatible with an ITU h.264 video coding standard, the tap weights for the 6-tap and 2-tap filters being specified by the ITU h.264 video coding standard.

24. The method of claim 20, further comprising generating additional non-integer pixel values for motion estimation based on the stored non-integer pixel values.

25. The method of claim 24, further comprising generating the additional non-integer pixel values for motion estimation without storing the additional non-integer pixel values for motion compensation, and regenerating the additional non-integer pixel values for motion compensation.

26. The method of claim 20, further comprising performing the motion estimation and the motion compensation on a 4 pixel by 4 pixel video block.

27. The method of claim 26, further comprising performing the motion estimation and the motion compensation in a pipelined manner to generate motion vectors and difference matrices for video blocks larger than the 4 pixel by 4 pixel sub-video block.

28. A computer readable medium comprising instructions that perform:

generating non-integer pixel values using a filter that receives an input of at least three integer pixel values for horizontal or vertical interpolation;

using the non-integer pixel values for motion estimation;

storing the non-integer pixel values; and

using said stored non-integer pixel values for motion compensation.

29. The computer-readable medium of claim 28, wherein the non-integer pixel value comprises a half-pixel value, the computer-readable medium further comprising instructions to perform:

the quarter-pixel values are regenerated for the motion compensation.

30. The computer readable medium of claim 29, wherein the instructions perform:

generating said half-pixel values by applying a 6-tap filter; and

the quarter-pixel values are generated by applying a 2-tap filter.

31. The computer-readable medium of claim 28, further comprising instructions to perform the motion estimation and the motion compensation on a 4 pixel by 4 pixel video block.

32. The computer-readable medium of claim 31, further comprising instructions to perform the motion estimation and the motion compensation in a pipelined manner to generate motion vectors and difference matrices for video blocks larger than the 4 pixel by 4 pixel sub-video block.

33. An apparatus comprising:

means for generating non-integer pixel values for motion estimation using an input of at least three integer pixel values for vertical or horizontal interpolation;

means for using said non-integer pixel values in motion estimation;

means for storing the non-integer pixel values; and

means for using said stored non-integer pixel values for motion compensation.

34. The apparatus of claim 33, further comprising:

means for generating said non-integer pixel values for said motion estimation using an input of six integer pixel values;

means for generating further non-integer pixel values for said motion estimation using two stored inputs of non-integer pixel values; and

means for generating said further non-integer pixel values for said motion compensation using inputs of two stored non-integer pixel values.

35. A video encoding apparatus comprising:

a motion estimator which generates non-integer pixel values for motion estimation, the motion estimator comprising a filter for two-dimensional interpolation receiving an input of at least 5 integer pixel values;

a memory that stores said non-integer pixel values generated by said motion estimator; and

a motion compensator using said stored non-integer pixel values for motion compensation.