[go: up one dir, main page]

CN1658673A - Video compression coding-decoding method - Google Patents

Video compression coding-decoding method Download PDF

Info

Publication number
CN1658673A
CN1658673A CN200510038537.8A CN200510038537A CN1658673A CN 1658673 A CN1658673 A CN 1658673A CN 200510038537 A CN200510038537 A CN 200510038537A CN 1658673 A CN1658673 A CN 1658673A
Authority
CN
China
Prior art keywords
mrow
frame
coding
msub
prediction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN200510038537.8A
Other languages
Chinese (zh)
Inventor
马国强
徐苏珊
吴金勇
徐健键
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University
Original Assignee
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University filed Critical Nanjing University
Priority to CN200510038537.8A priority Critical patent/CN1658673A/en
Publication of CN1658673A publication Critical patent/CN1658673A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

视频压缩编解码方法,包括下述程序对视频压缩信号进行编码处理,离散余弦变换DCT;变换与量化;在编码比特流进入信道前需设置信道缓存;缓存器须带有控制机制;运动估计;这种位置偏移是以运动矢量来描述的,一个运动矢量代表水平和垂直两个方向上的位移;运动估计时,P帧图像使用前面最近解码的I帧或P帧作参考图像,称为前向预测;运动补偿:利用运动估计算出的运动矢量,将参考帧图像中的宏块移至水平和垂直方向上的相对应位置,即可生成对被压缩图像的预测;并对亚象素进行搜索计算;对采样信号做DCT变换之后的量化、存储、运动搜索全部在频域完成视频编码器在频域完成所有计算。压缩率高,计算量小。

Video compression encoding and decoding method, including the following procedures to encode and process video compression signals, discrete cosine transform DCT; transformation and quantization; channel buffer needs to be set before the coded bit stream enters the channel; the buffer must have a control mechanism; motion estimation; This position offset is described by a motion vector, a motion vector represents the displacement in the horizontal and vertical directions; when motion estimation, the P frame image uses the most recently decoded I frame or P frame as the reference image, which is called Forward prediction; motion compensation: use the motion vector calculated by motion estimation to move the macroblock in the reference frame image to the corresponding position in the horizontal and vertical directions to generate a prediction of the compressed image; and sub-pixel Carry out search calculation; quantization, storage, and motion search after DCT transformation of the sampled signal are all completed in the frequency domain. The video encoder completes all calculations in the frequency domain. The compression rate is high and the calculation amount is small.

Description

Video compression coding and decoding method
One, the technical field
The invention relates to a video compression coding algorithm and an AVCS video conference system based on the algorithm.
Second, background Art
At present, foreign similar video conference terminals generally adopt encoding technologies such as H.263 and H.264. The H.263 adopted product has lower computational complexity, is easy to realize on hardware with lower cost, has lower production cost, but has very low compression ratio to video data, occupies larger bandwidth and increases network operation cost; products using h.264 have very high compression ratio, occupy less network resources, but bring with it very high computational overhead, making such products rely on rather costly hardware platforms.
In 1995, after the Video Coding Experts Group (VCEG) of the international union completed the work of the h.263 standard, a new low bit rate video communication standard was developed and named h.26l. In 2001, the potential advantages of h.26l were recognized by the ISO Moving Picture Experts Group (MPEG), which in cooperation with VCEG established the Joint Video Team (JVT). The result of this group was Advanced Video Coding (AVC) released in the second quarter of 2003. In the ITU-T family of standards, AVC is referred to as H.264. Since the end of 2003, the appeal of h.264 quickly spared video users who had been suffering from the expense of dedicated bandwidth, since half the bandwidth was available to achieve the original image quality. In the domestic market, video manufacturers including Dingshitong, Zhongtai, Zhongxing, Kodao, TANDBERG and the like have or are about to put out new video products supporting the H.264 standard. H.264 has the greatest advantage over h.263 in that it is a very low code rate coding scheme. Theoretically, under the condition of equivalent restored image quality, H.264 saves half of code rate compared with H.263. In other words, the image quality of the same segment of video coded at 384kbps using H.264 is the same as that of the H.263 coded at 768 kbps. This provides the possibility of obtaining high quality images at low bandwidth for users with tight bandwidth resources. H.264 was designed with hierarchical coding transmission at different network resources. H.264 has stronger fault-tolerant capability, and can obtain better quality than H.263 coded video in a network environment with unstable quality. As video communication applications are gradually migrating from government and enterprise private networks to public networks, the interference rejection characteristics of h.264 will play a key role. Another significant difference between the h.264 and h.261 and h.263 encoding schemes is that it can support finer sub-pel motion vectors when performing motion compensated prediction. In contrast to the 1/2 pixel level prediction of h.263, h.264 can achieve prediction at the 1/4 pixel level, which results in higher quality video encoded by h.264. The benefits of h.264 are not free. The cost of h.264 is that its computational complexity is much higher than h.263. The decoding complexity of h.264 is 2 times that of h.263 in the same case, while the encoding complexity is more than 3 times that of h.263. The increase in computational complexity has limited the implementation of h.264. A simple example is the latest product of a well-known international video terminal manufacturer, which can support 2M bitrate under h.263, but can only support up to 512kbps under h.264.
Of course, as a new coding standard, h.264 has its limitations in application. Since the original design objective was to use h.264 to obtain good image quality at low bandwidth, we can see that h.264 has no significant difference in image quality compared to h.263 at high bit rate in practical tests. Therefore, the network bandwidth used is a factor that must be considered by the user when purchasing h.264 products. Since 1M bandwidth can usually be guaranteed if the videoconference is running on a private network, there is no need to spend more investment on h.264. Since the h.264 standard is released for only one year, most terminal manufacturers advertising h.264 support are mainly basic grades supporting h.264. As the complexity of h.264 codecs increases, it presents challenges to the video processing capabilities of the terminal vendor. The existing platform can not perform H.264 coding and decoding at all or support coding and decoding at high code rate. Moreover, the implementation methods of h.264 are different among several major terminal manufacturers, it is difficult for terminals of different brands to establish connection using h.264, the interconnection capability is difficult to be ensured, and these objective factors all set a great obstacle to the rapid popularization of h.264.
However, h.264 has its technical advancement, and as an emerging codec standard, its high-efficiency coding performance is helpful to improve the utilization efficiency of resources and save huge investment in network bandwidth. In 2003, the popularization degree of broadband in China is higher and higher, the video communication demand under low bandwidth such as DSL is gradually increased, and we have reason to believe that H.264 plays a key role in the popularization of video communication.
The birth of H.264 is that in video communication and storage application, the video coding and decoding standard occupies the core position of the technology. There have been two standardization systems for video coding, one being the MPEG series of standards (e.g., MPEG-1, MPEG-2, MPEG-4) dominated by ISO/IEC; another is the ITU-T dominated H.26x series of standards (e.g., H.261 and H.263). The MPEG series of standards are widely used in the fields of video storage, on demand and forwarding, such as the video format of VCD, which has been developed based on MPEG-1 technology. Also, due to the recommendation of the international telecommunications union, the h.26x series of standards are also widely used in the field of video communication, and are adopted by a wide range of operators and equipment providers.
Patent applications for video coding methods include: the integer transform matrix selection method of CN 200410012857.1 video coding and the related integer transform method relate to the integer transform of image data compression in a video coder-decoder, and the method adopts 8 by 8 integer DCT transform aiming at the first audio/video coding standard (AVS) to be formulated in China at present, provides a transform base selection method of integer transform, comprehensively evaluates two indexes of decorrelation efficiency and energy concentration rate of the transform base, and the transform dynamic range and the calculation complexity of the transform base, provides two groups of 8 by 8 integer transform bases (5, 6, 4, 1) and (4, 5, 3, 1) with excellent performance through the method, and obtains an integer transform fast algorithm based on the two groups of bases.
CN03157077.1 discloses a bidirectional prediction method for video coding, when bi-directionally predicting coding at the coding end, first, for each image block of the current B frame, obtaining a given candidate forward motion vector of the current image block; then, candidate backward motion vectors are obtained through calculation, and candidate bidirectional prediction reference blocks are obtained through a bidirectional prediction method; calculating matches within a given search range and/or within a given match threshold; and finally, selecting the optimal matching block to determine the final forward motion vector, backward motion vector and block residual error of the block. The method is combined with forward and backward predictive coding to realize a new predictive coding type, and can be suitable for the established AVS standard.
CN200310116090.2 provides a method for determining a reference image block in a direct coding mode, which can be implemented in a division-free manner while maintaining an accurate motion vector, thereby improving the calculation accuracy of the motion vector, more truly representing the motion of an object in a video, obtaining more accurate motion vector prediction, and combining with forward predictive coding and backward predictive coding, a new predictive coding type can be implemented, which can ensure the high efficiency of direct mode coding and facilitate the implementation of hardware, and obtain an effect similar to that of traditional B frame coding, and can be used for the formulated AVS standard. 98123036.9A video coding-decoding (CODEC) method of error resilient mode, a computer readable medium containing a program for the video CODEC method, and a video CODEC apparatus. The video CODEC method provides greater resilience against channel errors, making communications less susceptible to errors. Wherein a header data bit region, a motion vector data bit region and a discrete cosine transform data bit region are divided from each macroblock of error recovery mode video data, then the divided bit regions are variable-length coded, reversible variable-length coding is performed on a bit region selected from the variable-length coding region according to a priority for recovery, and a flag is inserted in the variable-length coding or reversible variable-length coding bit region. But existing approaches do not focus on solving the problem of computational load.
Third, the invention
The purpose of the invention is: the video compression coding algorithm system which is designed autonomously is adopted, perfect balance is achieved on network overhead and calculation amount, the characteristics of high compression ratio and low calculation load are achieved, the compression ratio close to H.264 can be provided, and the calculation load can be reduced to the level close to H.263.
The video compression coding and decoding method is characterized by comprising the following procedures of coding a video compression signal, Discrete Cosine Transform (DCT): DCT is a spatial transform, which is performed on a block-by-block basis to generate DCT coefficient data blocks, and concentrates the energy of blocks on a few low-frequency DCT coefficients in a general image; transformation and quantization: the quantization is carried out on DCT transform coefficients, the DCT coefficients are removed in a certain quantization step size in the quantization process, and different quantization precisions are adopted for 64 DCT transform coefficients in a DCT transform block so as to ensure that the DCT transform block contains specific DCT spatial frequency information as much as possible and ensure that the quantization precision does not exceed the requirement. In the DCT transform coefficients, the low-frequency coefficient has higher importance on visual induction, so the distributed quantization precision is thinner; the importance of the high-frequency coefficient to the visual induction is low, the distributed quantization precision is coarse, and most high-frequency coefficients in one DCT block become zero after being quantized;
a channel buffer is required before the coded bit stream enters the channel. The channel buffer writes data from the entropy coder via a buffer at a variable bit rate, and reads data out into the channel at a nominally constant bit rate of the transmission system. The buffer size, or capacity, is set, but the instantaneous output bit rate of the encoder is often significantly higher or lower than the transmission system band, which may cause buffer overflow or underflow. Therefore, the buffer needs to have a control mechanism, and the bit rate of the encoder is adjusted by feedback control of the compression algorithm, so that the write data rate and the read data rate of the buffer tend to be balanced. The buffer controls the compression algorithm by controlling the quantization step of the quantizer, when the instantaneous output rate of the encoder is too high and the buffer overflows, the quantization step is increased to reduce the encoding data rate, and the loss of the image is correspondingly increased; when the instantaneous output rate of the encoder is too low and the buffer is about to overflow, the quantization step size is decreased to increase the encoded data rate.
And (3) motion estimation: when motion estimation is used in inter-frame coding, an estimate of the compressed image is generated by reference to the frame image. The motion estimation is performed on a macroblock-by-macroblock basis, and the positional shift between macroblocks at the corresponding positions of the compressed image and the reference image is calculated. This positional offset is described in terms of motion vectors, one representing displacement in both the horizontal and vertical directions. In motion estimation, the reference frame pictures used by P-frame and B-frame pictures are different. The P frame picture uses the last decoded I frame or P frame as the reference picture, called forward prediction; b-frame pictures, which use two frames of pictures as prediction references, called bi-directional prediction, where one reference frame precedes the coded frame in display order (forward prediction) and the other frame is later in display order than the coded frame (backward prediction), the reference frame of a B-frame being in any case an I-frame or a P-frame;
and motion compensation: the motion vector calculated by motion estimation is used to move the macro block in the reference frame image to the corresponding position in the horizontal and vertical directions, so as to generate the prediction of the compressed image. Motion is ordered in most natural scenes. The difference value between the prediction image generated by such motion compensation and the compressed image is small.
The invention is characterized in that: and after the motion search algorithm in the frequency domain is adopted, the quantization, the storage and the motion search of the sampled signal after the DCT transformation are all completed in the frequency domain by the video encoder, and all the calculation is completed in the frequency domain.
The basis of the invention also includes: in run-length coding, only non-zero coefficients are coded. The coding of a non-zero coefficient consists of two parts: the first part represents the number of consecutive zero coefficients preceding a non-zero coefficient (called run) and the latter part is that non-zero coefficient. This gives the advantage of the type scanning, since the type scanning has a high probability of zero-linking in most cases, and the efficiency of run-length coding is high. When the rear remaining DCT coefficients in the one-dimensional sequence are all zero, the encoding of this 8 x 8 transform block can be ended, as indicated by an "end of block" flag (EOB), and the resulting compression effect is very significant.
Subjective evaluation of digital image quality: the conditions for subjective evaluation include: panel structure, viewing distance, test image, ambient illumination, background tint, and the like. The evaluation group consists of a certain number of observers, wherein the professionals and the non-professionals respectively account for a certain proportion. The viewing distance is 3-6 times the diagonal dimension of the display. The test image is composed of a number of image sequences with certain image details and motion. Subjective evaluation reflects the average of many people's statistical evaluations of picture quality.
Type scan and run length coding: the DCT transform produces an 8 x 8 two-dimensional array that must be converted to a one-dimensional arrangement for transmission. There are two-dimensional to one-dimensional conversion modes, or scan modes: type-Zig (Zig-Zag) and alternating scans, of which type-Zig is the most common one. Since most of the non-zero DCT coefficients are concentrated in the upper left corner of the 8 x 8 two-dimensional matrix, i.e. the low frequency component region, after the pattern scan, these non-zero DCT coefficients are concentrated in the front of the one-dimensional array, followed by long strings of quantized zero DCT coefficients, which creates conditions for run-length coding. Entropy coding, an efficient discrete representation of the DCT coefficients generated by quantization, is bit stream encoded prior to transmission to produce a digital bit stream for transmission. Entropy coding is based on the statistical properties of the encoded signal such that the average bit rate is reduced. The run and non-zero coefficients can be entropy coded independently or jointly. In the Huffman coding used in entropy coding, a code table is produced after the probabilities of all coded signals are determined, less bits are allocated to frequently occurring signals with large probability for representation, more bits are allocated to infrequently occurring signals with small probability for representation, and the average length of the whole code stream tends to be shortest.
The invention is characterized in that: the performance of the motion search step itself is improved. In a traditional video coding system, an encoder needs to perform repeated conversion between a spatial domain and a frequency domain, an algorithm based on the spatial domain is used during motion search, and the algorithm needs to be performed in the frequency domain during residual coding, so that the energy of a coefficient is concentrated in a low-frequency region, and the quantization is convenient. Frequent switching between the spatial domain and the frequency domain is rather resource consuming.
The invention is also characterized in that: a set of first and then complete video conference system is provided. The technical core of the video conference is a video coding and decoding algorithm system, the invention develops deep research in the field, provides an innovative frequency domain-based sub-pixel motion search algorithm, and establishes a set of efficient and stable video coding algorithm system, and the video coding algorithm system not only has high coding efficiency, but also has far lower computational complexity than other algorithms of the same kind, and is easy to realize on a low-cost hardware platform. The invention provides an original searching algorithm in the sub-pixel searching step of the motion search, which can reduce the computational complexity to below 10 percent and simultaneously can ensure the sufficient accuracy of the searching result.
In addition, the system of the present invention implements the following functions:
a sub-pixel motion search is performed in the frequency domain using a novel video coding algorithm.
And providing a remote electronic whiteboard, a remote slide and data sharing.
12' liquid crystal touch screen, can draw any figure and communicate with characters.
And mobile phone access is supported, and the conference data is sent to the mobile terminal in time.
And the built-in web server provides a user interface to modify the coding parameters.
The built-in disk video recorder records the video image for a very long time.
And a usb interface is provided, so that data exchange and usb digital camera connection are facilitated.
A built-in high-sensitivity motion detection algorithm can be used as security monitoring.
Description of the drawings
FIG. 1 video compression coding algorithm block diagram FIG. 2 leaky bucket model
Delta response for the figure 3 object as it moves wherein the delta response for the figure 3(a) object as it translates s to the right
FIG. 3(b) delta response of an object translated to the left by s-1
FIG. 4 sub-pixel spatial locations
FIG. 5 comparison of computational Performance under various Standard test sequences
FIG. 6 is a flow of video encoding based entirely on the frequency domain
Distribution of pixels and surrounding pixels in fig. 74 × 4
FIG. 8 Intra 4 × 4 prediction modes
FIG. 9 intra 4 × 4 fast prediction mode selection flow chart
104 x 4 small blocks sub-sampling the current block and neighboring blocks of fig. 11
FIG. 12 is a diagram of 4 × 4 current frame with 4 × 4 current frame in the same location as the previous frame
FIG. 13 System software composition Block diagram FIG. 14 System hardware composition Block diagram
Fifth, detailed description of the invention
1 video compression coding algorithm
Fig. 1 is a block diagram of a video compression coding algorithm employed in the present invention.
The introduction of each algorithm module in the block diagram is as follows:
a. motion search (motion estimation)
Motion search (or called motion estimation) is one of the core technologies in the field of video compression coding, and is also an algorithm module which consumes the most system computing resources in video coding. The video coding scheme of the invention adopts a conventional hybrid search algorithm in the integer pixel search; in the sub-pixel search, the invention realizes the original search technology. This novel search algorithm will be described in detail later.
b. Intra prediction
In a video stream, each frame of image may be encoded in an I frame (intra-frame prediction frame) or a P frame (inter-frame prediction frame). When the P frame is coded, the information in the self image is not directly used as a coded data source, but motion search is carried out in the previously coded image to find out motion information which is used as a basis for inter-frame prediction, and then the difference value of the two frame images is coded. This can greatly reduce the number of bits used to describe the image, thereby achieving the purpose of compression. The I-frame is encoded without the aid of any previous image, but by using the pixels of the already encoded part itself to predict the values of the pixels of the non-encoded part. I-frames are less efficient than P-frames, but are an important building block in video streams because they provide the ability to resynchronize. If a certain frame loses packets during transmission, a subsequent P frame predicted by the frame cannot be correctly decoded, but because the I frame is self-contained and does not refer to any previous image, the code stream is resynchronized here, and errors are limited within a certain range. Due to the importance of I-frames, intra-prediction algorithms for I-frames are also one of the major research points for any video coding scheme. The invention provides a novel intra-frame prediction algorithm in the following, and provides efficient and stable intra-frame prediction performance under the condition of limited calculation cost.
c. Rate distortion optimization
An optimal scheme is selected among the respective coding modes. There are many coding mode and parameter decision problems in video coding. For example, what value the motion vector should take in inter prediction, what the search accuracy is, and the selection of these coding parameters and modes depends on the rate-distortion optimization algorithm. The rate-distortion optimization algorithm evaluates each candidate coding mode or parameter, and then picks out the optimal mode according to a certain rule. This selection method generally measures both the coding efficiency (i.e., compression performance) and the snr after compression. The relationship between these two performance indicators is non-linear, and in order to increase the computation speed, reduce the computation overhead of the system,
the video compression coding scheme of the invention adopts Lagrange operators to realize linear approximation. The following equation is the lagrangian in this scheme. Wherein, DREC is distortion degree, PREC is coding efficiency after prediction, Sk and Q are coding modes and parameters to be selected, JMODE is total cost value, and the coding mode and parameter with the minimum JMODE value are optimal values to be selected.
JMODE(Sk,Ik,λ)=DREC(Sk,Q)+λRREC(Sk,Q)
d. Code rate control
And monitoring the channel condition and making a decision on the allocation of the code rate. This algorithm module uses a leaky bucket model as shown in fig. 2 to detect the transmission condition of the channel.
e. Memory management
The logical and physical management of the memory and the responsibility for reference frame queue management. When encoding a P frame, it is necessary to perform motion search with reference to an image that has been encoded or decoded in the past, and therefore, it is necessary to establish a reference frame queue and store reference frame data at the same time of encoding and decoding. The same memory logic model is used between the encoder and decoder, each independently maintaining a queue of reference frames, passing only minimal information for synchronization.
f. Entropy coding
Various methods of compression of video sequences are centered around three aspects: eliminating temporal redundancy, eliminating spatial redundancy, eliminating statistical redundancy. Inter and intra prediction addresses temporal and spatial redundancy, respectively, and the method of eliminating statistical redundancy is known as entropy coding. The video coding algorithm system of the invention adopts mature Huffman algorithm as entropy coding.
g. Transformation and quantization
The residual data is time-frequency transformed and quantized in the frequency domain.
1.1 sub-pel motion search
Motion search (or referred to as motion estimation) is one of the core techniques in the field of video compression coding. Video signals, after analog-to-digital conversion, have a huge data size and cannot be directly stored or used for communication. However, natural objects appearing in video images are slowly varying with respect to the high sampling frequency, which results in a great redundancy in the original video information both in the time domain and in the spatial domain. The basic principle of the motion search technology is to search adjacent images in a video sequence, find out motion information and motion vectors, and replace the original information of the corresponding images with data representing the motion of an object, thereby greatly eliminating time redundancy and achieving the purpose of data compression.
The accuracy of modern motion search algorithms is no longer limited to whole pixels. Experiments prove that when the sub-pixel precision of half pixel or more is achieved, the code rate after coding is obviously reduced. Under the condition of low noise, when the search precision is doubled, the compression ratio can be improved by about 0.5bit/sample, and the average code rate after coding can be reduced by 24.41-36.92%. However, when the search accuracy reaches 1/8 pixels or more, the increase in the compression ratio is no longer significant due to the noise enhancement. Currently, the mainstream video coding standard adopts a sub-pixel search technology to improve the coding performance, and introduces a half-pixel motion search in H.263 and MPEG-2, while MPEG-4 and the newly established H.264 use a motion search with 1/4 pixel precision.
In the existing sub-pixel search algorithm, the widely used technology is a full search algorithm based on a spatial domain or various fast algorithms of full search, the algorithms search the best matching block in a search window by taking a pixel block as a unit, and take the average variance sum or the sum of absolute differences as a judgment rule, the search process needs to be carried out with multiple filtering interpolation, and the cost function is repeatedly calculated, so the calculation complexity is very high. Experiments show that after the sub-pixel precision is achieved, the calculation cost of the motion search process is often more than one time of the original whole pixel search. Moreover, the accuracy of matching depends on the precision of the interpolation algorithm, and influences the coding efficiency to a certain extent. The invention provides a novel search algorithm, which predicts and searches motion vectors by utilizing phase correlation in a frequency domain, almost does not need interpolation calculation in the sub-pixel search process, does not need a calculation cost function, can greatly reduce the calculation expense brought by the space domain search algorithm, and is suitable for an embedded platform needing video content service.
1.1.1 frequency Domain phase and object space translation
As is well known, in the fourier transform domain, the change in phase corresponds to a translation of the object in the time/space domain:
F{x(s-τ)}=e-jwτF{x(s)} (1)
in equation (1), F {. cndot } represents the fourier transform of the discrete signal, and s represents the spatial displacement (if in the time domain, t is substituted, only the spatial domain is described below). By this property of fourier transformation, motion information in the spatial domain can be easily resolved in the frequency domain. In video coding schemes, if fourier transform is used, it becomes very convenient and accurate to search for motion information in the frequency domain. However, the fourier transform has poor energy convergence performance, and the spatial redundancy cannot be effectively removed after the transform, which makes the fourier transform inapplicable to practical video coding algorithms. At present, DCT transformation is generally adopted by each video coding standard, has energy convergence performance close to K-L transformation, can concentrate most energy on direct current and low frequency parts, and can ensure image quality under high compression ratio after passing through a low-pass filter. In view of this, the invention adopts DCT to realize time-frequency transformation, and calculates the spatial translation from the phase of DCT domain, because of the particularity of DCT, there is no simple corresponding relation in DCT domain like Fourier.
Suppose there is a one-dimensional discrete signal (x)1(n)|n∈[0,N-1]N is the size of the search window, and after moving m to the right, a signal x is formed2(n)|n∈[0,N-1]}:
<math> <mrow> <msub> <mi>x</mi> <mn>2</mn> </msub> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfenced open='{' close=''> <mtable> <mtr> <mtd> <msub> <mi>x</mi> <mn>1</mn> </msub> <mrow> <mo>(</mo> <mi>n</mi> <mo>-</mo> <mi>m</mi> <mo>)</mo> </mrow> <mo>,</mo> <mi>n</mi> <mo>&GreaterEqual;</mo> <mi>m</mi> </mtd> </mtr> <mtr> <mtd> <mn>0</mn> <mo>,</mo> <mi>n</mi> <mo>&lt;</mo> <mi>m</mi> </mtd> </mtr> </mtable> </mfenced> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>2</mn> <mo>)</mo> </mrow> </mrow> </math>
According to[2]The following DCT and DST transforms are defined:
<math> <mrow> <msubsup> <mi>X</mi> <mn>2</mn> <mi>C</mi> </msubsup> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mn>2</mn> <mi>N</mi> </mfrac> <mi>C</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <munderover> <mi>&Sigma;</mi> <mrow> <mi>n</mi> <mo>=</mo> <mn>0</mn> </mrow> <mrow> <mi>N</mi> <mo>-</mo> <mn>1</mn> </mrow> </munderover> <msub> <mi>x</mi> <mn>2</mn> </msub> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> <mi>cos</mi> <mrow> <mo>(</mo> <mfrac> <mi>k&pi;</mi> <mi>N</mi> </mfrac> <mrow> <mo>(</mo> <mi>n</mi> <mo>+</mo> <mn>0.5</mn> <mo>)</mo> </mrow> <mo>)</mo> </mrow> <mo>,</mo> <mi>k</mi> <mo>&Element;</mo> <mo>[</mo> <mn>0</mn> <mo>,</mo> <mi>N</mi> <mo>-</mo> <mn>1</mn> <mo>]</mo> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>3</mn> <mo>)</mo> </mrow> </mrow> </math>
<math> <mrow> <msubsup> <mi>X</mi> <mn>2</mn> <mi>S</mi> </msubsup> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mn>2</mn> <mi>N</mi> </mfrac> <mi>C</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <munderover> <mi>&Sigma;</mi> <mrow> <mi>n</mi> <mo>=</mo> <mn>0</mn> </mrow> <mrow> <mi>N</mi> <mo>-</mo> <mn>1</mn> </mrow> </munderover> <msub> <mi>x</mi> <mn>2</mn> </msub> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> <mi>sin</mi> <mrow> <mo>(</mo> <mfrac> <mi>k&pi;</mi> <mi>N</mi> </mfrac> <mrow> <mo>(</mo> <mi>n</mi> <mo>+</mo> <mn>0.5</mn> <mo>)</mo> </mrow> <mo>)</mo> </mrow> <mo>,</mo> <mi>k</mi> <mo>&Element;</mo> <mo>[</mo> <mn>0</mn> <mo>,</mo> <mi>N</mi> <mo>-</mo> <mn>1</mn> <mo>]</mo> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>4</mn> <mo>)</mo> </mrow> </mrow> </math>
<math> <mrow> <msubsup> <mi>Z</mi> <mn>1</mn> <mi>C</mi> </msubsup> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mn>2</mn> <mi>N</mi> </mfrac> <mi>C</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <munderover> <mi>&Sigma;</mi> <mrow> <mi>n</mi> <mo>=</mo> <mn>0</mn> </mrow> <mrow> <mi>N</mi> <mo>-</mo> <mn>1</mn> </mrow> </munderover> <msub> <mi>x</mi> <mn>1</mn> </msub> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> <mi>cos</mi> <mrow> <mo>(</mo> <mfrac> <mi>k&pi;</mi> <mi>N</mi> </mfrac> <mi>n</mi> <mo>)</mo> </mrow> <mo>,</mo> <mi>k</mi> <mo>&Element;</mo> <mo>[</mo> <mn>0</mn> <mo>,</mo> <mi>N</mi> <mo>-</mo> <mn>1</mn> <mo>]</mo> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>5</mn> <mo>)</mo> </mrow> </mrow> </math>
<math> <mrow> <msubsup> <mi>Z</mi> <mn>1</mn> <mi>S</mi> </msubsup> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mn>2</mn> <mi>N</mi> </mfrac> <mi>C</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <munderover> <mi>&Sigma;</mi> <mrow> <mi>n</mi> <mo>=</mo> <mn>0</mn> </mrow> <mrow> <mi>N</mi> <mo>-</mo> <mn>1</mn> </mrow> </munderover> <msub> <mi>x</mi> <mn>1</mn> </msub> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> <mi>sin</mi> <mrow> <mo>(</mo> <mfrac> <mi>k&pi;</mi> <mi>N</mi> </mfrac> <mi>n</mi> <mo>)</mo> </mrow> <mo>,</mo> <mi>k</mi> <mo>&Element;</mo> <mo>[</mo> <mn>0</mn> <mo>,</mo> <mi>N</mi> <mo>-</mo> <mn>1</mn> <mo>]</mo> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>6</mn> <mo>)</mo> </mrow> </mrow> </math>
in the above formula, the first and second carbon atoms are, C ( k ) = 1 2 , k = { 0 , N } 1 , k = [ 1 , N - 1 ] - - - - ( 7 )
it is readily demonstrated that these four transformations satisfy the following equations:
X 2 C ( k ) X 2 S ( k ) = Z 1 C ( k ) - Z 1 S ( k ) Z 1 S ( k ) + Z 1 C ( k ) g m C ( k ) g m S ( k ) - - - - ( 8 )
wherein, <math> <mrow> <msubsup> <mi>g</mi> <mi>m</mi> <mi>S</mi> </msubsup> <mo>=</mo> <mi>sin</mi> <mrow> <mo>(</mo> <mrow> <mo>(</mo> <mi>k&pi;</mi> <mo>/</mo> <mi>N</mi> <mo>)</mo> </mrow> <mrow> <mo>(</mo> <mi>m</mi> <mo>+</mo> <mn>0.5</mn> <mo>)</mo> </mrow> <mo>)</mo> </mrow> <mo>,</mo> <msubsup> <mi>g</mi> <mi>m</mi> <mi>C</mi> </msubsup> <mo>=</mo> <mi>cos</mi> <mrow> <mo>(</mo> <mrow> <mo>(</mo> <mi>k&pi;</mi> <mo>/</mo> <mi>N</mi> <mo>)</mo> </mrow> <mrow> <mo>(</mo> <mi>m</mi> <mo>+</mo> <mn>0.5</mn> <mo>)</mo> </mrow> <mo>)</mo> </mrow> <mo>.</mo> </mrow> </math> we see that these two variables belonging to the frequency domain contain translation information m. At a known signal x1(n)、x2(n) in case of finding a fast algorithm to solve for gm C、gm SAnd m is extracted from the motion vector, so that the motion search of the DCT domain can be realized.
Rewriting the equation in (8) to <math> <mrow> <mover> <mi>X</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <mi>Z</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mover> <mi>&Omega;</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>.</mo> </mrow> </math> It can be shown that z (k) is an orthogonal matrix and has:
λZT(k)Z(k)=I2 (9)
I2is a 2 x 2 identity matrix. In this way, we can solve the equation:
<math> <mrow> <mover> <mi>&Omega;</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <mi>&lambda;</mi> <msup> <mi>Z</mi> <mi>T</mi> </msup> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mover> <mi>X</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>10</mn> <mo>)</mo> </mrow> </mrow> </math>
thereby can solve gm C、gm S
The orthogonal law according to the sine function has the following law[4]
<math> <mrow> <munderover> <mi>&Sigma;</mi> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>N</mi> </munderover> <msup> <mi>C</mi> <mn>2</mn> </msup> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mi>sin</mi> <mrow> <mo>(</mo> <mfrac> <mi>k&pi;</mi> <mi>N</mi> </mfrac> <mrow> <mo>(</mo> <mi>m</mi> <mo>+</mo> <mn>0.5</mn> <mo>)</mo> </mrow> <mo>)</mo> </mrow> <mi>sin</mi> <mrow> <mo>(</mo> <mfrac> <mi>k&pi;</mi> <mi>N</mi> </mfrac> <mrow> <mo>(</mo> <mi>n</mi> <mo>+</mo> <mn>0.5</mn> <mo>)</mo> </mrow> <mo>)</mo> </mrow> <mo>=</mo> <mi>&delta;</mi> <mrow> <mo>(</mo> <mi>m</mi> <mo>-</mo> <mi>n</mi> <mo>)</mo> </mrow> <mo>-</mo> <mi>&delta;</mi> <mrow> <mo>(</mo> <mi>m</mi> <mo>+</mo> <mi>n</mi> <mo>+</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>11</mn> <mo>)</mo> </mrow> </mrow> </math>
<math> <mrow> <munderover> <mi>&Sigma;</mi> <mrow> <mi>k</mi> <mo>=</mo> <mn>0</mn> </mrow> <mrow> <mi>N</mi> <mo>-</mo> <mn>1</mn> </mrow> </munderover> <msup> <mi>C</mi> <mn>2</mn> </msup> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mi>cos</mi> <mrow> <mo>(</mo> <mfrac> <mi>k&pi;</mi> <mi>N</mi> </mfrac> <mrow> <mo>(</mo> <mi>m</mi> <mo>+</mo> <mn>0.5</mn> <mo>)</mo> </mrow> <mo>)</mo> </mrow> <mi>cos</mi> <mrow> <mo>(</mo> <mfrac> <mi>k&pi;</mi> <mi>N</mi> </mfrac> <mrow> <mo>(</mo> <mi>n</mi> <mo>+</mo> <mn>0.5</mn> <mo>)</mo> </mrow> <mo>)</mo> </mrow> <mo>=</mo> <mi>&delta;</mi> <mrow> <mo>(</mo> <mi>m</mi> <mo>-</mo> <mi>n</mi> <mo>)</mo> </mrow> <mo>+</mo> <mi>&delta;</mi> <mrow> <mo>(</mo> <mi>m</mi> <mo>+</mo> <mi>n</mi> <mo>+</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>12</mn> <mo>)</mo> </mrow> </mrow> </math>
Where δ (n) is a discrete impulse function.
According to the formulas (8) and (10-12), we can obtain:
<math> <mrow> <mfrac> <mn>2</mn> <mi>N</mi> </mfrac> <munderover> <mi>&Sigma;</mi> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>N</mi> </munderover> <msup> <mi>C</mi> <mn>2</mn> </msup> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <msubsup> <mi>g</mi> <mi>m</mi> <mi>S</mi> </msubsup> <mi>sin</mi> <mrow> <mo>(</mo> <mfrac> <mi>k&pi;</mi> <mi>N</mi> </mfrac> <mrow> <mo>(</mo> <mi>n</mi> <mo>+</mo> <mn>0.5</mn> <mo>)</mo> </mrow> <mo>)</mo> </mrow> <mo>=</mo> <mi>&delta;</mi> <mrow> <mo>(</mo> <mi>m</mi> <mo>-</mo> <mi>n</mi> <mo>)</mo> </mrow> <mo>-</mo> <mi>&delta;</mi> <mrow> <mo>(</mo> <mi>m</mi> <mo>+</mo> <mi>n</mi> <mo>+</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>13</mn> <mo>)</mo> </mrow> </mrow> </math>
<math> <mrow> <mfrac> <mn>2</mn> <mi>N</mi> </mfrac> <munderover> <mi>&Sigma;</mi> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>N</mi> </munderover> <msup> <mi>C</mi> <mn>2</mn> </msup> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <msubsup> <mi>g</mi> <mi>m</mi> <mi>C</mi> </msubsup> <mi>cos</mi> <mrow> <mo>(</mo> <mfrac> <mi>k&pi;</mi> <mi>N</mi> </mfrac> <mrow> <mo>(</mo> <mi>n</mi> <mo>+</mo> <mn>0.5</mn> <mo>)</mo> </mrow> <mo>)</mo> </mrow> <mo>=</mo> <mi>&delta;</mi> <mrow> <mo>(</mo> <mi>m</mi> <mo>-</mo> <mi>n</mi> <mo>)</mo> </mrow> <mo>+</mo> <mi>&delta;</mi> <mrow> <mo>(</mo> <mi>m</mi> <mo>+</mo> <mi>n</mi> <mo>+</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>14</mn> <mo>)</mo> </mrow> </mrow> </math>
analyzing formula (13), when m is larger than 0 and is located in the search window [0, N ], a positive δ response can be always found at N ═ m, and a negative δ response is found at N ═ m-1; when m < 0 and is located in the negative mirror image of the search window [ -N, 0), it is always possible to find a negative delta response at N ═ m, while finding a positive delta response at N ═ m-1. As shown in fig. 3, the gray area is a search window, and when a positive δ response is found in the search window, it means that the object has rightward translation and the motion displacement is s; when a negative delta response is found in the search window, it means that the object has a translation to the left and the motion displacement is s-1. See fig. 3(a) delta response of the object translated to the right by s and fig. 3(b) delta response of the object translated to the left by s-1. FIG. 4 is a schematic view of the spatial location of sub-pixels.
In the specific calculation, theInstead of the former <math> <mrow> <mfrac> <mn>2</mn> <mi>N</mi> </mfrac> <munderover> <mi>&Sigma;</mi> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>N</mi> </munderover> <msup> <mi>C</mi> <mn>2</mn> </msup> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <msubsup> <mi>g</mi> <mi>m</mi> <mi>S</mi> </msubsup> <mi>sin</mi> <mrow> <mo>(</mo> <mfrac> <mi>k&pi;</mi> <mi>N</mi> </mfrac> <mrow> <mo>(</mo> <mi>n</mi> <mo>+</mo> <mn>0.5</mn> <mo>)</mo> </mrow> <mo>)</mo> </mrow> <mo>,</mo> </mrow> </math> To reduce computational complexity.
1.1.2 frequency domain sub-pixel search algorithm flow
Based on the above derivation, the flow of the frequency domain-based sub-pixel search algorithm is as follows:
1) determining the search window as N, extracting in the x direction to refer to the imageOne-dimensional signal x starting from a pixel point F1(n) and x of the corresponding position in the current image2(n)。
2) According to the formula (3-6), x is calculated1(n) and x2(n) four discrete DCT/DST transform coefficients.
3) Is calculated at [1, N]Interval gm SObtained from the following formulae (3 to 6) and (8):
<math> <mrow> <msubsup> <mi>g</mi> <mi>m</mi> <mi>S</mi> </msubsup> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfenced open='{' close='' separators=' '> <mtable> <mtr> <mtd> <mn>1</mn> <mo>,</mo> <mi>k</mi> <mo>=</mo> <mi>N</mi> </mtd> </mtr> <mtr> <mtd> <mrow> <mo>(</mo> <msubsup> <mi>Z</mi> <mn>1</mn> <mi>C</mi> </msubsup> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>&CenterDot;</mo> <msubsup> <mi>X</mi> <mn>2</mn> <mi>S</mi> </msubsup> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>-</mo> <msubsup> <mi>Z</mi> <mn>1</mn> <mi>S</mi> </msubsup> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>&CenterDot;</mo> <msubsup> <mi>X</mi> <mn>2</mn> <mi>C</mi> </msubsup> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>)</mo> </mrow> <mo>/</mo> <mrow> <mo>(</mo> <msup> <mrow> <mo>(</mo> <msubsup> <mi>Z</mi> <mn>1</mn> <mi>C</mi> </msubsup> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>)</mo> </mrow> <mn>2</mn> </msup> <mo>+</mo> <msup> <mrow> <mo>(</mo> <msubsup> <mi>Z</mi> <mn>1</mn> <mi>S</mi> </msubsup> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>)</mo> </mrow> <mn>2</mn> </msup> <mo>)</mo> </mrow> <mo>,</mo> <mi>k</mi> <mo>&Element;</mo> <mo>[</mo> <mn>1</mn> <mo>,</mo> <mrow> <mi>N</mi> <mo>)</mo> </mrow> </mtd> </mtr> </mtable> <mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>15</mn> <mo>)</mo> </mrow> </mrow> </mfenced> </mrow> </math>
4) root of herbaceous plantThe translation direction d in the x direction is obtained from equation (13)xAnd a displacement sx
5) Repeating the above steps in the y-direction yields d in the y-directiony、sy
6) Carrying parameter mx、myLooking up table 1, the matching points in fig. 4 are determined and the half-pel motion vector is determined.
TABLE 1m and motion vectors
mx myMatching point motion vectors
>0>0>0<0<0<0=0=0 >0 3 (0.5,0.5)<0 8 (0.5,-0.5)=0 5 (0.5,0)>0 1 (-0.5,0.5)<0 6 (-0.5,-0.5)=0 4 (-0.5,0)>0 2 (0,0.5)<0 7 (0,-0.5)
=0 =0 F (0,0)
7) If motion vectors of 1/4 pixel accuracy are required, the steps 1) -6) are repeated on the resulting pixel block using bilinear filter interpolation with the motion vectors obtained in 6).
FIG. 5 is a comparison of the computational complexity, i.e., computational performance, of the algorithm herein in a sub-pel search with a full search algorithm under various standard test sequences. Since the image composition of each test sequence is different and the computing environment is different, for convenience, the computation complexity of the full search algorithm in each test sequence is set to be 1 as a comparison reference.
The full name of DCT Transform is Discrete Cosine Transform (Discrete Cosine Transform), which refers to converting a set of light intensity data into frequency data so as to know the intensity variation. If the high-frequency data is modified and then converted back to the original form, the data is obviously different from the original data, but the human eyes cannot easily recognize the data. When compressing, the original image data is divided into 8 × 8 data cell matrices, for example, the first matrix of luminance values has the following contents:
y00 y01 y02 y03 y04 y05 y06 y07
y10 y11 y12 y13 y14 y15 y16 y17
y20 y21 y22 y23 y24 y25 y26 y27
y30 y31 y32 y33 y34 y35 y36 y37
y40 y41 y42 y43 y44 y45 y46 y47
y50 y51 y52 y53 y54 y55 y56 y57
y60 y61 y62 y63 y64 y65 y66 y67
y70 y71 y72 y73 y74 y75 y76 y77
JPEG refers to the entire luminance matrix and the chrominance Cb matrix, the saturation Cr matrix, as a basic unit called MCU. The number of matrices contained in each MCU must not exceed 10. For example, if the ratio of row and column samples is 4: 2, each MCU will contain four luminance matrices, one chrominance matrix and one saturation matrix.
After the image data is divided into 8-by-8 matrices, 128 values must be subtracted from each value, and then the subtracted values are substituted into a DCT transformation formula one by one, so that the DCT transformation can be achieved. The image data value must be subtracted by 128 because the DCT transformation formula accepts a number in the range of-128 to + 127.
DCT transform formula:
Figure A20051003853700151
x, y represent the coordinate position of a value within the image data matrix. (x, y) represents a number of values within the image data matrix. V represents the coordinate position of a value in the DCT transformed matrix, and F (u, v) represents a value in the DCT transformed matrix.
u-0 and v-0 c (u) c (v) 1/1.414
u > 0 or v > 0 c (u) c (v) 1
The natural number of the matrix data after DCT transform is the frequency coefficient, the coefficient is the largest with the value of F (0, 0) and is called DC, and the other 63 frequency coefficients are mostly positive and negative floating point numbers close to 0 and are generally called AC.
1.1.3 nodules
For video coding, motion search is performed in the frequency domain, which is advantageous not only in improving the performance of the motion search step itself. In a traditional video coding system, an encoder needs to perform repeated conversion between a spatial domain and a frequency domain, an algorithm based on the spatial domain is used during motion search, and the algorithm needs to be performed in the frequency domain during residual coding, so that the energy of a coefficient is concentrated in a low-frequency region, and the quantization is convenient. Frequent switching between the spatial domain and the frequency domain is quite resource consuming, and after the motion search algorithm in the frequency domain is adopted, the video encoder will complete all computations in the frequency domain, and the encoding process is as shown in fig. 6. Compared with the video coding process of searching motion vectors in the spatial domain, the quantization, storage and motion search of the sample signal after DCT conversion in the figure 6 are all completed in the frequency domain, which not only reduces the inverse DCT conversion step in the spatial domain coding process, but also more effectively reduces the space required by storage, and is beneficial to the optimization of the encoder and the decoder.
1.2 fast selection algorithm for intra prediction mode
1.2.1 Intra coding prediction modes
If there is not a strong temporal correlation between the current picture and the previous input picture, the frame picture is typically encoded as an I-frame, using an intra-coding mode. In the conventional video coding standard, an I-frame image is directly coded without using a prediction technology, namely macroblock data is directly transformed, quantized and coded and transmitted, so that the data volume of the I-frame image after coding is very large. In order to more effectively improve the coding efficiency, the video coding system of the invention fully utilizes the spatial redundancy among the pixels in the image and defines 16 × 16 and 4 × 4 prediction units. Distribution of pixels and surrounding pixels in fig. 74 × 4
In the intra prediction module of the present invention, if the current macroblock coding mode is intra coding, the prediction value of the macroblock is from the neighboring coded reconstructed macroblock. The luminance component may use a 16 × 16 macroblock or a 4 × 4 small block as a basic unit of intra prediction encoding. When a 16 × 16 macroblock is used as a coding unit, 4 prediction modes are available for selection; when 4 × 4 small blocks are used as a coding unit, there are 9 prediction modes available for selection. Two chroma components use 8 × 8 macro block as the basic unit of intra prediction coding, 4 prediction modes are available for selection, and the modes selected by the two chroma components must be the same. Since 4 x 4 small blocks are more elaborate, the computational complexity is mainly reflected in this unit.
The distribution of pixels in a 4 x 4 tile and surrounding pixels is shown in fig. 7, where the lower case english letters a to p denote the 16 pixels inside the tile and the upper case letters a to M denote the pixels surrounding the tile. Intra 4 × 4 uses 9 modes for prediction, where mode 2 is DC prediction and the remaining prediction mode directions are as shown in fig. 8 for intra 4 × 4 prediction modes. For example, if mode 1 is selected for horizontal prediction, the predicted value in the small block is from pixel I, J, K, L.
1.2.2 fast intra prediction coding mode selection algorithm
The intra-frame prediction mode selection algorithm provided by the invention utilizes the boundary direction histogram, the context model and the prediction coding mode of the small block at the same position of the previous frame to quickly select the available candidate prediction mode, carries out pre-coding according to the pre-selection mode and then utilizes the Lagrangian cost function to select the optimal prediction mode. To further reduce the amount of computation, the raw data is sub-sampled before the boundary direction vectors are computed. Taking intra 4 × 4 as an example, the fast intra prediction mode selection process is shown in fig. 9, which is a flow chart of fast intra 4 × 4 prediction mode selection, and each part of the process will be described separately below.
1.2.2.1 Pixel sub-sampling
The input original pixel data is sub-sampled 2: 1, the number of the sampled pixels is 1/2 of the original pixel number, and the time consumed for calculating the boundary direction vector of the sampled pixels is about 1/2. The pixel sub-sampling method employed herein is illustrated by the sub-sampling of the 104 x 4 patches, where in the sub-sampled image the filled circles represent the available sampled pixels.
1.2.2.2 mode selection based on boundary direction
The nature of images that are spatially continuous and correlated, and the pixels that make up an image have correlation in 8 prediction directions in space, can be exploited to reduce spatial redundancy, and if the direction with the strongest correlation can be found and the values of the pixels are encoded using intra prediction, the best effect of intra coding can be achieved. Sobel operator is used herein[3~5]To calculate the boundary direction vector of the sub-sampled pixel, the Sobel operator is - 1 - 2 - 1 0 0 0 1 2 1 And - 1 0 1 - 2 0 2 - 1 0 1 , for calculating the horizontal and vertical components of the boundary vector, respectively.
For a sub-sampled pixel pi, j, the corresponding boundary vector is <math> <mrow> <msub> <mover> <mi>D</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mi>i</mi> <mo>,</mo> <mi>j</mi> </mrow> </msub> <mo>=</mo> <mo>{</mo> <msub> <mi>dx</mi> <mrow> <mi>i</mi> <mo>,</mo> <mi>j</mi> </mrow> </msub> <mo>,</mo> <msub> <mi>dy</mi> <mrow> <mi>i</mi> <mo>,</mo> <mi>j</mi> </mrow> </msub> <mo>}</mo> <mo>,</mo> </mrow> </math> dxi, j and dyi, j represent the boundary vector levels, respectivelyAnd a vertical direction component. dxi j and dyi, j, as shown in equation 1, where pi-1,j+1Etc. refer to the neighboring pixels of the pixel pi, j in the original image.
dxi,j=pi-1,j+1+2×pi,j+1+pi+1,j+1-pi-1,j-1-2×pi,j-1-pi+1,j-1
dyi,j=pi+1,j-1+2×pi+1,j+pi+1,j+1-pi-1,j-1-2×pi-1,j-pi-1,j+1 (1)
For ease of calculation, the norm defining the boundary direction vector is:
<math> <mrow> <mi>Amp</mi> <mrow> <mo>(</mo> <msub> <mover> <mi>D</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mi>i</mi> <mo>,</mo> <mi>j</mi> </mrow> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mo>|</mo> <msub> <mi>dx</mi> <mrow> <mi>i</mi> <mo>,</mo> <mi>j</mi> </mrow> </msub> <mo>|</mo> <mo>+</mo> <mo>|</mo> <msub> <mi>dy</mi> <mrow> <mi>i</mi> <mo>,</mo> <mi>j</mi> </mrow> </msub> <mo>|</mo> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>2</mn> <mo>)</mo> </mrow> </mrow> </math>
the direction of the boundary direction vector is:
Figure A20051003853700165
the moduli of the vectors in the same direction in the small block are added to obtain a corresponding edge direction histogram (edge direction histogram), the establishment of the 4 × 4 edge direction histogram in the frame is shown in the following formula 3, and the direction with the largest modulus in the direction histogram is selected as a candidate prediction direction.
<math> <mrow> <mi>Histo</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <munder> <mi>&Sigma;</mi> <mrow> <mrow> <mo>(</mo> <mi>m</mi> <mo>,</mo> <mi>n</mi> <mo>)</mo> </mrow> <mo>&Element;</mo> <mi>SET</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> </mrow> </munder> <mi>Amp</mi> <mrow> <mo>(</mo> <msub> <mover> <mi>D</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mi>m</mi> <mo>,</mo> <mi>n</mi> </mrow> </msub> <mo>)</mo> </mrow> <mo>,</mo> </mrow> </math>
<math> <mrow> <mi>SET</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>&Element;</mo> <mo>{</mo> <mrow> <mo>(</mo> <mi>i</mi> <mo>,</mo> <mi>j</mi> <mo>)</mo> </mrow> <mo>|</mo> <mi>Ang</mi> <mrow> <mo>(</mo> <msub> <mover> <mi>D</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mi>i</mi> <mo>,</mo> <mi>j</mi> </mrow> </msub> <mo>)</mo> </mrow> <mo>&Element;</mo> <msub> <mi>a</mi> <mi>u</mi> </msub> <mo>}</mo> <mo>,</mo> </mrow> </math>
while
a0=(-103.30,-76.60]
a1=(-13.30,13.30]
a3=(35.80,54.20]
a4=(-54.20,-35.80]
a5=(-76.70,-54.20]
a6=(-35.80,-13.30]
a7=(54.20,-76.70]
a8=(13.30,35.80]
(3)
1.2.2.3 mode selection based on context model
There is spatial correlation between the small blocks of an image, so that the coding mode of the current small block can be predicted by using the coding modes of the adjacent small blocks. As shown in fig. 11, C denotes a current 4 × 4 tile, and a and B denote a 4 × 4 tile on the top of the current block and a 4 × 4 tile on the left of the current block. And using the maximum value of the A and B prediction modes as the candidate prediction mode of the current block. The current block and the neighboring blocks as shown in fig. 11
1.2.2.4 State mode selection based on blocks in the same position in the previous frame image
According to the coding mode of the 4 × 4 small block at the corresponding position of the current small block in the previous frame image, if the corresponding small block of the previous frame image is the used intra-frame coding mode, the coding mode of the corresponding small block in the previous frame image is selected as the candidate coding mode of the current 4 × 4 small block, as shown in fig. 12, which is a schematic diagram of the 4 × 4 small block at the same position of the current 4 × 4 small block and the previous frame.
1.2.2.5 precoding and Performance comparison
The pre-coding uses the pixels around the current small block, and the current small block is subjected to predictive coding in sequence according to the selected candidate prediction mode, and the optimal prediction mode is selected by using a Lagrangian cost function, wherein the Lagrangian cost function is as follows:
J(s,c,IMODE|QP,λMODE)=SSD(s,c,IMODE|QP)+λMODE·R(s,c,IMODE|QP) (4)
wherein, IMODE refers to several optional prediction directions for intra-frame prediction, SSD refers to the sum of mean square errors between an original pixel value s and a reconstructed pixel value c of 4 × 4 in a frame, R (s, c, IMODE | QP) refers to coding by using an IMODE mode, and the size of a code stream to be coded uses variable length Huffman coding. The peak signal-to-noise ratio (PSNR) is used for quality detection in video coding, and equation (5) is the equation for peak signal-to-noise ratio:
PSNR = 10 log 10 ( 255 2 MSE ) - - - - ( 5 )
1.2.3 results of the experiment
The test sequences used in the experiment were Mobile, Tempete, Bus, Paris, size QCIF, while only the luminance component was tested. The test results are shown in table 2.
TABLE 2 variation of coding Performance under different test sequences
Test sequence Change in the first I-frame image encoding time (%) Average change in bit rate of images per frame in sequence (%) Change in average per-frame image encoding time in sequence (%) Variation of image PSNR (dB)
Mobile -70.25 0.12 -33.56 -0.016
Tempete -69.78 0.26 -32.14 -0.014
Bus -69.58 0.39 -24.34 -0.024
Paris -71.03 0.42 -31.76 -0.021
2 System software composition block diagram (FIG. 13 System software composition block diagram)
In the software system of the system, the most core module is the video encoding and decoding device, and the two parts are the main bodies of the whole software architecture and are the biggest innovation of the invention. The video conference system designed by the invention uses RTP/RTCP protocol to transmit video and voice data. The real-time transport protocol (RTP) is responsible for packing and sending media data, and the RTCP is responsible for communicating the sending and receiving parties of video and voice data streams and transmitting feedback information and time synchronization information.
3 the system hardware is composed into a block diagram (shown in figure 14), and the system adopts an embedded design.
In a word, the video conference is a market which is rapidly growing, but as the industrial standards are not completely unified, the western countries cannot achieve monopoly on the core technology, and China is facing great opportunities and is expected to develop and develop pictures in the field. At present, products such as MCU, gatekeeper and the like of a part of video conference network equipment produced in China are internationally in the state of advanced technology and even leading, but for terminal equipment products of video conferences, China still lacks competitive products, and the market is almost completely occupied by foreign products. The AVCS-II video conference system developed by the applied physics research institute of Nanjing university is a new attempt and breakthrough in the field of video conference terminal products, especially in the video codec technology, in our country to a certain extent, and is expected to open the domestic and foreign video conference markets. The frequency domain-based sub-pixel motion search algorithm provided by the invention is technically innovative, and experiments and practical use of users prove that the algorithm has high accuracy and extremely low computation complexity, and can be used for rapidly matching the optimal motion vector. In addition to a unique video coding system, the system designed by the invention provides a rich video conference tool set, so that a complete video and data interaction platform is constructed for users.

Claims (3)

1. The video compression coding and decoding method comprises the following procedures of coding a video compression signal, Discrete Cosine Transform (DCT): DCT is a spatial transform, which generates DCT coefficient data blocks in units of blocks, and concentrates the energy of the blocks on a few low-frequency DCT coefficients in a general image; transformation and quantization: the quantization is carried out aiming at the DCT transform coefficients, the DCT coefficients are removed in a quantization process by a certain quantization step length, and in the DCT transform coefficients, the low-frequency coefficients have higher importance on visual induction, so that the distributed quantization precision is finer; the importance of the high-frequency coefficient to the visual induction is low, and the distributed quantization precision is coarse; before the coded bit stream enters a channel, a channel buffer needs to be set: channel buffer, which writes data from entropy coder via a buffer at variable bit rate, reads data from transmission system at nominal constant bit rate, and sends the data into channel; the buffer has to be provided with a control mechanism, and the bit rate of the encoder is adjusted by feedback control of a compression algorithm, so that the write-in data rate and the read-out data rate of the buffer tend to be balanced; and (3) motion estimation: when the method is used in an interframe coding mode, the estimation of a compressed image is generated by a reference frame image, the motion estimation is carried out by taking a macro block as a unit, and the position offset between the macro blocks at the corresponding positions of the compressed image and the reference image is calculated, wherein the position offset is described by motion vectors, and one motion vector represents the displacement in the horizontal direction and the vertical direction; in motion estimation, the P frame picture uses the last decoded I frame or P frame as a reference picture, which is called forward prediction; b-frame pictures, which use two frames of pictures as prediction references, called bi-directional prediction, where one reference frame precedes the coded frame in display order (forward prediction) and the other frame is later in display order than the coded frame (backward prediction), the reference frame of a B-frame being in any case an I-frame or a P-frame; and motion compensation: the motion vector calculated by motion estimation is used for moving the macro block in the reference frame image to the corresponding position in the horizontal and vertical directions, so that the prediction of the compressed image can be generated; and performing searching calculation on the sub-pixels;
the method is characterized in that: and after the motion search algorithm in the frequency domain is adopted, the quantization, the storage and the motion search of the sampled signal after the DCT transformation are all completed in the frequency domain by the video encoder, and all the calculation is completed in the frequency domain.
2. The video compression coding and decoding method of claim 1, wherein: the flow of the frequency domain-based subpixel searching algorithm is as follows:
1) determining the search window as N, extracting one-dimensional signal x starting from the whole pixel point F of the reference image in the x direction1(n) and x of the corresponding position in the current image2(n)。
2) According to formula (3 &6) Calculating x1(n) and x2(n) four discrete DCT/DST transform coefficients.
3) Is calculated at [1, N]Interval gm SObtained from the following formulae (3 to 6) and (8):
<math> <mrow> <msubsup> <mi>g</mi> <mi>m</mi> <mi>S</mi> </msubsup> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfenced open='{' close='-'> <mtable> <mtr> <mtd> <mn>1</mn> <mo>,</mo> <mi>k</mi> <mo>=</mo> <mi>N</mi> </mtd> </mtr> <mtr> <mtd> <mrow> <mo>(</mo> <msubsup> <mi>Z</mi> <mn>1</mn> <mi>C</mi> </msubsup> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>&CenterDot;</mo> <msubsup> <mi>X</mi> <mn>2</mn> <mi>S</mi> </msubsup> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>-</mo> <msubsup> <mi>Z</mi> <mn>1</mn> <mi>S</mi> </msubsup> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>&CenterDot;</mo> <msubsup> <mi>X</mi> <mn>2</mn> <mi>C</mi> </msubsup> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>)</mo> </mrow> <mo>/</mo> <mo>(</mo> <msup> <mrow> <mrow> <mo>(</mo> <msubsup> <mi>Z</mi> <mn>1</mn> <mi>C</mi> </msubsup> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> </mrow> <mo>)</mo> </mrow> <mn>2</mn> </msup> <mo>+</mo> <msup> <mrow> <mo>(</mo> <msubsup> <mi>Z</mi> <mn>1</mn> <mi>S</mi> </msubsup> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>)</mo> </mrow> <mn>2</mn> </msup> <mo>)</mo> <mo>,</mo> <mi>k</mi> <mo>&Element;</mo> <mrow> <mo>(</mo> <mn>1</mn> <mo>,</mo> <mi>N</mi> <mo>)</mo> </mrow> </mtd> </mtr> </mtable> </mfenced> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>15</mn> <mo>)</mo> </mrow> </mrow> </math>
4) the translation direction d in the x direction is obtained from equation (13)xAnd a displacement sx
5) Repeating the above steps in the y-direction yields d in the y-directiony、sy
6) Carrying parameter mx、myLooking up Table 1, determine that in FIG. 4The points are matched and a half-pel motion vector is determined.
m and motion vector mx myMatching point motion vectors >0>0>0<0<0<0=0=0 >0 3 (0.5,0.5)<0 8 (0.5,-0.5)=0 5 (0.5,0)>0 1 (-0.5,0.5)<0 6 (-0.5,-0.5)=0 4 (-0.5,0)>0 2 (0,0.5)<0 7 (0,-0.5) =0 =0 F (0,0)
7) If motion vectors of 1/4 pixel accuracy are required, the steps 1) -6) are repeated on the resulting pixel block using bilinear filter interpolation with the motion vectors obtained in 6).
3. The video compression coding and decoding method according to claim 1, wherein the intra-frame prediction mode selection algorithm uses the boundary direction histogram, the context model and the prediction coding mode of the same position small block of the previous frame to quickly select an available candidate prediction mode, performs pre-coding according to a pre-selection mode, and then selects an optimal prediction mode by using a Lagrange cost function; the method comprises the following steps of firstly performing sub-sampling on original data before calculating a boundary direction vector; pixel sub-sampling: the input original pixel data is sub-sampled 2: 1, the number of the sampled pixels is 1/2 of the original pixel number, and the time consumed for calculating the boundary direction vector of the sampled pixels is about 1/2. Mode selection based on boundary direction
The natural image is continuous and correlated in space, each pixel composing the image has correlation in 8 prediction directions in space, and Sobel operator is used[3~5]To calculate the boundary direction vector of the sub-sampled pixel, the Sobel operator is - 1 - 2 - 1 0 0 0 1 2 1 And - 1 0 1 - 2 0 2 - 1 0 1 , used to calculate the horizontal and vertical components of the boundary vector, respectively; for a sub-sampled pixel pi, j, the corresponding boundary vector is <math> <mrow> <msub> <mover> <mi>D</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mi>i</mi> <mo>,</mo> <mi>j</mi> </mrow> </msub> <mo>=</mo> <mo>{</mo> <msub> <mi>dx</mi> <mrow> <mi>i</mi> <mo>,</mo> <mi>j</mi> </mrow> </msub> <mo>,</mo> <msub> <mi>dy</mi> <mrow> <mi>i</mi> <mo>,</mo> <mi>j</mi> </mrow> </msub> <mo>}</mo> <mo>,</mo> </mrow> </math> dxi, j and dyi, j represent the boundary vector horizontal and vertical components, respectively. dxi j and dyi, j, as shown in equation 1, where pi-1,j+1Etc. refer to the neighboring pixels of the pixel pi, j in the original image. dx (x)i,j=pi-1,j+1+2×pi,j+1+pi+1,j+1-pi-1,j-1-2×pi,j-1-pi+1,j-1dyi,j=pi+1,j-1+2×pi+1,j+pi+1,j+1-pi-1,j-1-2×pi-1,j-pi-1,j+1 (1)
For ease of calculation, the norm defining the boundary direction vector is:
<math> <mrow> <mi>Amp</mi> <mrow> <mo>(</mo> <msub> <mover> <mi>D</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mi>i</mi> <mo>,</mo> <mi>j</mi> </mrow> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mo>|</mo> <msub> <mi>dx</mi> <mrow> <mi>i</mi> <mo>,</mo> <mi>j</mi> </mrow> </msub> <mo>|</mo> <mo>+</mo> <mo>|</mo> <msub> <mi>dy</mi> <mrow> <mi>i</mi> <mo>,</mo> <mi>j</mi> </mrow> </msub> <mo>|</mo> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>2</mn> <mo>)</mo> </mrow> </mrow> </math>
the direction of the boundary direction vector is:
adding the moduli of the vectors in the same direction in the small block to obtain a corresponding Edge direction histogram (Edge direction histogram), wherein the establishment of the 4 × 4 Edge direction histogram in the frame is shown in the following formula 3, and the direction with the largest modulus in the direction histogram is selected as a candidate prediction direction;
<math> <mrow> <mi>Histo</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <munder> <mi>&Sigma;</mi> <mrow> <mrow> <mo>(</mo> <mi>m</mi> <mo>,</mo> <mi>n</mi> <mo>)</mo> </mrow> <mo>&Element;</mo> <mi>SET</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> </mrow> </munder> <mi>Amp</mi> <mrow> <mo>(</mo> <msub> <mover> <mi>D</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mi>m</mi> <mo>,</mo> <mi>n</mi> </mrow> </msub> <mo>)</mo> </mrow> <mo>,</mo> </mrow> </math>
<math> <mrow> <mi>SET</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>&Element;</mo> <mo>{</mo> <mrow> <mo>(</mo> <mi>i</mi> <mo>,</mo> <mi>j</mi> <mo>)</mo> </mrow> <mo>|</mo> <mi>Ang</mi> <mrow> <mo>(</mo> <msub> <mover> <mi>D</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mi>i</mi> <mo>,</mo> <mi>j</mi> </mrow> </msub> <mo>)</mo> </mrow> <mo>&Element;</mo> <msub> <mi>a</mi> <mi>u</mi> </msub> <mo>}</mo> <mo>,</mo> </mrow> </math>
while
a0=(-103.3°,-76.6°]
a1=(-13.3°,13.3°]
a3=(35.8°,54.2°]
a4=(-54.2°,-35.8°]
a5=(-76.7°,-54.2°]
a6=(-35.8°,-13.3°]
a7=(54.2°,-76.7°]
a8=(13.3°,35.8°]
(3) according to the coding mode of the 4 x 4 small block at the corresponding position of the current small block in the previous frame image, if the corresponding small block of the previous frame image is the used intra-frame coding mode, the coding mode of the corresponding small block in the previous frame image is selected as the candidate coding mode of the current 4 x 4 small block;
the pre-coding uses the pixels around the current small block, and the current small block is subjected to predictive coding in sequence according to the selected candidate prediction mode, and the optimal prediction mode is selected by using a Lagrangian cost function, wherein the Lagrangian cost function is as follows:
J(s,c,IMODE|QP,λMODE)=SSD(s,c,IMODE|QP)+λMODEr (s, c, IMODE | QP) (4) where IMODE refers to several alternatives for intra predictionThe prediction direction, SSD, refers to the sum of the mean square errors between the original 4 × 4 pixel values s and the reconstructed pixel values c in the frame, and R (s, c, IMODE | QP), refers to the size of the code stream to be coded using the IMODE mode, and uses variable length huffman coding.
CN200510038537.8A 2005-03-23 2005-03-23 Video compression coding-decoding method Pending CN1658673A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN200510038537.8A CN1658673A (en) 2005-03-23 2005-03-23 Video compression coding-decoding method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN200510038537.8A CN1658673A (en) 2005-03-23 2005-03-23 Video compression coding-decoding method

Publications (1)

Publication Number Publication Date
CN1658673A true CN1658673A (en) 2005-08-24

Family

ID=35007890

Family Applications (1)

Application Number Title Priority Date Filing Date
CN200510038537.8A Pending CN1658673A (en) 2005-03-23 2005-03-23 Video compression coding-decoding method

Country Status (1)

Country Link
CN (1) CN1658673A (en)

Cited By (33)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100466736C (en) * 2005-12-30 2009-03-04 杭州华三通信技术有限公司 Motion picture coding control method and coding device
WO2009049533A1 (en) * 2007-10-15 2009-04-23 Huawei Technologies Co., Ltd. The video encoding and decoding method and corresponding codec based on the motion skip mode
CN100502509C (en) * 2006-11-02 2009-06-17 中山大学 An Image Compression Coding Method Applied to Streaming Media
WO2009086761A1 (en) * 2008-01-04 2009-07-16 Huawei Technologies Co., Ltd. Video encoding and decoding method and device and video processing system
WO2009121233A1 (en) * 2008-03-31 2009-10-08 深圳市融创天下科技发展有限公司 Spatial prediction method for video encoding
CN101835044A (en) * 2010-04-23 2010-09-15 南京邮电大学 A Classification and Combination Method in Frequency Domain Distributed Video Coding
CN101266760B (en) * 2007-03-13 2010-11-10 凌阳科技股份有限公司 Method and system for integrating reverse crossing and over-driving to process image data on LCD
CN101470893B (en) * 2007-12-26 2011-01-19 中国科学院声学研究所 An Acceleration Method of Vector Graphics Display Based on Bitmap Buffer
CN101534436B (en) * 2008-03-11 2011-02-02 深圳市融创天下科技发展有限公司 Allocation method of video image macro-block-level self-adaptive code-rates
CN102077596A (en) * 2008-07-01 2011-05-25 索尼公司 Image processing device and method
CN102143361A (en) * 2011-01-12 2011-08-03 浙江大学 Video coding method and video coding device
CN101383897B (en) * 2007-09-05 2011-08-17 索尼株式会社 Image processing device and method
CN102158716A (en) * 2011-01-28 2011-08-17 北京视博云科技有限公司 Method for optimizing video and device
CN102202220A (en) * 2010-03-25 2011-09-28 佳能株式会社 Encoding apparatus and control method for encoding apparatus
CN102316313A (en) * 2010-06-29 2012-01-11 凌阳科技股份有限公司 Low-complexity bit rate control method in embedded real-time video compression system
CN101155302B (en) * 2006-09-25 2012-03-07 张燕生 Video coding and decoding method based on image block data rotating and transforming
CN102413324A (en) * 2010-09-20 2012-04-11 联合信源数字音视频技术(北京)有限公司 Precoding code table optimization method and precoding method
CN101237579B (en) * 2007-02-02 2012-05-23 三星电子株式会社 Apparatus and method of up-converting frame rate of decoded frame
CN101647278B (en) * 2006-12-12 2012-05-30 梵提克斯公司 Improved video rate control for video encoding standards
CN101778296B (en) * 2009-01-09 2012-05-30 深圳市融创天下科技股份有限公司 Method for coding video signal
CN102647559A (en) * 2012-04-26 2012-08-22 广州盈可视电子科技有限公司 Pan-tilt tracing and recording method and device
CN101836453B (en) * 2007-09-03 2012-09-26 思科系统国际公司 Method for alternating entropy coding
CN101895739B (en) * 2009-05-20 2012-12-19 深圳市融创天下科技股份有限公司 Block statistical characteristic-based block encoding method
CN103152577A (en) * 2009-08-17 2013-06-12 三星电子株式会社 Method and apparatus for encoding video, and method and apparatus for decoding video
CN104837024A (en) * 2011-08-29 2015-08-12 苗太平洋控股有限公司 Kim kwang-je; oh hyun-oh
CN101637025B (en) * 2007-03-14 2016-03-23 日本电信电话株式会社 Quantization control method and quantization control device
CN102308580B (en) * 2009-02-05 2016-05-04 汤姆森特许公司 Method and apparatus for adaptive mode video encoding and decoding
CN106851312A (en) * 2010-07-09 2017-06-13 三星电子株式会社 Method and apparatus for being coded and decoded to motion vector
CN106878748A (en) * 2010-08-17 2017-06-20 M&K控股株式会社 Device for decoding moving pictures
CN109495749A (en) * 2018-12-24 2019-03-19 上海国茂数字技术有限公司 A kind of coding and decoding video, search method and device
WO2019129130A1 (en) * 2017-12-31 2019-07-04 华为技术有限公司 Image prediction method and device and codec
CN110351560A (en) * 2019-07-17 2019-10-18 深圳市网心科技有限公司 A kind of coding method, system and electronic equipment and storage medium
RU2772639C2 (en) * 2017-12-31 2022-05-23 Хуавей Текнолоджиз Ко., Лтд. Codec, device and method for predicting image

Cited By (54)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100466736C (en) * 2005-12-30 2009-03-04 杭州华三通信技术有限公司 Motion picture coding control method and coding device
CN101155302B (en) * 2006-09-25 2012-03-07 张燕生 Video coding and decoding method based on image block data rotating and transforming
CN100502509C (en) * 2006-11-02 2009-06-17 中山大学 An Image Compression Coding Method Applied to Streaming Media
CN101647278B (en) * 2006-12-12 2012-05-30 梵提克斯公司 Improved video rate control for video encoding standards
CN102638687A (en) * 2006-12-12 2012-08-15 梵提克斯公司 An improved video rate control for video coding standards
CN101237579B (en) * 2007-02-02 2012-05-23 三星电子株式会社 Apparatus and method of up-converting frame rate of decoded frame
CN101266760B (en) * 2007-03-13 2010-11-10 凌阳科技股份有限公司 Method and system for integrating reverse crossing and over-driving to process image data on LCD
CN101637025B (en) * 2007-03-14 2016-03-23 日本电信电话株式会社 Quantization control method and quantization control device
CN101836453B (en) * 2007-09-03 2012-09-26 思科系统国际公司 Method for alternating entropy coding
CN101383897B (en) * 2007-09-05 2011-08-17 索尼株式会社 Image processing device and method
WO2009049533A1 (en) * 2007-10-15 2009-04-23 Huawei Technologies Co., Ltd. The video encoding and decoding method and corresponding codec based on the motion skip mode
CN101470893B (en) * 2007-12-26 2011-01-19 中国科学院声学研究所 An Acceleration Method of Vector Graphics Display Based on Bitmap Buffer
CN101478672B (en) * 2008-01-04 2012-12-19 华为技术有限公司 Video encoding, decoding method and device, and video processing system
WO2009086761A1 (en) * 2008-01-04 2009-07-16 Huawei Technologies Co., Ltd. Video encoding and decoding method and device and video processing system
CN101534436B (en) * 2008-03-11 2011-02-02 深圳市融创天下科技发展有限公司 Allocation method of video image macro-block-level self-adaptive code-rates
CN101552924B (en) * 2008-03-31 2011-08-03 深圳市融创天下科技发展有限公司 Spatial prediction method for video coding
WO2009121233A1 (en) * 2008-03-31 2009-10-08 深圳市融创天下科技发展有限公司 Spatial prediction method for video encoding
CN102077596A (en) * 2008-07-01 2011-05-25 索尼公司 Image processing device and method
CN101778296B (en) * 2009-01-09 2012-05-30 深圳市融创天下科技股份有限公司 Method for coding video signal
CN102308580B (en) * 2009-02-05 2016-05-04 汤姆森特许公司 Method and apparatus for adaptive mode video encoding and decoding
CN101895739B (en) * 2009-05-20 2012-12-19 深圳市融创天下科技股份有限公司 Block statistical characteristic-based block encoding method
US9369715B2 (en) 2009-08-17 2016-06-14 Samsung Electronics Co., Ltd. Method and apparatus for encoding video, and method and apparatus for decoding video
CN103152577A (en) * 2009-08-17 2013-06-12 三星电子株式会社 Method and apparatus for encoding video, and method and apparatus for decoding video
US9319686B2 (en) 2009-08-17 2016-04-19 Samsung Electronics Co., Ltd. Method and apparatus for encoding video, and method and apparatus for decoding video
US9313503B2 (en) 2009-08-17 2016-04-12 Samsung Electronics Co., Ltd. Method and apparatus for encoding video, and method and apparatus for decoding video
US9313502B2 (en) 2009-08-17 2016-04-12 Samsung Electronics Co., Ltd. Method and apparatus for encoding video, and method and apparatus for decoding video
US9392283B2 (en) 2009-08-17 2016-07-12 Samsung Electronics Co., Ltd. Method and apparatus for encoding video, and method and apparatus for decoding video
CN102202220A (en) * 2010-03-25 2011-09-28 佳能株式会社 Encoding apparatus and control method for encoding apparatus
CN102202220B (en) * 2010-03-25 2015-05-13 佳能株式会社 Encoding apparatus and control method for encoding apparatus
CN101835044A (en) * 2010-04-23 2010-09-15 南京邮电大学 A Classification and Combination Method in Frequency Domain Distributed Video Coding
CN101835044B (en) * 2010-04-23 2012-04-11 南京邮电大学 A Classification and Combination Method in Frequency Domain Distributed Video Coding
CN102316313A (en) * 2010-06-29 2012-01-11 凌阳科技股份有限公司 Low-complexity bit rate control method in embedded real-time video compression system
CN102316313B (en) * 2010-06-29 2013-08-28 凌阳科技股份有限公司 Low-complexity bit rate control method in embedded real-time video compression system
CN106851312B (en) * 2010-07-09 2019-09-13 三星电子株式会社 Method and device for encoding and decoding motion vectors
CN106851312A (en) * 2010-07-09 2017-06-13 三星电子株式会社 Method and apparatus for being coded and decoded to motion vector
CN106878748A (en) * 2010-08-17 2017-06-20 M&K控股株式会社 Device for decoding moving pictures
CN106878748B (en) * 2010-08-17 2019-12-06 M&K控股株式会社 Device for decoding moving pictures
CN102413324A (en) * 2010-09-20 2012-04-11 联合信源数字音视频技术(北京)有限公司 Precoding code table optimization method and precoding method
CN102143361A (en) * 2011-01-12 2011-08-03 浙江大学 Video coding method and video coding device
CN102143361B (en) * 2011-01-12 2013-05-01 浙江大学 Video coding method and video coding device
CN102158716A (en) * 2011-01-28 2011-08-17 北京视博云科技有限公司 Method for optimizing video and device
CN104837024B (en) * 2011-08-29 2016-04-27 苗太平洋控股有限公司 For the device of the movable information under merging patterns of decoding
CN105376577A (en) * 2011-08-29 2016-03-02 苗太平洋控股有限公司 Apparatus for decoding motion information in merge mode
CN104837024A (en) * 2011-08-29 2015-08-12 苗太平洋控股有限公司 Kim kwang-je; oh hyun-oh
CN102647559B (en) * 2012-04-26 2016-04-13 广州盈可视电子科技有限公司 A kind of The Cloud Terrace follows the tracks of the method and apparatus recorded
CN102647559A (en) * 2012-04-26 2012-08-22 广州盈可视电子科技有限公司 Pan-tilt tracing and recording method and device
WO2019129130A1 (en) * 2017-12-31 2019-07-04 华为技术有限公司 Image prediction method and device and codec
CN109996081A (en) * 2017-12-31 2019-07-09 华为技术有限公司 Image prediction method, device and codec
RU2772639C2 (en) * 2017-12-31 2022-05-23 Хуавей Текнолоджиз Ко., Лтд. Codec, device and method for predicting image
US11528503B2 (en) 2017-12-31 2022-12-13 Huawei Technologies Co., Ltd. Picture prediction method and apparatus, and codec
CN109996081B (en) * 2017-12-31 2023-09-12 华为技术有限公司 Image prediction method, device and codec
US12069294B2 (en) 2017-12-31 2024-08-20 Huawei Technologies Co., Ltd. Picture prediction method and apparatus, and codec
CN109495749A (en) * 2018-12-24 2019-03-19 上海国茂数字技术有限公司 A kind of coding and decoding video, search method and device
CN110351560A (en) * 2019-07-17 2019-10-18 深圳市网心科技有限公司 A kind of coding method, system and electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN1658673A (en) Video compression coding-decoding method
CN1207916C (en) Apparatus and method for compressing motion vector field
CN102084655B (en) Video encoding by filter selection
CN1254113C (en) Image encoding device, image encoding method, image decoding device, image decoding method, and communication device
CN101411195B (en) Encoding and decoding of interlaced video
CN101889405B (en) Method and apparatus for performing motion estimation
CN104994395B (en) Video encoding/decoding method
CN1280709C (en) Parameterization of fade compensation
CN1214647C (en) Image encoding method and image encoder
CN1615645A (en) Coding dynamic filters
CN1290342C (en) Device and method capable of performing block comparison motion compensation and global motion compensation
CN101621687B (en) Methodfor converting video code stream from H. 264 to AVS and device thereof
CN1910933A (en) Image information encoding device and image information encoding method
CN1691779A (en) Video transcoding method and apparatus and motion vector interpolation method
CN116456101A (en) Image coding method, image decoding method and related device
CN1605213A (en) skip macroblock coding
CN1625265A (en) Method and apparatus for scalable video encoding and decoding
CN1835595A (en) Image encoding/decoding method and device thereof
CN1240226C (en) Video transcoder with drift compensation
CN101069429A (en) Method and apparatus for multi-layered video encoding and decoding
JP2005354686A (en) Method and system for selecting optimal coding mode for each macroblock in video
CN1774930A (en) Video transcoding
CN1663258A (en) Improved interpolation of video compression frames
CN102422643A (en) Image processing apparatus, method and program
CN104811728B (en) A kind of method for searching motion of video content adaptive

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
AD01 Patent right deemed abandoned
C20 Patent right or utility model deemed to be abandoned or is abandoned