Third, the invention
The purpose of the invention is: the video compression coding algorithm system which is designed autonomously is adopted, perfect balance is achieved on network overhead and calculation amount, the characteristics of high compression ratio and low calculation load are achieved, the compression ratio close to H.264 can be provided, and the calculation load can be reduced to the level close to H.263.
The video compression coding and decoding method is characterized by comprising the following procedures of coding a video compression signal, Discrete Cosine Transform (DCT): DCT is a spatial transform, which is performed on a block-by-block basis to generate DCT coefficient data blocks, and concentrates the energy of blocks on a few low-frequency DCT coefficients in a general image; transformation and quantization: the quantization is carried out on DCT transform coefficients, the DCT coefficients are removed in a certain quantization step size in the quantization process, and different quantization precisions are adopted for 64 DCT transform coefficients in a DCT transform block so as to ensure that the DCT transform block contains specific DCT spatial frequency information as much as possible and ensure that the quantization precision does not exceed the requirement. In the DCT transform coefficients, the low-frequency coefficient has higher importance on visual induction, so the distributed quantization precision is thinner; the importance of the high-frequency coefficient to the visual induction is low, the distributed quantization precision is coarse, and most high-frequency coefficients in one DCT block become zero after being quantized;
a channel buffer is required before the coded bit stream enters the channel. The channel buffer writes data from the entropy coder via a buffer at a variable bit rate, and reads data out into the channel at a nominally constant bit rate of the transmission system. The buffer size, or capacity, is set, but the instantaneous output bit rate of the encoder is often significantly higher or lower than the transmission system band, which may cause buffer overflow or underflow. Therefore, the buffer needs to have a control mechanism, and the bit rate of the encoder is adjusted by feedback control of the compression algorithm, so that the write data rate and the read data rate of the buffer tend to be balanced. The buffer controls the compression algorithm by controlling the quantization step of the quantizer, when the instantaneous output rate of the encoder is too high and the buffer overflows, the quantization step is increased to reduce the encoding data rate, and the loss of the image is correspondingly increased; when the instantaneous output rate of the encoder is too low and the buffer is about to overflow, the quantization step size is decreased to increase the encoded data rate.
And (3) motion estimation: when motion estimation is used in inter-frame coding, an estimate of the compressed image is generated by reference to the frame image. The motion estimation is performed on a macroblock-by-macroblock basis, and the positional shift between macroblocks at the corresponding positions of the compressed image and the reference image is calculated. This positional offset is described in terms of motion vectors, one representing displacement in both the horizontal and vertical directions. In motion estimation, the reference frame pictures used by P-frame and B-frame pictures are different. The P frame picture uses the last decoded I frame or P frame as the reference picture, called forward prediction; b-frame pictures, which use two frames of pictures as prediction references, called bi-directional prediction, where one reference frame precedes the coded frame in display order (forward prediction) and the other frame is later in display order than the coded frame (backward prediction), the reference frame of a B-frame being in any case an I-frame or a P-frame;
and motion compensation: the motion vector calculated by motion estimation is used to move the macro block in the reference frame image to the corresponding position in the horizontal and vertical directions, so as to generate the prediction of the compressed image. Motion is ordered in most natural scenes. The difference value between the prediction image generated by such motion compensation and the compressed image is small.
The invention is characterized in that: and after the motion search algorithm in the frequency domain is adopted, the quantization, the storage and the motion search of the sampled signal after the DCT transformation are all completed in the frequency domain by the video encoder, and all the calculation is completed in the frequency domain.
The basis of the invention also includes: in run-length coding, only non-zero coefficients are coded. The coding of a non-zero coefficient consists of two parts: the first part represents the number of consecutive zero coefficients preceding a non-zero coefficient (called run) and the latter part is that non-zero coefficient. This gives the advantage of the type scanning, since the type scanning has a high probability of zero-linking in most cases, and the efficiency of run-length coding is high. When the rear remaining DCT coefficients in the one-dimensional sequence are all zero, the encoding of this 8 x 8 transform block can be ended, as indicated by an "end of block" flag (EOB), and the resulting compression effect is very significant.
Subjective evaluation of digital image quality: the conditions for subjective evaluation include: panel structure, viewing distance, test image, ambient illumination, background tint, and the like. The evaluation group consists of a certain number of observers, wherein the professionals and the non-professionals respectively account for a certain proportion. The viewing distance is 3-6 times the diagonal dimension of the display. The test image is composed of a number of image sequences with certain image details and motion. Subjective evaluation reflects the average of many people's statistical evaluations of picture quality.
Type scan and run length coding: the DCT transform produces an 8 x 8 two-dimensional array that must be converted to a one-dimensional arrangement for transmission. There are two-dimensional to one-dimensional conversion modes, or scan modes: type-Zig (Zig-Zag) and alternating scans, of which type-Zig is the most common one. Since most of the non-zero DCT coefficients are concentrated in the upper left corner of the 8 x 8 two-dimensional matrix, i.e. the low frequency component region, after the pattern scan, these non-zero DCT coefficients are concentrated in the front of the one-dimensional array, followed by long strings of quantized zero DCT coefficients, which creates conditions for run-length coding. Entropy coding, an efficient discrete representation of the DCT coefficients generated by quantization, is bit stream encoded prior to transmission to produce a digital bit stream for transmission. Entropy coding is based on the statistical properties of the encoded signal such that the average bit rate is reduced. The run and non-zero coefficients can be entropy coded independently or jointly. In the Huffman coding used in entropy coding, a code table is produced after the probabilities of all coded signals are determined, less bits are allocated to frequently occurring signals with large probability for representation, more bits are allocated to infrequently occurring signals with small probability for representation, and the average length of the whole code stream tends to be shortest.
The invention is characterized in that: the performance of the motion search step itself is improved. In a traditional video coding system, an encoder needs to perform repeated conversion between a spatial domain and a frequency domain, an algorithm based on the spatial domain is used during motion search, and the algorithm needs to be performed in the frequency domain during residual coding, so that the energy of a coefficient is concentrated in a low-frequency region, and the quantization is convenient. Frequent switching between the spatial domain and the frequency domain is rather resource consuming.
The invention is also characterized in that: a set of first and then complete video conference system is provided. The technical core of the video conference is a video coding and decoding algorithm system, the invention develops deep research in the field, provides an innovative frequency domain-based sub-pixel motion search algorithm, and establishes a set of efficient and stable video coding algorithm system, and the video coding algorithm system not only has high coding efficiency, but also has far lower computational complexity than other algorithms of the same kind, and is easy to realize on a low-cost hardware platform. The invention provides an original searching algorithm in the sub-pixel searching step of the motion search, which can reduce the computational complexity to below 10 percent and simultaneously can ensure the sufficient accuracy of the searching result.
In addition, the system of the present invention implements the following functions:
a sub-pixel motion search is performed in the frequency domain using a novel video coding algorithm.
And providing a remote electronic whiteboard, a remote slide and data sharing.
12' liquid crystal touch screen, can draw any figure and communicate with characters.
And mobile phone access is supported, and the conference data is sent to the mobile terminal in time.
And the built-in web server provides a user interface to modify the coding parameters.
The built-in disk video recorder records the video image for a very long time.
And a usb interface is provided, so that data exchange and usb digital camera connection are facilitated.
A built-in high-sensitivity motion detection algorithm can be used as security monitoring.
Fifth, detailed description of the invention
1 video compression coding algorithm
Fig. 1 is a block diagram of a video compression coding algorithm employed in the present invention.
The introduction of each algorithm module in the block diagram is as follows:
a. motion search (motion estimation)
Motion search (or called motion estimation) is one of the core technologies in the field of video compression coding, and is also an algorithm module which consumes the most system computing resources in video coding. The video coding scheme of the invention adopts a conventional hybrid search algorithm in the integer pixel search; in the sub-pixel search, the invention realizes the original search technology. This novel search algorithm will be described in detail later.
b. Intra prediction
In a video stream, each frame of image may be encoded in an I frame (intra-frame prediction frame) or a P frame (inter-frame prediction frame). When the P frame is coded, the information in the self image is not directly used as a coded data source, but motion search is carried out in the previously coded image to find out motion information which is used as a basis for inter-frame prediction, and then the difference value of the two frame images is coded. This can greatly reduce the number of bits used to describe the image, thereby achieving the purpose of compression. The I-frame is encoded without the aid of any previous image, but by using the pixels of the already encoded part itself to predict the values of the pixels of the non-encoded part. I-frames are less efficient than P-frames, but are an important building block in video streams because they provide the ability to resynchronize. If a certain frame loses packets during transmission, a subsequent P frame predicted by the frame cannot be correctly decoded, but because the I frame is self-contained and does not refer to any previous image, the code stream is resynchronized here, and errors are limited within a certain range. Due to the importance of I-frames, intra-prediction algorithms for I-frames are also one of the major research points for any video coding scheme. The invention provides a novel intra-frame prediction algorithm in the following, and provides efficient and stable intra-frame prediction performance under the condition of limited calculation cost.
c. Rate distortion optimization
An optimal scheme is selected among the respective coding modes. There are many coding mode and parameter decision problems in video coding. For example, what value the motion vector should take in inter prediction, what the search accuracy is, and the selection of these coding parameters and modes depends on the rate-distortion optimization algorithm. The rate-distortion optimization algorithm evaluates each candidate coding mode or parameter, and then picks out the optimal mode according to a certain rule. This selection method generally measures both the coding efficiency (i.e., compression performance) and the snr after compression. The relationship between these two performance indicators is non-linear, and in order to increase the computation speed, reduce the computation overhead of the system,
the video compression coding scheme of the invention adopts Lagrange operators to realize linear approximation. The following equation is the lagrangian in this scheme. Wherein, DREC is distortion degree, PREC is coding efficiency after prediction, Sk and Q are coding modes and parameters to be selected, JMODE is total cost value, and the coding mode and parameter with the minimum JMODE value are optimal values to be selected.
JMODE(Sk,Ik,λ)=DREC(Sk,Q)+λRREC(Sk,Q)
d. Code rate control
And monitoring the channel condition and making a decision on the allocation of the code rate. This algorithm module uses a leaky bucket model as shown in fig. 2 to detect the transmission condition of the channel.
e. Memory management
The logical and physical management of the memory and the responsibility for reference frame queue management. When encoding a P frame, it is necessary to perform motion search with reference to an image that has been encoded or decoded in the past, and therefore, it is necessary to establish a reference frame queue and store reference frame data at the same time of encoding and decoding. The same memory logic model is used between the encoder and decoder, each independently maintaining a queue of reference frames, passing only minimal information for synchronization.
f. Entropy coding
Various methods of compression of video sequences are centered around three aspects: eliminating temporal redundancy, eliminating spatial redundancy, eliminating statistical redundancy. Inter and intra prediction addresses temporal and spatial redundancy, respectively, and the method of eliminating statistical redundancy is known as entropy coding. The video coding algorithm system of the invention adopts mature Huffman algorithm as entropy coding.
g. Transformation and quantization
The residual data is time-frequency transformed and quantized in the frequency domain.
1.1 sub-pel motion search
Motion search (or referred to as motion estimation) is one of the core techniques in the field of video compression coding. Video signals, after analog-to-digital conversion, have a huge data size and cannot be directly stored or used for communication. However, natural objects appearing in video images are slowly varying with respect to the high sampling frequency, which results in a great redundancy in the original video information both in the time domain and in the spatial domain. The basic principle of the motion search technology is to search adjacent images in a video sequence, find out motion information and motion vectors, and replace the original information of the corresponding images with data representing the motion of an object, thereby greatly eliminating time redundancy and achieving the purpose of data compression.
The accuracy of modern motion search algorithms is no longer limited to whole pixels. Experiments prove that when the sub-pixel precision of half pixel or more is achieved, the code rate after coding is obviously reduced. Under the condition of low noise, when the search precision is doubled, the compression ratio can be improved by about 0.5bit/sample, and the average code rate after coding can be reduced by 24.41-36.92%. However, when the search accuracy reaches 1/8 pixels or more, the increase in the compression ratio is no longer significant due to the noise enhancement. Currently, the mainstream video coding standard adopts a sub-pixel search technology to improve the coding performance, and introduces a half-pixel motion search in H.263 and MPEG-2, while MPEG-4 and the newly established H.264 use a motion search with 1/4 pixel precision.
In the existing sub-pixel search algorithm, the widely used technology is a full search algorithm based on a spatial domain or various fast algorithms of full search, the algorithms search the best matching block in a search window by taking a pixel block as a unit, and take the average variance sum or the sum of absolute differences as a judgment rule, the search process needs to be carried out with multiple filtering interpolation, and the cost function is repeatedly calculated, so the calculation complexity is very high. Experiments show that after the sub-pixel precision is achieved, the calculation cost of the motion search process is often more than one time of the original whole pixel search. Moreover, the accuracy of matching depends on the precision of the interpolation algorithm, and influences the coding efficiency to a certain extent. The invention provides a novel search algorithm, which predicts and searches motion vectors by utilizing phase correlation in a frequency domain, almost does not need interpolation calculation in the sub-pixel search process, does not need a calculation cost function, can greatly reduce the calculation expense brought by the space domain search algorithm, and is suitable for an embedded platform needing video content service.
1.1.1 frequency Domain phase and object space translation
As is well known, in the fourier transform domain, the change in phase corresponds to a translation of the object in the time/space domain:
F{x(s-τ)}=e-jwτF{x(s)} (1)
in equation (1), F {. cndot } represents the fourier transform of the discrete signal, and s represents the spatial displacement (if in the time domain, t is substituted, only the spatial domain is described below). By this property of fourier transformation, motion information in the spatial domain can be easily resolved in the frequency domain. In video coding schemes, if fourier transform is used, it becomes very convenient and accurate to search for motion information in the frequency domain. However, the fourier transform has poor energy convergence performance, and the spatial redundancy cannot be effectively removed after the transform, which makes the fourier transform inapplicable to practical video coding algorithms. At present, DCT transformation is generally adopted by each video coding standard, has energy convergence performance close to K-L transformation, can concentrate most energy on direct current and low frequency parts, and can ensure image quality under high compression ratio after passing through a low-pass filter. In view of this, the invention adopts DCT to realize time-frequency transformation, and calculates the spatial translation from the phase of DCT domain, because of the particularity of DCT, there is no simple corresponding relation in DCT domain like Fourier.
Suppose there is a one-dimensional discrete signal (x)1(n)|n∈[0,N-1]N is the size of the search window, and after moving m to the right, a signal x is formed2(n)|n∈[0,N-1]}:
<math> <mrow> <msub> <mi>x</mi> <mn>2</mn> </msub> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfenced open='{' close=''> <mtable> <mtr> <mtd> <msub> <mi>x</mi> <mn>1</mn> </msub> <mrow> <mo>(</mo> <mi>n</mi> <mo>-</mo> <mi>m</mi> <mo>)</mo> </mrow> <mo>,</mo> <mi>n</mi> <mo>≥</mo> <mi>m</mi> </mtd> </mtr> <mtr> <mtd> <mn>0</mn> <mo>,</mo> <mi>n</mi> <mo><</mo> <mi>m</mi> </mtd> </mtr> </mtable> </mfenced> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>2</mn> <mo>)</mo> </mrow> </mrow> </math>
According to[2]The following DCT and DST transforms are defined:
<math> <mrow> <msubsup> <mi>X</mi> <mn>2</mn> <mi>C</mi> </msubsup> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mn>2</mn> <mi>N</mi> </mfrac> <mi>C</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <munderover> <mi>Σ</mi> <mrow> <mi>n</mi> <mo>=</mo> <mn>0</mn> </mrow> <mrow> <mi>N</mi> <mo>-</mo> <mn>1</mn> </mrow> </munderover> <msub> <mi>x</mi> <mn>2</mn> </msub> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> <mi>cos</mi> <mrow> <mo>(</mo> <mfrac> <mi>kπ</mi> <mi>N</mi> </mfrac> <mrow> <mo>(</mo> <mi>n</mi> <mo>+</mo> <mn>0.5</mn> <mo>)</mo> </mrow> <mo>)</mo> </mrow> <mo>,</mo> <mi>k</mi> <mo>∈</mo> <mo>[</mo> <mn>0</mn> <mo>,</mo> <mi>N</mi> <mo>-</mo> <mn>1</mn> <mo>]</mo> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>3</mn> <mo>)</mo> </mrow> </mrow> </math>
<math> <mrow> <msubsup> <mi>X</mi> <mn>2</mn> <mi>S</mi> </msubsup> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mn>2</mn> <mi>N</mi> </mfrac> <mi>C</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <munderover> <mi>Σ</mi> <mrow> <mi>n</mi> <mo>=</mo> <mn>0</mn> </mrow> <mrow> <mi>N</mi> <mo>-</mo> <mn>1</mn> </mrow> </munderover> <msub> <mi>x</mi> <mn>2</mn> </msub> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> <mi>sin</mi> <mrow> <mo>(</mo> <mfrac> <mi>kπ</mi> <mi>N</mi> </mfrac> <mrow> <mo>(</mo> <mi>n</mi> <mo>+</mo> <mn>0.5</mn> <mo>)</mo> </mrow> <mo>)</mo> </mrow> <mo>,</mo> <mi>k</mi> <mo>∈</mo> <mo>[</mo> <mn>0</mn> <mo>,</mo> <mi>N</mi> <mo>-</mo> <mn>1</mn> <mo>]</mo> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>4</mn> <mo>)</mo> </mrow> </mrow> </math>
<math> <mrow> <msubsup> <mi>Z</mi> <mn>1</mn> <mi>C</mi> </msubsup> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mn>2</mn> <mi>N</mi> </mfrac> <mi>C</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <munderover> <mi>Σ</mi> <mrow> <mi>n</mi> <mo>=</mo> <mn>0</mn> </mrow> <mrow> <mi>N</mi> <mo>-</mo> <mn>1</mn> </mrow> </munderover> <msub> <mi>x</mi> <mn>1</mn> </msub> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> <mi>cos</mi> <mrow> <mo>(</mo> <mfrac> <mi>kπ</mi> <mi>N</mi> </mfrac> <mi>n</mi> <mo>)</mo> </mrow> <mo>,</mo> <mi>k</mi> <mo>∈</mo> <mo>[</mo> <mn>0</mn> <mo>,</mo> <mi>N</mi> <mo>-</mo> <mn>1</mn> <mo>]</mo> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>5</mn> <mo>)</mo> </mrow> </mrow> </math>
<math> <mrow> <msubsup> <mi>Z</mi> <mn>1</mn> <mi>S</mi> </msubsup> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mn>2</mn> <mi>N</mi> </mfrac> <mi>C</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <munderover> <mi>Σ</mi> <mrow> <mi>n</mi> <mo>=</mo> <mn>0</mn> </mrow> <mrow> <mi>N</mi> <mo>-</mo> <mn>1</mn> </mrow> </munderover> <msub> <mi>x</mi> <mn>1</mn> </msub> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> <mi>sin</mi> <mrow> <mo>(</mo> <mfrac> <mi>kπ</mi> <mi>N</mi> </mfrac> <mi>n</mi> <mo>)</mo> </mrow> <mo>,</mo> <mi>k</mi> <mo>∈</mo> <mo>[</mo> <mn>0</mn> <mo>,</mo> <mi>N</mi> <mo>-</mo> <mn>1</mn> <mo>]</mo> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>6</mn> <mo>)</mo> </mrow> </mrow> </math>
in the above formula, the first and second carbon atoms are,
it is readily demonstrated that these four transformations satisfy the following equations:
wherein, <math> <mrow> <msubsup> <mi>g</mi> <mi>m</mi> <mi>S</mi> </msubsup> <mo>=</mo> <mi>sin</mi> <mrow> <mo>(</mo> <mrow> <mo>(</mo> <mi>kπ</mi> <mo>/</mo> <mi>N</mi> <mo>)</mo> </mrow> <mrow> <mo>(</mo> <mi>m</mi> <mo>+</mo> <mn>0.5</mn> <mo>)</mo> </mrow> <mo>)</mo> </mrow> <mo>,</mo> <msubsup> <mi>g</mi> <mi>m</mi> <mi>C</mi> </msubsup> <mo>=</mo> <mi>cos</mi> <mrow> <mo>(</mo> <mrow> <mo>(</mo> <mi>kπ</mi> <mo>/</mo> <mi>N</mi> <mo>)</mo> </mrow> <mrow> <mo>(</mo> <mi>m</mi> <mo>+</mo> <mn>0.5</mn> <mo>)</mo> </mrow> <mo>)</mo> </mrow> <mo>.</mo> </mrow> </math> we see that these two variables belonging to the frequency domain contain translation information m. At a known signal x1(n)、x2(n) in case of finding a fast algorithm to solve for gm C、gm SAnd m is extracted from the motion vector, so that the motion search of the DCT domain can be realized.
Rewriting the equation in (8) to <math> <mrow> <mover> <mi>X</mi> <mo>→</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <mi>Z</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mover> <mi>Ω</mi> <mo>→</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>.</mo> </mrow> </math> It can be shown that z (k) is an orthogonal matrix and has:
λZT(k)Z(k)=I2 (9)
I2is a 2 x 2 identity matrix. In this way, we can solve the equation:
<math> <mrow> <mover> <mi>Ω</mi> <mo>→</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <mi>λ</mi> <msup> <mi>Z</mi> <mi>T</mi> </msup> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mover> <mi>X</mi> <mo>→</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>10</mn> <mo>)</mo> </mrow> </mrow> </math>
thereby can solve gm C、gm S。
The orthogonal law according to the sine function has the following law[4]:
<math> <mrow> <munderover> <mi>Σ</mi> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>N</mi> </munderover> <msup> <mi>C</mi> <mn>2</mn> </msup> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mi>sin</mi> <mrow> <mo>(</mo> <mfrac> <mi>kπ</mi> <mi>N</mi> </mfrac> <mrow> <mo>(</mo> <mi>m</mi> <mo>+</mo> <mn>0.5</mn> <mo>)</mo> </mrow> <mo>)</mo> </mrow> <mi>sin</mi> <mrow> <mo>(</mo> <mfrac> <mi>kπ</mi> <mi>N</mi> </mfrac> <mrow> <mo>(</mo> <mi>n</mi> <mo>+</mo> <mn>0.5</mn> <mo>)</mo> </mrow> <mo>)</mo> </mrow> <mo>=</mo> <mi>δ</mi> <mrow> <mo>(</mo> <mi>m</mi> <mo>-</mo> <mi>n</mi> <mo>)</mo> </mrow> <mo>-</mo> <mi>δ</mi> <mrow> <mo>(</mo> <mi>m</mi> <mo>+</mo> <mi>n</mi> <mo>+</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>11</mn> <mo>)</mo> </mrow> </mrow> </math>
<math> <mrow> <munderover> <mi>Σ</mi> <mrow> <mi>k</mi> <mo>=</mo> <mn>0</mn> </mrow> <mrow> <mi>N</mi> <mo>-</mo> <mn>1</mn> </mrow> </munderover> <msup> <mi>C</mi> <mn>2</mn> </msup> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mi>cos</mi> <mrow> <mo>(</mo> <mfrac> <mi>kπ</mi> <mi>N</mi> </mfrac> <mrow> <mo>(</mo> <mi>m</mi> <mo>+</mo> <mn>0.5</mn> <mo>)</mo> </mrow> <mo>)</mo> </mrow> <mi>cos</mi> <mrow> <mo>(</mo> <mfrac> <mi>kπ</mi> <mi>N</mi> </mfrac> <mrow> <mo>(</mo> <mi>n</mi> <mo>+</mo> <mn>0.5</mn> <mo>)</mo> </mrow> <mo>)</mo> </mrow> <mo>=</mo> <mi>δ</mi> <mrow> <mo>(</mo> <mi>m</mi> <mo>-</mo> <mi>n</mi> <mo>)</mo> </mrow> <mo>+</mo> <mi>δ</mi> <mrow> <mo>(</mo> <mi>m</mi> <mo>+</mo> <mi>n</mi> <mo>+</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>12</mn> <mo>)</mo> </mrow> </mrow> </math>
Where δ (n) is a discrete impulse function.
According to the formulas (8) and (10-12), we can obtain:
<math> <mrow> <mfrac> <mn>2</mn> <mi>N</mi> </mfrac> <munderover> <mi>Σ</mi> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>N</mi> </munderover> <msup> <mi>C</mi> <mn>2</mn> </msup> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <msubsup> <mi>g</mi> <mi>m</mi> <mi>S</mi> </msubsup> <mi>sin</mi> <mrow> <mo>(</mo> <mfrac> <mi>kπ</mi> <mi>N</mi> </mfrac> <mrow> <mo>(</mo> <mi>n</mi> <mo>+</mo> <mn>0.5</mn> <mo>)</mo> </mrow> <mo>)</mo> </mrow> <mo>=</mo> <mi>δ</mi> <mrow> <mo>(</mo> <mi>m</mi> <mo>-</mo> <mi>n</mi> <mo>)</mo> </mrow> <mo>-</mo> <mi>δ</mi> <mrow> <mo>(</mo> <mi>m</mi> <mo>+</mo> <mi>n</mi> <mo>+</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>13</mn> <mo>)</mo> </mrow> </mrow> </math>
<math> <mrow> <mfrac> <mn>2</mn> <mi>N</mi> </mfrac> <munderover> <mi>Σ</mi> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>N</mi> </munderover> <msup> <mi>C</mi> <mn>2</mn> </msup> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <msubsup> <mi>g</mi> <mi>m</mi> <mi>C</mi> </msubsup> <mi>cos</mi> <mrow> <mo>(</mo> <mfrac> <mi>kπ</mi> <mi>N</mi> </mfrac> <mrow> <mo>(</mo> <mi>n</mi> <mo>+</mo> <mn>0.5</mn> <mo>)</mo> </mrow> <mo>)</mo> </mrow> <mo>=</mo> <mi>δ</mi> <mrow> <mo>(</mo> <mi>m</mi> <mo>-</mo> <mi>n</mi> <mo>)</mo> </mrow> <mo>+</mo> <mi>δ</mi> <mrow> <mo>(</mo> <mi>m</mi> <mo>+</mo> <mi>n</mi> <mo>+</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>14</mn> <mo>)</mo> </mrow> </mrow> </math>
analyzing formula (13), when m is larger than 0 and is located in the search window [0, N ], a positive δ response can be always found at N ═ m, and a negative δ response is found at N ═ m-1; when m < 0 and is located in the negative mirror image of the search window [ -N, 0), it is always possible to find a negative delta response at N ═ m, while finding a positive delta response at N ═ m-1. As shown in fig. 3, the gray area is a search window, and when a positive δ response is found in the search window, it means that the object has rightward translation and the motion displacement is s; when a negative delta response is found in the search window, it means that the object has a translation to the left and the motion displacement is s-1. See fig. 3(a) delta response of the object translated to the right by s and fig. 3(b) delta response of the object translated to the left by s-1. FIG. 4 is a schematic view of the spatial location of sub-pixels.
In the specific calculation, theInstead of the former <math> <mrow> <mfrac> <mn>2</mn> <mi>N</mi> </mfrac> <munderover> <mi>Σ</mi> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>N</mi> </munderover> <msup> <mi>C</mi> <mn>2</mn> </msup> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <msubsup> <mi>g</mi> <mi>m</mi> <mi>S</mi> </msubsup> <mi>sin</mi> <mrow> <mo>(</mo> <mfrac> <mi>kπ</mi> <mi>N</mi> </mfrac> <mrow> <mo>(</mo> <mi>n</mi> <mo>+</mo> <mn>0.5</mn> <mo>)</mo> </mrow> <mo>)</mo> </mrow> <mo>,</mo> </mrow> </math> To reduce computational complexity.
1.1.2 frequency domain sub-pixel search algorithm flow
Based on the above derivation, the flow of the frequency domain-based sub-pixel search algorithm is as follows:
1) determining the search window as N, extracting in the x direction to refer to the imageOne-dimensional signal x starting from a pixel point F1(n) and x of the corresponding position in the current image2(n)。
2) According to the formula (3-6), x is calculated1(n) and x2(n) four discrete DCT/DST transform coefficients.
3) Is calculated at [1, N]Interval gm SObtained from the following formulae (3 to 6) and (8):
<math> <mrow> <msubsup> <mi>g</mi> <mi>m</mi> <mi>S</mi> </msubsup> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfenced open='{' close='' separators=' '> <mtable> <mtr> <mtd> <mn>1</mn> <mo>,</mo> <mi>k</mi> <mo>=</mo> <mi>N</mi> </mtd> </mtr> <mtr> <mtd> <mrow> <mo>(</mo> <msubsup> <mi>Z</mi> <mn>1</mn> <mi>C</mi> </msubsup> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>·</mo> <msubsup> <mi>X</mi> <mn>2</mn> <mi>S</mi> </msubsup> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>-</mo> <msubsup> <mi>Z</mi> <mn>1</mn> <mi>S</mi> </msubsup> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>·</mo> <msubsup> <mi>X</mi> <mn>2</mn> <mi>C</mi> </msubsup> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>)</mo> </mrow> <mo>/</mo> <mrow> <mo>(</mo> <msup> <mrow> <mo>(</mo> <msubsup> <mi>Z</mi> <mn>1</mn> <mi>C</mi> </msubsup> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>)</mo> </mrow> <mn>2</mn> </msup> <mo>+</mo> <msup> <mrow> <mo>(</mo> <msubsup> <mi>Z</mi> <mn>1</mn> <mi>S</mi> </msubsup> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>)</mo> </mrow> <mn>2</mn> </msup> <mo>)</mo> </mrow> <mo>,</mo> <mi>k</mi> <mo>∈</mo> <mo>[</mo> <mn>1</mn> <mo>,</mo> <mrow> <mi>N</mi> <mo>)</mo> </mrow> </mtd> </mtr> </mtable> <mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>15</mn> <mo>)</mo> </mrow> </mrow> </mfenced> </mrow> </math>
4) root of herbaceous plantThe translation direction d in the x direction is obtained from equation (13)xAnd a displacement sx。
5) Repeating the above steps in the y-direction yields d in the y-directiony、sy。
6) Carrying parameter mx、myLooking up table 1, the matching points in fig. 4 are determined and the half-pel motion vector is determined.
TABLE 1m and motion vectors
|
mx |
myMatching point motion vectors |
|
>0>0>0<0<0<0=0=0 |
>0 3 (0.5,0.5)<0 8 (0.5,-0.5)=0 5 (0.5,0)>0 1 (-0.5,0.5)<0 6 (-0.5,-0.5)=0 4 (-0.5,0)>0 2 (0,0.5)<0 7 (0,-0.5) |
|
=0 |
=0 F (0,0) |
7) If motion vectors of 1/4 pixel accuracy are required, the steps 1) -6) are repeated on the resulting pixel block using bilinear filter interpolation with the motion vectors obtained in 6).
FIG. 5 is a comparison of the computational complexity, i.e., computational performance, of the algorithm herein in a sub-pel search with a full search algorithm under various standard test sequences. Since the image composition of each test sequence is different and the computing environment is different, for convenience, the computation complexity of the full search algorithm in each test sequence is set to be 1 as a comparison reference.
The full name of DCT Transform is Discrete Cosine Transform (Discrete Cosine Transform), which refers to converting a set of light intensity data into frequency data so as to know the intensity variation. If the high-frequency data is modified and then converted back to the original form, the data is obviously different from the original data, but the human eyes cannot easily recognize the data. When compressing, the original image data is divided into 8 × 8 data cell matrices, for example, the first matrix of luminance values has the following contents:
y00 y01 y02 y03 y04 y05 y06 y07
y10 y11 y12 y13 y14 y15 y16 y17
y20 y21 y22 y23 y24 y25 y26 y27
y30 y31 y32 y33 y34 y35 y36 y37
y40 y41 y42 y43 y44 y45 y46 y47
y50 y51 y52 y53 y54 y55 y56 y57
y60 y61 y62 y63 y64 y65 y66 y67
y70 y71 y72 y73 y74 y75 y76 y77
JPEG refers to the entire luminance matrix and the chrominance Cb matrix, the saturation Cr matrix, as a basic unit called MCU. The number of matrices contained in each MCU must not exceed 10. For example, if the ratio of row and column samples is 4: 2, each MCU will contain four luminance matrices, one chrominance matrix and one saturation matrix.
After the image data is divided into 8-by-8 matrices, 128 values must be subtracted from each value, and then the subtracted values are substituted into a DCT transformation formula one by one, so that the DCT transformation can be achieved. The image data value must be subtracted by 128 because the DCT transformation formula accepts a number in the range of-128 to + 127.
DCT transform formula:
x, y represent the coordinate position of a value within the image data matrix. (x, y) represents a number of values within the image data matrix. V represents the coordinate position of a value in the DCT transformed matrix, and F (u, v) represents a value in the DCT transformed matrix.
u-0 and v-0 c (u) c (v) 1/1.414
u > 0 or v > 0 c (u) c (v) 1
The natural number of the matrix data after DCT transform is the frequency coefficient, the coefficient is the largest with the value of F (0, 0) and is called DC, and the other 63 frequency coefficients are mostly positive and negative floating point numbers close to 0 and are generally called AC.
1.1.3 nodules
For video coding, motion search is performed in the frequency domain, which is advantageous not only in improving the performance of the motion search step itself. In a traditional video coding system, an encoder needs to perform repeated conversion between a spatial domain and a frequency domain, an algorithm based on the spatial domain is used during motion search, and the algorithm needs to be performed in the frequency domain during residual coding, so that the energy of a coefficient is concentrated in a low-frequency region, and the quantization is convenient. Frequent switching between the spatial domain and the frequency domain is quite resource consuming, and after the motion search algorithm in the frequency domain is adopted, the video encoder will complete all computations in the frequency domain, and the encoding process is as shown in fig. 6. Compared with the video coding process of searching motion vectors in the spatial domain, the quantization, storage and motion search of the sample signal after DCT conversion in the figure 6 are all completed in the frequency domain, which not only reduces the inverse DCT conversion step in the spatial domain coding process, but also more effectively reduces the space required by storage, and is beneficial to the optimization of the encoder and the decoder.
1.2 fast selection algorithm for intra prediction mode
1.2.1 Intra coding prediction modes
If there is not a strong temporal correlation between the current picture and the previous input picture, the frame picture is typically encoded as an I-frame, using an intra-coding mode. In the conventional video coding standard, an I-frame image is directly coded without using a prediction technology, namely macroblock data is directly transformed, quantized and coded and transmitted, so that the data volume of the I-frame image after coding is very large. In order to more effectively improve the coding efficiency, the video coding system of the invention fully utilizes the spatial redundancy among the pixels in the image and defines 16 × 16 and 4 × 4 prediction units. Distribution of pixels and surrounding pixels in fig. 74 × 4
In the intra prediction module of the present invention, if the current macroblock coding mode is intra coding, the prediction value of the macroblock is from the neighboring coded reconstructed macroblock. The luminance component may use a 16 × 16 macroblock or a 4 × 4 small block as a basic unit of intra prediction encoding. When a 16 × 16 macroblock is used as a coding unit, 4 prediction modes are available for selection; when 4 × 4 small blocks are used as a coding unit, there are 9 prediction modes available for selection. Two chroma components use 8 × 8 macro block as the basic unit of intra prediction coding, 4 prediction modes are available for selection, and the modes selected by the two chroma components must be the same. Since 4 x 4 small blocks are more elaborate, the computational complexity is mainly reflected in this unit.
The distribution of pixels in a 4 x 4 tile and surrounding pixels is shown in fig. 7, where the lower case english letters a to p denote the 16 pixels inside the tile and the upper case letters a to M denote the pixels surrounding the tile. Intra 4 × 4 uses 9 modes for prediction, where mode 2 is DC prediction and the remaining prediction mode directions are as shown in fig. 8 for intra 4 × 4 prediction modes. For example, if mode 1 is selected for horizontal prediction, the predicted value in the small block is from pixel I, J, K, L.
1.2.2 fast intra prediction coding mode selection algorithm
The intra-frame prediction mode selection algorithm provided by the invention utilizes the boundary direction histogram, the context model and the prediction coding mode of the small block at the same position of the previous frame to quickly select the available candidate prediction mode, carries out pre-coding according to the pre-selection mode and then utilizes the Lagrangian cost function to select the optimal prediction mode. To further reduce the amount of computation, the raw data is sub-sampled before the boundary direction vectors are computed. Taking intra 4 × 4 as an example, the fast intra prediction mode selection process is shown in fig. 9, which is a flow chart of fast intra 4 × 4 prediction mode selection, and each part of the process will be described separately below.
1.2.2.1 Pixel sub-sampling
The input original pixel data is sub-sampled 2: 1, the number of the sampled pixels is 1/2 of the original pixel number, and the time consumed for calculating the boundary direction vector of the sampled pixels is about 1/2. The pixel sub-sampling method employed herein is illustrated by the sub-sampling of the 104 x 4 patches, where in the sub-sampled image the filled circles represent the available sampled pixels.
1.2.2.2 mode selection based on boundary direction
The nature of images that are spatially continuous and correlated, and the pixels that make up an image have correlation in 8 prediction directions in space, can be exploited to reduce spatial redundancy, and if the direction with the strongest correlation can be found and the values of the pixels are encoded using intra prediction, the best effect of intra coding can be achieved. Sobel operator is used herein[3~5]To calculate the boundary direction vector of the sub-sampled pixel, the Sobel operator is And for calculating the horizontal and vertical components of the boundary vector, respectively.
For a sub-sampled pixel pi, j, the corresponding boundary vector is <math> <mrow> <msub> <mover> <mi>D</mi> <mo>→</mo> </mover> <mrow> <mi>i</mi> <mo>,</mo> <mi>j</mi> </mrow> </msub> <mo>=</mo> <mo>{</mo> <msub> <mi>dx</mi> <mrow> <mi>i</mi> <mo>,</mo> <mi>j</mi> </mrow> </msub> <mo>,</mo> <msub> <mi>dy</mi> <mrow> <mi>i</mi> <mo>,</mo> <mi>j</mi> </mrow> </msub> <mo>}</mo> <mo>,</mo> </mrow> </math> dxi, j and dyi, j represent the boundary vector levels, respectivelyAnd a vertical direction component. dxi j and dyi, j, as shown in equation 1, where pi-1,j+1Etc. refer to the neighboring pixels of the pixel pi, j in the original image.
dxi,j=pi-1,j+1+2×pi,j+1+pi+1,j+1-pi-1,j-1-2×pi,j-1-pi+1,j-1
dyi,j=pi+1,j-1+2×pi+1,j+pi+1,j+1-pi-1,j-1-2×pi-1,j-pi-1,j+1 (1)
For ease of calculation, the norm defining the boundary direction vector is:
<math> <mrow> <mi>Amp</mi> <mrow> <mo>(</mo> <msub> <mover> <mi>D</mi> <mo>→</mo> </mover> <mrow> <mi>i</mi> <mo>,</mo> <mi>j</mi> </mrow> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mo>|</mo> <msub> <mi>dx</mi> <mrow> <mi>i</mi> <mo>,</mo> <mi>j</mi> </mrow> </msub> <mo>|</mo> <mo>+</mo> <mo>|</mo> <msub> <mi>dy</mi> <mrow> <mi>i</mi> <mo>,</mo> <mi>j</mi> </mrow> </msub> <mo>|</mo> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>2</mn> <mo>)</mo> </mrow> </mrow> </math>
the direction of the boundary direction vector is:
the moduli of the vectors in the same direction in the small block are added to obtain a corresponding edge direction histogram (edge direction histogram), the establishment of the 4 × 4 edge direction histogram in the frame is shown in the following formula 3, and the direction with the largest modulus in the direction histogram is selected as a candidate prediction direction.
<math> <mrow> <mi>Histo</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <munder> <mi>Σ</mi> <mrow> <mrow> <mo>(</mo> <mi>m</mi> <mo>,</mo> <mi>n</mi> <mo>)</mo> </mrow> <mo>∈</mo> <mi>SET</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> </mrow> </munder> <mi>Amp</mi> <mrow> <mo>(</mo> <msub> <mover> <mi>D</mi> <mo>→</mo> </mover> <mrow> <mi>m</mi> <mo>,</mo> <mi>n</mi> </mrow> </msub> <mo>)</mo> </mrow> <mo>,</mo> </mrow> </math>
<math> <mrow> <mi>SET</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>∈</mo> <mo>{</mo> <mrow> <mo>(</mo> <mi>i</mi> <mo>,</mo> <mi>j</mi> <mo>)</mo> </mrow> <mo>|</mo> <mi>Ang</mi> <mrow> <mo>(</mo> <msub> <mover> <mi>D</mi> <mo>→</mo> </mover> <mrow> <mi>i</mi> <mo>,</mo> <mi>j</mi> </mrow> </msub> <mo>)</mo> </mrow> <mo>∈</mo> <msub> <mi>a</mi> <mi>u</mi> </msub> <mo>}</mo> <mo>,</mo> </mrow> </math>
while
a0=(-103.30,-76.60]
a1=(-13.30,13.30]
a3=(35.80,54.20]
a4=(-54.20,-35.80]
a5=(-76.70,-54.20]
a6=(-35.80,-13.30]
a7=(54.20,-76.70]
a8=(13.30,35.80]
(3)
1.2.2.3 mode selection based on context model
There is spatial correlation between the small blocks of an image, so that the coding mode of the current small block can be predicted by using the coding modes of the adjacent small blocks. As shown in fig. 11, C denotes a current 4 × 4 tile, and a and B denote a 4 × 4 tile on the top of the current block and a 4 × 4 tile on the left of the current block. And using the maximum value of the A and B prediction modes as the candidate prediction mode of the current block. The current block and the neighboring blocks as shown in fig. 11
1.2.2.4 State mode selection based on blocks in the same position in the previous frame image
According to the coding mode of the 4 × 4 small block at the corresponding position of the current small block in the previous frame image, if the corresponding small block of the previous frame image is the used intra-frame coding mode, the coding mode of the corresponding small block in the previous frame image is selected as the candidate coding mode of the current 4 × 4 small block, as shown in fig. 12, which is a schematic diagram of the 4 × 4 small block at the same position of the current 4 × 4 small block and the previous frame.
1.2.2.5 precoding and Performance comparison
The pre-coding uses the pixels around the current small block, and the current small block is subjected to predictive coding in sequence according to the selected candidate prediction mode, and the optimal prediction mode is selected by using a Lagrangian cost function, wherein the Lagrangian cost function is as follows:
J(s,c,IMODE|QP,λMODE)=SSD(s,c,IMODE|QP)+λMODE·R(s,c,IMODE|QP) (4)
wherein, IMODE refers to several optional prediction directions for intra-frame prediction, SSD refers to the sum of mean square errors between an original pixel value s and a reconstructed pixel value c of 4 × 4 in a frame, R (s, c, IMODE | QP) refers to coding by using an IMODE mode, and the size of a code stream to be coded uses variable length Huffman coding. The peak signal-to-noise ratio (PSNR) is used for quality detection in video coding, and equation (5) is the equation for peak signal-to-noise ratio:
1.2.3 results of the experiment
The test sequences used in the experiment were Mobile, Tempete, Bus, Paris, size QCIF, while only the luminance component was tested. The test results are shown in table 2.
TABLE 2 variation of coding Performance under different test sequences
| Test sequence | Change in the first I-frame image encoding time (%) | Average change in bit rate of images per frame in sequence (%) | Change in average per-frame image encoding time in sequence (%) | Variation of image PSNR (dB) |
|
Mobile |
-70.25 |
0.12 |
-33.56 |
-0.016 |
|
Tempete |
-69.78 |
0.26 |
-32.14 |
-0.014 |
|
Bus |
-69.58 |
0.39 |
-24.34 |
-0.024 |
|
Paris |
-71.03 |
0.42 |
-31.76 |
-0.021 |
2 System software composition block diagram (FIG. 13 System software composition block diagram)
In the software system of the system, the most core module is the video encoding and decoding device, and the two parts are the main bodies of the whole software architecture and are the biggest innovation of the invention. The video conference system designed by the invention uses RTP/RTCP protocol to transmit video and voice data. The real-time transport protocol (RTP) is responsible for packing and sending media data, and the RTCP is responsible for communicating the sending and receiving parties of video and voice data streams and transmitting feedback information and time synchronization information.
3 the system hardware is composed into a block diagram (shown in figure 14), and the system adopts an embedded design.
In a word, the video conference is a market which is rapidly growing, but as the industrial standards are not completely unified, the western countries cannot achieve monopoly on the core technology, and China is facing great opportunities and is expected to develop and develop pictures in the field. At present, products such as MCU, gatekeeper and the like of a part of video conference network equipment produced in China are internationally in the state of advanced technology and even leading, but for terminal equipment products of video conferences, China still lacks competitive products, and the market is almost completely occupied by foreign products. The AVCS-II video conference system developed by the applied physics research institute of Nanjing university is a new attempt and breakthrough in the field of video conference terminal products, especially in the video codec technology, in our country to a certain extent, and is expected to open the domestic and foreign video conference markets. The frequency domain-based sub-pixel motion search algorithm provided by the invention is technically innovative, and experiments and practical use of users prove that the algorithm has high accuracy and extremely low computation complexity, and can be used for rapidly matching the optimal motion vector. In addition to a unique video coding system, the system designed by the invention provides a rich video conference tool set, so that a complete video and data interaction platform is constructed for users.