HK1140324B

HK1140324B - Digital signal coding apparatus, digital signal decoding apparatus, digital signal arithmetic coding method and digital signal arithmetic decoding method

Info

Publication number: HK1140324B
Application number: HK10106603.4A
Authority: HK
Inventors: 关口俊一; 山田悦久; 浅井光太郎
Original assignee: 三菱电机株式会社
Priority date: 2002-04-25
Filing date: 2010-07-07
Publication date: 2013-11-22

Description

Digital signal encoding device, digital signal decoding device, digital signal arithmetic encoding method, and digital signal arithmetic decoding method

This application is a divisional application of chinese patent application 03800516.6(PCT/JP03/04578) entitled "digital signal encoding device, digital signal decoding device, digital signal arithmetic encoding method, and digital signal arithmetic decoding method", filed on 10.4.2003.

The technical field is as follows:

the present invention relates to a digital signal encoding device, a digital signal decoding device, a digital signal arithmetic encoding method, and a digital signal arithmetic decoding method used in a video image compression encoding technique, a compressed video image data transmission technique, and the like.

Background art:

in the known international standard video image coding methods such as MPEG and ITU-T h.26x, huffman coding is adopted as entropy coding. Although huffman coding provides the most suitable coding performance when each information source symbol is required to be expressed as an independent code, there is a problem that if the shape of a signal such as a video signal locally varies, it cannot ensure the most suitable coding performance when the probability of occurrence of the information source symbol varies.

In this case, the following scheme may be adopted: dynamically adapting the probability of occurrence of each information source symbol, and integrating a plurality of symbols to be expressed by 1 word as arithmetic coding.

The idea of Arithmetic Coding is briefly described with reference to Mark Nelson, "arithmetric Coding + statistical modeling ═ Data compression part l-arithmetric Coding", dr. dobb's journal, February 1991. Here, consider that alphabetic characters are used as information sources of information source symbols, and consider that information called "BILLGATES" is arithmetically encoded.

At this time, the occurrence probability of each character is defined as shown in fig. 1. As shown in the value range of the figure, only one region defined on the probability number straight line of the interval of [0, 1] is determined.

Next, the encoding process is performed. First, although the character "B" is encoded, this corresponds to the range [0.2, 0.3] on the selected probability number line. Therefore, the character "B" is a value that corresponds to the upper limit (High) and the lower limit (Low) of a set of value ranges [0.2 and 0.3 ].

Then, in the encoding of "I", the value range [0.2, 0.3] selected in the encoding of "B" is changed to be the [0, 1] interval, and the [0.5, 0.6] interval is selected. In short, the arithmetic coding process is equivalent to the process of performing the intrusion of the range of the probability number straight line.

If this process is repeated for each character, the arithmetic encoding result of "BILLGATES" is expressed by the Low value <0.2572167752> at the time when the character "S" is encoded, as shown in fig. 2.

The decoding process is a process that can also be considered the reverse.

First, the value range assigned to the character on the straight line corresponding to the probability number is examined as the encoding result <0.2572167752> to obtain "B".

Then, <0.572167752> is obtained by performing a division operation in the value domain after subtracting the Low value of "B". As a result, the character "I" corresponding to the interval of [0.5, 0.6] can be decoded. Hereinafter, the process is repeated to decode "BILL GATES".

By the above processing, even a code of a very long message can be mapped to 1 codeword at the end if arithmetic coding is performed. However, in view of problems such as an infinite decimal point accuracy cannot be handled in actual implementation, and an increase in the operation load due to the necessity of multiplication and division in the encoding and decoding program, for example, it is contrived to perform a floating decimal point operation using an integer register as a word expression, approximate the Low value to a power of 2, and replace the multiplication and division with a shift operation. If arithmetic coding is used, it is desirable that the above procedure is suitable for entropy coding of the occurrence probability of the information source symbol. In particular, when the occurrence probability dynamically changes, the table in fig. 1 is updated as appropriate to follow the change in the occurrence probability, and higher coding efficiency than in huffman coding can be obtained.

Since the known digital signal arithmetic coding method and digital signal arithmetic decoding method are configured as described above, when transmitting an entropy-coded video image signal, it is common to transmit the video image signal in units of resynchronization capability (for example, in an MPEG-2 slice structure) in order to minimize video image aliasing due to transmission errors and to divide each frame of the video image into partial areas.

Therefore, in huffman coding, each symbol to be coded may be defined only by the word corresponding to the set as a transmission unit in order to be mapped to a word having an integer bit length, but in arithmetic coding, it is necessary not only to explicitly interrupt a special symbol of a coding program, but also to reset a learning process of the occurrence probability of the symbol up to that point once and to discharge a bit that can specify the code when restarting coding, and therefore there is a possibility that coding efficiency before and after the interruption may be reduced. Further, if the arithmetic coding process is performed such that the coding is performed without resetting the 1 video frame and the decoding process cannot be performed if the decoding process for a certain packet is performed without data of the previous packet in a case where the coding has to be divided into small units such as packet data at the time of transmission, there is a problem that the quality of the video image is significantly deteriorated in a case where a packet loss due to a transmission error, delay, or the like occurs.

Disclosure of Invention

The present invention has been made to solve the above-mentioned problems, and an object of the present invention is to provide a digital signal encoding device and a digital signal arithmetic encoding method capable of improving encoding efficiency of arithmetic encoding while securing error resistance.

The present invention is also directed to a digital signal decoding apparatus and a digital signal arithmetic decoding method that can correctly decode a signal even when the encoding apparatus continues encoding without resetting the arithmetic encoding state or the symbol occurrence probability learning state of the previous transmission unit.

When compressing a digital signal of a predetermined transmission unit by arithmetic coding, information indicating the arithmetic coding state at the time when coding of a certain transmission unit is completed is multiplexed as a part of data of a next transmission unit, or information indicating the learning state of the occurrence probability at the time when coding of a certain transmission unit is completed is multiplexed as a part of data of a next transmission unit by learning the occurrence probability by counting the occurrence frequency of a coded symbol while determining the occurrence probability based on the dependency relationship with signals included in 1 or more adjacent transmission units.

Therefore, since the encoding can be continued without resetting the previous arithmetic encoding state or symbol occurrence probability learning state, there is an effect that encoding capable of improving the encoding efficiency of arithmetic encoding can be performed while securing the error resistance.

The digital signal decoding device and the digital signal arithmetic decoding method of the present invention are configured to perform initialization of a decoding operation based on information representing an arithmetic coding state multiplexed as a part of transmission unit data at the start of decoding of a certain transmission unit, or to perform initialization using an occurrence probability of the transmission unit at the start of decoding of a certain transmission unit based on information representing a symbol occurrence probability learning state multiplexed as a part of the transmission unit data, and to determine an occurrence probability of a decoded symbol based on a dependency relationship with signals included in 1 or more adjacent transmission units and learn the occurrence probability based on an occurrence frequency of the decoded symbol to be able to count when decoding a compressed digital signal of a predetermined transmission unit.

Therefore, even when the encoding apparatus side continues encoding without resetting the arithmetic encoding state or the symbol occurrence probability learning state of the previous transmission unit, it has an effect of being correctly decodable.

Brief description of the drawings

Fig. 1 is an explanatory diagram showing the probability of occurrence of each character in the case where characters called "BILLGATES" are arithmetically encoded.

Fig. 2 is an explanatory diagram showing an arithmetic encoding result in a case where characters called "BILLGATES" are arithmetically encoded.

Fig. 3 is a block diagram showing a video image encoding device (digital signal encoding device) according to a first embodiment of the present invention.

Fig. 4 is a block diagram showing a video image decoding apparatus (digital signal decoding apparatus) according to a first embodiment of the present invention.

Fig. 5 is a block diagram showing an internal configuration of the arithmetic coding unit 6 of fig. 3.

Fig. 6 is a flowchart showing the processing contents of the arithmetic coding unit 6 of fig. 5.

Fig. 7 is an explanatory diagram showing an example of the context model.

Fig. 8 is an explanatory diagram showing an example of a context model for motion vectors.

Fig. 9 is an explanatory diagram illustrating a segment configuration.

Fig. 10 is an explanatory diagram showing an example of the bit stream generated by the arithmetic coding unit 6.

Fig. 11 is an explanatory diagram showing an example of another bit stream generated by the arithmetic coding unit 6.

Fig. 12 is an explanatory diagram showing an example of another bit stream generated by the arithmetic coding unit 6.

Fig. 13 is a block diagram showing an internal configuration of the arithmetic decoding unit 27 of fig. 4.

Fig. 14 is a flowchart showing the processing contents of the arithmetic decoding unit 27 of fig. 13.

Fig. 15 is a diagram showing the internal configuration of the arithmetic coding unit 6 in the second embodiment.

Fig. 16 is a flowchart showing the processing contents of the arithmetic coding unit 6 of fig. 15.

Fig. 17 is an explanatory diagram illustrating a learning state of the vein model.

Fig. 18 is an explanatory diagram showing an example of the bit data stream generated by the arithmetic coding unit 6 according to the second embodiment.

Fig. 19 is a diagram showing the internal configuration of the arithmetic decoding unit 27 according to the second embodiment.

Fig. 20 is a flowchart showing the processing contents of the arithmetic decoding unit 27 of fig. 19.

Fig. 21 is an explanatory diagram showing an example of the bit data stream generated by the arithmetic coding unit 6 according to the third embodiment.

Detailed description of the invention

Hereinafter, the best mode for carrying out the present invention will be described in more detail with reference to the accompanying drawings.

Embodiment 1

In the first embodiment, an example of applying Arithmetic Coding to a video Image Coding method for performing Coding by equally dividing a video Image frame into rectangular regions (hereinafter, referred to as macroblocks) of 16 × 16 pixels is described Using an example disclosed in "video compression Using Context-Based Adaptive Arithmetic Coding" by d.

Fig. 3 is a block diagram showing a video encoding apparatus (digital signal encoding apparatus) according to a first embodiment of the present invention, in which a motion detection unit 2 detects a motion vector 5 in units of macroblocks from an input video signal 1 using a reference picture 4 stored in a frame memory 3 a. The motion compensation unit 7 obtains a temporal prediction image 8 based on the motion vector 5 detected by the motion detection unit 2. The subtractor 51 obtains the difference between the input video image signal 1 and the temporal predicted image 8, and outputs the difference as a temporal prediction residual signal 9.

The spatial prediction unit 10a generates a spatial prediction residual signal 11 by performing prediction from a spatial neighborhood within the same video image frame with reference to the input video image signal 1. The encoding mode determination unit 12 is configured from: a motion prediction mode for encoding the temporal prediction residual signal 9; skip mode as a case where the motion vector 5 is zero without temporal prediction residual signal 9 component; and an intra mode for encoding the spatial prediction residual signal 11, and selects a mode in which a corresponding macroblock can be most efficiently encoded, and outputs encoding mode information 13.

The orthogonal transform unit 15 performs orthogonal transform on the signal to be encoded selected by the encoding mode determination unit 12, and outputs orthogonal transform coefficient data. The quantization unit 16 quantizes the orthogonal transform coefficient data at a granularity indicated by the quantization step parameter 23 determined by the encoding control unit 22.

The inverse quantization unit 18 performs inverse quantization of the orthogonal transform coefficient data 17 output from the quantization unit 16 with the granularity indicated by the quantization step parameter 23. The inverse orthogonal transform unit 19 performs inverse orthogonal transform on the orthogonal transform coefficient data inverse-quantized by the inverse quantization unit 18. The switching unit 52 selects and outputs the temporal predicted image 8 output from the motion compensation unit 7 or the spatial predicted image 20 output from the spatial prediction unit 10a, based on the encoding mode information 13 output from the encoding mode determination unit 12. The adder 53 adds the output signal of the switching unit 52 and the output signal of the inverse orthogonal transform unit 19 to generate a local decoded picture 21, and stores the local decoded picture 21 as a reference picture 4 in the frame memory 3 a.

The arithmetic coding unit 6 performs entropy coding on the data to be coded, such as the motion vector 5, the coding mode information 13, the spatial prediction mode 14, and the orthogonal transform coefficient data 17, and outputs the result of the coding as video image compressed data 26 to the transmission buffer 24. The encoding control unit 22 controls the encoding mode determination unit 12, the quantization unit 16, the inverse quantization unit 18, and the like.

Fig. 4 is a block diagram showing a video image decoding apparatus (digital signal decoding apparatus) according to a first embodiment of the present invention, in which an arithmetic decoding unit 27 performs an entropy decoding process to decode: motion vector 5, coding mode information 13, spatial prediction mode 14, orthogonal transform coefficient data 17, quantization step parameter 23, and the like. The inverse quantization unit 18 inversely quantizes the orthogonal transform coefficient data 17 and the quantization step parameter 23 decoded by the arithmetic decoding unit 27. The inverse orthogonal transformation unit 19 performs inverse orthogonal transformation on the inversely quantized orthogonal transformation coefficient data 17 and the quantization step parameter 23, and performs local decoding.

The motion compensation unit 7 restores the temporal predicted image 8 using the motion vector 5 decoded by the arithmetic decoding unit 27. The spatial prediction unit 10b restores the spatial prediction image 20 from the spatial prediction mode 14 decoded by the arithmetic decoding unit 27.

The switching unit 54 selects and outputs the temporal predicted image 8 or the spatial predicted image 20 based on the coding mode information 13 decoded by the arithmetic decoding unit 27. The adder 55 adds the prediction residual signal, which is the output signal of the inverse orthogonal transform unit 19, to the output signal of the switching unit 54, and outputs the decoded image 21. The decoded picture 21 is stored in a frame memory 3b used for generating a predicted picture of the following frame.

Next, the operation will be described.

First, an outline of operations of the video encoding apparatus and the video decoding apparatus will be described.

(1) Operation outline of video image encoding device

The input video signal 1 is input in units of macroblocks into which each video image frame is divided, and the motion detection unit 2 of the video encoding apparatus detects a motion vector 5 in units of macroblocks by using a reference image 4 stored in a frame memory 3 a.

When the motion detector 2 detects the motion vector 5, the motion compensator 7 acquires the temporal predicted image 8 based on the motion vector 5.

Upon receiving the temporal predicted image 8 from the motion compensation unit 7, the subtractor 51 obtains the difference between the input video image signal 1 and the temporal predicted image 8, and outputs the difference to the encoding mode determination unit 12 as a temporal prediction residual signal 9.

On the other hand, when an input video image signal 1 is input, the spatial prediction unit 10a generates a spatial prediction residual signal 11 by performing prediction of a spatial neighborhood within the same video image frame with reference to the input video image signal 1.

The encoding mode determination unit 12 is configured from: a motion prediction mode for encoding the temporal prediction residual signal 9; skip mode as a case where the motion vector 5 is zero without a component of the temporal prediction residual signal 9; and an intra mode for encoding the spatial prediction residual signal 11, and selects a mode for encoding a corresponding macroblock with the best efficiency, and outputs the encoding mode information 13 to the arithmetic encoding unit 6. When the motion prediction mode is selected, the temporal prediction residual signal 9 is output to the orthogonal transform unit 15 as a signal to be encoded, and when the intra mode is selected, the spatial prediction residual signal 11 is output to the orthogonal transform unit 15 as a signal to be encoded.

When the motion prediction mode is selected, the motion vector 5 is output as encoding target information from the motion detection unit 2 to the arithmetic encoding unit 6, and when the intra mode is selected, the intra prediction mode 14 is output as encoding target information from the spatial prediction unit 10a to the arithmetic encoding unit 6.

Upon receiving the signal to be encoded from the encoding mode decision unit 12, the orthogonal transform unit 15 performs orthogonal transform on the signal to be encoded and outputs the orthogonal transform coefficient data to the quantization unit 16.

Upon receiving the orthogonal transform coefficient data from the orthogonal transform unit 15, the quantization unit 16 quantizes the orthogonal transform coefficient data at a granularity indicated by a quantization step parameter 23 determined by the encoding control unit 22.

The encoding control unit 22 adjusts the quantization step parameter 23 to balance the encoding rate and the quality. Generally, after arithmetic coding, the amount of encoded data stored in the transmission buffer 24 immediately before transmission is checked at regular intervals, and parameter adjustment of the quantization step parameter 23 is performed based on the buffer margin 25. For example, when there is a large amount of buffer margin 25, the coding rate may be suppressed, and when there is a margin in the buffer margin 25, the quality may be improved by increasing the coding rate.

Upon receiving the orthogonal transform coefficient data 17 from the quantization unit 16, the inverse quantization unit 18 performs inverse quantization of the orthogonal transform coefficient data 17 at the granularity indicated by the quantization step parameter 23.

The inverse orthogonal transform unit 19 performs inverse orthogonal transform on the orthogonal transform coefficient data inverse-quantized by the inverse quantization unit 18.

The switching unit 52 selects and outputs the temporal predicted image 8 output from the motion compensation unit 7 or the spatial predicted image 20 output from the spatial prediction unit 10a, based on the encoding mode information 13 output from the encoding mode determination unit 12. That is, when the encoding mode information 13 indicates the display motion prediction mode, the temporal prediction image 8 output from the motion compensation unit 7 is selected and output, and when the encoding mode information 13 indicates the display intra mode, the spatial prediction image 20 output from the spatial prediction unit 10a is selected and output.

The adder 53 adds the output signal of the switching unit 52 and the output signal of the inverse orthogonal transform unit 19 to generate the local decoded image 21. The local decoded picture 21 is stored in the frame memory 3a as the reference picture 4 so as to be used for motion prediction of the following frame.

The arithmetic coding unit 6 performs entropy coding of data to be coded, such as the motion vector 5, the coding mode information 13, the spatial prediction mode 14, and the orthogonal transform coefficient data 17, according to a program to be described later, and outputs the coding result as video image compressed data 26 to the transmission buffer 24.

(2) Outline of operation of video image decoding apparatus

Upon receiving the video image compressed data 26 from the video image encoding apparatus, the arithmetic decoding unit 27 performs entropy decoding processing, which will be described later, to decode the motion vector 5, the encoding mode information 13, the spatial prediction mode 14, the orthogonal transform coefficient data 17, the quantization step parameter 23, and the like.

The inverse quantization unit 18 performs inverse quantization on the orthogonal transform coefficient data 17 and the quantization step parameter 23 decoded by the arithmetic decoding unit 27, and the inverse orthogonal transform unit 19 performs inverse orthogonal transform on the inverse-quantized orthogonal transform coefficient data 17 and the quantization step parameter 23, and partially decodes the inverse-quantized orthogonal transform coefficient data.

When the coding mode information 13 decoded by the arithmetic decoding unit 27 indicates the motion prediction mode, the motion compensation unit 7 restores the temporal predicted image 8 using the motion vector 5 decoded by the arithmetic decoding unit 27.

When the coding mode information 13 decoded by the arithmetic decoding unit 27 indicates the intra mode, the spatial prediction unit 10b restores the spatial prediction image 20 from the spatial prediction mode 14 decoded by the arithmetic decoding unit 27.

Here, the difference between the spatial prediction unit 10a on the video encoding apparatus side and the spatial prediction unit 10b on the video decoding apparatus side is a difference between all the spatial prediction modes obtained in the former and includes the processing for selecting the spatial prediction mode 14 most efficiently, and the latter is the processing for generating only the spatial prediction image 20 from the provided spatial prediction mode 14.

The switching unit 54 selects the temporal predicted image 8 restored by the motion compensation unit 7 or the spatial predicted image 20 restored by the spatial prediction unit 10b based on the coding mode information 13 decoded by the arithmetic decoding unit 27, and outputs the selected image as a predicted image to the adder 55.

Upon receiving the predicted image from the switching unit 54, the adder 55 adds the predicted image to the prediction residual signal output from the inverse orthogonal transform unit 19 to obtain the decoded image 21.

The decoded picture 21 is stored in the frame memory 3b so as to be used for generating a predicted picture for the following frame. The difference between the frame memories 3a and 3b is that they are mounted in a video encoding device and a video decoding device, respectively.

(3) Arithmetic encoding/decoding process

The arithmetic encoding and decoding process, which is the gist of the present invention, will be described in detail below. The encoding process is executed in the arithmetic encoding unit 6 of fig. 3, and the decoding process is executed in the arithmetic decoding unit 27 of fig. 4.

Fig. 5 is a block diagram showing an internal configuration of the arithmetic coding unit 6 of fig. 3. In the figure, the arithmetic coding unit 6 includes: a context model determination unit 28 that determines a context model (described later) defined for each data type, such as the motion vector 5, the coding mode information 13, the spatial prediction mode 14, and the orthogonal transform coefficient data 17, which are the data to be coded; a binarization unit 29 for converting the n-carry data into binary data according to a binarization rule determined for each type of data to be encoded; an occurrence probability generating unit 30 for providing an occurrence probability of a value (0 or 1) of each binarized sequence bin; an encoding unit 31 that performs arithmetic encoding based on the generated occurrence probability; and a transmission unit generation unit 35 for notifying the timing of the interrupted arithmetic coding and constituting data as a transmission unit in the timing.

1) Vein model determination processing (step ST1)

The context model is a model in which the dependency relationship with other information that becomes a factor of variation in the occurrence probability of an information source (encoded) symbol is modeled, and the state of the occurrence probability is switched in accordance with the dependency relationship, so that encoding that is more adaptive to the actual occurrence probability of the symbol can be performed.

Fig. 7 is an explanatory diagram illustrating a concept of the vein model. In fig. 7, the information source symbol is referred to as a binary bit. The branch of ctx in fig. 7, 0 to 2, is defined by the fact that the branch changes depending on the situation in a state where the probability of occurrence of the information source symbol using the ctx is assumed.

In the video image encoding according to the first embodiment, the value of ctx can be switched according to the dependency relationship between the encoded data of a certain macroblock and the encoded data of the neighboring macroblocks.

Fig. 8 is an explanatory diagram showing an example of a Context model for motion vectors, and fig. 8 is an example of a Context model for motion vectors of macroblocks disclosed in "Video Compression Using Context-based adaptive Coding" of d.

In fig. 8, the motion vector of the block C is to be encoded, and the prediction difference value mvd of the motion vector of the block C is predicted from the neighborhood to be precise_k(C) And (5) encoding. And ctx _ mvd (C, k) is a context model.

Each in mvd_k(A) The difference value is predicted by the motion vector shown in block A and mvd_k(B) The motion vector prediction difference values displayed in the block B are used as the switching evaluation value e of the context model_k(C) The definition of (1).

Evaluation value e_k(C) The deviation of the nearby motion vector is displayed, and generally, when the deviation is small, the mvd will be present_k(C) Become smaller, conversely to e_k(C) For larger applications there will also be mvd_k(C) The tendency becomes larger.

Thus, mvd_k(C) Is preferably based on e_k(C) And is adapted. The change setting of the occurrence probability is a context model, and it can be said that there are 3 types of occurrence probability changes in this case.

In addition, context models are defined in advance for each of the encoding target data such as the encoding mode information 13, the spatial prediction mode 14, and the orthogonal transform coefficient data 17, and are shared by the arithmetic encoding unit 6 of the video encoding apparatus and the arithmetic decoding unit 27 of the video decoding apparatus. The context model determining unit 28 of the arithmetic coding unit 6 shown in fig. 5 performs a process of selecting a predetermined model based on the type of the data to be coded.

Note that, the processing of selecting an arbitrary occurrence probability fluctuation from among the context models is the occurrence probability generation processing corresponding to the following 3), and therefore the description is given here.

2) Binarization processing (step ST2)

The context model is determined by the bin (binary position) of the binary sequence by binarizing the data to be encoded in the binarization unit 29. The rule of binarization is to transform into a variable-length sequence of bins based on the approximate distribution of the values obtained for each encoded data. In binarization, the number of straight line divisions of the probability number can be reduced by performing arithmetic coding on the original multivalued data to be coded, and coding the data in bin units, as compared with the case where the data is directly subjected to arithmetic coding. Therefore, the model has the advantage of slimming the vein model.

3) Occurrence probability generation processing (step ST3)

In the processing procedures 1) and 2), the binarization of the multilevel coding target data and the setting of the context model to be applied to each bin are completed, and coding is prepared. Since each of the vein models includes variations in the occurrence probability of 0/1, the occurrence probability generator 30 refers to the vein model determined in step ST1 and executes the 0/1 occurrence probability generation process in each bin.

FIG. 8 shows evaluation values e selected as the occurrence probability_k(C) For example, the occurrence probability generating unit 30 determines e in fig. 8_k(C) The evaluation value shown as the choice of the occurrence probability is used to determine which occurrence probability variation is to be used for the current coding from the choice branch of the context model to be referred to.

4) Encoding processing (Steps ST 3-ST 7)

Since the probability of occurrence of 0/1 values on the straight line of the probability number required for the arithmetic coding processing program is obtained by 3), arithmetic coding is performed in the coding unit 31 according to the processing program exemplified in the conventional example (step ST 4).

The actual code value (0 or 1)32 is fed back to the occurrence probability generating unit 30, and the occurrence frequency is calculated 0/1 in order to update the occurrence probability fluctuation portion of the used context model (step ST 5).

For example, when the encoding process for 100 bins is executed using the variation in the occurrence probability in a specific context model, the occurrence probability of 0/1 in the variation in the occurrence probability is 0.25 and 0.75, respectively. Here, if 1 is encoded using the same change in occurrence probability, the occurrence frequency of 1 is updated, and the occurrence probability of 0/1 changes to 0.247 or 0.752. By this mechanism, efficient encoding adaptive to the actual occurrence probability can be performed.

The arithmetic code 33 of the new coded value (0 or 1)32 generated by the coding unit 31 is sent to the transmission unit generating unit 35, and is multiplexed as data constituting a transmission unit as described in the following 6) (step ST 6).

Then, it is determined whether or not the encoding process has ended for the entire binary sequence bin of 1 piece of data to be encoded (step ST7), and if not ended, the process returns to step ST3, and the process following the generation process of the occurrence probability in each bin is executed. On the other hand, if the end is reached, the process proceeds to the transmission unit generation process described below.

5) Transmission unit creation processing (Steps ST 8-ST 9)

Although arithmetic coding converts a sequence of a plurality of pieces of data to be coded into 1 character, since a video image signal is subjected to motion prediction between frames or display in units of frames, it is necessary to generate a decoded image in units of frames to update the inside of a frame memory. Therefore, it is necessary to clearly determine the gap in the unit of a frame in the compressed data that is arithmetically encoded, and further, for the purpose of multiplexing with other media such as audio and video, packet transmission, and the like, it is necessary to transmit the compressed data in a fine unit in a frame. In this case, a slice structure, that is, a unit in which a plurality of macroblocks are grouped in a zigzag scanning order, is generally given.

Fig. 9 is an explanatory diagram for explaining a segment structure.

The rectangle enclosed by the dotted lines corresponds to a macroblock. Generally, the slice structure is handled as a unit of resynchronization in decoding. As an extreme example, the fragment data is mapped as usual into a packet payload for IP transport. Rtp (real time Transport protocol) is often used for IP transmission of real-time media such as video images, which are not susceptible to transmission delay. Most of the cases, the RTP packet provides a time stamp to a header portion, and fragment data of a video image is mapped and transmitted in a loading portion. For example, in "RTPPanyloyad Format for MPEG-4 Audio/Visual Streams", RFC 3016, Kikuchi and others, there is defined a method of mapping MPEG-4 video image compression data into RTP packets in units of fragments (video image packets) of MPEG-4.

Since RTP is transmitted as UDP packets, there is generally no retransmission control, and when a packet loss occurs, fragment data may not be completely transmitted to the decoding apparatus. If the subsequent slice data is to be encoded according to the information of the discarded slice, the decoding device cannot decode the data normally even if the data is assumed to be normally delivered to the decoding device.

Thus, any segment needs to be normally decodable from its beginning regardless of any dependencies. For example, generally speaking, encoding using information of the macroblock group of Slice3 located at the top and Slice4 located at the left is not performed when encoding of Slice5 is performed.

On the other hand, in order to improve the efficiency of arithmetic coding, it is preferable to adapt the probability of occurrence of a symbol based on the surrounding situation or to continue the segmentation process of the probability number straight line. For example, in order to encode Slice5 completely independently of Slice4, at the time when arithmetic encoding of the last macroblock of Slice4 is completed, the register value of the codeword expressible in the arithmetic encoding cannot be held, and encoding after resetting the register to the initial state is resumed in Slice 5. Therefore, the correlation existing between the end of Slice4 and the beginning of Slice5 cannot be used, and a reduction in coding efficiency is incurred. In short, it is generally designed to improve the loss resistance against unexpected loss of fragment data due to transmission errors or the like at the expense of a decrease in coding efficiency.

The transmission unit generating unit 35 according to the first embodiment provides a method and an apparatus for improving the adaptability of the design. That is, when the probability of loss of segment data due to a transmission error or the like is extremely low, it is possible to positively use the loss without constantly cutting the dependency relationship between segments related to arithmetic coding.

On the one hand, when the possibility of loss of fragment data is high, the dependency relationship between fragments can be cut off to adaptively control the coding efficiency in a transmission unit.

In short, the transmission unit generation unit 35 in the first embodiment receives the transmission unit instruction signal 36 at a timing for dividing the transmission unit as a control signal inside the video image encoding apparatus, and generates data of the transmission unit by dividing the word of the arithmetic code 33 input from the encoding unit 31 based on the timing input by the transmission unit instruction signal 36.

Specifically, the transmission unit generator 35 multiplexes the arithmetic code 33 of the coded value 32 as transmission unit configuration bits one by one (step ST6), determines whether or not the coding of data included in only the part of the macroblock obtained in the transmission unit is completed by the transmission unit instruction signal 36 (step ST8), and returns to step ST1 when it is determined that all the coding in the transmission unit is not completed, and executes the following processing for determining the context model.

On the other hand, when determining that all the codes in the transmission unit have been completed, the transmission unit generating unit 35 adds the following 2 pieces of information as header information of the next transmission unit data (step ST 9).

1. In the next transmission unit, a "register reset flag" indicating whether or not the register value indicating the arithmetic coding process expressed as the character code is reset is added to the probability number straight line division state. In the first generated transfer unit, the register reset flag is set to constantly indicate < reset >.

2. Only in the case where the register reset flag in the above 1 indicates "not to reset", the register value used at the start of arithmetic encoding and decoding of the next transmission unit is added as the "initial register value" as the register value at that time. As shown in fig. 5, the initial register value is the initial register value 34 input from the encoding unit 31 to the transfer unit generating unit 35.

As shown in fig. 10, in addition to the slice start code, the slice header data as the header of the video image compressed data for each slice and the slice header data as the header of the video image compressed data for each slice are provided with: the register reset flag of the above 1; and an initial register value which is multiplexed only when the register reset flag of the above 1 is indicated as < not reset >.

As described above, according to the 2 pieces of additional information, even when the immediately preceding segment is missing, the encoding efficiency can be maintained by using the register reset flag included in the header data of the segment itself and the value initialized by the register as the initial register value, so that the encoding for maintaining the continuity of the arithmetic word can be performed even between the segments.

In fig. 10, the slice header data and the compressed slice video data are multiplexed on the same data stream, but as shown in fig. 11, the slice header data may be transmitted offline in the form of a separate data stream, and the ID information of the corresponding slice header data may be added to the compressed slice video data. In the same figure, an example of transmitting a data stream according to the IP protocol is shown, and an example of transmitting a header data portion in TCP/IP with higher reliability and a video image compressed data portion in RTP/UDP/IP with low delay is shown. According to the separate transmission format of the header and the transmission unit based on the configuration of fig. 11, the data transmitted by RTP/UDP/IP is not necessarily divided into data units called fragments.

In a segment, basically, it is necessary to reset all the dependencies (context models) with respect to the video image signals in the neighboring region so that decoding can be resumed in the segment alone, but this causes a decrease in the video image encoding efficiency.

As shown in fig. 11, if the initial register state can be transmitted by TCP/IP, the video signal itself is encoded by using each context model in the frame, and the data that has been arithmetically encoded can be divided and transmitted at the stage of performing RTP packetization. Thus, according to this configuration, since the arithmetic coding process can be stably obtained without depending on the condition of the line, a bit data stream for which coding without limitation of the slice structure is performed can be transmitted with high error resistance maintained.

As shown in fig. 12, the syntax of whether or not to use the register reset flag and the initial register value may be configured in a layer that can be displayed at a higher level. Fig. 12 shows an example in which a register reset control flag indicating whether or not syntax of a register reset flag and an initial register value is used is multiplexed with header information given in units of a video image sequence constituted from a plurality of video image frames.

For example, when it is determined that the quality of the line is deteriorated and the register reset is executed in the video image order to enable stable video image transfer, the register reset control flag is set to a value indicating that < the register is always reset at the beginning of the segment in the video image order >. In this case, the multiplexing of the register reset flag and the initial register value to be multiplexed on a segment basis at the segment level becomes unnecessary.

Therefore, if the register reset can be controlled in units of video image sequence when a certain transmission condition (such as an error rate of a line) continues, overhead information transmitted in units of a segment can be reduced. Needless to say, the register reset control flag may be appended to header information of an arbitrary video image frame in the video image sequence, which is represented by the nth frame, the N +1 th frame, or the like.

The arithmetic decoding unit 27 of the video image decoding apparatus includes: a transmission unit decoding initialization section 37 that performs initialization of arithmetic decoding processing based on additional information about arithmetic encoding processing procedures contained in the header, for each received transmission unit; a context model determination unit 28 that specifies the form of decoding target data such as the motion vector 5, the coding mode information 13, the spatial prediction mode 14, and the orthogonal transform coefficient data 17 based on the arithmetic decoding process, and determines a context model defined commonly to the video encoding device; a binarization unit 29 for generating a binarization rule determined based on a form of the data to be decoded; an occurrence probability generating unit 30 for providing the occurrence probability of each bin (0 or 1) based on the binarization rule and the context model; and a decoding unit 38 for performing arithmetic decoding based on the generated occurrence probability and decoding the motion vector from the resulting binary sequence and the above-mentioned binarization rule

5. Coding mode information 13, spatial prediction mode 14, and orthogonal transform coefficient data 17.

6) Transmission unit decoding initialization processing (step ST10)

As shown in fig. 10, based on: the register reset flag and the initial register value 34 initialize the arithmetic decoding start state in the decoding unit 38 (step ST 10). A register reset flag indicating whether or not a register value in the arithmetic coding process is reset, the register reset flag being multiplexed in a transmission unit such as each segment; whereas the initial register value 34 is not used when the register value is reset.

7) Context model determination processing, binarization processing, and occurrence probability generation processing

These processing procedures are executed by the context model decision unit 28, the binarization unit 29, and the occurrence probability generation unit 30 shown in fig. 13, but are the same as the context model decision processing ST1, the binarization processing ST2, and the occurrence probability generation processing ST3 shown in the processing procedures 1) to 3) on the video image encoding apparatus side, and therefore the same step numbers are given, and the description thereof will be omitted.

8) Arithmetic decoding processing (step ST11)

Since the probability of occurrence of the bin to be decoded has already been determined by the processing procedure up to 7), the decoding unit 38 restores the value of the bin from the procedure of arithmetic decoding processing shown in the conventional example (step ST11), updates the probability of occurrence of the bin by counting 0/1 occurrence frequencies as in the case of processing on the video encoding apparatus side (step ST5), and determines whether or not the value of the bin to be decoded by comparison with the binary sequence diagram determined by the binarization rule is determined (step ST 12).

If the value of the bin decoded by comparison with the binary sequence diagram determined by the binarization rule is indeterminate, the processing described below in the 0/1 occurrence probability generation processing in each bin at step ST3 is executed again (steps ST3, ST11, ST5, and ST 12).

On the other hand, when the value of each bin decoded by confirmation of coincidence with the binary sequence pattern specified by the binarization rule is specified, the data value indicated by the coincident pattern is output as a decoded data value, and if all the transmission units such as fragments have not been decoded (step ST13), the following processing of the context model determination processing in step ST1 is repeatedly executed to decode all the transmission units.

As is apparent from the above description, according to the first embodiment, when video image compressed data is transmitted in a divided manner in a fine transmission unit such as a slice, a register reset flag indicating whether or not a register value indicating an arithmetic coding process is reset is added as slice header data, and an initial register value 34, so that coding can be performed without cutting the continuity of a coding process program of arithmetic coding, and coding efficiency can be maintained while improving error resistance against transmission errors, and decoding becomes possible.

In the first embodiment, although the slice structure is assumed as the transmission unit, the present invention is applicable even to a video image frame as the transmission unit.

The second embodiment,

In the second embodiment, another embodiment of the arithmetic coding unit 6 and the arithmetic decoding unit 27 will be described. In the second embodiment, the following features are provided: not only the register value indicating the state of the word in the arithmetic coding process but also the learning state of the occurrence probability variation in the context model, that is, the learning state of the occurrence probability variation in the context model based on the bin occurrence probability updating process in the occurrence probability generating unit 30 is multiplexed in the segment header.

For example, fig. 8 described in the first embodiment uses information of a motion vector of, for example, an upper block B of a block C as occurrence probability fluctuation determination in order to improve the efficiency of arithmetic coding of the block C, for example. Therefore, for example, if block C and block B are located in different segments, it is necessary to prohibit the use of the information of block B in the occurrence probability determination processing routine.

This situation means that the coding efficiency of the probability adaptation based on the context model is reduced.

Therefore, in the second embodiment, since the method and the apparatus for improving the adaptability of the design are provided, in the case where the probability of loss of the fragment data due to a transmission error or the like is extremely low, the dependency relationship between the fragments related to the arithmetic coding can be positively utilized without being constantly cut off, and in the case where the possibility of loss of the fragment data is high, the dependency relationship between the fragments can be cut off, and the coding efficiency of the transmission unit can be adaptively controlled.

Fig. 15 is a diagram showing an internal configuration of the arithmetic coding unit 6 according to the second embodiment.

The arithmetic coding unit 6 of the second embodiment differs from the arithmetic coding unit 6 of the first embodiment shown in fig. 5 only in that the occurrence probability generating unit 30 passes the context model state 39 to be multiplexed into the slice header to the transmission unit generating unit 35.

As is apparent from comparison with the flowchart of fig. 6 in the first embodiment, the difference is that the context model state 39 of the occurrence probability generation processing at 0/1 in each bin at step ST3, that is, the learning state 39 of the occurrence probability variation in the context model based on the occurrence probability update processing of the bin at the occurrence probability generation unit 30 is only a point of being multiplexed with the slice header in the header configuration processing of the next transmission unit at the transmission unit generation unit 35 at step ST9, similarly to the register value of the binary arithmetic coding processing at step ST 4.

Fig. 17 is an explanatory diagram illustrating a learning state of the vein model. The meaning of the state 39 of the context model will be described with reference to fig. 17.

Fig. 17 shows a case where n macroblocks exist in the k-th transmission unit, and a context model ctx used for only 1 degree is defined for each macroblock, and the occurrence probability of each macroblock ctx changes.

The state 39 of the context model continuing to the next transmission unit means the final state ctx of the k-th transmission unit as shown in fig. 17^k(n-1) is the initial state of ctx in the k +1 th transmission unit, i.e., in ctx^k+1Probability po, p1 of occurrence of 0 or 1 when (n-1) ═ 0, 1 or 2, and probability p in ctx^kThe values 0 and 1 of (n-1) ═ 0 and 1 are equal in occurrence probability po and p. Therefore, ctx is displayed in the transmission unit generation unit 35^kThe data of the state of (n-1) is transmitted as a part of the header information in the k +1 th transmission unit.

In the second embodiment, the slice header data of the compressed data of each slice video image is added with the slice start code, the register reset flag, the initial register value, and the information on the context model state of the immediately preceding slice, which are the same as those in the first embodiment shown in fig. 10.

However, in the second embodiment, the register reset flag includes not only the presence or absence of multiplexing of the initial register value but also the presence or absence of multiplexing of the context model state data.

The information indicating whether the context model state data is multiplexed or not may be configured by setting not only a register reset flag but also another flag.

Although the above embodiment can be described, in fig. 18, the slice header data and the compressed data of the slice video image are multiplexed on the same data stream, but the slice header is transmitted offline in the form of another data stream, and the compressed data may be configured by attaching ID information of the corresponding slice header data.

Fig. 19 is a diagram showing the internal configuration of the arithmetic decoding unit 27 according to the second embodiment. The arithmetic decoding unit 27 of the second embodiment differs from the arithmetic decoding unit 27 of the first embodiment shown in fig. 13 only in that the transmission unit decoding initialization unit 37 passes the state 39 of the context model of the segment immediately before being multiplexed by the segment header to the occurrence probability generation unit 30, and the state of the context model is continued from the immediately preceding segment.

As is apparent from comparison with the flowchart of fig. 14 in the first embodiment, the difference between the above-described processing and the processing in step ST10 for initializing decoding of each transmission unit is that the context model state 39 decoded from the slice header refers to the processing in step ST3, that is, the context model determined in step ST1 is output to the processing for generating the probability of occurrence 0/1 in each bin, and the processing for generating the probability of occurrence 0/1 in the occurrence probability generating unit 30 is used.

In addition, since the state of the context model given to the segment header becomes overhead of the segment header when the number of context models is extremely large, a context model that significantly contributes to the coding efficiency may be selected and multiplexed.

For example, since the motion vector and the orthogonal transform coefficient data account for a large proportion of the total symbol amount, a configuration may be considered in which the context model is continued only for the state thereof. Further, the context model in the continuous state may be clearly multiplexed in the bit data stream, or the state may be selectively continued only for an important context model according to the local state of the video image.

As is apparent from the above description, according to the second embodiment, when the video image compressed data is transmitted in a finer transmission unit, the following can be added: a register reset flag indicating whether the register value showing the arithmetic coding processing is reset or not as segment header data; an initial register value 34; and information indicating the context model state of the immediately preceding segment, so that encoding can be performed without cutting off the continuity of the encoding processing program of arithmetic encoding, and the encoding efficiency can be maintained while improving the error resistance against transmission errors.

In the second embodiment, although the slice structure is assumed as the transmission unit, the present invention is applicable even to a video image frame as the transmission unit.

In particular, in the second embodiment, since information indicating the context model state of the immediately preceding segment is added, even if the block C and the block B immediately preceding the block C are made to be different segments in fig. 8, for example, the occurrence probability determination processing program of the block C can use the context model state of the block B, and the coding efficiency of adapting the occurrence probability based on the context model can be improved. In other words, when the probability of loss of segment data is extremely low due to a transmission error or the like, the dependency relationship between segments related to arithmetic coding does not have to be constantly cut off, and the segment can be actively used until the context model state of the immediately preceding segment.

In the case of the second embodiment, although the description has been made on the case where the information indicating the context model state of each data of the immediately preceding segment is added as segment header data in parallel with the addition of the register reset flag and the initial register value of the first embodiment for each segment data as in the bit data stream syntax shown in fig. 18, the register reset flag and the initial register value of the first embodiment are not added but omitted, and only the information indicating the context model state of each data of the immediately preceding segment may be added as segment header data, and it is needless to say that the context model state is set by adding information indicating the context model state of each data of the immediately preceding segment only when the reset flag is OFF, that is, when the reset is not performed, regardless of whether or not the context model state is set in parallel with the addition of the register reset flag and the initial register value of the first embodiment The state reset flag (see fig. 21) may be used for decoding.

Third embodiment

In the third embodiment, an example in which the transmission unit is formed as data to be encoded and a data division format in which the transmission unit is separately grouped will be described.

For example, taking as an example the data division method disclosed in the draft of Video coding system draft work draft number2, review 3, and JVT-B118r3 studied in Joint Video Team (JVT) of ISO/IEC MPEG andITU-T VCEG, a method is employed in which only the number of macroblocks existing inside the data unit is transmitted in the form of slice data, the data unit being formed by grouping data of a specific form, with the slice structure shown in fig. 9 being displayed as a unit. The data form of the fragment data in units of data grouped is, for example, 0 to 7 as shown below.

0TYPE HEADER for a picture (frame) or slice

1TYPE _ MBHEADER macroblock header information (coding mode information, etc.)

2TYPE _ MVD motion vector

3TYPE _ CBP CBP (distribution of significant orthogonal transform coefficients within a macroblock)

4TYPE _2x2DC alternating conversion coefficient data (1)

5TYPE _ COEFF _ Y orthogonal transform coefficient data (2)

6TYPE _ COEFF _ C alternative conversion coefficient data (3)

7TYPE _ EOS data stream end identification information

For example, in the TYPE _ MVD slice of data TYPE 2, only the data in which the number of macroblocks and motion vector information are collected inside the slice are transmitted as slice data.

Therefore, in the case of decoding TYPE _ MVD data of the k +1 th slice after TYPE _ MVD data of the k th slice is concatenated, if only the context model state of the motion vector at the end of the k th slice is multiplexed in advance in the header of the slice that is TYPE _ MVD data of the k +1 th slice, the context model learning state for arithmetic coding of the motion vector can be continued.

Fig. 21 is an explanatory diagram showing an example of the bit data stream generated by the arithmetic coding unit 6 according to the third embodiment. In fig. 21, for example, when a motion vector in a TYPE _ MVD slice of data TYPE 2 is multiplexed as slice data, a slice start code, a data TYPE ID indicating the TYPE _ MVD, a context model state reset flag, and information indicating the context model state for the motion vector of the immediately preceding slice are added to the slice header.

For example, when only the orthogonal transform coefficient data (2) of TYPE _ COEFF _ Y of the data TYPE 5 is multiplexed as the slice data, the slice header is added with the slice start code, the data TYPE ID indicating the TYPE _ COEFF _ Y, the context model state reset flag, and information indicating the context model state for the orthogonal transform coefficient data of the immediately preceding slice.

In the same figure, the slice header data and the compressed data are multiplexed on the same data stream, but the slice header is transmitted offline in the form of a separate data stream, and the compressed data may be configured by attaching ID information of the corresponding slice header data.

In the arithmetic coding unit 6 of the third embodiment, in the configuration of fig. 15, the transmission unit generating unit 35 may be configured to reconstruct the macroblock data in the slice according to the rule of the data segmentation method, or may be configured to multiplex ID information indicating the type of each data type and the learning state of the context model corresponding to each data type.

In the third embodiment, in the configuration shown in fig. 19, the arithmetic decoding unit 27 may be configured such that the transmission unit decoding initialization unit 37 determines the context model to be used by notifying the context model determination unit 28 of the data type identification ID multiplexed in the segment header, and notifies the occurrence probability generation unit 30 of the learning state of the context model, thereby continuing the learning state 39 of the context model between the segments and executing arithmetic decoding.

As is apparent from the above description, according to the third embodiment, even when the video signal is divided into transmission units grouped in a predetermined data format and compression coding is performed, when the video signal belonging to the transmission unit is arithmetically coded, the coding is continued without resetting the probability learning state of the symbol occurrence of the transmission unit grouped in the previous predetermined data format, so that even when the video signal is grouped in the predetermined data format, coding that improves the coding efficiency of arithmetic coding can be performed while ensuring error resistance.

In the third embodiment, although the data type of each slice structure is exemplified as the transmission unit, the present invention can be applied to transmission of each data type of a video image frame unit.

In addition, in the case of the third embodiment shown in fig. 21, which describes an example of the bit stream syntax, the context model state reset flag and the information indicating the context model state of each piece of data immediately preceding the piece when the flag is OFF are added as the piece header data for each piece of data, but in the same way as in the case of the example of the bit stream syntax of the second embodiment shown in fig. 18, the context model state reset flag and the information indicating the context model state of each piece of data immediately preceding the piece when the flag is OFF may be added as the piece header data for each piece of data, in parallel with the addition of the register reset flag and the initial register value, and the context model state reset flag may be omitted regardless of whether or not the register reset flag and the initial register value are set in parallel, the decoding may be performed by constantly adding information indicating the context model state of each piece of data of the immediately preceding segment.

In addition, although the above-described embodiments one to three have described the example of the video image data as the digital signal, the present invention is not limited to this, and not only the digital signal of the video image data but also a digital signal of a sound, a digital signal of a still image, a digital signal of a text, and a digital signal of multimedia in which these signals are arbitrarily combined can be applied.

Furthermore, although in the first and second embodiments above, the description has been given by taking as an example a slice as a transmission unit of a digital signal, and in the third embodiment, a predetermined transmission unit in which data forms are distinguished and other forms are excluded in a slice, the present invention is not limited to this, and a 1 picture (picture), that is, a 1 video image frame unit, which is a unit formed by collecting a plurality of slices may be used as a predetermined transmission unit, and the use of a storage system other than communication or the like is assumed, and it is possible to connect a predetermined storage unit to not only a predetermined transmission unit.

As described above, the digital signal encoding device and the like according to the present invention are suitable for use in cases where it is necessary to improve the encoding efficiency of arithmetic coding while securing error resistance when compressing and transmitting a video image signal.

Claims

1. A digital signal encoding device that outputs a bit stream including encoded data of a digital signal of a predetermined unit, the digital signal encoding device comprising:

an arithmetic coding unit for compressing the digital signal of the given unit by arithmetic coding,

wherein the arithmetic encoding unit multiplexes information representing whether context model state data for arithmetic decoding of the given unit is multiplexed to the bitstream as an element of header information associated with the given unit.

2. A digital signal encoding method for outputting a bit stream including encoded data of a digital signal of a predetermined unit, characterized in that:

the digital signal of the given unit is compressed by arithmetic coding,

multiplexing information indicating whether context model state data used for arithmetic decoding of the predetermined unit is multiplexed into the bitstream as an element of header information associated with the predetermined unit.