[go: up one dir, main page]

WO2025138170A1 - Procédé de codage, procédé de décodage, codeur, décodeur et support de stockage - Google Patents

Procédé de codage, procédé de décodage, codeur, décodeur et support de stockage Download PDF

Info

Publication number
WO2025138170A1
WO2025138170A1 PCT/CN2023/143439 CN2023143439W WO2025138170A1 WO 2025138170 A1 WO2025138170 A1 WO 2025138170A1 CN 2023143439 W CN2023143439 W CN 2023143439W WO 2025138170 A1 WO2025138170 A1 WO 2025138170A1
Authority
WO
WIPO (PCT)
Prior art keywords
residual
image
value
reconstructed image
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/CN2023/143439
Other languages
English (en)
Chinese (zh)
Inventor
戴震宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Oppo Mobile Telecommunications Corp Ltd
Original Assignee
Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Oppo Mobile Telecommunications Corp Ltd filed Critical Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority to PCT/CN2023/143439 priority Critical patent/WO2025138170A1/fr
Publication of WO2025138170A1 publication Critical patent/WO2025138170A1/fr
Pending legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/132Sampling, masking or truncation of coding units, e.g. adaptive resampling, frame skipping, frame interpolation or high-frequency transform coefficient masking

Definitions

  • the present application relates to the technical field of video coding and decoding, and in particular to a coding and decoding method, a codec and a storage medium.
  • neural network-based coding and decoding tools have been introduced into the video coding and decoding framework, such as neural network-based super-resolution (NNSR) and neural network-based post-filter (NNPF).
  • NNSR neural network-based super-resolution
  • NNPF neural network-based post-filter
  • Most of these neural networks are based on residual network structures, so that residual information in images can be predicted through training.
  • the training process of the neural network is not completely consistent with the actual coding test process.
  • the training data of the neural network and the actual test data are often different. This inconsistency will cause certain errors in the output information of the neural network, thereby reducing the coding and decoding performance.
  • the embodiments of the present application provide a coding and decoding method, a codec and a storage medium.
  • the following introduces various aspects involved in the present application.
  • a decoding method is provided, which is applied to a decoder, including: parsing a bit stream to determine a first reconstructed image; determining a first residual image corresponding to the first reconstructed image based on a neural network; adjusting the residual value in the first residual image to determine a second residual image; and determining a second reconstructed image based on the first reconstructed image and the second residual image.
  • a coding method is provided, which is applied to an encoder, including: determining a first reconstructed image; determining a first residual image corresponding to the first reconstructed image based on a neural network; adjusting the residual value in the first residual image to determine a second residual image; determining a second reconstructed image based on the first reconstructed image and the second residual image.
  • a decoder comprising: a memory for storing a computer program; and a processor for executing the method described in the first aspect when running the computer program.
  • an encoder comprising: a first determination unit configured to determine a first reconstructed image; a second determination unit configured to determine a first residual image corresponding to the first reconstructed image based on a neural network; a third determination unit configured to adjust the residual value in the first residual image to determine the second residual image; and a fourth determination unit configured to determine the second reconstructed image based on the first reconstructed image and the second residual image.
  • an encoder comprising: a memory for storing a computer program; and a processor for executing the method described in the second aspect when running the computer program.
  • a non-volatile computer-readable storage medium for storing a bit stream, wherein the bit stream is generated by an encoding method using an encoder, or the bit stream is decoded by a decoding method using a decoder, wherein the decoding method is the method described in the first aspect, and the encoding method is the method described in the second aspect.
  • a computer program product comprising a computer program, wherein when the computer program is executed, the method as described in the first aspect or the second aspect is implemented.
  • the embodiment of the present application does not directly use the residual information determined based on the neural network (i.e., the first residual image mentioned above) to restore and reconstruct the image, but performs residual adjustment on the residual information before using it to restore and reconstruct the image. This helps to correct the output error of the neural network, thereby improving the encoding and decoding performance.
  • FIG. 1 is a diagram showing an example structure of a video encoder to which an embodiment of the present application can be applied.
  • FIG. 2 is a diagram showing an example structure of a video decoder to which an embodiment of the present application can be applied.
  • FIG4 is a diagram showing an example of the structure of NNSR.
  • Figure 5 is a schematic diagram of the residual network structure.
  • Figure 6 is a schematic diagram of the residual network structure of NNSR.
  • Figure 7 is a schematic diagram of the residual network structure of the neural network based loop filter (NNLF).
  • FIG11 is a flow chart of the encoding method provided in an embodiment of the present application.
  • FIG. 15 is a schematic diagram of the structure of an encoder provided in one embodiment of the present application.
  • FIG16 is a schematic diagram of the structure of an encoder provided in another embodiment of the present application.
  • FIG. 1 is a schematic block diagram of a video encoder according to an embodiment of the present application.
  • the video encoder 100 can be used to perform lossy compression on an image, or can be used to perform lossless compression on an image.
  • the lossless compression can be visually lossless compression or mathematically lossless compression.
  • the video encoder 100 can be applied to image data in a luminance and chrominance (YCbCr, YUV) format.
  • YCbCr, YUV luminance and chrominance
  • the YUV ratio can be 4:2:0, 4:2:2 or 4:4:4, Y represents brightness (Luma), Cb (U) represents blue chrominance, Cr (V) represents red chrominance, and U and V represent chrominance (Chroma) for describing color and saturation.
  • the video encoder 100 reads video data, and for each image in the video data, divides an image into a number of coding tree units (CTUs).
  • CTU coding tree units
  • a CTU may be referred to as a "tree block", “largest coding unit” (LCU) or “coding tree block” (CTB).
  • LCU largest coding unit
  • CTB coding tree block
  • Each CTU may be associated with a pixel block of equal size within an image.
  • Each pixel may correspond to a luminance (luminance or luma) sample and two chrominance (chrominance or chroma) samples. Therefore, each CTU may be associated with a luminance sample block and two chrominance sample blocks.
  • the size of a CTU is, for example, 128 ⁇ 128, 64 ⁇ 64, 32 ⁇ 32, etc.
  • the video encoder and video decoder may support various PU sizes. Assuming that the size of a particular CU is 2N ⁇ 2N, the video encoder and video decoder may support PU sizes of 2N ⁇ 2N or N ⁇ N for intra-frame prediction, and support symmetric PUs of 2N ⁇ 2N, 2N ⁇ N, N ⁇ 2N, N ⁇ N or similar sizes for inter-frame prediction. The video encoder and video decoder may also support asymmetric PUs of 2N ⁇ nU, 2N ⁇ nD, nL ⁇ 2N, and nR ⁇ 2N for inter-frame prediction.
  • the video encoder 100 may include: a prediction unit 110, a residual unit 120, a transform/quantization unit 130, an inverse transform/quantization unit 140, a reconstruction unit 150, a loop filter unit 160, a decoded image buffer 170, and an entropy coding unit 180. It should be noted that the video encoder 100 may include more, fewer, or different functional components.
  • the current block may be referred to as a current coding unit (CU) or a current prediction unit (PU), etc.
  • a prediction block may also be referred to as a predicted image block or an image prediction block, and a reconstructed image block may also be referred to as a reconstructed block or an image reconstruction block.
  • the prediction unit 110 includes an inter-frame prediction unit 111 and an intra-frame prediction unit 112. Since there is a strong correlation between adjacent pixels in an image of a video, an intra-frame prediction method is used in video coding and decoding technology to eliminate spatial redundancy between adjacent pixels. Since there is a strong similarity between adjacent images in a video, an inter-frame prediction method is used in video coding and decoding technology to eliminate temporal redundancy between adjacent images, thereby improving coding efficiency.
  • the inter-frame prediction unit 111 can be used for inter-frame prediction.
  • Inter-frame prediction may include motion estimation and motion compensation. It may refer to image information of different images.
  • Inter-frame prediction uses motion information to find a reference block from a reference image, and generates a prediction block based on the reference block to eliminate temporal redundancy.
  • Inter-frame prediction uses motion information to find a reference block from a reference image, and generates a prediction block based on the reference block.
  • the motion information includes a reference image list where the reference image is located, a reference image index, and a motion vector.
  • the motion vector may be an integer pixel or a sub-pixel. If the motion vector is a sub-pixel, it is necessary to use an interpolation filter in the reference image to make the required sub-pixel block.
  • the integer pixel or sub-pixel block in the reference image found according to the motion vector is called a reference block.
  • Some technologies will directly use the reference block as a prediction block, while some technologies will generate a prediction block based on the reference block.
  • the process of generating a prediction block based on the reference block can also be understood as taking the reference block as the prediction block and then processing the prediction block to generate a new prediction block.
  • the intra-frame prediction unit 112 only refers to information of the same image to predict pixel information within the current code image block to eliminate spatial redundancy.
  • the intra-frame prediction modes used by HEVC are Planar, DC, and 33 angle modes, for a total of 35 prediction modes.
  • the intra-frame modes used by versatile video coding (VVC) are Planar, DC, and 65 angle modes, for a total of 67 prediction modes.
  • Residual unit 120 may generate a residual block of a CU based on the pixel blocks of the CU and the prediction blocks of the PUs of the CU. For example, residual unit 120 may generate a residual block of a CU so that each sample in the residual block has a value equal to the difference between the following two: a sample in the pixel blocks of the CU and a corresponding sample in the prediction blocks of the PUs of the CU.
  • the transform/quantization unit 130 may quantize the transform coefficients.
  • the transform/quantization unit 130 may quantize the transform coefficients associated with the TUs of the CU based on a quantization parameter (QP) value associated with the CU.
  • QP quantization parameter
  • the video encoder 100 may adjust the degree of quantization applied to the transform coefficients associated with the CU by adjusting the QP value associated with the CU.
  • the inverse transform/quantization unit 140 may apply inverse quantization and inverse transform to the quantized transform coefficients, respectively, to reconstruct a residual block from the quantized transform coefficients.
  • the reconstruction unit 150 may add the samples of the reconstructed residual block to the corresponding samples of one or more prediction blocks generated by the prediction unit 110 to generate a reconstructed image block associated with the TU. By reconstructing the sample blocks of each TU of the CU in this manner, the video encoder 100 may reconstruct the pixel blocks of the CU.
  • the loop filter unit 160 is used to process the inverse transformed and inverse quantized pixels to compensate for distortion information and provide a better reference for subsequent coded pixels. For example, a deblocking filter operation may be performed to reduce the blocking effect of pixel blocks associated with the CU.
  • the intra prediction unit 222 may perform intra prediction to generate a prediction block for the PU.
  • the intra prediction unit 222 may use an intra prediction mode to generate a prediction block for the PU based on pixel blocks of spatially neighboring PUs.
  • the intra prediction unit 222 may also determine the intra prediction mode of the PU according to one or more syntax elements parsed from the code stream.
  • the basic process of video encoding and decoding is as follows: at the encoding end, an image is divided into blocks.
  • the prediction unit 110 uses intra-frame prediction or inter-frame prediction to generate a prediction block of the current block.
  • the residual unit 120 can calculate the residual block based on the original block of the prediction block and the current block, that is, the difference between the original block of the prediction block and the current block.
  • the residual block can also be called residual information.
  • the residual block can remove information that is not sensitive to the human eye through the transformation and quantization process of the transformation/quantization unit 130 to eliminate visual redundancy.
  • the entropy decoding unit 210 can parse the code stream to obtain the prediction information, quantization coefficient matrix, etc. of the current block.
  • the prediction unit 220 uses intra-frame prediction or inter-frame prediction to generate a prediction block of the current block based on the prediction information.
  • the inverse quantization/transformation unit 230 uses the quantization coefficient matrix obtained from the code stream to inverse quantize and inverse transform the quantization coefficient matrix to obtain a residual block.
  • the reconstruction unit 240 adds the prediction block and the residual block to obtain a reconstructed block.
  • the reconstructed blocks constitute a reconstructed image, and the loop filtering unit 250 performs loop filtering on the reconstructed image based on the image or on the block to obtain a decoded image.
  • the encoding end also requires similar operations as the decoding end to obtain a decoded image.
  • the decoded image can also be called a reconstructed image, and the reconstructed image can be used as a reference image for inter-frame prediction for subsequent
  • RPR technology is further introduced on the basis of the basic video coding framework.
  • RPR technology can adaptively adjust the resolution of the image within the bitstream according to the network conditions, without inserting the instantaneous decoder refresh frame (IDR) frame or the intra random access picture (IRAP) frame.
  • IDR instantaneous decoder refresh frame
  • IRAP intra random access picture
  • RPR technology can downsample the original frame and encode the low resolution (LR) frame; when the network bandwidth becomes better, RPR technology can encode the high resolution (HR) original frame.
  • the coding and decoding framework using RPR technology is shown in Figure 3. First, the input image is downsampled through the preprocessing operation.
  • the downsampled image can be coded and decoded.
  • the reconstructed image obtained by decoding is upsampled through the post-processing operation to obtain an image with the same resolution as the original input image, which is used as the reference image for the subsequent frame to be encoded.
  • the traditional methods mainly include nearest interpolation, bilinear interpolation, bicubic interpolation and other algorithms.
  • the exploration of NNSR has gradually started.
  • NNSR neural network based video coding
  • the baseline NNSR adopts the structure of a convolutional network, and takes low-resolution reconstructed images and predicted images, as well as various auxiliary information as network input.
  • the various auxiliary information may include, for example, quantizer parameters (QP, or base QP) of general test conditions, slice-level QP (slice QP), etc.
  • QP quantizer parameters
  • slice-level QP slice-level QP
  • the network outputs the predicted original resolution residual image.
  • the predicted original resolution residual image is added pixel by pixel with the original resolution reconstructed image obtained by upsampling based on the RPR technology, and the original resolution reconstructed image after upsampling by the neural network is obtained.
  • the baseline NNSR also uses the classic residual network architecture, and its basic architecture is shown in Figure 6.
  • input represents various information input to the network, such as a low-resolution reconstructed image, a low-resolution predicted image, and various auxiliary information.
  • the various auxiliary information may include, for example, the QP (base QP) of the general test condition, the slice-level QP (slice QP), etc.
  • NNSR represents a neural network-based super-resolution device, which is mainly composed of residual blocks (ResBlock). NNSR passes through multiple layers of cascaded residual blocks and finally outputs the residual image Res of the original resolution predicted by the network.
  • RPR represents the reconstructed image Rec of the original resolution obtained using the traditional upsampling method. "+" means that the residual image Res and the reconstructed image Rec are added pixel by pixel to obtain the upsampled reconstructed image Cnn of the original resolution output by the neural network.
  • NNSR uses a neural network to predict the residual information between the reconstructed image and the original image. Then, through a simple addition operation, NNSR superimposes the residual information on the reconstructed image to obtain a super-resolution image. The quality of the super-resolution image output by NNSR is closer to the original image. It can be seen that the neural network based on the residual network architecture has the ability to predict residual information. In addition to NNSR, other neural network tools based on the residual network architecture include NNPF and NNLF, which also have the ability to predict residual information.
  • Rec represents the reconstructed image output by the codec
  • NNLF represents a loop filter based on a neural network.
  • NNLF is designed based on the residual network architecture, so the network prediction output is residual information.
  • NNLF is trained to predict the residual information of the input reconstructed image and the original image, and finally the residual information is superimposed on the input reconstructed image through a simple addition operation, and the output is a filtered image, which makes its quality closer to the original image.
  • Rec represents the reconstructed image output by the codec
  • NNPF represents a neural network post-processing filter, which is designed based on the residual network architecture, so the network prediction output is residual information.
  • the residual information output by NNPF is added to the reconstructed image to obtain the final neural network reconstructed image, which is used as the filtered image obtained by post-processing.
  • an embodiment of the present application proposes an encoding method, including: determining a first reconstructed image; determining a first residual image corresponding to the first reconstructed image based on a neural network; adjusting the residual value in the first residual image to determine a second residual image; and determining a second reconstructed image based on the first reconstructed image and the second residual image.
  • An embodiment of the present application also provides a decoding method, including: parsing a code stream to determine a first reconstructed image; determining a first residual image corresponding to the first reconstructed image based on a neural network; adjusting the residual value in the first residual image to determine a second residual image; and determining a second reconstructed image based on the first reconstructed image and the second residual image.
  • the embodiment of the present application does not directly use the residual information determined by the neural network (corresponding to the first residual image above), but adjusts the residual information output by the neural network before using it.
  • the adjustment of the residual information may correct the error caused by the inconsistency between the training and actual test processes, thereby improving the encoding and decoding performance.
  • Fig. 9 is a flow chart of a decoding method provided in an embodiment of the present application.
  • the method of Fig. 9 can be applied to a decoder.
  • the decoder can be, for example, a decoder that supports decoding based on a neural network tool (such as NNSR or NNPF).
  • steps S910 to S920 the code stream is parsed to determine a first reconstructed image; and a first residual image corresponding to the first reconstructed image is determined according to a neural network.
  • the image (such as a reconstructed image or a residual image) mentioned in the embodiment of the present application may refer to a frame of image, or may refer to an image block (such as a coding block) in a frame of image.
  • the image mentioned in the embodiment of the present application may refer to an image corresponding to a color component, or may refer to an image corresponding to multiple color components.
  • the image mentioned in the embodiment of the present application may refer to an image corresponding to the Y component, an image corresponding to the Cb component, or an image corresponding to the Cr component.
  • the image mentioned in the embodiment of the present application may refer to an image corresponding to the brightness component, or may refer to an image corresponding to the chroma component.
  • the first residual image may include predicted residual information of the first reconstructed image output by the neural network.
  • the neural network mentioned in the embodiment of the present application may be a neural network based on a residual network structure.
  • the neural network mentioned in the embodiment of the present application may refer to a neural network with the ability to predict residual information, and as an example, the neural network may be NNSR or NNPF.
  • steps S910 to S920 is related to the type of neural network, and the embodiment of the present application does not specifically limit this.
  • step S910 may include: parsing the code stream to determine the third reconstructed image; upsampling the third reconstructed image to determine the first reconstructed image.
  • the third reconstructed image mentioned here may be a low-resolution reconstructed image.
  • the low-resolution reconstructed image can be obtained by traditional video decoding.
  • the code stream can be parsed to determine the quantization coefficient; the quantization coefficient is inversely quantized to determine the transformation coefficient; the transformation coefficient is inversely transformed to determine the residual value; and the third reconstructed image is determined based on the predicted value and the residual value.
  • the third reconstructed image can be upsampled based on the RPR technology.
  • the third reconstructed image can be upsampled in one or more of the following ways: nearest neighbor interpolation, bilinear interpolation, and bicubic interpolation.
  • the resolution of the upsampled third reconstructed image can be called the original resolution reconstructed image.
  • step S920 may include: inputting the third reconstructed image into the neural network to determine a first residual image, wherein the resolution of the first residual image is higher than that of the third reconstructed image.
  • a low-resolution reconstructed image into the neural network to determine a residual image of the original resolution.
  • other information may be input into the neural network so that the neural network can better predict the first residual image.
  • a first predicted image may be input into the neural network.
  • the first predicted image may be a predicted image with the same resolution as the third reconstructed image, and therefore, the first predicted image may also be referred to as a low-resolution predicted image.
  • other auxiliary information may also be input into the neural network.
  • the auxiliary information mentioned here may include, for example, one or more of the following: QP of general test conditions, slice-level QP, etc.
  • step S910 can be obtained based on the traditional video decoding process.
  • step S910 may include: parsing the code stream to determine the quantization coefficient; dequantizing the quantization coefficient to determine the transformation coefficient; deconverting the transformation coefficient to determine the residual value; and determining the first reconstructed image based on the predicted value and the residual value.
  • Step S920 may include: inputting the first reconstructed image into the neural network to determine the first residual image.
  • the first reconstructed image can be input into the neural network so that the neural network predicts the residual information of the first reconstructed image, thereby obtaining the first residual image.
  • the residual value in the first residual image is adjusted to determine the second residual image.
  • the embodiment of the present application does not specifically limit the adjustment method of the residual value.
  • the residual value in the first residual image can be increased, or the residual value in the first residual image can be decreased.
  • a certain residual adjustment formula can be used to adjust the residual value in the first residual image to obtain a second residual image.
  • one or more fixed offset values (which can be positive numbers) can be used to offset the residual value in the first residual image (i.e., residual offset adjustment (ROA)) to obtain a second residual image.
  • ROA residual offset adjustment
  • the residual value in the first residual image is larger than the actual residual value. Therefore, the residual value in the first residual image can be adjusted with the goal of reducing the residual value of the first residual image.
  • the adjustment of the first bias value can make the first residual value and the second residual value satisfy at least one of the following: if the first residual value is a positive number, the second residual value is equal to the difference between the first residual value and the first bias value; if the first residual value is a negative number, the second residual value is equal to the sum of the first residual value and the first bias value; if the first residual value is 0, the second residual value is equal to the first residual value.
  • Figure 10 represents the residual image output by the original NNSR
  • Figure 10 (b) represents the adjusted residual image. Comparing the residual values of the two residual images in Figure 10, it can be seen that the positive residual values in the residual image output by the original NNSR are all subtracted by 1, the negative residual values are all added by 1, and the value of the residual value 0 remains unchanged. Through this adjustment, the overall absolute residual value of the residual image output by the original NNSR is reduced. Using a fixed offset value to adjust the residual value is very simple to implement and basically does not increase the decoding complexity.
  • the first bias value mentioned in the embodiment of the present application may refer to a bias value, or it may be a group of bias values formed by multiple bias values with different values.
  • the multiple bias values can be used to adjust the residual values within multiple value ranges respectively.
  • the multiple bias values can be understood as multiple bias values of different precisions. For larger residual values, a bias value with lower precision (larger value) can be used to adjust it; for smaller residual values, a bias value with higher precision (smaller value) can be used to adjust it.
  • ROA_FACTOR the value of ROA_FACTOR can be derived using the following strategy (expressed in pseudo code):
  • ⁇ x1,x2,x3 ⁇ represents positive residual values
  • ⁇ y1,y2,y3 ⁇ represents negative residual values
  • ⁇ a1,a2,a3 ⁇ and ⁇ b1,b2,b3 ⁇ are preset candidate bias values.
  • a second reconstructed image is determined based on the first reconstructed image and the second residual image.
  • the first reconstructed image and the second residual image may be added pixel by pixel to obtain the second reconstructed image.
  • the second reconstructed image may be a neural network upsampled reconstructed image of the original resolution.
  • the second reconstructed image may be a filtered image obtained by post-processing.
  • the embodiment of the present application does not directly use the residual information determined by the neural network (corresponding to the first residual image above), but adjusts the residual information output by the neural network before using it (i.e., using the second residual image).
  • the adjustment of the residual information may correct the error caused by the inconsistency between the training and actual test processes, thereby improving the decoding performance.
  • the above describes in detail how to adjust the residual image determined based on the neural network.
  • the adjustment of the residual image by the decoding end can be based on auxiliary information (such as high-level syntax elements) in the bitstream.
  • auxiliary information such as high-level syntax elements
  • the following is a detailed example of the auxiliary information that may be used to implement the residual image adjustment scheme at the decoding end. It should be understood that the auxiliary information mentioned below is optional information.
  • the codec can adjust the residual image according to the same predefined rules without the need for such auxiliary information.
  • the bitstream may carry the first identification information. Therefore, during the decoding process, the bitstream may be parsed to determine the First identification information.
  • the first identification information can be used to indicate whether to perform residual adjustment on the first residual image (or whether to allow residual adjustment on the first residual image).
  • the first identification information may include a first value and a second value. The first value may be, for example, 1 or true. The second value may be, for example, 0 or false. If the value of the first identification information is the first value, it indicates that the decoding end performs residual adjustment on the first residual image; if the value of the first identification information is the second value, it may indicate that the decoding end does not perform residual adjustment on the first residual image.
  • the above-mentioned first identification information may be a flag bit at the picture level. Therefore, the first identification information may be referred to as a picture-level residual information adjustment enable flag.
  • the first identification information may be represented by picture_roa_enable_flag[comp], where comp represents the current color component.
  • picture_level residual information adjustment enable flag picture_roa_enable_flag[comp] may be parsed. If picture_roa_enable_flag[comp] is "1", residual adjustment is performed on the current color component; if picture_roa_enable_flag[comp] is "0", residual adjustment is not performed on the current color component.
  • the code stream may carry the second identification information. Therefore, during the decoding process, the code stream may be parsed to determine the second identification information.
  • the second identification information is used to indicate whether to perform residual adjustment on the image sequence (or whether to allow residual adjustment on the image sequence).
  • the second identification information includes a third value and a fourth value.
  • the third value may be, for example, 1 or true.
  • the fourth value may be, for example, 0 or false.
  • the third value is used to indicate that residual adjustment is performed on the image sequence, and the fourth value is used to indicate that residual adjustment is not performed on the image sequence.
  • the second identification information may be represented by roa_enable_flag. It should be understood that the second identification information and the first identification information mentioned above may be used in combination.
  • the value of the second identification information when the value of the second identification information is the third value (i.e., residual adjustment is allowed on the image sequence), the value of the first identification information may continue to be determined. If the value of the first identification information is the first value (i.e., residual adjustment is allowed on the first residual image), the first residual image is adjusted; if the value of the first identification information is the second value (i.e., residual adjustment is not allowed on the first residual image), the first residual image is not adjusted.
  • the value of the second identification information is the fourth value (i.e., residual adjustment is not allowed for the image sequence) residual adjustment is not performed on the image in the current sequence. In this case, the decoder does not need to parse the first identification information, and the value of the first identification information of each frame of the image in the current sequence is assumed to be the second value.
  • the residual adjustment method used by the codec may include only one residual adjustment method or multiple residual adjustment methods. If only one residual adjustment method is included, when the decoder determines to adjust the first residual image, the residual adjustment method may be directly used for adjustment. If multiple residual adjustment methods are included, the code stream may carry index information indicating the residual adjustment method currently used (the index information may be carried in supplemental enhancement information (SEI)). Still taking the residual adjustment of the first residual image as an example, the code stream may be parsed to determine the first index information. The first index information may be used to indicate the residual adjustment method of the first residual image. The first index information may indicate which of the preset multiple residual adjustment methods the residual adjustment method of the first residual image is. For example, the preset residual adjustment method includes two residual adjustment methods.
  • SEI Supplemental Enhancement Information
  • the index of the first residual adjustment method is 0, and the index of the second residual adjustment method is 1. If the code stream is parsed and it is determined that the value of the first index information is 1, it can be determined that the second residual adjustment method is used to adjust the residual value of the first residual image.
  • different candidate bias values can represent different residual adjustment methods.
  • two candidate bias values can be pre-set: 1 and 2. Among them, the value of the first index information corresponding to the candidate bias value 1 is 0, and the value of the first index information corresponding to the candidate bias value 2 is 1. If the code stream is parsed and it is determined that the value of the first index information is 1, it can be determined to use the bias value 2 to adjust the residual value of the first residual image.
  • the first residual value in the first residual image is a positive number
  • the first residual value is subtracted by 2; if the first residual value in the first residual image is a negative number, the first residual value is increased by 2; if the first residual value in the first residual image is 0, the first residual value remains unchanged.
  • the first index information may be used in combination with the first identification information mentioned above. For example, if the first identification information indicates that residual adjustment is performed on the first residual image, the first index information may continue to be parsed to determine the residual adjustment method. For another example, if the first identification information indicates that residual adjustment is not performed on the first residual image, the first index information may not be parsed or the parsing of the first index information may be skipped.
  • Figure 11 is a flow chart of the encoding method provided in an embodiment of the present application.
  • the method of Figure 11 can be applied to an encoder.
  • the encoder can be, for example, an encoder that supports encoding based on a neural network tool (such as NNSR or NNPF).
  • a first reconstructed image is determined; and a first residual image corresponding to the first reconstructed image is determined according to a neural network.
  • the image (such as a reconstructed image or a residual image) mentioned in the embodiment of the present application may refer to a frame of image, or may refer to an image block (such as a coding block) in a frame of image.
  • the image mentioned in the embodiment of the present application may refer to an image corresponding to a color component, or may refer to an image corresponding to multiple color components.
  • the image mentioned in the embodiment of the present application may refer to an image corresponding to the Y component, an image corresponding to the Cb component, or an image corresponding to the Cr component.
  • the image mentioned in the embodiment of the present application may refer to an image corresponding to the brightness component, or may refer to an image corresponding to the chroma component.
  • the first residual image may include the prediction residual information of the first reconstructed image output by the neural network.
  • the network may be a neural network based on a residual network structure.
  • the neural network mentioned in the embodiments of the present application may refer to a neural network capable of predicting residual information, and as an example, the neural network may be NNSR or NNPF.
  • steps S1110 to S1120 is related to the type of neural network, and the embodiments of the present application do not specifically limit this.
  • step S1110 may include: upsampling the third reconstructed image to determine the first reconstructed image.
  • the third reconstructed image mentioned here may be a low-resolution reconstructed image.
  • the low-resolution reconstructed image may be obtained by a traditional video coding method.
  • the image to be encoded may be first predicted, transformed, quantized, and the like to obtain a quantization coefficient.
  • the quantization coefficient is inversely quantized, inversely transformed, and the like to determine the third reconstructed image.
  • the third reconstructed image may be upsampled based on the RPR technology.
  • the third reconstructed image may be upsampled in one or more of the following ways: nearest neighbor interpolation, bilinear interpolation, and bicubic interpolation.
  • the resolution of the third reconstructed image obtained by upsampling may be referred to as the original resolution reconstructed image.
  • step S1120 may include: inputting the third reconstructed image into the neural network to determine the first residual image, wherein the resolution of the first residual image is higher than the resolution of the third reconstructed image.
  • step S1120 may include: inputting the third reconstructed image into the neural network to determine the first residual image, wherein the resolution of the first residual image is higher than the resolution of the third reconstructed image.
  • the first predicted image may be input into the neural network.
  • the first predicted image may be a predicted image with the same resolution as the third reconstructed image, and therefore, the first predicted image may also be referred to as a low-resolution predicted image.
  • other auxiliary information may also be input into the neural network.
  • the auxiliary information mentioned here may, for example, include one or more of the following: QP of general test conditions, slice-level QP, etc.
  • the first reconstructed image in step S1110 can be obtained based on the traditional video encoding process.
  • step S1110 may include: predicting, transforming, quantizing, and other operations on the image to be encoded to determine the first reconstructed image.
  • step S1120 may include: inputting the first reconstructed image into the neural network to determine the first residual image.
  • the first reconstructed image can be input into the neural network so that the neural network predicts the residual information of the first reconstructed image, thereby obtaining the first residual image.
  • the residual value in the first residual image is adjusted to determine the second residual image.
  • the embodiment of the present application does not specifically limit the adjustment method of the residual value.
  • the residual value in the first residual image can be increased, or the residual value in the first residual image can be decreased.
  • a certain residual adjustment formula can be used to adjust the residual value in the first residual image to obtain a second residual image.
  • one or more fixed offset values (which can be positive numbers) can be used to offset the residual value in the first residual image (i.e., residual offset adjustment (ROA)) to obtain a second residual image.
  • ROA residual offset adjustment
  • the residual value in the first residual image is larger than the actual residual value. Therefore, the residual value in the first residual image can be adjusted with the goal of reducing the residual value of the first residual image.
  • the adjustment of the first bias value can make the first residual value and the second residual value satisfy at least one of the following: if the first residual value is a positive number, the second residual value is equal to the difference between the first residual value and the first bias value; if the first residual value is a negative number, the second residual value is equal to the sum of the first residual value and the first bias value; if the first residual value is 0, the second residual value is equal to the first residual value.
  • Figure 10 represents the residual image output by the original NNSR
  • Figure 10 (b) represents the adjusted residual image. Comparing the residual values of the two residual images in Figure 10, it can be seen that the positive residual values in the residual image output by the original NNSR are all subtracted by 1, the negative residual values are all added by 1, and the value of the 0 residual value remains unchanged. Through this adjustment, the overall absolute residual value of the residual image output by the original NNSR is reduced. Using a fixed offset value to adjust the residual value is very simple to implement and basically does not increase the coding complexity.
  • the first bias value mentioned in the embodiment of the present application may refer to a bias value, or it may be a group of bias values formed by multiple bias values with different values.
  • the multiple bias values can be used to adjust the residual values within multiple value ranges respectively.
  • the multiple bias values can be understood as multiple bias values of different precisions. For larger residual values, a bias value with lower precision (larger value) can be used to adjust it; for smaller residual values, a bias value with higher precision (smaller value) can be used to adjust it.
  • ROA_FACTOR the value of ROA_FACTOR can be derived using the following strategy (expressed in pseudo code):
  • a second reconstructed image is determined based on the first reconstructed image and the second residual image.
  • the first reconstructed image and the second residual image may be added pixel by pixel to obtain the second reconstructed image.
  • the second reconstructed image may be a neural network upsampled reconstructed image of the original resolution.
  • the second reconstructed image may be a filtered image obtained by post-processing.
  • the embodiment of the present application does not directly use the residual information determined by the neural network (corresponding to the first residual image above), but adjusts the residual information output by the neural network before using it (i.e., using the second residual image).
  • the adjustment of the residual information may correct the error caused by the inconsistency between the training and actual test processes, thereby improving the encoding performance.
  • the above describes in detail how to adjust the residual image determined based on the neural network.
  • the following describes in detail the relevant parameters (such as high-level syntax elements) that may be involved in the residual image adjustment process.
  • the second identification information may be written into the code stream.
  • the second identification information is used to indicate whether residual adjustment is performed on the image sequence (or whether residual adjustment is allowed on the image sequence).
  • the second identification information includes a third value and a fourth value.
  • the third value may be, for example, 1 or true.
  • the fourth value may be, for example, 0 or false.
  • the third value is used to indicate that residual adjustment is performed on the image sequence, and the fourth value is used to indicate that residual adjustment is not performed on the image sequence.
  • the second identification information may be represented by roa_enable_flag. It should be understood that the second identification information and the first identification information mentioned above may be used in combination.
  • the first identification information may continue to be encoded.
  • the value of the second identification information is the fourth value (i.e., residual adjustment is not allowed on the image sequence)
  • the first identification information is not encoded.
  • the first residual value in the first residual image is a positive number
  • the first residual value is subtracted by 2; if the first residual value in the first residual image is a negative number, the first residual value is increased by 2; if the first residual value in the first residual image is 0, the first residual value remains unchanged.
  • the first index information may be used in combination with the first identification information mentioned above. For example, if the first identification information indicates that residual adjustment is performed on the first residual image, the first index information may be parsed to determine the residual adjustment method. For another example, if the first identification information indicates that residual adjustment is not performed on the first residual image, the first index information may not be parsed or the parsing of the first index information may be skipped.
  • the position of the adjustment module for outputting residual information in the codec is shown in Figure 12.
  • the NNSR in Figure 12 is a certain NNSR in the related art, and ROA is the residual information adjustment module proposed in this example.
  • This example is implemented in an RPR-based codec, specifically in the upsampling module.
  • Step d) compare the neural network upsampled reconstructed image and the original resolution reconstructed corrected image generated in this example with the original image and calculate the rate distortion cost, which is calculated as the distortion cost Distortion (D). Compare the two costs. If D ROA ⁇ D NNSR , the original resolution reconstructed corrected image output by ROA is used as the final neural network upsampled reconstructed image; otherwise, if D ROA ⁇ D NNSR , the neural network upsampled reconstructed image output by the original NNSR is still used as the final upsampled reconstructed image. Jump to step e).
  • D distortion cost Distortion
  • Step f) if the current frame has been processed, then load the next frame for processing and jump to step b).
  • Step a) first parse the sequence level flag roa_enable_flag to determine whether ROA can be used in the current sequence. If roa_enable_flag is "1", the current sequence uses ROA to try to adjust the residual information of the current sequence, and jump to b); if roa_enable_flag is "0", the current sequence does not use ROA, and the process ends;
  • Step c) for the current color component of the current frame of the current sequence, parse the picture-level residual information adjustment enable flag picture_roa_enable_flag[comp]. If picture_roa_enable_flag[comp] is "1”, further parse the index of the residual adjustment offset picture_roa_offset_Id[comp], and jump to step d); if picture_roa_enable_flag[comp] is "0", ROA is not used, and the original NNSR output neural network upsampled image is still used as the final upsampled reconstructed image, and jump to step e);
  • Step d) adjust the residual image according to the index picture_roa_offset_Id[comp] of the resolved residual adjustment offset, and add it pixel by pixel with the original resolution reconstructed image obtained by RPR upsampling to obtain the original resolution reconstructed corrected image. Take the original resolution reconstructed corrected image as the final upsampled reconstructed image, and jump to step e);
  • Step e) if the current frame has been processed, then load the next frame for processing and jump to step b).
  • the image-level residual adjustment offset index picture_roa_offset_Id[N] indicates the serial number of the encoded residual adjustment offset; N can be 3, indicating three color components Y, Cb, and Cr; N can also be 2, indicating two color channels, brightness and chrominance.
  • the basic purpose of residual adjustment is to reduce the output residual.
  • the residual value at each pixel can be positive or negative.
  • a fixed value positive number
  • the negative residual value is added to the fixed value.
  • the residual value is 0, no adjustment is made, so that the overall absolute residual value becomes smaller.
  • the fixed value of the adjustment bias is (+1)
  • (a) in Figure 10 represents the residual image output by the original NNSR
  • (b) in Figure 10 represents the adjusted residual image.
  • the residual image is added pixel by pixel to the reconstructed image obtained by RPR upsampling to obtain the neural network upsampling reconstructed adjusted image of the original resolution.
  • This example sets multiple candidate offset values for adjusting the residual information.
  • One of the offset values can be selected by calculating the rate-distortion cost, and its index is encoded into the bitstream for the decoder to read and process.
  • the technical solution provided in this example is implemented based on all frame types (including I, P, B), and the performance is tested.
  • the candidate bias values are set to 1 and 2.
  • JVET joint video experts team
  • This solution proposes an image-level output information adjustment method, which further improves the encoding and decoding performance of the neural network super-resolution device by correcting the output residual image of the neural network super-resolution device. Since the above adjustment method is very simple to implement, there is basically no increase in encoding and decoding complexity.
  • FIG13 is a schematic diagram of the structure of a decoder provided by an embodiment of the present application.
  • the decoder 1300 shown in FIG13 includes a first determination unit 1310, a second determination unit 1320, a third determination unit 1330, and a fourth determination unit 1340.
  • the first determination unit 1310 is configured to parse the code stream and determine a first reconstructed image.
  • the second determination unit 1320 is configured to determine a first residual image corresponding to the first reconstructed image according to a neural network.
  • the third determination unit 1330 is configured to adjust the residual value in the first residual image and determine the second residual image.
  • the fourth determination unit 1340 is configured to determine the second reconstructed image based on the first reconstructed image and the second residual image.
  • the decoder 1300 further includes: a fifth determination unit configured to parse the bitstream and determine first identification information, where the first identification information is used to indicate whether to perform residual adjustment on the first residual image.
  • the first identification information includes a first value and a second value
  • the first value is used to indicate that residual adjustment is performed on the first residual image
  • the second value is used to indicate that residual adjustment is not performed on the first residual image
  • the decoder 1300 further includes: a sixth determination unit configured to parse the bitstream and determine second identification information, where the second identification information is used to indicate whether to perform residual adjustment on the image sequence.
  • the second identification information includes a third value and a fourth value
  • the third value is used to indicate that residual adjustment is performed on the image sequence
  • the fourth value is used to indicate that residual adjustment is not performed on the image sequence.
  • the decoder 1300 further includes: a seventh determination unit configured to parse the bitstream and determine first index information, where the first index information is used to indicate a residual adjustment method of the first residual image.
  • the first index information is used to indicate a first offset value
  • the residual value of the second residual image is determined based on the first offset value and the residual value of the first residual image.
  • the first index information belongs to supplemental enhancement information.
  • the first residual image includes a first residual value
  • the second residual image includes a second residual value corresponding to the first residual value
  • the first residual value and the second residual value satisfy at least one of the following:
  • the second residual value is equal to the difference between the first residual value and the first offset value
  • the second residual value is equal to the sum of the first residual value and the first offset value
  • the second residual value is equal to the first residual value.
  • the first bias value includes a plurality of bias values having different values, and the plurality of bias values are respectively used to adjust residual values within a plurality of value ranges.
  • the neural network is a neural network-based super-resolution device.
  • the first determining unit 1310 is configured to: parse the code stream to determine the third reconstructed image; and upsample the third reconstructed image to determine the first reconstructed image.
  • the second determination unit 1320 is configured to: input the third reconstructed image into the neural network to determine the first residual image, and the resolution of the first residual image is higher than the resolution of the third reconstructed image.
  • the neural network is a neural network based post-filter.
  • the second determination unit 1320 is configured to: input the first reconstructed image into the neural network to determine the first residual image.
  • a "unit” may be a part of a circuit, a part of a processor, a part of a program or software, etc., and of course, it may be a module, or it may be non-modular.
  • the components in the present embodiment may be integrated into a processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit may be implemented in the form of hardware or in the form of a software functional module.
  • the integrated unit is implemented in the form of a software function module and is not sold or used as an independent product, it can be stored in Based on this understanding, the technical solution of this embodiment, or the part that contributes to the prior art, or the whole or part of the technical solution can be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for a computer device (which can be a personal computer, server, or network device, etc.) or a processor to perform all or part of the steps of the method described in this embodiment.
  • the aforementioned storage medium includes: U disk, mobile hard disk, read only memory (ROM), random access memory (RAM), disk or optical disk, etc., which can store program code.
  • an embodiment of the present application provides a computer-readable storage medium, which is applied to the decoder 1300.
  • the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the decoding method in the first embodiment is implemented.
  • the decoder 1400 may include: a communication interface 1410, a memory 1420 and a processor 1430; each component is coupled together through a bus system 1440. It can be understood that the bus system 1440 is used to realize the connection and communication between these components.
  • the bus system 1440 also includes a power bus, a control bus and a status signal bus. However, for the sake of clarity, various buses are marked as bus system 1440 in Figure 14. Among them,
  • the communication interface 1410 is used to receive and send signals during the process of sending and receiving information with other external network elements;
  • Memory 1420 used for storing computer programs
  • the processor 1430 is configured to, when running the computer program, execute:
  • a second reconstructed image is determined according to the first reconstructed image and the second residual image.
  • the memory 1420 in the embodiment of the present application can be a volatile memory or a non-volatile memory, or can include both volatile and non-volatile memories.
  • the non-volatile memory can be a read-only memory (ROM), a programmable read-only memory (PROM), an erasable programmable read-only memory (EPROM), an electrically erasable programmable read-only memory (EEPROM), or a flash memory.
  • the volatile memory can be a random access memory (RAM), which is used as an external cache.
  • RAM static RAM
  • DRAM dynamic RAM
  • SDRAM synchronous DRAM
  • DDRSDRAM double data rate synchronous DRAM
  • ESDRAM enhanced SDRAM
  • SLDRAM synchronous link DRAM
  • DRRAM direct RAM bus RAM
  • the processor 1430 may be an integrated circuit chip with signal processing capabilities. In the implementation process, each step of the above method can be completed by the hardware integrated logic circuit in the processor 1430 or the instruction in the form of software.
  • the above processor 1430 may be a general processor, a digital signal processor (Digital Signal Processor, DSP), an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), a field programmable gate array (Field Programmable Gate Array, FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components.
  • DSP Digital Signal Processor
  • ASIC Application Specific Integrated Circuit
  • FPGA Field Programmable Gate Array
  • the methods, steps and logic block diagrams disclosed in the embodiments of the present application can be implemented or executed.
  • the general processor can be a microprocessor or the processor can also be any conventional processor, etc.
  • the steps of the method disclosed in the embodiments of the present application can be directly embodied as a hardware decoding processor to execute, or the hardware and software modules in the decoding processor can be executed.
  • the software module can be located in a mature storage medium in the field such as a random access memory, a flash memory, a read-only memory, a programmable read-only memory or an electrically erasable programmable memory, a register, etc.
  • the storage medium is located in the memory 1420, and the processor 1430 reads the information in the memory 1420 and completes the steps of the above method in combination with its hardware.
  • the processing unit can be implemented in one or more application specific integrated circuits (Application Specific Integrated Circuits, ASIC), digital signal processors (Digital Signal Processing, DSP), digital signal processing devices (DSP Device, DSPD), programmable logic devices (Programmable Logic Device, PLD), field programmable gate arrays (Field-Programmable Gate Array, FPGA), general processors, controllers, microcontrollers, microprocessors, other electronic units for performing the functions described in this application or a combination thereof.
  • ASIC Application Specific Integrated Circuits
  • DSP Digital Signal Processing
  • DSP Device digital signal processing devices
  • PLD programmable logic devices
  • FPGA field programmable gate array
  • general processors controllers, microcontrollers, microprocessors, other electronic units for performing the functions described in this application or a combination thereof.
  • the technology described in this application can be implemented by a module (such as a process, function, etc.) that performs the functions described in this application.
  • the software code can be stored in a memory and executed by a processor.
  • the memory can be implemented in the processor or outside the processor.
  • the processor 1430 is further configured to execute the decoding method described in the above embodiment when running the computer program.
  • FIG15 is a schematic diagram of the structure of an encoder provided by an embodiment of the present application.
  • the decoder 1500 shown in FIG15 includes a first determination unit 1510, a second determination unit 1520, a third determination unit 1530, and a fourth determination unit 1540.
  • the first determination unit 1510 is configured as follows: The first reconstructed image is determined.
  • the second determining unit 1520 is configured to determine a first residual image corresponding to the first reconstructed image according to a neural network.
  • the third determining unit 1530 is configured to adjust the residual value in the first residual image to determine a second residual image.
  • the fourth determining unit 1540 is configured to determine a second reconstructed image according to the first reconstructed image and the second residual image.
  • the decoder 1500 further includes: a first writing unit configured to write first identification information into a bitstream, the first identification information including a first value and a second value, the first value being used to indicate that residual adjustment is to be performed on the first residual image, and the second value being used to indicate that residual adjustment is not to be performed on the first residual image.
  • a first writing unit configured to write first identification information into a bitstream, the first identification information including a first value and a second value, the first value being used to indicate that residual adjustment is to be performed on the first residual image, and the second value being used to indicate that residual adjustment is not to be performed on the first residual image.
  • the value of the first identification information is a first value; if the rate-distortion cost corresponding to the second reconstructed image is greater than or equal to the rate-distortion cost corresponding to the first reconstructed image, the value of the first identification information is a second value.
  • the decoder 1500 further includes: a second writing unit configured to write second identification information into the bitstream, where the second identification information is used to indicate whether to perform residual adjustment on the image sequence.
  • the second identification information includes a third value and a fourth value
  • the third value is used to indicate that residual adjustment is performed on the image sequence
  • the fourth value is used to indicate that residual adjustment is not performed on the image sequence.
  • the decoder 1500 further includes: a third writing unit configured to write first index information into a bitstream, wherein the first index information is used to indicate that the first residual image uses a target residual adjustment method among multiple residual adjustment methods.
  • the second residual image includes a plurality of residual images determined based on the plurality of residual adjustment methods, and the target residual adjustment method is determined based on rate-distortion costs corresponding to the plurality of residual images.
  • the first index information is used to indicate a first offset value
  • the residual value of the second residual image is determined based on the first offset value and the residual value of the first residual image.
  • the first index information belongs to supplemental enhancement information.
  • the first residual image includes a first residual value
  • the second residual image includes a second residual value corresponding to the first residual value
  • the first residual value and the second residual value satisfy at least one of the following:
  • the second residual value is equal to the difference between the first residual value and the first offset value
  • the second residual value is equal to the sum of the first residual value and the first offset value
  • the second residual value is equal to the first residual value.
  • the first bias value includes a plurality of bias values having different values, and the plurality of bias values are respectively used to adjust residual values within a plurality of value ranges.
  • the neural network is a neural network-based super-resolution device.
  • the second determination unit 1520 is configured to input the third reconstructed image into the neural network to determine the first residual image, wherein a resolution of the first residual image is higher than a resolution of the third reconstructed image.
  • the neural network is a neural network based post-filter.
  • the second determination unit 1520 is configured to: input the first reconstructed image into the neural network to determine the first residual image.
  • a "unit” may be a part of a circuit, a part of a processor, a part of a program or software, etc., and of course, it may be a module, or it may be non-modular.
  • the components in the present embodiment may be integrated into a processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit may be implemented in the form of hardware or in the form of a software functional module.
  • the integrated unit is implemented in the form of a software function module and is not sold or used as an independent product, it can be stored in a computer-readable storage medium.
  • the technical solution of this embodiment is essentially or the part that contributes to the prior art or all or part of the technical solution can be embodied in the form of a software product.
  • the computer software product is stored in a storage medium, including several instructions for a computer device (which can be a personal computer, server, or network device, etc.) or a processor to perform all or part of the steps of the method described in this embodiment.
  • the aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (ROM), random access memory (RAM), disk or optical disk, etc., various media that can store program codes.
  • an embodiment of the present application provides a computer-readable storage medium, which is applied to the encoder 1500.
  • the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, it implements the decoding method described in any one of the aforementioned embodiments.
  • the encoder 1600 may include: a communication interface 1610, a memory 1620 and a processor 1630; each component is coupled together through a bus system 1640. It can be understood that the bus system 1640 is used to realize the connection and communication between these components.
  • the bus system 1640 also includes a power bus, a control bus and a status signal bus. However, for the sake of clarity, various buses are labeled as bus system 1640 in FIG. 16 .
  • the communication interface 1610 is used to receive and send signals during the process of sending and receiving information with other external network elements;
  • Memory 1620 used for storing computer programs
  • the processor 1630 is configured to, when running the computer program, execute:
  • the memory 1620 in the embodiment of the present application can be a volatile memory or a non-volatile memory, or can include both volatile and non-volatile memories.
  • the non-volatile memory can be a read-only memory (ROM), a programmable read-only memory (PROM), an erasable programmable read-only memory (EPROM), an electrically erasable programmable read-only memory (EEPROM), or a flash memory.
  • the volatile memory can be a random access memory (RAM), which is used as an external cache.
  • RAM static RAM
  • DRAM dynamic RAM
  • SDRAM synchronous DRAM
  • DDRSDRAM double data rate synchronous DRAM
  • ESDRAM enhanced SDRAM
  • SLDRAM synchronous link DRAM
  • DRRAM direct RAM bus RAM
  • the processor 1630 may be an integrated circuit chip with signal processing capabilities. In the implementation process, each step of the above method can be completed by the hardware integrated logic circuit in the processor 1630 or the instruction in the form of software.
  • the above processor 1630 may be a general processor, a digital signal processor (Digital Signal Processor, DSP), an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), a field programmable gate array (Field Programmable Gate Array, FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components.
  • DSP Digital Signal Processor
  • ASIC Application Specific Integrated Circuit
  • FPGA Field Programmable Gate Array
  • the methods, steps and logic block diagrams disclosed in the embodiments of the present application can be implemented or executed.
  • the general processor can be a microprocessor or the processor can also be any conventional processor, etc.
  • the steps of the method disclosed in the embodiments of the present application can be directly embodied as a hardware decoding processor to execute, or the hardware and software modules in the decoding processor can be executed.
  • the software module can be located in a mature storage medium in the field such as a random access memory, a flash memory, a read-only memory, a programmable read-only memory or an electrically erasable programmable memory, a register, etc.
  • the storage medium is located in the memory 1620, and the processor 1630 reads the information in the memory 1620 and completes the steps of the above method in combination with its hardware.
  • the processing unit can be implemented in one or more application specific integrated circuits (Application Specific Integrated Circuits, ASIC), digital signal processors (Digital Signal Processing, DSP), digital signal processing devices (DSP Device, DSPD), programmable logic devices (Programmable Logic Device, PLD), field programmable gate arrays (Field-Programmable Gate Array, FPGA), general processors, controllers, microcontrollers, microprocessors, other electronic units for performing the functions described in this application or a combination thereof.
  • ASIC Application Specific Integrated Circuits
  • DSP Digital Signal Processing
  • DSP Device digital signal processing devices
  • PLD programmable logic devices
  • FPGA field programmable gate array
  • general processors controllers, microcontrollers, microprocessors, other electronic units for performing the functions described in this application or a combination thereof.
  • the technology described in this application can be implemented by a module (such as a process, function, etc.) that performs the functions described in this application.
  • the software code can be stored in a memory and executed by a processor.
  • the memory can be implemented in the processor or outside the processor.
  • the processor 1630 is further configured to execute the encoding method in the aforementioned embodiment when running the computer program.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

Les modes de réalisation de la présente demande concernent un procédé de codage, un procédé de décodage, un codeur, un décodeur et un support de stockage. Le procédé de décodage consiste à : analyser un flux de code et déterminer une première image reconstruite; sur la base d'un réseau neuronal, déterminer une première image résiduelle correspondant à la première image reconstruite; ajuster une valeur résiduelle dans la première image résiduelle afin de déterminer une seconde image résiduelle; et déterminer une seconde image reconstruite sur la base de la première image reconstruite et de la seconde image résiduelle. Dans les modes de réalisation de la présente demande, les informations résiduelles déterminées sur la base d'un réseau neuronal ne sont pas directement utilisées pour récupérer une image reconstruite, mais les informations résiduelles sont soumises à un ajustement résiduel et sont ensuite utilisées pour récupérer l'image reconstruite, de façon à corriger l'erreur de sortie du réseau neuronal, ce qui permet d'améliorer les performances de codage et de décodage.
PCT/CN2023/143439 2023-12-29 2023-12-29 Procédé de codage, procédé de décodage, codeur, décodeur et support de stockage Pending WO2025138170A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2023/143439 WO2025138170A1 (fr) 2023-12-29 2023-12-29 Procédé de codage, procédé de décodage, codeur, décodeur et support de stockage

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2023/143439 WO2025138170A1 (fr) 2023-12-29 2023-12-29 Procédé de codage, procédé de décodage, codeur, décodeur et support de stockage

Publications (1)

Publication Number Publication Date
WO2025138170A1 true WO2025138170A1 (fr) 2025-07-03

Family

ID=96216649

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/143439 Pending WO2025138170A1 (fr) 2023-12-29 2023-12-29 Procédé de codage, procédé de décodage, codeur, décodeur et support de stockage

Country Status (1)

Country Link
WO (1) WO2025138170A1 (fr)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110536133A (zh) * 2018-05-24 2019-12-03 华为技术有限公司 视频数据解码方法及装置
CN114501010A (zh) * 2020-10-28 2022-05-13 Oppo广东移动通信有限公司 图像编码方法、图像解码方法及相关装置
CN115037948A (zh) * 2021-03-04 2022-09-09 脸萌有限公司 基于神经网络的具有残差缩放的视频编解码环路滤波器
US20230052774A1 (en) * 2019-12-05 2023-02-16 Electronics And Telecommunications Research Institute Method, apparatus, and recording medium for region-based differential image encoding/decoding
WO2023225854A1 (fr) * 2022-05-24 2023-11-30 Oppo广东移动通信有限公司 Procédé et dispositif de filtrage en boucle, et procédé, dispositif et système de codage/décodage vidéo

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110536133A (zh) * 2018-05-24 2019-12-03 华为技术有限公司 视频数据解码方法及装置
US20230052774A1 (en) * 2019-12-05 2023-02-16 Electronics And Telecommunications Research Institute Method, apparatus, and recording medium for region-based differential image encoding/decoding
CN114501010A (zh) * 2020-10-28 2022-05-13 Oppo广东移动通信有限公司 图像编码方法、图像解码方法及相关装置
CN115037948A (zh) * 2021-03-04 2022-09-09 脸萌有限公司 基于神经网络的具有残差缩放的视频编解码环路滤波器
WO2023225854A1 (fr) * 2022-05-24 2023-11-30 Oppo广东移动通信有限公司 Procédé et dispositif de filtrage en boucle, et procédé, dispositif et système de codage/décodage vidéo

Similar Documents

Publication Publication Date Title
US20240107015A1 (en) Encoding method, decoding method, code stream, encoder, decoder and storage medium
WO2022227062A1 (fr) Procédés de codage et de décodage, flux de code, codeur, décodeur et support de stockage
JP7753433B2 (ja) デコードのための予測方法及びその装置、並びにコンピュータ記憶媒体
WO2022178686A1 (fr) Procédé de codage/décodage, dispositif de codage/décodage, système de codage/décodage et support d'enregistrement lisible par ordinateur
WO2023193253A1 (fr) Procédé de décodage, procédé de codage, décodeur et codeur
US20250119543A1 (en) Coding method and decoder
JP2025520767A (ja) デコーディング方法、エンコーディング方法、デコーダー及びエンコーダー
US12395627B2 (en) Intra prediction method and decoder
WO2023193254A1 (fr) Procédé de décodage, procédé de codage, décodeur et codeur
WO2024007120A1 (fr) Procédé de codage et de décodage, codeur, décodeur et support de stockage
CN114424544A (zh) 允许基于矩阵的帧内预测块具有多个变换块
WO2025138170A1 (fr) Procédé de codage, procédé de décodage, codeur, décodeur et support de stockage
WO2025073085A1 (fr) Procédé de codage, procédé de décodage, codeur, décodeur et support de stockage
WO2025097423A1 (fr) Procédés de codage et de décodage, flux de code, codeur, décodeur, et support de stockage
WO2025156084A1 (fr) Procédé de codage, procédé de décodage, codeur, décodeur, flux de code et support de stockage
WO2025129660A1 (fr) Procédés de codage et de décodage, codeur et décodeur, train de bits et support de stockage
US20260032260A1 (en) Encoding/decoding method, code stream, encoder, decoder and storage medium
WO2025123197A1 (fr) Procédés de codage, procédés de décodage, codeurs, décodeurs, flux de code et support de stockage
CN116614633A (zh) 解码预测方法、装置及计算机存储介质
RU2815738C2 (ru) Определение режима кодирования цветности на основе внутрикадрового предсказания на основе матрицы
WO2026000233A1 (fr) Procédé de codage, procédé de décodage, codeur, décodeur, flux binaire et support de stockage
WO2025112031A1 (fr) Procédé de codage, procédé de décodage, codeur, décodeur, et support de stockage
WO2025091378A1 (fr) Procédé de codage, procédé de décodage, codeur, décodeur et support de stockage
WO2024207136A1 (fr) Procédé de codage/décodage, flux de code, codeur, décodeur et support de stockage
WO2025217770A1 (fr) Procédé de codage, procédé de décodage, codeur, décodeur, et support de stockage

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23962844

Country of ref document: EP

Kind code of ref document: A1