[go: up one dir, main page]

WO2026007784A1 - Loss information calculation method and apparatus, and related device - Google Patents

Loss information calculation method and apparatus, and related device

Info

Publication number
WO2026007784A1
WO2026007784A1 PCT/CN2025/103711 CN2025103711W WO2026007784A1 WO 2026007784 A1 WO2026007784 A1 WO 2026007784A1 CN 2025103711 W CN2025103711 W CN 2025103711W WO 2026007784 A1 WO2026007784 A1 WO 2026007784A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
output image
image
frequency band
frequency
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/CN2025/103711
Other languages
French (fr)
Chinese (zh)
Inventor
刘琼
叶杰栋
朱逸清
吕卓逸
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Vivo Mobile Communication Co Ltd
Original Assignee
Vivo Mobile Communication Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Vivo Mobile Communication Co Ltd filed Critical Vivo Mobile Communication Co Ltd
Publication of WO2026007784A1 publication Critical patent/WO2026007784A1/en
Pending legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0495Quantised networks; Sparse networks; Compressed networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Definitions

  • Neural networks primarily learn based on loss information.
  • the loss information of image-related neural networks is calculated equally for all pixels in the entire image, resulting in poor targeting of the loss information and thus poor learning performance.
  • This application provides a method, apparatus, and related equipment for calculating loss information, which can solve the problem that the poor learning effect of neural networks is caused by the poor targeting of loss information.
  • a method for calculating loss information including:
  • a loss information calculation device comprising:
  • the second acquisition module is used to acquire the target frequency band component of the target image corresponding to the output image
  • the calculation module is used to calculate the loss information of the neural network, the loss information including the loss information between the target frequency band component of the output image and the target frequency band component of the target image.
  • an electronic device including a processor and a memory, the memory storing a program or instructions executable on the processor, the program or instructions being executed by the processor to implement the steps of the loss information calculation method provided in the embodiments of this application.
  • an electronic device including a processor and a communication interface, wherein the processor is configured to acquire target frequency band components of an output image of a neural network; acquire target frequency band components of a target image corresponding to the output image; and calculate loss information of the neural network, the loss information including loss information between the target frequency band components of the output image and the target frequency band components of the target image.
  • an electronic device comprising: a memory configured to store video data, and processing circuitry configured to implement the steps of the loss information calculation method provided in the embodiments of this application.
  • a readable storage medium on which a program or instructions are stored, which, when executed by a processor, implement the steps of the loss information calculation method provided in the embodiments of this application.
  • a chip including a processor and a communication interface, the communication interface being coupled to the processor, the processor being used to run programs or instructions to implement the steps of the loss information calculation method provided in the embodiments of this application.
  • a computer program/program product is provided, which is stored in a storage medium and is executed by at least one processor to implement the steps of the loss information calculation method provided in the embodiments of this application.
  • Figure 1 is a schematic diagram of the encoding and decoding system provided in an embodiment of this application;
  • Figure 2 is a schematic diagram of the encoder provided in an embodiment of this application.
  • Figure 3 is a schematic diagram of the decoder provided in an embodiment of this application.
  • FIG. 4 is a flowchart of a loss information calculation method provided in an embodiment of this application.
  • Figure 5 is a schematic diagram of a frequency domain transformation provided in an embodiment of this application.
  • Figure 6 is a schematic diagram of a loss information provided in an embodiment of this application.
  • Figure 7 is a structural diagram of a loss information calculation device provided in an embodiment of this application.
  • Figure 8 is a structural diagram of an electronic device provided in an embodiment of this application.
  • Figure 9 is a structural diagram of a terminal provided in an embodiment of this application.
  • first and second are used to distinguish similar objects and not to describe a specific order or sequence. It should be understood that such terms can be used interchangeably where appropriate so that embodiments of this application can be implemented in orders other than those illustrated or described herein, and the objects distinguished by “first” and “second” are generally of the same class, not limited in number; for example, the first object can be one or more.
  • “or” in this application indicates at least one of the connected objects.
  • the scope of protection for "A or B” covers at least three scenarios: Scenario 1: including A but not B; Scenario 2: including B but not A; Scenario 3: including both A and B.
  • the terms “A and/or B,” “at least one of A and B,” and “at least one of A or B” also cover at least the above three scenarios.
  • the character “/” generally indicates that the preceding and following objects are in an "or” relationship.
  • FIG. 1 is a schematic diagram of the encoding/decoding system 10 provided in an embodiment of this application.
  • the technical solution of this application embodiment relates to encoding and decoding (CODEC) video data (including encoding or decoding).
  • the video data includes original unencoded video, encoded video, decoded (e.g., reconstructed) video, or syntax elements, etc.
  • the encoding/decoding system 10 includes a source device 100, which provides encoded video data to be decoded and displayed by the destination device 110.
  • the source device 100 provides video data to the destination device 110 via a communication medium 120.
  • the source device 100 and the destination device 110 may include any one or more of the following: desktop computer, laptop computer, tablet computer, set-top box, mobile phone, wearable device (e.g., smartwatch or wearable camera), television, camera, display device, in-vehicle device, virtual reality (VR) device, augmented reality (AR) device, mixed reality (MR) device, digital media player, video game console, video conferencing equipment, video streaming equipment, broadcast receiver equipment, broadcast transmitter equipment, spacecraft, aircraft, robot, satellite, etc.
  • wearable device e.g., smartwatch or wearable camera
  • VR virtual reality
  • AR augmented reality
  • MR mixed reality
  • source device 100 includes a data source 101, memory 102, encoder 200, and output interface 104.
  • Destination device 110 includes an input interface 111, decoder 300, memory 113, and display device 114.
  • Source device 100 represents an example of a video encoding device
  • destination device 110 represents an example of a video decoding device.
  • source device 100 and destination device 110 may not include some of the components shown in Figure 1, or they may include components other than those shown in Figure 1.
  • source device 100 may receive video data from an external data source (such as an external camera).
  • destination device 110 may interface with an external display device instead of including an integrated display device.
  • memory 102 and memory 113 may be external memories.
  • Figure 1 illustrates the source device 100 and the destination device 110 as separate devices, in some examples, they may be integrated into a single device. In such embodiments, the same hardware or software, separate hardware or software, or any combination thereof may be used to implement the functionality corresponding to the source device 100 and the functionality corresponding to the destination device 110.
  • source device 100 and destination device 110 can perform unidirectional or bidirectional video transmission. If it is bidirectional video transmission, source device 100 and destination device 110 can operate in a substantially symmetrical manner, that is, each of source device 100 and destination device 110 includes an encoder and a decoder.
  • Data source 101 represents the source of video data (i.e., raw, unencoded video data) and provides encoder 200 with a series of images containing video data, which encoder 200 encodes.
  • Data source 101 of source device 100 may include video acquisition devices (such as video cameras), video archives containing previously acquired raw video, or video feed interfaces for receiving video from video content providers.
  • data source 101 may generate computer graphics-based data as source video, or combine live video, archived video, and computer-generated video.
  • encoder 200 encodes the acquired, pre-acquired, or computer-generated video data.
  • Encoder 200 may rearrange the images from the received order (sometimes referred to as the "display order") according to the encoded order.
  • Encoder 200 may generate a bitstream including the encoded video data.
  • Source device 100 may then output the encoded video data to communication medium 120 via output interface 104 for reception or retrieval, for example, by input interface 111 of destination device 110.
  • the memory 102 of the source device 100 and the memory 113 of the destination device 110 represent general-purpose memory.
  • memory 102 may store raw video data from data source 101
  • memory 113 may store decoded video data from decoder 300.
  • memories 102 and 113 may respectively store software instructions executable by, for example, encoder 200 and decoder 300.
  • memories 102 and 113 are shown separately from encoder 200 and decoder 300 in this example, it should be understood that encoder 200 and decoder 300 may also include internal memory for functionally similar or equivalent purposes. If encoder 200 and decoder 300 are deployed on the same hardware device, memories 102 and 113 may be the same memory.
  • memories 102 and 113 may store, for example, encoded video data output from encoder 200 and input to decoder 300.
  • portions of memories 102 and 113 may be allocated as one or more video buffers, for example, to store raw, decoded, or encoded video data.
  • Output interface 104 may include any type of medium or device capable of transmitting encoded video data from source device 100 to destination device 110.
  • output interface 104 may include a transmitter or transceiver, such as an antenna, configured to transmit encoded video data directly from source device 100 to destination device 110 in real time.
  • the encoded video data may be modulated according to the communication standards of a wireless communication protocol and transmitted to destination device 110.
  • Communication medium 120 may include transient media, such as wireless broadcasting or wired network transmission.
  • communication medium 120 may include radio frequency (RF) spectrum or one or more physical transmission lines (e.g., cables).
  • Communication medium 120 may form part of a packet-based network (such as a local area network, a wide area network, or a global network such as the Internet).
  • Communication medium 120 may also take the form of a storage medium (e.g., a non-transitory storage medium), such as a hard disk, flash drive, compact disc, digital video disc, Blu-ray disc, volatile or non-volatile memory, or any other suitable digital storage medium for storing encoded video data.
  • the communication medium 120 may include a router, switch, base station, or any other device that can be used to facilitate communication from source device 100 to destination device 110.
  • a server (not shown) may receive encoded video from source device 100 and provide the encoded video data to destination device 110, for example, via network transmission.
  • the server may include (for example, a web server for a website), a server configured to provide file transfer protocol services (such as File Transfer Protocol (FTP) or File Delivery Over Unidirectional Transport (FLUTE) protocol), a content delivery network (CDN) device, a Hypertext Transfer Protocol (HTTP) server, a Multimedia Broadcast Multicast Services (MBMS) or Evolved Multimedia Broadcast Multicast Service (eMBMS) server, or a Network-attached storage (NAS) device, etc.
  • the server can implement one or more HTTP streaming protocols, such as Moving Picture Experts Group (MPEG) Media Transport (MMT), Dynamic Adaptive Streaming over HTTP (DASH), HTTP Live Streaming (HLS), or Real Time Streaming Protocol (RTSP).
  • MPEG Moving Picture Experts Group
  • MMT Dynamic Adaptive Streaming over HTTP
  • HLS HTTP Live Streaming
  • RTSP Real Time Streaming Protocol
  • Destination device 110 can access encoded video data from a server, for example, via a wireless channel (e.g., Wi-Fi connection) or a wired connection (e.g., Digital Subscriber Line (DSL), Cable Modem, etc.) for accessing encoded video data stored on the server.
  • a wireless channel e.g., Wi-Fi connection
  • a wired connection e.g., Digital Subscriber Line (DSL), Cable Modem, etc.
  • Output interface 104 and input interface 111 may represent a wireless transmitter/receiver, a modem, a wired networking component (e.g., an Ethernet card), a wireless communication component operating according to the Institute of Electrical and Electronics Engineers (IEEE) 802.11 standard or IEEE 802.15 standard (e.g., ZigBeeTM), Bluetooth standard, or other physical components.
  • IEEE Institute of Electrical and Electronics Engineers
  • 802.11 standard or IEEE 802.15 standard (e.g., ZigBeeTM)
  • Bluetooth standard or other physical components.
  • output interface 104 and input interface 111 may be configured to transmit data, such as encoded video data, according to Wi-Fi, Ethernet, cellular networks (such as 4th Generation Mobile Communication Technology (4G), LTE (Long Term Evolution), Advanced LTE, 5th Generation Mobile Communication Technology (5G), 6th Generation Mobile Communication Technology (6G), etc.).
  • 4G 4th Generation Mobile Communication Technology
  • LTE Long Term Evolution
  • Advanced LTE 5th Generation Mobile Communication Technology
  • 6G 6
  • the technology provided in this application can be applied to support video encoding and decoding in one or more multimedia applications such as video conferencing, over-the-air television broadcasting, cable television transmission, satellite television transmission, internet streaming video transmission, digital video encoded onto a data storage medium, decoding of digital video stored on a data storage medium, or other applications.
  • multimedia applications such as video conferencing, over-the-air television broadcasting, cable television transmission, satellite television transmission, internet streaming video transmission, digital video encoded onto a data storage medium, decoding of digital video stored on a data storage medium, or other applications.
  • the input interface 111 of the destination device 110 receives an encoded video bitstream from the communication medium 120.
  • the encoded video bitstream may include syntax elements and encoded data units (e.g., sequences, image groups, images, slices, blocks, etc.), where the syntax elements are used to decode the encoded data units to obtain decoded video data.
  • the display device 114 displays the decoded video data to the user.
  • the display device 114 may include a cathode ray tube (CRT), a liquid crystal display (LCD), a plasma display, an organic light-emitting diode (OLED) display, or other types of display devices.
  • the encoder 200 and decoder 300 can be implemented as one or more of various processing circuits, which may include microprocessors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), discrete logic, hardware, or any combination thereof.
  • DSPs digital signal processors
  • ASICs application-specific integrated circuits
  • FPGAs field-programmable gate arrays
  • the device may store instructions for the software in a suitable non-transitory computer-readable storage medium and use one or more processors to execute the instructions in hardware to perform the technology provided in the embodiments of this application.
  • the encoder 200 and decoder 300 can process based on the following video codec standards: H.263, H.264, H.265 (also known as High Efficiency Video Coding (HEVC)), H.266 (also known as Versatile Video Coding (VVC), Moving Picture Experts Group 2 (MPEG-2), MPEG-4, VP8, VP9, Alliance for Open Media Video 1 (AV1), Audio Video Coding Standard 1 (AVS1), AVS2, AVS3, or next-generation video standard protocols. This application does not specifically limit the implementation of these protocols.
  • HEVC High Efficiency Video Coding
  • H.266 also known as Versatile Video Coding (VVC)
  • MPEG-2 Moving Picture Experts Group 2
  • MPEG-4 MPEG-4
  • VP8 VP9 Alliance for Open Media Video 1 (AV1), Audio Video Coding Standard 1 (AVS1), AVS2, AVS3, or next-generation video standard protocols. This application does not specifically limit the implementation of these protocols.
  • encoder 200 and decoder 300 can perform block-based encoding and decoding of images.
  • the term "block” generally refers to a structure that includes data to be processed (e.g., encoded, decoded, or otherwise used during encoding or decoding).
  • a block can include a two-dimensional matrix of samples of luminance or chrominance data.
  • encoder 200 and decoder 300 can encode and decode video data represented in YUV format.
  • the encoder 200 can be the encoder 200 in Figure 1.
  • the encoder 200 includes a memory 201, an encoding parameter determination unit 210, a residual generation unit 202, a transform processing unit 203, a quantization unit 204, an inverse quantization unit 205, an inverse transform processing unit 206, a reconstruction unit 207, a filter unit 208, a decoded picture buffer (DPB) 209, and an entropy encoding unit 220.
  • DPB decoded picture buffer
  • the memory 201 can store video data to be encoded.
  • the encoder 200 can receive and store video data from the data source 101 shown in Figure 1.
  • the memory 201 can be on the same chip as other components of the encoder 200 (as shown in Figure 2), or it can be on a separate chip from those components.
  • the coding parameter determination unit 210 includes a mode selection unit 211, an inter-frame prediction unit 212, and an intra-frame prediction unit 213.
  • the inter-frame prediction unit 212 is used to obtain a first prediction block for the current block using an inter-frame prediction mode.
  • the intra-frame prediction unit 213 is used to obtain a second prediction block for the current block using an intra-frame prediction mode.
  • the mode selection unit 211 is used to obtain a target prediction block based on the first and second prediction blocks and determine the final prediction mode.
  • the coding parameter determination unit 210 may also include other functional units, such as functional units for determining the partitioning method of coding units (CUs), functional units for determining the transformation type of the residual data of the CUs, or functional units for determining the quantization parameters of the residual data of the CUs.
  • CUs partitioning method of coding units
  • functional units for determining the transformation type of the residual data of the CUs or functional units for determining the quantization parameters of the residual data of the CUs.
  • the CU to be processed in the current image is referred to as the current CU
  • the image block to be processed in the current CU is referred to as the current block or the image block to be processed.
  • the current CU the image block to be processed in the current CU
  • encoding it refers to the block currently being encoded
  • decoding it refers to the block currently being decoded.
  • Inter-frame prediction unit 212 may include a motion estimation unit and a motion compensation unit.
  • the motion estimation unit may perform a motion search to identify one or more matching reference blocks in one or more reference pictures (e.g., one or more previously encoded/decoded pictures stored in DPB 209).
  • the motion estimation unit can generate one or more motion vectors (MVs) representing the position of a reference block in a reference image relative to the position of the current block in the current image.
  • the motion compensation unit can then use interpolation to obtain a predicted value with the precision indicated by the motion vectors.
  • the encoding parameter determination unit 210 can provide the target prediction block to the residual generation unit 202.
  • the residual generation unit 202 receives the raw uncoded video data of the current block from the memory 201 and calculates the residual between the current block and the target prediction block to obtain the residual block.
  • the function of the residual generation unit 202 can be implemented using one or more subtractor circuits that perform binary subtraction.
  • the encoding parameter determination unit 210 can provide the entropy encoding unit 220 with syntax elements representing encoding parameters for encoding.
  • the encoding parameters include one or more of the following: the partitioning method of the CU, the final prediction mode, the transformation type of the residual data of the CU, or the quantization parameters of the residual data of the CU.
  • the transform processing unit 203 transforms the residual block output by the residual generation unit 202 to obtain a transform coefficient block.
  • This transformation may include Discrete Cosine Transform (DCT), integer transform, direction transform, or Karhunen-Loeve transform, etc.
  • the encoder 200 may not include the transform processing unit 203.
  • Quantization unit 204 can quantize the transform coefficients in the transform coefficient block according to the quantization parameter (QP) value associated with the current block to produce a quantized transform coefficient block.
  • QP quantization parameter
  • the inverse quantization unit 205 and the inverse transform processing unit 206 can perform inverse quantization and inverse transform on the transform coefficient block, respectively, to obtain the reconstructed residual block.
  • the reconstruction unit 207 can generate a reconstructed block corresponding to the current block based on the reconstructed residual block and the target prediction block generated by the coding parameter determination unit 210.
  • Filter unit 208 can perform one or more filter operations on the reconstructed block.
  • filter unit 208 can be a deblocking filter (DBF), an adaptive loop filter (ALF), a sample adaptive offset (SAO) filter, etc.
  • encoder 200 may not include filter unit 208.
  • Encoder 200 stores the reconstructed image obtained from the reconstructed blocks in DPB 209.
  • reconstruction unit 207 can store the reconstructed blocks in DPB 209.
  • filter unit 208 can store the filtered reconstructed blocks in DPB 209.
  • Inter-frame prediction unit 212 retrieves the reconstructed image from DPB 209 to perform inter-frame prediction on blocks of subsequent images to be encoded.
  • DPB 209 can be replaced with other types of memory.
  • Entropy coding unit 220 can entropy code the syntax elements of other components in encoder 200 to output encoded video data. For example, entropy coding unit 220 can entropy code the quantized transform coefficient block from quantization unit 204. As another example, entropy coding unit 220 can entropy code the syntax elements (e.g., motion information for inter-frame prediction or intra-frame mode information for intra-frame prediction) from coding parameter determination unit 210.
  • syntax elements e.g., motion information for inter-frame prediction or intra-frame mode information for intra-frame prediction
  • composition of the encoder 200 shown in Figure 2 is only illustrative and does not constitute a limitation on the embodiments of this application.
  • Figure 3 is a schematic diagram of the structure of the decoder 300 provided in an embodiment of this application.
  • the decoder 300 can be the decoder 300 described in Figure 1.
  • the decoder 300 includes a coded picture buffer (CPB) 301, an entropy decoding unit 302, a prediction processing unit 310, an inverse quantization unit 303, an inverse transform processing unit 304, a reconstruction unit 305, a filter unit 306, and a DPB 307.
  • CPB coded picture buffer
  • the entropy decoding unit 302 can receive encoded video data from the CPB 301 and perform entropy decoding on the video data to obtain syntax elements.
  • the syntax elements indicate encoding parameters, including one or more of the following: CU partitioning method, final prediction mode, transformation type of CU residual data, or quantization parameters of CU residual data.
  • the prediction processing unit 310 obtains the final prediction mode. If the final prediction mode is an inter-frame prediction mode, the prediction block of the current CU can be obtained through the inter-frame prediction unit 311 of the prediction processing unit 310; if the final prediction mode is an intra-frame prediction mode, the prediction block of the current CU can be obtained through the intra-frame prediction unit 312 of the prediction processing unit 310.
  • the prediction processing unit 310 may also include a unit for performing prediction functions according to other prediction modes.
  • CPB 301 can acquire and store encoded video data from the communication medium 120 shown in Figure 1.
  • DPB 307 is used to store decoded images.
  • CPB 301 and DPB 307 can be replaced with other types of memory, which are not specifically limited in this application.
  • CPB 301 can be on the same chip as other components of decoder 300 (as shown in the figure), or it can be on a separate chip from the chip where those components are located.
  • Decoder 300 can perform reconstruction operations on each block individually.
  • Entropy decoding unit 302 can entropy decode the syntax elements and transform information (e.g., QP or transform mode indication) of the quantized transform coefficients to obtain the quantized transform coefficients.
  • Dequantization unit 303 dequantizes the quantized transform coefficients to obtain a transform coefficient block including the transform coefficients.
  • Inverse transform processing unit 304 performs an inverse transform on the transform coefficient block to generate a residual block corresponding to the current block; this inverse transform is the reverse operation of the above transform.
  • Reconstruction unit 305 can reconstruct the current block based on the prediction block and the residual block. For example, reconstruction unit 305 can add samples from the residual block to the corresponding samples from the prediction block to reconstruct the current block.
  • Filter unit 306 can perform one or more filter operations on the reconstructed block.
  • the type of filter unit 306 can be referenced to the type of filter unit 208, and will not be described again here. In some examples, the operations of filter unit 306 can be skipped.
  • Decoder 300 can store the reconstructed image obtained from the reconstructed blocks in DPB 307. For example, in an example where filter unit 306 is not operated, reconstruction unit 305 can store the reconstructed blocks in DPB 307. In an example where filter unit 306 is operated, filter unit 306 can store the filtered reconstructed blocks in DPB 307. Decoder 300 can output the decoded image (e.g., decoded video) from DPB 307 for subsequent rendering on a display device (such as display device 114 of FIG. 1).
  • a display device such as display device 114 of FIG. 1).
  • the loss information calculation method provided in the embodiments of this application is described below with reference to the accompanying drawings.
  • the loss information calculation method provided in the embodiments of this application can be executed by an encoding end, such as the encoder 200 shown in Figure 1 or Figure 2.
  • the loss information calculation method provided in the embodiments of this application can be executed by a decoding end, such as the decoder 300 shown in Figure 1 or Figure 3.
  • the encoding end and decoding end can be implemented by software, hardware, or a combination thereof. When implemented by hardware, the encoding end can be referred to as an encoding end device or a video encoding device, and the decoding end can be referred to as a decoding end device or a video decoding device.
  • Figure 4 is a flowchart of a loss information calculation method provided in an embodiment of this application. As shown in Figure 4, it includes the following steps:
  • Step 401 Obtain the target frequency band component of the output image of the neural network.
  • the aforementioned neural network can be a convolutional neural network, a deep convolutional neural network, a feedforward neural network, or a recurrent neural network, etc.
  • the aforementioned neural network is an image-related neural network, such as a neural network used to process image-related tasks in the field of video encoding and decoding, such as a neural network used for super-resolution (SR), a neural network used for image enhancement, or a neural network used for loop filtering, etc.
  • SR super-resolution
  • SR neural network used for image enhancement
  • SR neural network used for loop filtering
  • the above output image is the image output by the neural network, that is, the image obtained after processing by the neural network.
  • the type or structure of the neural network is not limited in the embodiments of this application, nor is the function of the neural network limited. Specifically, it can be a neural network related to the field of image or video encoding and decoding.
  • the target frequency band component is a component of the target frequency band, which may specifically be a low-frequency component, a high-frequency component, or a mid-frequency component.
  • Step 402 Obtain the target frequency band component of the target image corresponding to the output image.
  • the target image corresponding to the output image can be the original image, specifically an image that approximates the target of the neural network, meaning the neural network learns using this target image as its target.
  • the target image can be referred to as a positive image sample or a real image sample during the neural network learning process.
  • the target image and the output image have the same size.
  • steps 401 and 402 are not limited in this embodiment. Step 401 can be executed first, followed by step 402, or step 402 can be executed first, followed by step 401, or steps 401 and 402 can be executed simultaneously.
  • Step 403 Calculate the loss information of the neural network, the loss information including the loss information between the target frequency band component of the output image and the target frequency band component of the target image.
  • the loss information of the aforementioned computational neural network can be the loss information between the target frequency band component of the output image and the target frequency band component of the target image.
  • the loss function used to calculate the loss information can be loss function L1 (i.e., absolute error loss function), loss function L2 (i.e., mean squared error loss function), or structural similarity index (SSIM) loss function, etc.
  • loss function L1 i.e., absolute error loss function
  • loss function L2 i.e., mean squared error loss function
  • structural similarity index (SSIM) loss function etc.
  • loss function used to calculate the loss information is not limited in the embodiments of this application.
  • the loss information includes the loss information between the target frequency band component of the output image and the target frequency band component of the target image, it is possible to obtain loss information for the target frequency band, making the loss information of the neural network more targeted and beneficial to improving the learning effect of the neural network.
  • the above-mentioned loss information calculation method can be executed by the encoding end, that is, the encoding end executes the above steps.
  • the above-mentioned loss information calculation method can also be executed by other electronic devices other than the encoding end, such as by an electronic device that performs neural network learning to execute the steps in the above method.
  • the electronic device can specifically be a computer, server, mobile phone or other electronic devices.
  • steps 401 to 403 may be configured to be executed in the aforementioned neural network, or may be executed using an algorithm or program other than the aforementioned neural network, and there is no limitation on this.
  • the above method further includes the following steps:
  • the neural network is then trained based on the aforementioned loss information.
  • the learning of the neural network can also be referred to as the training of the neural network.
  • the above learning can improve the neural network's attention to the target frequency band, thereby improving the neural network's processing performance for tasks related to the target frequency band. If the target frequency band is high frequency, the neural network can pay more attention to high frequency information and focus more directly on the high frequency components in the image. This can improve the conversion effect from low resolution to high resolution for super-resolution tasks.
  • the embodiments of this application are not limited to performing the above steps.
  • the loss information may be sent to other devices, and the other devices may learn the above neural network based on the loss information.
  • the above method may further include using the learned neural network to perform video encoding/decoding related image tasks such as SR or image enhancement to improve video encoding/decoding performance.
  • the acquisition of the target frequency band components of the output image of the neural network includes:
  • the output image of the neural network is transformed in the frequency domain to obtain the frequency domain information of the output image, and the target frequency band component is obtained from the frequency domain information of the output image.
  • the frequency domain transformation of the neural network output image described above can be achieved by using a spatial-to-frequency domain transformation technique to obtain the frequency domain information of the output image.
  • This frequency domain information includes the components of each frequency band in the output image.
  • the aforementioned target frequency band component can be understood as the target frequency band information in the frequency domain information of the image.
  • frequency domain transformation can be implemented to obtain the target frequency band component of the output image. This allows neural networks that do not include frequency domain information in the output image to obtain the loss information corresponding to the target frequency band, thereby improving the performance of these neural networks.
  • the embodiments of this application are not limited to obtaining the target frequency band component through the above frequency domain transformation.
  • the output image of the above neural network itself carries frequency domain information, so the target frequency band component can be directly extracted from the output image.
  • obtaining the target frequency band component of the target image corresponding to the output image includes:
  • the target image corresponding to the output image is subjected to frequency domain transformation to obtain the frequency domain information of the target image, and the target frequency band component is obtained from the frequency domain information of the target image.
  • the aforementioned frequency domain transformation of the target image can be performed using a spatial-to-frequency domain transformation method to obtain the frequency domain information of the target image.
  • This frequency domain information includes the components of each frequency band in the target image.
  • the above-mentioned method of obtaining the target frequency band component from the frequency domain information of the target image can be achieved by directly extracting the target frequency band component from the frequency domain information of the target image.
  • frequency domain transformation can be used to obtain the target frequency band component of the target image. This allows loss information corresponding to the target frequency band to be obtained even for target images that do not include frequency domain information, thereby reducing the limitations of loss information calculation.
  • the embodiments of this application are not limited to obtaining the target frequency band components of the target image through the above frequency domain transformation.
  • the frequency information of the target image can be obtained directly from other devices.
  • the frequency domain transformation includes the following:
  • DWT Discrete Wavelet Transform
  • DCT Discrete Cosine Transform
  • DFT Discrete Fourier Transform
  • the order of the frequency domain transform can be set according to the complexity requirements. Taking the first-order transform of DWT as an example, as shown in Figure 5, the digital image represents the output image or target image.
  • the high-frequency components and low-frequency components are obtained through row decomposition, column decomposition, column reconstruction, and row reconstruction, specifically H and L in Figure 5.
  • the target frequency band component includes at least one of the following:
  • High-frequency components mid-frequency components, and low-frequency components.
  • loss information associated with at least one of the high-frequency component, mid-frequency component, and low-frequency component can be calculated, so that the neural network can better focus on the high-frequency component, mid-frequency component, or low-frequency component, thereby improving the processing details of the high-frequency component, mid-frequency component, or low-frequency component in the image processing process and thus improving the processing performance of the neural network.
  • the neural network can focus more directly on the information of high-frequency parts, thereby improving the neural network's ability to learn high-frequency information from data. This enables the network to supplement more high-frequency detail information during high-resolution reconstruction, thereby improving the image's ability to recover high-frequency information.
  • the loss information can include loss information calculated by the following formula:
  • L1_DWT_loss represents the loss information between the high-frequency components of the output image and the high-frequency components of the target image
  • n is the number of high-frequency component sample points of the output image and the target image
  • Yi is the high-frequency component of the target image
  • f( xi ) is the high-frequency component of the output image.
  • FIG. 6 A specific schematic diagram can be shown in Figure 6.
  • the neural network is SR network as an example.
  • SR in Figure 6 represents the above output image
  • GT represents the above target image.
  • calculating the loss information of the neural network includes at least one of the following:
  • the first loss value, the second loss value, and the third loss value mentioned above can be calculated using the same or different loss functions.
  • a loss value can be obtained for any one of the high-frequency, mid-frequency, and low-frequency components, or multiple loss values. Calculating a single loss value allows the neural network to better focus on the high-frequency, mid-frequency, or low-frequency components, improving the processing detail of these components in image processing and thus enhancing the network's performance. Calculating multiple loss values allows the neural network to selectively focus on multiple frequency bands, improving its ability to learn information from multiple frequency bands in the data and further enhancing its performance.
  • the loss information between the target frequency band component of the output image and the target frequency band component of the target image includes: a weighted average loss value of at least two of the first loss value, the second loss value, and the third loss value.
  • the weighted average loss value of at least two of the aforementioned first, second, and third loss values can be a pre-configured or protocol-defined weight value for each frequency band.
  • the loss value of each frequency band is multiplied by its corresponding weight, and then the average is calculated to obtain the aforementioned weighted average loss value. For example, if the weights corresponding to the high-frequency component, mid-frequency component, and low-frequency component are weight 1, weight 2, and weight 3, respectively, the aforementioned weighted average loss value can be equal to one of the following:
  • loss information of the neural network can be obtained based on loss values from multiple frequency bands. This allows the neural network to selectively focus on multiple frequency bands, improving its ability to learn information from data across multiple frequency bands and thus enhancing its performance. Furthermore, since it uses a weighted average loss value, the weights of each frequency band are considered, making the neural network's focus on multiple frequency bands more targeted, thereby further improving its ability to learn information from data across multiple frequency bands and enhancing its performance.
  • the embodiments of this application are not limited to obtaining the loss information by weighted averaging.
  • the loss average of at least two components can also be calculated directly.
  • the output image includes:
  • the upsampled image output by the neural network is the upsampled image output by the neural network.
  • the output image includes an upsampled image output by the neural network
  • the neural network when deployed at the decoding end, it can support the encoding end to compress the downsampled image, saving transmission bandwidth.
  • the decoding end decodes the reconstructed low-resolution image, and then outputs a high-quality upsampled image (i.e., the original resolution image) via the neural network to improve image processing performance.
  • the output image in this application embodiment is not limited to an upsampled image.
  • an unsampled or downsampled image may also be used.
  • the target frequency band component of the output image of the neural network is obtained; the target frequency band component of the target image corresponding to the output image is obtained; and the loss information of the neural network is calculated, wherein the loss information includes the loss information between the target frequency band component of the output image and the target frequency band component of the target image. Since the loss information includes the loss information between the target frequency band component of the output image and the target frequency band component of the target image, it is possible to obtain loss information specific to the target frequency band, making the loss information of the neural network more targeted and beneficial to improving the learning effect of the neural network.
  • the loss information calculation method provided in this application can be executed by a loss information calculation device.
  • the device can be an electronic device or a component within an electronic device, such as a chip or circuit.
  • This application uses the execution of the loss information calculation method by a loss information calculation device as an example to illustrate the loss information calculation device provided in this application.
  • the loss information calculation device 700 includes:
  • the first acquisition module 701 is used to acquire the target frequency band component of the output image of the neural network.
  • the second acquisition module 702 is used to acquire the target frequency band component of the target image corresponding to the output image
  • the calculation module 703 is used to calculate the loss information of the neural network, the loss information including the loss information between the target frequency band component of the output image and the target frequency band component of the target image.
  • the first acquisition module 701 is used to perform frequency domain transformation on the output image of the neural network to obtain the frequency domain information of the output image, and to obtain the target frequency band component from the frequency domain information of the output image.
  • the second acquisition module 702 is used to perform frequency domain transformation on the target image corresponding to the output image to obtain the frequency domain information of the target image, and to obtain the target frequency band component from the frequency domain information of the target image.
  • the frequency domain transformation includes the following:
  • DWT Discrete Wavelet Transform
  • DCT Discrete Cosine Transform
  • DFT Fourier Transform
  • the target frequency band component includes at least one of the following:
  • High-frequency components mid-frequency components, and low-frequency components.
  • the computing module 703 is used for at least one of the following:
  • the loss information between the target frequency band component of the output image and the target frequency band component of the target image includes: a weighted average loss value of at least two of the first loss value, the second loss value, and the third loss value.
  • the output image includes:
  • the upsampled image output by the neural network is the upsampled image output by the neural network.
  • the aforementioned loss information calculation device is beneficial for improving the learning effect of neural networks.
  • the loss information calculation device 700 provided in this application embodiment can implement the various processes implemented in the method embodiment of FIG4 and achieve the same technical effect. To avoid repetition, it will not be described again here.
  • this application embodiment also provides an electronic device 800, including a processor 801 and a memory 802.
  • the memory 802 stores programs or instructions that can run on the processor 801.
  • the program or instructions executed by the processor 801 implement the various steps of the above-described loss information calculation method embodiment and achieve the same technical effect.
  • the program or instructions executed by the processor 801 implement the various steps of the above-described loss information calculation method embodiment and achieve the same technical effect.
  • the memory 802 can be the memory 102 or memory 113 in the embodiment shown in Figure 1, and the processor 801 can implement the functions of the encoder 200 or decoder 300 in the embodiments shown in Figures 1-3.
  • This application also provides an electronic device, including: a memory configured to store video data; and a processing circuit configured to implement the various steps of the loss information calculation method embodiments described above.
  • the memory may be memory 102 or memory 113 in the embodiment shown in FIG1, and the processing circuit may implement the functions of encoder 200 or decoder 300 in the embodiments shown in FIG1-3.
  • This application also provides an electronic device, including a processor and a communication interface.
  • the communication interface is coupled to the processor, and the processor is used to run programs or instructions to implement the steps in the method embodiment shown in FIG4.
  • This device embodiment corresponds to the above method embodiment, and all implementation processes and methods of the above method embodiments can be applied to this terminal embodiment and achieve the same technical effect.
  • the processor or processing circuit in this application embodiment may include general-purpose processors, special-purpose processors, etc., such as central processing units (CPUs), microprocessors, digital signal processors (DSPs), artificial intelligence (AI) processors, graphics processing units (GPUs), application-specific integrated circuits (ASICs), network processors (NPs), field-programmable gate arrays (FPGAs), or other programmable logic devices, gate circuits, transistors, discrete hardware components, etc.
  • the communication interface in this application embodiment may include transceivers, pins, circuits, buses, etc.
  • the aforementioned electronic devices can be terminals or other devices besides terminals, such as servers, network attached storage (NAS), etc.
  • the terminal can be a mobile phone, tablet computer, laptop computer, notebook computer, personal digital assistant (PDA), handheld computer, netbook, ultra-mobile personal computer (UMPC), mobile internet device (MID), augmented reality (AR), virtual reality (VR) device, mixed reality (MR) device, robot, wearable device, flight vehicle, vehicle user equipment (VUE), shipborne equipment, pedestrian user equipment (PUE), smart home (home devices with wireless communication capabilities, such as refrigerators, televisions, washing machines, or furniture), game console, personal computer (PC), ATM or self-service machine, etc.
  • PDA personal digital assistant
  • UMPC ultra-mobile personal computer
  • MID mobile internet device
  • AR augmented reality
  • VR virtual reality
  • MR mixed reality
  • robot wearable device
  • flight vehicle vehicle user equipment
  • VUE vehicle user equipment
  • PUE shipborne equipment
  • PUE pedestrian user equipment
  • smart home home devices with wireless communication capabilities, such as refrigerators, televisions, washing machines, or furniture
  • PC personal computer
  • ATM self-service machine
  • Wearable devices include: smartwatches, smart bracelets, smart earphones, smart glasses, smart jewelry (smart bracelets, smart chains, smart rings, smart necklaces, smart anklets, smart anklets, etc.), smart wristbands, smart clothing, etc.
  • in-vehicle devices can also be referred to as in-vehicle terminals, in-vehicle controllers, in-vehicle modules, in-vehicle components, in-vehicle chips, or in-vehicle units, etc. It should be noted that the embodiments in this application do not limit the specific type of terminal.
  • a server can be a standalone physical server, a server cluster or distributed system consisting of multiple physical servers, or a cloud server.
  • a cloud server can provide cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, content delivery networks (CDNs), or cloud computing services based on big data and artificial intelligence platforms.
  • the aforementioned electronic device may include, but is not limited to, the type of source device 100 or destination device 110 shown in FIG1.
  • Figure 9 is a schematic diagram of the hardware structure of a terminal that implements an embodiment of this application.
  • the terminal 900 includes, but is not limited to, at least some of the following components: radio frequency unit 901, network module 902, audio output unit 903, input unit 904, sensor 905, display unit 906, user input unit 907, interface unit 908, memory 909, and processor 910.
  • the terminal 900 may also include a power supply (such as a battery) for powering various components.
  • the power supply can be logically connected to the processor 910 through a power management system, thereby enabling functions such as charging, discharging, and power consumption management through the power management system.
  • the terminal structure shown in Figure 9 does not constitute a limitation on the terminal.
  • the terminal may include more or fewer components than shown, or combine certain components, or have different component arrangements, which will not be elaborated here.
  • the input unit 904 may include a graphics processor 9041 and a microphone 9042.
  • the graphics processor 9041 processes image data of still images or videos obtained by an image acquisition device (such as a camera) in video acquisition mode or image acquisition mode, or it may process the obtained point cloud data.
  • the display unit 906 may include a display panel 9061, which may be configured in the form of a liquid crystal display, an organic light-emitting diode, etc.
  • the user input unit 907 includes at least one of a touch panel 9071 and other input devices 9072.
  • the touch panel 9071 is also called a touch screen.
  • the touch panel 9071 may include two parts: a touch detection device and a touch controller.
  • Other input devices 9072 may include, but are not limited to, physical keyboards, function keys (such as volume control buttons, power buttons, etc.), trackballs, mice, and joysticks, which will not be described in detail here.
  • the radio frequency unit 901 can transmit it to the processor 910 for processing; in addition, the radio frequency unit 901 can send uplink data to the network-side device.
  • the radio frequency unit 901 includes, but is not limited to, antennas, amplifiers, transceivers, couplers, low-noise amplifiers, duplexers, etc.
  • the memory 909 can be used to store software programs or instructions, as well as various data.
  • the memory 909 may primarily include a first storage area for storing programs or instructions and a second storage area for storing data.
  • the first storage area may store the operating system, application programs or instructions required for at least one function (such as sound playback, image playback, etc.).
  • the memory 909 may include volatile memory or non-volatile memory.
  • the non-volatile memory may be read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), or flash memory.
  • Volatile memory can be random access memory (RAM), static random access memory (SRAM), dynamic random access memory (DRAM), synchronous dynamic random access memory (SDRAM), double data rate synchronous dynamic random access memory (DDRSDRAM), enhanced synchronous dynamic random access memory (ESDRAM), synchronous link dynamic random access memory (SLDRAM), and direct memory bus RAM (DRRAM).
  • RAM random access memory
  • SRAM static random access memory
  • DRAM dynamic random access memory
  • SDRAM synchronous dynamic random access memory
  • DDRSDRAM double data rate synchronous dynamic random access memory
  • ESDRAM enhanced synchronous dynamic random access memory
  • SLDRAM synchronous link dynamic random access memory
  • DRRAM direct memory bus RAM
  • Processor 910 may include one or more processing units; optionally, processor 910 integrates an application processor and a modem processor, wherein the application processor mainly handles operations involving the operating system, user interface, and applications, and the modem processor mainly handles wireless communication signals, such as a baseband processor. It is understood that the aforementioned modem processor may also not be integrated into processor 910.
  • the processor 910 is configured to acquire the target frequency band component of the output image of the neural network; acquire the target frequency band component of the target image corresponding to the output image; and calculate the loss information of the neural network, wherein the loss information includes the loss information between the target frequency band component of the output image and the target frequency band component of the target image.
  • obtaining the components of the target frequency band of the output image of the neural network includes:
  • the output image of the neural network is transformed in the frequency domain to obtain the frequency domain information of the output image, and the target frequency band component is obtained from the frequency domain information of the output image.
  • obtaining the target frequency band component of the target image corresponding to the output image includes:
  • the target image corresponding to the output image is subjected to frequency domain transformation to obtain the frequency domain information of the target image, and the target frequency band component is obtained from the frequency domain information of the target image.
  • the frequency domain transformation includes the following:
  • DWT Discrete Wavelet Transform
  • DCT Discrete Cosine Transform
  • DFT Fourier Transform
  • the target frequency band component includes at least one of the following:
  • High-frequency components mid-frequency components, and low-frequency components.
  • calculating the loss information of the neural network includes at least one of the following:
  • the loss information between the target frequency band component of the output image and the target frequency band component of the target image includes: a weighted average loss value of at least two of the first loss value, the second loss value, and the third loss value.
  • the output image includes:
  • the upsampled image output by the neural network is the upsampled image output by the neural network.
  • the aforementioned terminals are beneficial for improving the learning performance of neural networks.
  • This application also provides a readable storage medium storing a program or instructions.
  • the program or instructions When the program or instructions are executed by a processor, they implement the various processes of the above-described loss information calculation method embodiments and achieve the same technical effect. To avoid repetition, they will not be described again here.
  • the processor mentioned above is the processor in the terminal described in the above embodiments.
  • the readable storage medium includes computer-readable storage media, such as ROM, RAM, magnetic disk, or optical disk. In some examples, the readable storage medium may be a non-transient readable storage medium.
  • This application embodiment also provides a chip, which includes a processor and a communication interface.
  • the communication interface is coupled to the processor.
  • the processor is used to run programs or instructions to implement the various processes of the above-described loss information calculation method embodiment and can achieve the same technical effect. To avoid repetition, it will not be described again here.
  • chips mentioned in the embodiments of this application may include system-on-a-chip (also known as system chip, chip system, or system-on-a-chip) or discrete display chips, etc.
  • This application also provides a computer program/program product, which is stored in a storage medium and executed by at least one processor to implement the various processes of the above-described loss information calculation method embodiments, and can achieve the same technical effect. To avoid repetition, it will not be described again here.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Multimedia (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The present application relates to the technical field of computers, and discloses a loss information calculation method and apparatus, and a related device. The loss information calculation method in embodiments of the present application comprises: acquiring a target frequency band component of an output image of a neural network; acquiring a target frequency band component of a target image corresponding to the output image; and calculating loss information of the neural network, the loss information comprising loss information between the target frequency band component of the output image and the target frequency band component of the target image.

Description

损失信息计算方法、装置及相关设备Loss information calculation methods, devices and related equipment

相关申请的交叉引用Cross-references to related applications

本申请主张在2024年07月03日在中国提交的中国专利申请No.202410886308.4的优先权,其全部内容通过引用包含于此。This application claims priority to Chinese Patent Application No. 202410886308.4, filed in China on July 3, 2024, the entire contents of which are incorporated herein by reference.

技术领域Technical Field

本申请属于计算机技术领域,具体涉及一种损失信息计算方法、装置及相关设备。This application belongs to the field of computer technology, specifically relating to a method, apparatus and related equipment for calculating loss information.

背景技术Background Technology

神经网络主要是基于损失信息进行学习,在一些相关技术中,图像相关的神经网络的损失信息是平等地计算整张图像中所有像素的所有图像信息的损失,因此,神经网络的损失信息针对性比较差,导致神经网络的学习效果比较差。Neural networks primarily learn based on loss information. In some related technologies, the loss information of image-related neural networks is calculated equally for all pixels in the entire image, resulting in poor targeting of the loss information and thus poor learning performance.

发明内容Summary of the Invention

本申请实施例提供一种损失信息计算方法、装置及相关设备,能够解决神经网络的损失信息针对性比较差导致的神经网络的学习效果比较差的问题。This application provides a method, apparatus, and related equipment for calculating loss information, which can solve the problem that the poor learning effect of neural networks is caused by the poor targeting of loss information.

第一方面,提供了一种损失信息计算方法,包括:Firstly, a method for calculating loss information is provided, including:

获取神经网络的输出图像的目标频段分量;Obtain the target frequency band components of the output image of the neural network;

获取所述输出图像对应的目标图像的目标频段分量;Obtain the target frequency band component of the target image corresponding to the output image;

计算所述神经网络的损失信息,所述损失信息包括所述输出图像的目标频段分量和所述目标图像的目标频段分量之间的损失信息。Calculate the loss information of the neural network, which includes the loss information between the target frequency band component of the output image and the target frequency band component of the target image.

第二方面,提供了一种损失信息计算装置,包括:Secondly, a loss information calculation device is provided, comprising:

第一获取模块,用于获取神经网络的输出图像的目标频段分量;The first acquisition module is used to acquire the target frequency band components of the output image of the neural network;

第二获取模块,用于所述输出图像对应的目标图像的目标频段分量;The second acquisition module is used to acquire the target frequency band component of the target image corresponding to the output image;

计算模块,用于计算所述神经网络的损失信息,所述损失信息包括所述输出图像的目标频段分量和所述目标图像的目标频段分量之间的损失信息。The calculation module is used to calculate the loss information of the neural network, the loss information including the loss information between the target frequency band component of the output image and the target frequency band component of the target image.

第三方面,提供了一种损失信息计算装置,所述装置被配置为执行如本申请实施例提供的损失信息计算方法的步骤。Thirdly, a loss information calculation apparatus is provided, the apparatus being configured to perform the steps of the loss information calculation method as provided in the embodiments of this application.

第四方面,提供了一种电子设备,该终端包括处理器和存储器,所述存储器存储可在所述处理器上运行的程序或指令,所述程序或指令被所述处理器执行时实现如本申请实施例提供的损失信息计算方法的步骤。Fourthly, an electronic device is provided, the terminal including a processor and a memory, the memory storing a program or instructions executable on the processor, the program or instructions being executed by the processor to implement the steps of the loss information calculation method provided in the embodiments of this application.

第五方面,提供了一种电子设备,包括处理器及通信接口,其中,所述处理器用于获取神经网络的输出图像的目标频段分量;获取所述输出图像对应的目标图像的目标频段分量;计算所述神经网络的损失信息,所述损失信息包括所述输出图像的目标频段分量和所述目标图像的目标频段分量之间的损失信息。Fifthly, an electronic device is provided, including a processor and a communication interface, wherein the processor is configured to acquire target frequency band components of an output image of a neural network; acquire target frequency band components of a target image corresponding to the output image; and calculate loss information of the neural network, the loss information including loss information between the target frequency band components of the output image and the target frequency band components of the target image.

第六方面,提供了一种电子设备,包括:存储器,被配置为存储视频数据,以及处理电路,被配置为实现如本申请实施例提供的损失信息计算方法的步骤。In a sixth aspect, an electronic device is provided, comprising: a memory configured to store video data, and processing circuitry configured to implement the steps of the loss information calculation method provided in the embodiments of this application.

第七方面,提供了一种可读存储介质,所述可读存储介质上存储程序或指令,所述程序或指令被处理器执行时实现如本申请实施例提供的损失信息计算方法的步骤。In a seventh aspect, a readable storage medium is provided, on which a program or instructions are stored, which, when executed by a processor, implement the steps of the loss information calculation method provided in the embodiments of this application.

第八方面,提供了一种芯片,所述芯片包括处理器和通信接口,所述通信接口和所述处理器耦合,所述处理器用于运行程序或指令,实现如本申请实施例提供的损失信息计算方法的步骤。Eighthly, a chip is provided, the chip including a processor and a communication interface, the communication interface being coupled to the processor, the processor being used to run programs or instructions to implement the steps of the loss information calculation method provided in the embodiments of this application.

第九方面,提供了一种计算机程序/程序产品,所述计算机程序/程序产品被存储在存储介质中,所述计算机程序/程序产品被至少一个处理器执行以实现如本申请实施例提供的损失信息计算方法的步骤。In a ninth aspect, a computer program/program product is provided, which is stored in a storage medium and is executed by at least one processor to implement the steps of the loss information calculation method provided in the embodiments of this application.

在本申请实施例中,获取神经网络的输出图像的目标频段分量;获取所述输出图像对应的目标图像的目标频段分量;计算所述神经网络的损失信息,所述损失信息包括所述输出图像的目标频段分量和所述目标图像的目标频段分量之间的损失信息。由于损失信息包括所述输出图像的目标频段分量和所述目标图像的目标频段分量之间的损失信息,这样可以实现获取针对目标频段的损失信息,使得神经网络的损失信息更加有针对性,有利于提升神经网络的学习效果。In this embodiment, the target frequency band component of the output image of the neural network is obtained; the target frequency band component of the target image corresponding to the output image is obtained; and the loss information of the neural network is calculated, wherein the loss information includes the loss information between the target frequency band component of the output image and the target frequency band component of the target image. Since the loss information includes the loss information between the target frequency band component of the output image and the target frequency band component of the target image, it is possible to obtain loss information specific to the target frequency band, making the loss information of the neural network more targeted and beneficial to improving the learning effect of the neural network.

附图说明Attached Figure Description

图1是本申请实施例提供的编解码系统的示意图;Figure 1 is a schematic diagram of the encoding and decoding system provided in an embodiment of this application;

图2为本申请实施例提供的编码器的结构示意图;Figure 2 is a schematic diagram of the encoder provided in an embodiment of this application;

图3是本申请实施例提供的解码器的结构示意图;Figure 3 is a schematic diagram of the decoder provided in an embodiment of this application;

图4是本申请实施例提供的一种损失信息计算方法的流程图;Figure 4 is a flowchart of a loss information calculation method provided in an embodiment of this application;

图5是本申请实施例提供的一种频域变换的示意图;Figure 5 is a schematic diagram of a frequency domain transformation provided in an embodiment of this application;

图6是本申请实施例提供的一种损失信息的示意图;Figure 6 is a schematic diagram of a loss information provided in an embodiment of this application;

图7是本申请实施例提供的一种损失信息计算装置的结构图;Figure 7 is a structural diagram of a loss information calculation device provided in an embodiment of this application;

图8是本申请实施例提供的一种电子设备的结构图;Figure 8 is a structural diagram of an electronic device provided in an embodiment of this application;

图9是本申请实施例提供的一种终端的结构图。Figure 9 is a structural diagram of a terminal provided in an embodiment of this application.

具体实施方式Detailed Implementation

下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员所获得的所有其他实施例,都属于本申请保护的范围。The technical solutions of the embodiments of this application will be clearly described below with reference to the accompanying drawings. Obviously, the described embodiments are only some, not all, of the embodiments of this application. All other embodiments obtained by those skilled in the art based on the embodiments of this application are within the scope of protection of this application.

本申请的术语“第一”、“第二”等是用于区别类似的对象,而不用于描述特定的顺序或先后次序。应该理解这样使用的术语在适当情况下可以互换,以便本申请的实施例能够以除了在这里图示或描述的那些以外的顺序实施,且“第一”、“第二”所区别的对象通常为一类,并不限定对象的个数,例如第一对象可以是一个,也可以是多个。此外,本申请中的“或”表示所连接对象的至少其中之一。例如“A或B”的保护范围至少涵盖三种方案,即,方案一:包括A且不包括B;方案二:包括B且不包括A;方案三:既包括A又包括B。此外,术语“A和/或B”、“A和B中的至少一项”、“A或B中的至少一项”也分别至少涵盖上述三种方案。字符“/”一般表示前后关联对象是一种“或”的关系。The terms "first," "second," etc., used in this application are used to distinguish similar objects and not to describe a specific order or sequence. It should be understood that such terms can be used interchangeably where appropriate so that embodiments of this application can be implemented in orders other than those illustrated or described herein, and the objects distinguished by "first" and "second" are generally of the same class, not limited in number; for example, the first object can be one or more. Furthermore, "or" in this application indicates at least one of the connected objects. For example, the scope of protection for "A or B" covers at least three scenarios: Scenario 1: including A but not B; Scenario 2: including B but not A; Scenario 3: including both A and B. In addition, the terms "A and/or B," "at least one of A and B," and "at least one of A or B" also cover at least the above three scenarios. The character "/" generally indicates that the preceding and following objects are in an "or" relationship.

图1是本申请实施例提供的编解码系统10的示意图。本申请实施例的技术方案涉及对视频数据进行编解码(CODEC)(包括编码或解码)。其中,视频数据包括原始未编码的视频、已编码的视频、已解码的(例如,重建的)视频或语法元素等。Figure 1 is a schematic diagram of the encoding/decoding system 10 provided in an embodiment of this application. The technical solution of this application embodiment relates to encoding and decoding (CODEC) video data (including encoding or decoding). The video data includes original unencoded video, encoded video, decoded (e.g., reconstructed) video, or syntax elements, etc.

如图1所示,编解码系统10包括源设备100,源设备100提供被目的地设备110解码和显示的已编码的视频数据。具体地,源设备100经由通信介质120向目的地设备110提供视频数据。源设备100和目的地设备110可以包括台式计算机、笔记本(即,膝上型)计算机、平板计算机、机顶盒、移动电话、可穿戴设备(例如智能手表或可穿戴相机)、电视、相机、显示设备、车载设备、虚拟现实(virtual reality,VR)设备、增强现实(Augmented reality,AR)设备、混合现实(mixed reality,MR)设备、数字媒体播放器、视频游戏控制台、视频会议设备、视频流式传输设备、广播接收器设备、广播发射器设备、航天器、飞机、机器人、卫星等任意一种或多种。As shown in Figure 1, the encoding/decoding system 10 includes a source device 100, which provides encoded video data to be decoded and displayed by the destination device 110. Specifically, the source device 100 provides video data to the destination device 110 via a communication medium 120. The source device 100 and the destination device 110 may include any one or more of the following: desktop computer, laptop computer, tablet computer, set-top box, mobile phone, wearable device (e.g., smartwatch or wearable camera), television, camera, display device, in-vehicle device, virtual reality (VR) device, augmented reality (AR) device, mixed reality (MR) device, digital media player, video game console, video conferencing equipment, video streaming equipment, broadcast receiver equipment, broadcast transmitter equipment, spacecraft, aircraft, robot, satellite, etc.

在图1的示例中,源设备100包括数据源101、存储器102、编码器200以及输出接口104。目的地设备110包括输入接口111、解码器300、存储器113和显示设备114。源设备100表示视频编码设备的示例,而目的地设备110表示视频解码设备的示例。在其他示例中,源设备100和目的地设备110可以不包括图1中的部分组件,或者也可以包括图1以外的其他组件。例如,源设备100可以从外部数据源(诸如外部相机)接收视频数据。同样,目的地设备110可以与外部显示设备接口连接,而不包括集成的显示设备。再例如,存储器102、存储器113可以是外置的存储器。In the example of Figure 1, source device 100 includes a data source 101, memory 102, encoder 200, and output interface 104. Destination device 110 includes an input interface 111, decoder 300, memory 113, and display device 114. Source device 100 represents an example of a video encoding device, while destination device 110 represents an example of a video decoding device. In other examples, source device 100 and destination device 110 may not include some of the components shown in Figure 1, or they may include components other than those shown in Figure 1. For example, source device 100 may receive video data from an external data source (such as an external camera). Similarly, destination device 110 may interface with an external display device instead of including an integrated display device. As another example, memory 102 and memory 113 may be external memories.

虽然图1将源设备100和目的地设备110绘示为单独的设备,但在一些示例中,二者也可以集成在一个设备中。在此类实施例中,可以使用相同硬件或软件,或使用单独的硬件或软件,或其任何组合来实施源设备100对应的功能以及目的地设备110对应的功能。Although Figure 1 illustrates the source device 100 and the destination device 110 as separate devices, in some examples, they may be integrated into a single device. In such embodiments, the same hardware or software, separate hardware or software, or any combination thereof may be used to implement the functionality corresponding to the source device 100 and the functionality corresponding to the destination device 110.

在一些示例中,源设备100和目的地设备110可以进行单向视频传输或双向视频传输。如果是双向视频传输,则源设备100和目的设备110可以以基本对称的方式操作,即源设备100和目的地设备110中的每一个都包括编码器和解码器。In some examples, source device 100 and destination device 110 can perform unidirectional or bidirectional video transmission. If it is bidirectional video transmission, source device 100 and destination device 110 can operate in a substantially symmetrical manner, that is, each of source device 100 and destination device 110 includes an encoder and a decoder.

数据源101表示视频数据的源(即,原始的、未编码的视频数据)并且向编码器200提供包含视频数据的连续的图片,编码器200对图片的数据进行编码。源设备100的数据源101可以包括视频采集设备(诸如视频相机)、包含先前采集的原始视频的视频存档或用于从视频内容提供者接收视频的视频馈入接口(feed interface)。作为替代,数据源101可以生成基于计算机图形的数据作为源视频,或者对实时视频、存档视频和计算机生成的视频进行组合。在这些情况下,编码器200对所采集的、预采集的或计算机生成的视频数据进行编码。编码器200可以将图片从所接收的顺序(有时被称为“显示顺序″)重新按照编码顺序布置。编码器200可以生成包括已编码的视频数据的比特流。源设备100随后可以经由输出接口104将已编码的视频数据输出到通信介质120上,以供例如目的地设备110的输入接口111接收或检索。Data source 101 represents the source of video data (i.e., raw, unencoded video data) and provides encoder 200 with a series of images containing video data, which encoder 200 encodes. Data source 101 of source device 100 may include video acquisition devices (such as video cameras), video archives containing previously acquired raw video, or video feed interfaces for receiving video from video content providers. Alternatively, data source 101 may generate computer graphics-based data as source video, or combine live video, archived video, and computer-generated video. In these cases, encoder 200 encodes the acquired, pre-acquired, or computer-generated video data. Encoder 200 may rearrange the images from the received order (sometimes referred to as the "display order") according to the encoded order. Encoder 200 may generate a bitstream including the encoded video data. Source device 100 may then output the encoded video data to communication medium 120 via output interface 104 for reception or retrieval, for example, by input interface 111 of destination device 110.

源设备100的存储器102和目的地设备110的存储器113表示通用存储器。在一些示例中,存储器102可以存储来自数据源101的原始视频数据,存储器113可以存储来自解码器300的已解码的视频数据。附加地或替代地,存储器102、113可以分别存储能由例如编码器200和解码器300执行的软件指令。尽管在此示例中存储器102和存储器113与编码器200和解码器300被分开地示出,但应理解,编码器200和解码器300还可以包括用于功能上类似或等同目的的内部存储器。若编码器200和解码器300部署在同一个硬件设备上,存储器102和存储器113可以是同一个存储器。此外,存储器102、113可以存储例如从编码器200输出且被输入到解码器300的已编码的视频数据。在一些示例中,存储器102、113的部分可以被分配为一个或多个视频缓冲器,例如用于存储原始的、已解码的或已编码的视频数据。The memory 102 of the source device 100 and the memory 113 of the destination device 110 represent general-purpose memory. In some examples, memory 102 may store raw video data from data source 101, and memory 113 may store decoded video data from decoder 300. Additionally or alternatively, memories 102 and 113 may respectively store software instructions executable by, for example, encoder 200 and decoder 300. Although memories 102 and 113 are shown separately from encoder 200 and decoder 300 in this example, it should be understood that encoder 200 and decoder 300 may also include internal memory for functionally similar or equivalent purposes. If encoder 200 and decoder 300 are deployed on the same hardware device, memories 102 and 113 may be the same memory. Furthermore, memories 102 and 113 may store, for example, encoded video data output from encoder 200 and input to decoder 300. In some examples, portions of memories 102 and 113 may be allocated as one or more video buffers, for example, to store raw, decoded, or encoded video data.

在一些示例中,源设备100可以将已编码的数据从输出接口104输出到存储器113。类似地,目的地设备110可以经由输入接口111从存储器113访问已编码的数据。存储器113或存储器102可以包括各种分布式或本地访问的数据存储介质中的任何一种,诸如硬驱动器、蓝光光盘、数字多功能光盘(Digital Versatile Disc,DVD)、只读光盘(Compact Disc Read-Only Memory,CD-ROM)、闪存、易失性或非易失性存储器或者用于存储已编码的视频数据的任何其他合适的数字存储介质。In some examples, source device 100 can output encoded data from output interface 104 to memory 113. Similarly, destination device 110 can access encoded data from memory 113 via input interface 111. Memory 113 or memory 102 can include any of a variety of distributed or locally accessed data storage media, such as hard drives, Blu-ray discs, digital versatile discs (DVDs), compact disc read-only memory (CD-ROMs), flash memory, volatile or non-volatile memory, or any other suitable digital storage medium for storing encoded video data.

输出接口104可以包括能够将已编码的视频数据从源设备100发送至目的地设备110的任何类型的介质或设备。例如,输出接口104可以包括被配置为将已编码的视频数据从源设备100直接实时发送至目的地设备110的发送器或收发器,例如天线。已编码的视频数据可以根据无线通信协议的通信标准被调制,并且被发送至目的地设备110。Output interface 104 may include any type of medium or device capable of transmitting encoded video data from source device 100 to destination device 110. For example, output interface 104 may include a transmitter or transceiver, such as an antenna, configured to transmit encoded video data directly from source device 100 to destination device 110 in real time. The encoded video data may be modulated according to the communication standards of a wireless communication protocol and transmitted to destination device 110.

通信介质120可以包括瞬时介质,诸如无线广播或有线网络传输。例如,通信介质120可以包括射频(radio frequency,RF)频谱或者一个或多个物理传输线(例如,电缆)。通信介质120可以形成基于分组的网络(诸如局域网、广域网或诸如互联网的全球网络)的一部分。通信介质120也可以采用存储介质(例如,非暂时性存储介质)的形式,诸如硬盘、闪存驱动器、压缩盘、数字视频盘、蓝光光盘、易失性或非易失性存储器或用于存储已编码的视频数据的任何其它合适的数字存储介质。Communication medium 120 may include transient media, such as wireless broadcasting or wired network transmission. For example, communication medium 120 may include radio frequency (RF) spectrum or one or more physical transmission lines (e.g., cables). Communication medium 120 may form part of a packet-based network (such as a local area network, a wide area network, or a global network such as the Internet). Communication medium 120 may also take the form of a storage medium (e.g., a non-transitory storage medium), such as a hard disk, flash drive, compact disc, digital video disc, Blu-ray disc, volatile or non-volatile memory, or any other suitable digital storage medium for storing encoded video data.

在一些实施方式中,通信介质120可以包括路由器、交换机、基站或可以用于促进从源设备100到目的地设备110的通信的任何其它设备。例如,服务器(未示出)可以从源设备100接收已编码的视频并将该已编码的视频数据提供给目的地设备110,例如,经由网络传输提供给目的地设备110。该服务器可以包括(例如,用于网站的)web服务器、被配置为提供文件传输协议服务(诸如文件传输协议(File Transfer Protocol,FTP)或单向文件传输(File Delivery Over Unidirectional Transport,FLUTE)协议)的服务器、内容递送网络(content delivery network,CDN)设备、超文本传输协议(Hypertext Transfer Protocol,HTTP)服务器、多媒体广播多播服务(Multimedia Broadcast Multicast Services,MBMS)或增强型MBMS(evolved Multimedia Broadcast Multicast Service,eMBMS)服务器或网络附属存储(Network-attached storage,NAS)设备等。服务器可以实施一个或多个HTTP流式传输协议,诸如运动图像专家组(Moving Picture Experts Group,MPEG)媒体传输(MPEG Media Transport,MMT)协议、基于HTTP的动态自适应流式传输(Dynamic Adaptive Streaming over HTTP,DASH)协议、HTTP实时流式传输(HTTP Live Streaming,HLS)协议或实时流式传输协议(Real Time Streaming Protocol,RTSP)等。In some implementations, the communication medium 120 may include a router, switch, base station, or any other device that can be used to facilitate communication from source device 100 to destination device 110. For example, a server (not shown) may receive encoded video from source device 100 and provide the encoded video data to destination device 110, for example, via network transmission. The server may include (for example, a web server for a website), a server configured to provide file transfer protocol services (such as File Transfer Protocol (FTP) or File Delivery Over Unidirectional Transport (FLUTE) protocol), a content delivery network (CDN) device, a Hypertext Transfer Protocol (HTTP) server, a Multimedia Broadcast Multicast Services (MBMS) or Evolved Multimedia Broadcast Multicast Service (eMBMS) server, or a Network-attached storage (NAS) device, etc. The server can implement one or more HTTP streaming protocols, such as Moving Picture Experts Group (MPEG) Media Transport (MMT), Dynamic Adaptive Streaming over HTTP (DASH), HTTP Live Streaming (HLS), or Real Time Streaming Protocol (RTSP).

目的地设备110可以从服务器访问已编码的视频数据,例如通过用于访问被存储在服务器上的已编码的视频数据的无线信道(例如,Wi-Fi连接)或有线连接(例如,数字订户线(Digital subscriber line,DSL)、电缆调制解调器等)。Destination device 110 can access encoded video data from a server, for example, via a wireless channel (e.g., Wi-Fi connection) or a wired connection (e.g., Digital Subscriber Line (DSL), Cable Modem, etc.) for accessing encoded video data stored on the server.

输出接口104和输入接口111可以表示无线发送器/接收器、调制解调器、有线联网组件(例如,以太网卡)、根据电气与电子工程师学会(Institute of Electrical and Electronics Engineers,IEEE)802.11标准或IEEE 802.15标准(例如,ZigBeeTM)、蓝牙标准等)操作的无线通信组件或者其他物理组件。在输出接口104和输入接口111包括无线组件的示例中,输出接口104和输入接口111可以被配置为根据WIFI、以太网、蜂窝网络(诸如第四代移动通信技术(4th Generation Mobile Communication Technology,4G)、LTE(长期演进)、高级LTE、第五代移动通信技术(5th Generation Mobile Communication Technology,5G)、第六代移动通信技术(6th Generation Mobile Communication Technology,6G)等)来传递数据,诸如已编码的视频数据。Output interface 104 and input interface 111 may represent a wireless transmitter/receiver, a modem, a wired networking component (e.g., an Ethernet card), a wireless communication component operating according to the Institute of Electrical and Electronics Engineers (IEEE) 802.11 standard or IEEE 802.15 standard (e.g., ZigBee™), Bluetooth standard, or other physical components. In the example where output interface 104 and input interface 111 include wireless components, output interface 104 and input interface 111 may be configured to transmit data, such as encoded video data, according to Wi-Fi, Ethernet, cellular networks (such as 4th Generation Mobile Communication Technology (4G), LTE (Long Term Evolution), Advanced LTE, 5th Generation Mobile Communication Technology (5G), 6th Generation Mobile Communication Technology (6G), etc.).

本申请实施例提供的技术可以被应用于支持诸如以下一种或多种多媒体应用中的视频编解码:视频会议、空中电视广播、有线电视传输、卫星电视传输、互联网流式传输视频传输、被编码到数据存储介质上的数字视频、对存储在数据存储介质上的数字视频的解码或者其他应用。The technology provided in this application can be applied to support video encoding and decoding in one or more multimedia applications such as video conferencing, over-the-air television broadcasting, cable television transmission, satellite television transmission, internet streaming video transmission, digital video encoded onto a data storage medium, decoding of digital video stored on a data storage medium, or other applications.

目的地设备110的输入接口111从通信介质120接收已编码的视频比特流(bitstream)。已编码的视频比特流可以包括语法元素和已编码的数据单元(例如序列、图片组、图片、切片、块等),其中语法元素用于对已编码的数据单元进行解码,得到已解码的视频数据。显示设备114向用户显示已解码的视频数据。显示设备114可以包括阴极射线管(Cathode ray tube,CRT)、液晶显示器(liquid-crystal display,LCD)、等离子显示器、有机发光二极管(organic light-emitting diode,OLED)显示器或其他类型的显示设备。The input interface 111 of the destination device 110 receives an encoded video bitstream from the communication medium 120. The encoded video bitstream may include syntax elements and encoded data units (e.g., sequences, image groups, images, slices, blocks, etc.), where the syntax elements are used to decode the encoded data units to obtain decoded video data. The display device 114 displays the decoded video data to the user. The display device 114 may include a cathode ray tube (CRT), a liquid crystal display (LCD), a plasma display, an organic light-emitting diode (OLED) display, or other types of display devices.

编码器200和解码器300可以被实施为各种处理电路中的一个或多个,该处理电路可以包括微处理器、数字信号处理器(Digital Signal Processors,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列(Field Programmable Gate Arrays,FPGA)、分立逻辑、硬件或其任何组合。当所述技术全部或部分地被实施在软件中时,设备可以将用于软件的指令存储在适当的非暂态计算机可读存储介质中,并且使用一个或多个处理器在硬件中执行指令以执行本申请实施例提供的技术。The encoder 200 and decoder 300 can be implemented as one or more of various processing circuits, which may include microprocessors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), discrete logic, hardware, or any combination thereof. When the technology is implemented wholly or partially in software, the device may store instructions for the software in a suitable non-transitory computer-readable storage medium and use one or more processors to execute the instructions in hardware to perform the technology provided in the embodiments of this application.

编码器200和解码器300可以在如下视频编解码标准的基础上进行处理:H.263、H.264、H.265(也被称为高效视频编解码(High Efficiency Video Coding,HEVC))、H.266(也被称为多功能视频编解码(Versatile Video Coding,VVC)、第二代动态影像专家小组(Moving Picture Experts Group 2,MPEG-2)、MPEG-4、VP8、VP9、第一代开放媒体联盟视频(Alliance for Open Media Video 1,AV1)、第一代音视频编解码标准(Audio Video coding Standard 1,AVS1)、AVS2、AVS3或下一代视频标准协议,本申请实施例不做具体限定。The encoder 200 and decoder 300 can process based on the following video codec standards: H.263, H.264, H.265 (also known as High Efficiency Video Coding (HEVC)), H.266 (also known as Versatile Video Coding (VVC), Moving Picture Experts Group 2 (MPEG-2), MPEG-4, VP8, VP9, Alliance for Open Media Video 1 (AV1), Audio Video Coding Standard 1 (AVS1), AVS2, AVS3, or next-generation video standard protocols. This application does not specifically limit the implementation of these protocols.

通常,编码器200和解码器300可以执行图片的基于块的编解码。术语“块″通常是指包括要处理的(例如,已编码的、已解码的或在编码或解码过程中以其他方式使用的)数据的结构。例如,块可以包括亮度或色度数据的样本的二维矩阵。例如,编码器200和解码器300可以对YUV格式表示的视频数据进行编解码。Typically, encoder 200 and decoder 300 can perform block-based encoding and decoding of images. The term "block" generally refers to a structure that includes data to be processed (e.g., encoded, decoded, or otherwise used during encoding or decoding). For example, a block can include a two-dimensional matrix of samples of luminance or chrominance data. For example, encoder 200 and decoder 300 can encode and decode video data represented in YUV format.

参见图2,该图为本申请实施例提供的编码器200的结构示意图,该编码器200可以是图1中的编码器200。在图2的示例中,编码器200包括存储器201、编码参数确定单元210、残差生成单元202、变换处理单元203、量化单元204、反量化单元205、反变换处理单元206、重建单元207、滤波器单元208、解码图片缓冲器(decoded picture buffer,DPB)209和熵编码单元220。Referring to Figure 2, which is a schematic diagram of the encoder 200 provided in an embodiment of this application, the encoder 200 can be the encoder 200 in Figure 1. In the example of Figure 2, the encoder 200 includes a memory 201, an encoding parameter determination unit 210, a residual generation unit 202, a transform processing unit 203, a quantization unit 204, an inverse quantization unit 205, an inverse transform processing unit 206, a reconstruction unit 207, a filter unit 208, a decoded picture buffer (DPB) 209, and an entropy encoding unit 220.

存储器201可以存储待编码的视频数据,例如,编码器200可以从图1所示的数据源101接收视频数据并存储。在一些示例中,存储器201可以与编码器200的其他组件在同一个芯片上(如图2所示),也可以独立于该些组件所在的芯片。The memory 201 can store video data to be encoded. For example, the encoder 200 can receive and store video data from the data source 101 shown in Figure 1. In some examples, the memory 201 can be on the same chip as other components of the encoder 200 (as shown in Figure 2), or it can be on a separate chip from those components.

编码参数确定单元210包括模式选择单元211、帧间预测单元212和帧内预测单元213。帧间预测单元212用于采用帧间预测模式获取当前块的第一预测块,帧内预测单元213用于采用帧内预测模式获取当前块的第二预测块,模式选择单元211用于根据第一预测块和第二预测块获得目标预测块,并确定最终预测模式。此外,编码参数确定单元210还可以包括其他功能单元,例如用于确定编码单元(coding unit,CU)的划分方式的功能单元、确定CU的残差数据的变换类型或CU的残差数据的量化参数的功能单元等。The coding parameter determination unit 210 includes a mode selection unit 211, an inter-frame prediction unit 212, and an intra-frame prediction unit 213. The inter-frame prediction unit 212 is used to obtain a first prediction block for the current block using an inter-frame prediction mode. The intra-frame prediction unit 213 is used to obtain a second prediction block for the current block using an intra-frame prediction mode. The mode selection unit 211 is used to obtain a target prediction block based on the first and second prediction blocks and determine the final prediction mode. Furthermore, the coding parameter determination unit 210 may also include other functional units, such as functional units for determining the partitioning method of coding units (CUs), functional units for determining the transformation type of the residual data of the CUs, or functional units for determining the quantization parameters of the residual data of the CUs.

为了便于描述和理解,本申请实施例将当前图像中待处理的CU称为当前CU,当前CU中待处理的图像块称为当前块或者待处理图像块,例如在编码中,指当前正在编码的块;在解码中,指当前正在解码的块。For ease of description and understanding, in the embodiments of this application, the CU to be processed in the current image is referred to as the current CU, and the image block to be processed in the current CU is referred to as the current block or the image block to be processed. For example, in encoding, it refers to the block currently being encoded; in decoding, it refers to the block currently being decoded.

帧间预测单元212可以包括运动估计单元以及运动补偿单元。对于当前块的帧间预测,运动估计单元可以执行运动搜索以识别一个或多个参考图片(例如,存储在DPB 209中的一个或多个先前已编解码的图片)中的一个或多个匹配的参考块。Inter-frame prediction unit 212 may include a motion estimation unit and a motion compensation unit. For inter-frame prediction of the current block, the motion estimation unit may perform a motion search to identify one or more matching reference blocks in one or more reference pictures (e.g., one or more previously encoded/decoded pictures stored in DPB 209).

运动估计单元可以形成参考图片中的参考块的位置相对于当前图片中的当前块的位置的一个或多个运动矢量(motion vector,MV)。运动补偿单元可以通过插值来获得运动矢量所指示精度的预测值。The motion estimation unit can generate one or more motion vectors (MVs) representing the position of a reference block in a reference image relative to the position of the current block in the current image. The motion compensation unit can then use interpolation to obtain a predicted value with the precision indicated by the motion vectors.

编码参数确定单元210可以向残差生成单元202提供目标预测块。残差生成单元202接收来自存储器201的当前块的原始未编码视频数据,并计算当前块与目标预测块之间的残差,得到残差块。在一些示例中,可以使用执行二进制减法的一个或多个减法器电路来实施残差生成单元202的功能。The encoding parameter determination unit 210 can provide the target prediction block to the residual generation unit 202. The residual generation unit 202 receives the raw uncoded video data of the current block from the memory 201 and calculates the residual between the current block and the target prediction block to obtain the residual block. In some examples, the function of the residual generation unit 202 can be implemented using one or more subtractor circuits that perform binary subtraction.

作为一种示例,编码参数确定单元210可以向熵编码单元220提供表示编码参数的语法元素以进行编码。编码参数包括CU的划分方式、最终预测模式、CU的残差数据的变换类型或CU的残差数据的量化参数等其中的一种或多种。As an example, the encoding parameter determination unit 210 can provide the entropy encoding unit 220 with syntax elements representing encoding parameters for encoding. The encoding parameters include one or more of the following: the partitioning method of the CU, the final prediction mode, the transformation type of the residual data of the CU, or the quantization parameters of the residual data of the CU.

变换处理单元203对残差生成单元202输出的残差块进行变换从而得到变换系数块,该变换可以包括离散余弦变换(Discrete Cosine Transform,DCT)、整数变换、方向变换或Karhunen-Loeve变换等。在一些示例中,编码器200可以不包括变换处理单元203。The transform processing unit 203 transforms the residual block output by the residual generation unit 202 to obtain a transform coefficient block. This transformation may include Discrete Cosine Transform (DCT), integer transform, direction transform, or Karhunen-Loeve transform, etc. In some examples, the encoder 200 may not include the transform processing unit 203.

量化单元204可以根据与当前块相关联的量化参数(quantization parameter,QP)值来量化变换系数块中的变换系数,以产生量化的变换系数块。Quantization unit 204 can quantize the transform coefficients in the transform coefficient block according to the quantization parameter (QP) value associated with the current block to produce a quantized transform coefficient block.

反量化单元205和反变换处理单元206可以分别对变换系数块进行反量化和反变换,得到重建的残差块。重建单元207可以基于重建的残差块以及由编码参数确定单元210生成的目标预测块来生成对应于当前块的重建块。The inverse quantization unit 205 and the inverse transform processing unit 206 can perform inverse quantization and inverse transform on the transform coefficient block, respectively, to obtain the reconstructed residual block. The reconstruction unit 207 can generate a reconstructed block corresponding to the current block based on the reconstructed residual block and the target prediction block generated by the coding parameter determination unit 210.

滤波器单元208可以对重建块执行一个或多个滤波器操作。例如,滤波器单元208可以是去块效应滤波器(deblocking filter,DBF)、自适应环路滤波器(adaptive loop filter,ALF)、样本自适应偏移量(sample adaptive offset,SAO)滤波器等。在一些示例中,编码器200可以不包括滤波器单元208。Filter unit 208 can perform one or more filter operations on the reconstructed block. For example, filter unit 208 can be a deblocking filter (DBF), an adaptive loop filter (ALF), a sample adaptive offset (SAO) filter, etc. In some examples, encoder 200 may not include filter unit 208.

编码器200将由重建块得到的重建图片存储在DPB 209中。例如,在不需要滤波器单元208的操作的示例中,重建单元207可以将重建块存储至DPB 209。在需要滤波器单元208的操作的示例中,滤波器单元208可以将经滤波的重建块存储至DPB 209。帧间预测单元212从DPB 209中获取重建图片,以对后续待编码的图片的块进行帧间预测。在一些示例中,DPB 209可以替换为其他类型的存储器。Encoder 200 stores the reconstructed image obtained from the reconstructed blocks in DPB 209. For example, in an example where the operation of filter unit 208 is not required, reconstruction unit 207 can store the reconstructed blocks in DPB 209. In an example where the operation of filter unit 208 is required, filter unit 208 can store the filtered reconstructed blocks in DPB 209. Inter-frame prediction unit 212 retrieves the reconstructed image from DPB 209 to perform inter-frame prediction on blocks of subsequent images to be encoded. In some examples, DPB 209 can be replaced with other types of memory.

熵编码单元220可以对编码器200中其他组件的语法元素进行熵编码,输出已编码视频数据。例如,熵编码单元220可以对来自量化单元204的量化的变换系数块进行熵编码。作为另一示例,熵编码单元220可以对来自编码参数确定单元210的语法元素(例如,用于帧间预测的运动信息或用于帧内预测的帧内模式信息)进行熵编码。Entropy coding unit 220 can entropy code the syntax elements of other components in encoder 200 to output encoded video data. For example, entropy coding unit 220 can entropy code the quantized transform coefficient block from quantization unit 204. As another example, entropy coding unit 220 can entropy code the syntax elements (e.g., motion information for inter-frame prediction or intra-frame mode information for intra-frame prediction) from coding parameter determination unit 210.

可以理解的是,图2所述的编码器200的组成只是一种示意,并不构成对本申请实施例的限定。It is understood that the composition of the encoder 200 shown in Figure 2 is only illustrative and does not constitute a limitation on the embodiments of this application.

图3是本申请实施例提供的解码器300的结构示意图,该解码器300可以是图1所述的解码器300。在图3的示例中,解码器300包括编码图片缓冲器(coded picture buffer,CPB)301、熵解码单元302、预测处理单元310、反量化单元303、反变换处理单元304、重建单元305、滤波器单元306和DPB 307。Figure 3 is a schematic diagram of the structure of the decoder 300 provided in an embodiment of this application. The decoder 300 can be the decoder 300 described in Figure 1. In the example of Figure 3, the decoder 300 includes a coded picture buffer (CPB) 301, an entropy decoding unit 302, a prediction processing unit 310, an inverse quantization unit 303, an inverse transform processing unit 304, a reconstruction unit 305, a filter unit 306, and a DPB 307.

熵解码单元302可以从CPB 301接收已编码的视频数据并且对该视频数据进行熵解码以获取语法元素,所述语法元素指示编码参数,编码参数包括CU的划分方式、最终预测模式、CU的残差数据的变换类型或CU的残差数据的量化参数等其中的一种或多种。The entropy decoding unit 302 can receive encoded video data from the CPB 301 and perform entropy decoding on the video data to obtain syntax elements. The syntax elements indicate encoding parameters, including one or more of the following: CU partitioning method, final prediction mode, transformation type of CU residual data, or quantization parameters of CU residual data.

在语法元素包括所述最终预测模式的情况下,预测处理单元310获取最终预测模式。若最终预测模式为帧间预测模式,则可以通过预测处理单元310的帧间预测单元311获取当前CU的预测块;若最终预测模式为帧内预测模式,则可以通过预测处理单元310的帧内预测单元312获取当前CU的预测块。在一些示例中,预测处理单元310还可以包括用于根据其他预测模式执行预测功能的单元。When the syntax element includes the final prediction mode, the prediction processing unit 310 obtains the final prediction mode. If the final prediction mode is an inter-frame prediction mode, the prediction block of the current CU can be obtained through the inter-frame prediction unit 311 of the prediction processing unit 310; if the final prediction mode is an intra-frame prediction mode, the prediction block of the current CU can be obtained through the intra-frame prediction unit 312 of the prediction processing unit 310. In some examples, the prediction processing unit 310 may also include a unit for performing prediction functions according to other prediction modes.

CPB 301可以从如图1所示的通信介质120获取已编码的视频数据并进行存储。DPB 307用于存储已解码的图片。可选的,CPB 301和DPB 307也可以替换为其他类型的存储器,本申请不做具体限定。在一些示例中,CPB 301可以与解码器300的其他组件在同一个芯片上(如图所示),也可以独立于该些组件所在的芯片。CPB 301 can acquire and store encoded video data from the communication medium 120 shown in Figure 1. DPB 307 is used to store decoded images. Optionally, CPB 301 and DPB 307 can be replaced with other types of memory, which are not specifically limited in this application. In some examples, CPB 301 can be on the same chip as other components of decoder 300 (as shown in the figure), or it can be on a separate chip from the chip where those components are located.

解码器300可以单独地对每个块执行重建操作。熵解码单元302可以对量化的变换系数的语法元素以及变换信息(例如QP或变换模式指示)进行熵解码,得到量化的变换系数。通过反量化单元303对量化的变换系数进行反量化,得到包括变换系数的变换系数块。通过反变换处理单元304对变换系数块进行反变换以生成与当前块相对应的残差块,该反变换为上述变换的逆向操作。Decoder 300 can perform reconstruction operations on each block individually. Entropy decoding unit 302 can entropy decode the syntax elements and transform information (e.g., QP or transform mode indication) of the quantized transform coefficients to obtain the quantized transform coefficients. Dequantization unit 303 dequantizes the quantized transform coefficients to obtain a transform coefficient block including the transform coefficients. Inverse transform processing unit 304 performs an inverse transform on the transform coefficient block to generate a residual block corresponding to the current block; this inverse transform is the reverse operation of the above transform.

重建单元305可以根据预测块和残差块来重建当前块。例如,重建单元305可以将残差块的样本加到预测块的对应样本来重建当前块。Reconstruction unit 305 can reconstruct the current block based on the prediction block and the residual block. For example, reconstruction unit 305 can add samples from the residual block to the corresponding samples from the prediction block to reconstruct the current block.

滤波器单元306可以对重建块执行一个或多个滤波器操作。例如,滤波器单元306的类型可以参考滤波器单元208的类型,此处不再赘述。在一些示例中,滤波器单元306的操作可以跳过。Filter unit 306 can perform one or more filter operations on the reconstructed block. For example, the type of filter unit 306 can be referenced to the type of filter unit 208, and will not be described again here. In some examples, the operations of filter unit 306 can be skipped.

解码器300可以将由重建块得到的重建图片存储在DPB 307中。例如,在不执行滤波器单元306的操作的示例中,重建单元305可以将重建块存储到DPB 307。在执行滤波器单元306的操作的示例中,滤波器单元306可以将经滤波的重建块存储到DPB 307。解码器300可以从DPB 307输出已解码的图片(例如,已解码的视频)以用于对显示设备(诸如图1的显示设备114)的后续呈现。Decoder 300 can store the reconstructed image obtained from the reconstructed blocks in DPB 307. For example, in an example where filter unit 306 is not operated, reconstruction unit 305 can store the reconstructed blocks in DPB 307. In an example where filter unit 306 is operated, filter unit 306 can store the filtered reconstructed blocks in DPB 307. Decoder 300 can output the decoded image (e.g., decoded video) from DPB 307 for subsequent rendering on a display device (such as display device 114 of FIG. 1).

下面结合附图介绍本申请实施例提供的损失信息计算方法。本申请实施例提供的损失信息计算方法可以由编码端执行,例如图1或图2所示的编码器200。本申请实施例提供的损失信息计算方法可以由解码端执行,例如图1或图3所述的解码器300。其中,编码端和解码端可以由软件、硬件或其组合实现,当其由硬件实现时,编码端可以被称为编码端设备或视频编码设备,解码端可以被称为解码端设备或视频解码设备。The loss information calculation method provided in the embodiments of this application is described below with reference to the accompanying drawings. The loss information calculation method provided in the embodiments of this application can be executed by an encoding end, such as the encoder 200 shown in Figure 1 or Figure 2. The loss information calculation method provided in the embodiments of this application can be executed by a decoding end, such as the decoder 300 shown in Figure 1 or Figure 3. The encoding end and decoding end can be implemented by software, hardware, or a combination thereof. When implemented by hardware, the encoding end can be referred to as an encoding end device or a video encoding device, and the decoding end can be referred to as a decoding end device or a video decoding device.

请参见图4,图4是本申请实施例提供的一种损失信息计算方法的流程图,如图4所示,包括如下步骤:Please refer to Figure 4, which is a flowchart of a loss information calculation method provided in an embodiment of this application. As shown in Figure 4, it includes the following steps:

步骤401、获取神经网络的输出图像的目标频段分量。Step 401: Obtain the target frequency band component of the output image of the neural network.

其中,上述神经网络可以是卷积神经网络、深度卷积神经网络、前馈型神经网络或者循环神经网络等。The aforementioned neural network can be a convolutional neural network, a deep convolutional neural network, a feedforward neural network, or a recurrent neural network, etc.

上述神经网络为图像相关的神经网络,例如:用于处理视频编解码领域的图像相关任务的神经网络,如:上述神经网络为用于超分辨率(Super-Resolution,SR)的神经网络,或者上述神经网络为用于图像增强的神经网络,或者上述神经网络为用于环路滤波的神经网络等。The aforementioned neural network is an image-related neural network, such as a neural network used to process image-related tasks in the field of video encoding and decoding, such as a neural network used for super-resolution (SR), a neural network used for image enhancement, or a neural network used for loop filtering, etc.

上述输出图像为上述神经网络输出的图像,即经过神经网络处理得到的图像。The above output image is the image output by the neural network, that is, the image obtained after processing by the neural network.

需要说明的是,本申请实施例中对神经网络的类型或结构不作限定,且对于神经网络的功能也不作限定,具体可以是图像或视频编解码领域相关的神经网络。It should be noted that the type or structure of the neural network is not limited in the embodiments of this application, nor is the function of the neural network limited. Specifically, it can be a neural network related to the field of image or video encoding and decoding.

本申请实施例中,目标频段分量为目标频段的分量,具体可以是低频分量、高频分量或中频分量。In this embodiment of the application, the target frequency band component is a component of the target frequency band, which may specifically be a low-frequency component, a high-frequency component, or a mid-frequency component.

步骤402、获取所述输出图像对应的目标图像的目标频段分量。Step 402: Obtain the target frequency band component of the target image corresponding to the output image.

上述输出图像对应的目标图像可以是原始图像,具体可以是上述神经网络的目标趋近的图像,即神经网络学习以该目标图像为目标进行学习,或者上述目标图像也可以称作神经网络学习过程中的正图像样本,或者真实图像样本等。且上述目标图像与上述输出图像的尺寸相同。The target image corresponding to the output image can be the original image, specifically an image that approximates the target of the neural network, meaning the neural network learns using this target image as its target. Alternatively, the target image can be referred to as a positive image sample or a real image sample during the neural network learning process. Furthermore, the target image and the output image have the same size.

需要说明的是,本申请实施例中对步骤401和402的执行顺序不作限定,可以是先执行步骤401,再执行步骤402,或者先执行步骤402再执行步骤401,或者步骤401和步骤402同时执行。It should be noted that the execution order of steps 401 and 402 is not limited in this embodiment. Step 401 can be executed first, followed by step 402, or step 402 can be executed first, followed by step 401, or steps 401 and 402 can be executed simultaneously.

步骤403、计算所述神经网络的损失信息,所述损失信息包括所述输出图像的目标频段分量和所述目标图像的目标频段分量之间的损失信息。Step 403: Calculate the loss information of the neural network, the loss information including the loss information between the target frequency band component of the output image and the target frequency band component of the target image.

上述计算神经网络的损失信息可以是计算输出图像的目标频段分量和所述目标图像的目标频段分量之间的损失信息。The loss information of the aforementioned computational neural network can be the loss information between the target frequency band component of the output image and the target frequency band component of the target image.

在一些实施方式中,计算损失信息的损失函数可以是损失函数L1(即绝对误差损失函数)、损失函数L2(即均方误差损失函数)或结构相似性指数(Structural Similarity Index measure,SSIM)损失函数等。In some implementations, the loss function used to calculate the loss information can be loss function L1 (i.e., absolute error loss function), loss function L2 (i.e., mean squared error loss function), or structural similarity index (SSIM) loss function, etc.

需要说明的是,本申请实施例中对计算损失信息采用的损失函数不作限定。It should be noted that the loss function used to calculate the loss information is not limited in the embodiments of this application.

在本申请实施例中,由于损失信息包括所述输出图像的目标频段分量和所述目标图像的目标频段分量之间的损失信息,这样可以实现获取针对目标频段的损失信息,使得神经网络的损失信息更加有针对性,有利于提升神经网络的学习效果。In this embodiment of the application, since the loss information includes the loss information between the target frequency band component of the output image and the target frequency band component of the target image, it is possible to obtain loss information for the target frequency band, making the loss information of the neural network more targeted and beneficial to improving the learning effect of the neural network.

本申请实施例中,上述损失信息计算方法可以由编码端执行,即编码端执行上述各步骤,一些实施方式中,上述损失信息计算方法也可以由编码端之外的其他电子设备执行,如由执行神经网络学习的电子设备执行上述方法中的各步骤,该电子设备具体可以是计算机、服务器、手机等电子设备。In this embodiment of the application, the above-mentioned loss information calculation method can be executed by the encoding end, that is, the encoding end executes the above steps. In some embodiments, the above-mentioned loss information calculation method can also be executed by other electronic devices other than the encoding end, such as by an electronic device that performs neural network learning to execute the steps in the above method. The electronic device can specifically be a computer, server, mobile phone or other electronic devices.

本申请实施例中,步骤401至步骤403可以是配置在上述神经网络中执行,也可以是采用上述神经网络之外的算法或程序执行,对此不作限定。In the embodiments of this application, steps 401 to 403 may be configured to be executed in the aforementioned neural network, or may be executed using an algorithm or program other than the aforementioned neural network, and there is no limitation on this.

作为一种可选的实施方式,上述方法还包括如下步骤:As an optional implementation, the above method further includes the following steps:

基于上述损失信息对上述神经网络进行学习。The neural network is then trained based on the aforementioned loss information.

其中,本申请实施例中,神经网络的学习也可以称作神经网络的训练。In this embodiment of the application, the learning of the neural network can also be referred to as the training of the neural network.

通过上述学习可以提高神经网络对上述目标频段的关注程度,进而提高神经网络对于上述目标频段关联的任务的处理性能,如上述目标频段为高频,则可以提高神经网络对高频信息的关注程度,更直接地关注到图像中的高频分量,这样对于超分辨率任务,可以提高低分辨率到高分辨率的转换效果。The above learning can improve the neural network's attention to the target frequency band, thereby improving the neural network's processing performance for tasks related to the target frequency band. If the target frequency band is high frequency, the neural network can pay more attention to high frequency information and focus more directly on the high frequency components in the image. This can improve the conversion effect from low resolution to high resolution for super-resolution tasks.

需要说明的是,本申请实施例中并不限定执行上述步骤,例如:在一些实施方式中,也可以是在得到上述损失信息之后,则将上述损失信息发送给其他设备,由其他设备基于上述损失信息对上述神经网络进行学习。It should be noted that the embodiments of this application are not limited to performing the above steps. For example, in some implementations, after obtaining the above loss information, the loss information may be sent to other devices, and the other devices may learn the above neural network based on the loss information.

在一些实施方式中,上述方法还可以包括利用学习后的上述神经网络执行SR或图像增强等视频编解码相关的图像任务,以提高视频编解码性能。In some implementations, the above method may further include using the learned neural network to perform video encoding/decoding related image tasks such as SR or image enhancement to improve video encoding/decoding performance.

作为一种可选的实施方式,上述获取神经网络的输出图像的目标频段的分量,包括:As an optional implementation, the acquisition of the target frequency band components of the output image of the neural network includes:

对神经网络的输出图像进行频域变换,得到所述输出图像的频域信息,并从所述输出图像的频域信息中获取目标频段分量。The output image of the neural network is transformed in the frequency domain to obtain the frequency domain information of the output image, and the target frequency band component is obtained from the frequency domain information of the output image.

其中,上述对神经网络的输出图像进行频域变换可以是采用空域到频域的变换手段对输出图像进行频域变换得到输出图像的频域信息。上述输出图像的频域信息包括输出图像中各频段的分量。The frequency domain transformation of the neural network output image described above can be achieved by using a spatial-to-frequency domain transformation technique to obtain the frequency domain information of the output image. This frequency domain information includes the components of each frequency band in the output image.

上述从输出图像的频域信息中获取目标频段分量可以是直接在输出图像的频域信息中提取目标频段分量。The above-mentioned method of obtaining the target frequency band component from the frequency domain information of the output image can be achieved by directly extracting the target frequency band component from the frequency domain information of the output image.

其中,上述目标频段分量可以理解为图像的频域信息中目标频段的信息。The aforementioned target frequency band component can be understood as the target frequency band information in the frequency domain information of the image.

该实施方式中,可以实现频域变换获取输出图像的目标频段分量,这样对于输出图像不包括频域信息的神经网络也可以获取目标频段对应的损失信息,以提高这些神经网络的性能。In this embodiment, frequency domain transformation can be implemented to obtain the target frequency band component of the output image. This allows neural networks that do not include frequency domain information in the output image to obtain the loss information corresponding to the target frequency band, thereby improving the performance of these neural networks.

需要说明的是,本申请实施例中并不限定通过上述频域变换获取上述目标频段分量,例如:在一些实施方式中,上述神经网络的输出图像本身就携带有频域信息,这样可以直接在输出图像中提取目标频段分量。It should be noted that the embodiments of this application are not limited to obtaining the target frequency band component through the above frequency domain transformation. For example, in some embodiments, the output image of the above neural network itself carries frequency domain information, so the target frequency band component can be directly extracted from the output image.

作为一种可选的实施方式,所述获取所述输出图像对应的目标图像的目标频段分量,包括:As an optional implementation, obtaining the target frequency band component of the target image corresponding to the output image includes:

对所述输出图像对应的目标图像进行频域变换,得到所述目标图像的频域信息,并从所述目标图像的频域信息中获取目标频段分量。The target image corresponding to the output image is subjected to frequency domain transformation to obtain the frequency domain information of the target image, and the target frequency band component is obtained from the frequency domain information of the target image.

其中,上述对目标图像进行频域变换可以是采用空域到频域的变换手段对目标图像进行频域变换得到目标图像的频域信息。上述目标图像的频域信息包括目标图像中各频段的分量。The aforementioned frequency domain transformation of the target image can be performed using a spatial-to-frequency domain transformation method to obtain the frequency domain information of the target image. This frequency domain information includes the components of each frequency band in the target image.

上述从目标图像的频域信息中获取目标频段分量可以是直接在目标图像的频域信息中提取目标频段分量。The above-mentioned method of obtaining the target frequency band component from the frequency domain information of the target image can be achieved by directly extracting the target frequency band component from the frequency domain information of the target image.

该实施方式中,可以实现频域变换获取目标图像的目标频段分量,这样对于不包括频域信息的目标图像也可以获取目标频段对应的损失信息,以降低损失信息计算的局限性。In this embodiment, frequency domain transformation can be used to obtain the target frequency band component of the target image. This allows loss information corresponding to the target frequency band to be obtained even for target images that do not include frequency domain information, thereby reducing the limitations of loss information calculation.

需要说明的是,本申请实施例中并不限定通过上述频域变换获取上述目标图像的目标频段分量,例如:在一些实施方式中,可以直接从其他设备获取目标图像的频率信息。It should be noted that the embodiments of this application are not limited to obtaining the target frequency band components of the target image through the above frequency domain transformation. For example, in some embodiments, the frequency information of the target image can be obtained directly from other devices.

可选地,所述频域变换包括如下一项:Optionally, the frequency domain transformation includes the following:

离散小波变换(Discrete Wavelet Transform,DWT)、离散余弦变换(Discrete Cosine Transform,DCT)、傅里叶变换(Discrete Fourier Transform,DFT)。Discrete Wavelet Transform (DWT), Discrete Cosine Transform (DCT), and Discrete Fourier Transform (DFT).

该实施方式中,支持多种频域变换,以提高损失计算的灵活性。In this implementation, multiple frequency domain transformations are supported to improve the flexibility of loss calculation.

在一些实施方式中,上述频域变换的阶数可以根据复杂度的需要进行设定,以DWT的1阶变换为例,如图5所示,数字图像表示输出图像或目标图像,通过行分解、列分解、列重构、行重构得到的高频分量和低频分量,具体如图5中的H和L。In some implementations, the order of the frequency domain transform can be set according to the complexity requirements. Taking the first-order transform of DWT as an example, as shown in Figure 5, the digital image represents the output image or target image. The high-frequency components and low-frequency components are obtained through row decomposition, column decomposition, column reconstruction, and row reconstruction, specifically H and L in Figure 5.

作为一种可选的实施方式,所述目标频段分量包括如下至少一项:As an optional implementation, the target frequency band component includes at least one of the following:

高频分量、中频分量、低频分量。High-frequency components, mid-frequency components, and low-frequency components.

该实施方式中,可以实现计算高频分量、中频分量、低频分量中至少一项关联的损失信息,以使得神经网络可以更好地关注高频分量、中频分量或低频分量,以提高神经网络在图像处理过程中的高频分量、中频分量或低频分量的处理细节,进而提高神经网络的处理性能。In this embodiment, loss information associated with at least one of the high-frequency component, mid-frequency component, and low-frequency component can be calculated, so that the neural network can better focus on the high-frequency component, mid-frequency component, or low-frequency component, thereby improving the processing details of the high-frequency component, mid-frequency component, or low-frequency component in the image processing process and thus improving the processing performance of the neural network.

例如:以高频分量为例,通过计算高频分量的损失信息可以使得神经网络可更直接地关注高频部分的信息,从而提高神经网络从数据中学习高频部分信息的能力,使得网络在高分辨率重建过程中能够补足更多的高频细节信息,以提升图像高频信息的恢复能力。For example, taking high-frequency components as an example, by calculating the loss information of high-frequency components, the neural network can focus more directly on the information of high-frequency parts, thereby improving the neural network's ability to learn high-frequency information from data. This enables the network to supplement more high-frequency detail information during high-resolution reconstruction, thereby improving the image's ability to recover high-frequency information.

在一些实施方式中,在所述目标频段分量包括高频分量、中频分量和低频分量中的一项的情况下,可以得到更加有针对性的损失信息,以高频分量且损失函数为LI为例,上述损失信息可以包括通过如下公式计算的损失信息:
In some implementations, when the target frequency band component includes one of high-frequency components, mid-frequency components, and low-frequency components, more targeted loss information can be obtained. Taking high-frequency components and loss function LI as an example, the loss information can include loss information calculated by the following formula:

其中,上述L1_DWT_loss表示输出图像的高频分量和所述目标图像的高频分量之间的损失信息,n为输出图像和目标图像的高频分量样本点数量,Yi为目标图像的高频分量,f(xi)为输出图像的高频分量。Wherein, L1_DWT_loss represents the loss information between the high-frequency components of the output image and the high-frequency components of the target image, n is the number of high-frequency component sample points of the output image and the target image, Yi is the high-frequency component of the target image, and f( xi ) is the high-frequency component of the output image.

具体示意图可以如图6所示,其中,图6以是神经网络为SR网络进行举例,图6中的SR表示上述输出图像,GT表示上述目标图像。A specific schematic diagram can be shown in Figure 6. In Figure 6, the neural network is SR network as an example. SR in Figure 6 represents the above output image, and GT represents the above target image.

可选地,所述计算所述神经网络的损失信息,包括如下至少一项:Optionally, calculating the loss information of the neural network includes at least one of the following:

计算所述输出图像的高频分量和所述目标图像的高频分量之间的第一损失值;Calculate a first loss value between the high-frequency components of the output image and the high-frequency components of the target image;

计算所述输出图像的中频分量和所述目标图像的中频分量之间的第二损失值;Calculate a second loss value between the mid-frequency components of the output image and the mid-frequency components of the target image;

计算所述输出图像的低频分量和所述目标图像的低频分量之间的第三损失值。Calculate a third loss value between the low-frequency components of the output image and the low-frequency components of the target image.

其中,上述第一损失值、第二损失值和第三损失值可以采用相同或者不同的损失函数进行计算。The first loss value, the second loss value, and the third loss value mentioned above can be calculated using the same or different loss functions.

该实施方式中,可以得到高频分量、中频分量和低频分量中任一项的损失值,或者多项的损失值。其中,在计算一项损失值的情况下,可以使得神经网络可以更好地关注高频分量、中频分量或低频分量,以提高神经网络在图像处理过程中的高频分量、中频分量或低频分量的处理细节,进而提高神经网络的处理性能。在计算上述多项损失值的情况下,使得神经网络可以针对性关注多个频段,提高神经网络从数据中学习多个频段信息的能力,进而提高神经网络性能。In this implementation, a loss value can be obtained for any one of the high-frequency, mid-frequency, and low-frequency components, or multiple loss values. Calculating a single loss value allows the neural network to better focus on the high-frequency, mid-frequency, or low-frequency components, improving the processing detail of these components in image processing and thus enhancing the network's performance. Calculating multiple loss values allows the neural network to selectively focus on multiple frequency bands, improving its ability to learn information from multiple frequency bands in the data and further enhancing its performance.

可选地,在所述目标频段分量包括高频分量、中频分量和低频分量中的至少两项的情况下,所述输出图像的目标频段分量和所述目标图像的目标频段分量之间的损失信息包括:所述第一损失值、所述第二损失值和所述第三损失值中的至少两项的加权平均损失值。Optionally, when the target frequency band component includes at least two of high-frequency components, mid-frequency components, and low-frequency components, the loss information between the target frequency band component of the output image and the target frequency band component of the target image includes: a weighted average loss value of at least two of the first loss value, the second loss value, and the third loss value.

其中,上述第一损失值、所述第二损失值和所述第三损失值中的至少两项的加权平均损失值可以是预先配置或者协议约定每个频段的权重值,将每个频段的损失值与对应的权重相乘,再求平均,从而得到上述加权平均损失值。例如:高频分量、中频分量和低频分量分别对应的权重为权重1、权重2和权重3,上述加权平均损失值可以等于如下一项:The weighted average loss value of at least two of the aforementioned first, second, and third loss values can be a pre-configured or protocol-defined weight value for each frequency band. The loss value of each frequency band is multiplied by its corresponding weight, and then the average is calculated to obtain the aforementioned weighted average loss value. For example, if the weights corresponding to the high-frequency component, mid-frequency component, and low-frequency component are weight 1, weight 2, and weight 3, respectively, the aforementioned weighted average loss value can be equal to one of the following:

(第一损失值ⅹ权重1+第二损失值ⅹ权重2)/2;(First loss value x weight 1 + Second loss value x weight 2) / 2;

(第一损失值ⅹ权重1+第三损失值ⅹ权重3)/2;(First loss value x weight 1 + Third loss value x weight 3) / 2;

(第二损失值ⅹ权重2+第三损失值ⅹ权重3)/2;(Second loss value x weight 2 + Third loss value x weight 3) / 2;

(第一损失值ⅹ权重1+第二损失值ⅹ权重2+第三损失值ⅹ权重3)/2。(First loss value x weight 1 + second loss value x weight 2 + third loss value x weight 3)/2.

该实施方式中,可以实现基于多个频段的损失值信息获取神经网络的损失信息,以使得神经网络可以针对性关注多个频段,提高神经网络从数据中学习多个频段信息的能力,进而提高神经网络性能。且由于是加权平均损失值,这样考虑了各频段的权重,可以使得神经网络关注多个频段更加有针对性,进而提高神经网络从数据中学习多个频段信息的能力,提高神经网络性能。In this implementation, loss information of the neural network can be obtained based on loss values from multiple frequency bands. This allows the neural network to selectively focus on multiple frequency bands, improving its ability to learn information from data across multiple frequency bands and thus enhancing its performance. Furthermore, since it uses a weighted average loss value, the weights of each frequency band are considered, making the neural network's focus on multiple frequency bands more targeted, thereby further improving its ability to learn information from data across multiple frequency bands and enhancing its performance.

需要说明的是,在所述目标频段分量包括高频分量、中频分量和低频分量中的至少两项的情况下,本申请实施例中并不限定通过加权平均得到损失信息,例如:还可以是直接计算至少两项的损失均值。It should be noted that when the target frequency band components include at least two of the high-frequency components, mid-frequency components, and low-frequency components, the embodiments of this application are not limited to obtaining the loss information by weighted averaging. For example, the loss average of at least two components can also be calculated directly.

作为上述一种可选的实施方式,所述输出图像包括:As one optional implementation of the above, the output image includes:

所述神经网络输出的上采样图像。The upsampled image output by the neural network.

该实施方式中,由于输出图像包括神经网络输出的上采样图像,这样上述神经网络部署在解码端的情况下,可以支持编码端压缩的是下采样之后的图像,节省传输带宽。另外,对于SR,解码端解码获得重建的低分辨率图像,经神经网络输出高质量的上采样图像,即原始分辨率图像,以提高图像处理效果。In this embodiment, since the output image includes an upsampled image output by the neural network, when the neural network is deployed at the decoding end, it can support the encoding end to compress the downsampled image, saving transmission bandwidth. Furthermore, for SR, the decoding end decodes the reconstructed low-resolution image, and then outputs a high-quality upsampled image (i.e., the original resolution image) via the neural network to improve image processing performance.

需要说明的是,本申请实施例中并不限定输出图像为上采样图像,例如:在一些实施方式中,也可以未经过采样或者下采样图像。It should be noted that the output image in this application embodiment is not limited to an upsampled image. For example, in some implementations, an unsampled or downsampled image may also be used.

在本申请实施例中,获取神经网络的输出图像的目标频段分量;获取所述输出图像对应的目标图像的目标频段分量;计算所述神经网络的损失信息,所述损失信息包括所述输出图像的目标频段分量和所述目标图像的目标频段分量之间的损失信息。由于损失信息包括所述输出图像的目标频段分量和所述目标图像的目标频段分量之间的损失信息,这样可以实现获取针对目标频段的损失信息,使得神经网络的损失信息更加有针对性,有利于提升神经网络的学习效果。In this embodiment, the target frequency band component of the output image of the neural network is obtained; the target frequency band component of the target image corresponding to the output image is obtained; and the loss information of the neural network is calculated, wherein the loss information includes the loss information between the target frequency band component of the output image and the target frequency band component of the target image. Since the loss information includes the loss information between the target frequency band component of the output image and the target frequency band component of the target image, it is possible to obtain loss information specific to the target frequency band, making the loss information of the neural network more targeted and beneficial to improving the learning effect of the neural network.

本申请实施例提供的损失信息计算方法,执行主体可以为损失信息计算装置。作为一种示例,所述装置可以是电子设备,也可以是电子设备中的部件,例如芯片、电路等。本申请实施例中以损失信息计算装置执行损失信息计算方法为例,说明本申请实施例提供的损失信息计算装置。The loss information calculation method provided in this application can be executed by a loss information calculation device. As an example, the device can be an electronic device or a component within an electronic device, such as a chip or circuit. This application uses the execution of the loss information calculation method by a loss information calculation device as an example to illustrate the loss information calculation device provided in this application.

请参见图7,图7是本申请实施例提供的一种损失信息计算装置的结构图,如图7所示,损失信息计算装置700包括:Please refer to Figure 7, which is a structural diagram of a loss information calculation device provided in an embodiment of this application. As shown in Figure 7, the loss information calculation device 700 includes:

第一获取模块701,用于获取神经网络的输出图像的目标频段分量;The first acquisition module 701 is used to acquire the target frequency band component of the output image of the neural network.

第二获取模块702,用于获取所述输出图像对应的目标图像的目标频段分量;The second acquisition module 702 is used to acquire the target frequency band component of the target image corresponding to the output image;

计算模块703,用于计算所述神经网络的损失信息,所述损失信息包括所述输出图像的目标频段分量和所述目标图像的目标频段分量之间的损失信息。The calculation module 703 is used to calculate the loss information of the neural network, the loss information including the loss information between the target frequency band component of the output image and the target frequency band component of the target image.

可选地,所述第一获取模块701用于对神经网络的输出图像进行频域变换,得到所述输出图像的频域信息,并从所述输出图像的频域信息中获取目标频段分量。Optionally, the first acquisition module 701 is used to perform frequency domain transformation on the output image of the neural network to obtain the frequency domain information of the output image, and to obtain the target frequency band component from the frequency domain information of the output image.

可选地,所述第二获取模块702用于对所述输出图像对应的目标图像进行频域变换,得到所述目标图像的频域信息,并从所述目标图像的频域信息中获取目标频段分量。Optionally, the second acquisition module 702 is used to perform frequency domain transformation on the target image corresponding to the output image to obtain the frequency domain information of the target image, and to obtain the target frequency band component from the frequency domain information of the target image.

可选地,所述频域变换包括如下一项:Optionally, the frequency domain transformation includes the following:

离散小波变换DWT、离散余弦变换DCT、傅里叶变换DFT。Discrete Wavelet Transform (DWT), Discrete Cosine Transform (DCT), and Fourier Transform (DFT).

可选地,所述目标频段分量包括如下至少一项:Optionally, the target frequency band component includes at least one of the following:

高频分量、中频分量、低频分量。High-frequency components, mid-frequency components, and low-frequency components.

可选地,所述计算模块703用于如下至少一项:Optionally, the computing module 703 is used for at least one of the following:

计算所述输出图像的高频分量和所述目标图像的高频分量之间的第一损失值;Calculate a first loss value between the high-frequency components of the output image and the high-frequency components of the target image;

计算所述输出图像的中频分量和所述目标图像的中频分量之间的第二损失值;Calculate a second loss value between the mid-frequency components of the output image and the mid-frequency components of the target image;

计算所述输出图像的低频分量和所述目标图像的低频分量之间的第三损失值;Calculate a third loss value between the low-frequency components of the output image and the low-frequency components of the target image;

可选地,在所述目标频段分量包括高频分量、中频分量和低频分量中的至少两项的情况下,所述输出图像的目标频段分量和所述目标图像的目标频段分量之间的损失信息包括:所述第一损失值、所述第二损失值和所述第三损失值中的至少两项的加权平均损失值。Optionally, when the target frequency band component includes at least two of high-frequency components, mid-frequency components, and low-frequency components, the loss information between the target frequency band component of the output image and the target frequency band component of the target image includes: a weighted average loss value of at least two of the first loss value, the second loss value, and the third loss value.

可选地,所述输出图像包括:Optionally, the output image includes:

所述神经网络输出的上采样图像。The upsampled image output by the neural network.

上述损失信息计算装置有利于提升神经网络的学习效果。The aforementioned loss information calculation device is beneficial for improving the learning effect of neural networks.

本申请实施例提供的损失信息计算装置700能够实现图4的方法实施例实现的各个过程,并达到相同的技术效果,为避免重复,这里不再赘述。The loss information calculation device 700 provided in this application embodiment can implement the various processes implemented in the method embodiment of FIG4 and achieve the same technical effect. To avoid repetition, it will not be described again here.

如图8所示,本申请实施例还提供一种电子设备800,包括处理器801和存储器802,存储器802上存储有可在所述处理器801上运行的程序或指令,例如,该电子设备800为编码端设备时,该程序或指令被处理器801执行时实现上述损失信息计算方法实施例的各个步骤,且能达到相同的技术效果。该电子设备800为解码端设备时,该程序或指令被处理器801执行时实现上述损失信息计算方法实施例的各个步骤,且能达到相同的技术效果,为避免重复,这里不再赘述。可选的,存储器802可以是图1所示实施例中的存储器102或存储器113,处理器801可以实现图1-3所示实施例中的编码器200或解码器300的功能。As shown in Figure 8, this application embodiment also provides an electronic device 800, including a processor 801 and a memory 802. The memory 802 stores programs or instructions that can run on the processor 801. For example, when the electronic device 800 is an encoding device, the program or instructions executed by the processor 801 implement the various steps of the above-described loss information calculation method embodiment and achieve the same technical effect. When the electronic device 800 is a decoding device, the program or instructions executed by the processor 801 implement the various steps of the above-described loss information calculation method embodiment and achieve the same technical effect. To avoid repetition, this will not be described again here. Optionally, the memory 802 can be the memory 102 or memory 113 in the embodiment shown in Figure 1, and the processor 801 can implement the functions of the encoder 200 or decoder 300 in the embodiments shown in Figures 1-3.

本申请实施例还提供一种电子设备,包括:存储器,被配置为存储视频数据;以及处理电路,被配置为实现如上述损失信息计算方法实施例的各个步骤。可选的,存储器可以是图1所示实施例中的存储器102或存储器113,处理电路可以实现图1-3所示实施例中的编码器200或解码器300的功能。This application also provides an electronic device, including: a memory configured to store video data; and a processing circuit configured to implement the various steps of the loss information calculation method embodiments described above. Optionally, the memory may be memory 102 or memory 113 in the embodiment shown in FIG1, and the processing circuit may implement the functions of encoder 200 or decoder 300 in the embodiments shown in FIG1-3.

本申请实施例还提供一种电子设备,包括处理器和通信接口,所述通信接口和所述处理器耦合,所述处理器用于运行程序或指令,实现如图4所示方法实施例中的步骤。该设备实施例与上述方法实施例对应,上述方法实施例的各个实施过程和实现方式均可适用于该终端实施例中,且能达到相同的技术效果。This application also provides an electronic device, including a processor and a communication interface. The communication interface is coupled to the processor, and the processor is used to run programs or instructions to implement the steps in the method embodiment shown in FIG4. This device embodiment corresponds to the above method embodiment, and all implementation processes and methods of the above method embodiments can be applied to this terminal embodiment and achieve the same technical effect.

本申请实施例的处理器或处理电路可以包括通用处理器、专用处理器等,例如包括中央处理单元(Central Processing Unit,CPU)、微处理器、数字信号处理器(Digital Signal Processor,DSP)、人工智能(Artificial Intelligent,AI)处理器、图形处理器(Graphics Processing Unit,GPU)、专用集成电路(Application Specific Integrated Circuit,ASIC)、网络处理器(Network Processor,NP)、现场可编程门阵列(Field Programmable Gate Array,FPGA)或者其他可编程逻辑器件、门电路、晶体管、分立硬件组件等。本申请实施例的通信接口可以包括收发器、管脚、电路、总线等。The processor or processing circuit in this application embodiment may include general-purpose processors, special-purpose processors, etc., such as central processing units (CPUs), microprocessors, digital signal processors (DSPs), artificial intelligence (AI) processors, graphics processing units (GPUs), application-specific integrated circuits (ASICs), network processors (NPs), field-programmable gate arrays (FPGAs), or other programmable logic devices, gate circuits, transistors, discrete hardware components, etc. The communication interface in this application embodiment may include transceivers, pins, circuits, buses, etc.

上述电子设备可以是终端,也可以为除终端之外的其他设备,例如服务器、网络附属存储器(Network Attached Storage,NAS)等。The aforementioned electronic devices can be terminals or other devices besides terminals, such as servers, network attached storage (NAS), etc.

其中,终端可以是手机、平板电脑(Tablet Personal Computer)、膝上型电脑(Laptop Computer)、笔记本电脑、个人数字助理(Personal Digital Assistant,PDA)、掌上电脑、上网本、超级移动个人计算机(Ultra-mobile Personal Computer,UMPC)、移动上网装置(Mobile Internet Device,MID)、增强现实(Augmented Reality,AR)、虚拟现实(Virtual Reality,VR)设备、混合现实(mixed reality,MR)设备、机器人、可穿戴式设备(Wearable Device)、飞行器(flight vehicle)、车载用户设备(Vehicle User Equipment,VUE)、船载设备、行人用户设备(Pedestrian User Equipment,PUE)、智能家居(具有无线通信功能的家居设备,如冰箱、电视、洗衣机或者家具等)、游戏机、个人计算机(Personal Computer,PC)、柜员机或者自助机等终端侧设备。可穿戴式设备包括:智能手表、智能手环、智能耳机、智能眼镜、智能首饰(智能手镯、智能手链、智能戒指、智能项链、智能脚镯、智能脚链等)、智能腕带、智能服装等。其中,车载设备也可以称为车载终端、车载控制器、车载模块、车载部件、车载芯片或车载单元等。需要说明的是,在本申请实施例并不限定终端的具体类型。The terminal can be a mobile phone, tablet computer, laptop computer, notebook computer, personal digital assistant (PDA), handheld computer, netbook, ultra-mobile personal computer (UMPC), mobile internet device (MID), augmented reality (AR), virtual reality (VR) device, mixed reality (MR) device, robot, wearable device, flight vehicle, vehicle user equipment (VUE), shipborne equipment, pedestrian user equipment (PUE), smart home (home devices with wireless communication capabilities, such as refrigerators, televisions, washing machines, or furniture), game console, personal computer (PC), ATM or self-service machine, etc. Wearable devices include: smartwatches, smart bracelets, smart earphones, smart glasses, smart jewelry (smart bracelets, smart chains, smart rings, smart necklaces, smart anklets, smart anklets, etc.), smart wristbands, smart clothing, etc. Among these, in-vehicle devices can also be referred to as in-vehicle terminals, in-vehicle controllers, in-vehicle modules, in-vehicle components, in-vehicle chips, or in-vehicle units, etc. It should be noted that the embodiments in this application do not limit the specific type of terminal.

服务器可以是独立的物理服务器,也可以是多个物理服务器构成的服务器集群或者分布式系统,还可以是云服务器,该云服务器可以提供云服务、云数据库、云计算、云函数、云存储、网络服务、云通信、中间件服务、域名服务、安全服务、内容分发网络(Content Delivery Network,CDN)、或以大数据和人工智能平台为基础的云计算服务等。A server can be a standalone physical server, a server cluster or distributed system consisting of multiple physical servers, or a cloud server. A cloud server can provide cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, content delivery networks (CDNs), or cloud computing services based on big data and artificial intelligence platforms.

示例性的,上述电子设备可以包括但不限于图1所示的源设备100或目的地设备110的类型。For example, the aforementioned electronic device may include, but is not limited to, the type of source device 100 or destination device 110 shown in FIG1.

以电子设备为终端为例,图9为实现本申请实施例的一种终端的硬件结构示意图。Taking an electronic device as an example, Figure 9 is a schematic diagram of the hardware structure of a terminal that implements an embodiment of this application.

该终端900包括但不限于:射频单元901、网络模块902、音频输出单元903、输入单元904、传感器905、显示单元906、用户输入单元907、接口单元908、存储器909以及处理器910等中的至少部分部件。The terminal 900 includes, but is not limited to, at least some of the following components: radio frequency unit 901, network module 902, audio output unit 903, input unit 904, sensor 905, display unit 906, user input unit 907, interface unit 908, memory 909, and processor 910.

本领域技术人员可以理解,终端900还可以包括给各个部件供电的电源(比如电池),电源可以通过电源管理系统与处理器910逻辑相连,从而通过电源管理系统实现管理充电、放电以及功耗管理等功能。图9中示出的终端结构并不构成对终端的限定,终端可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置,在此不再赘述。Those skilled in the art will understand that the terminal 900 may also include a power supply (such as a battery) for powering various components. The power supply can be logically connected to the processor 910 through a power management system, thereby enabling functions such as charging, discharging, and power consumption management through the power management system. The terminal structure shown in Figure 9 does not constitute a limitation on the terminal. The terminal may include more or fewer components than shown, or combine certain components, or have different component arrangements, which will not be elaborated here.

应理解的是,本申请实施例中,输入单元904可以包括图形处理器9041和麦克风9042,图形处理器9041对在视频采集模式或图像采集模式中由图像采集装置(如摄像头)获得的静态图片或视频的图像数据进行处理,或者可以对获得的点云数据进行处理。显示单元906可包括显示面板9061,可以采用液晶显示器、有机发光二极管等形式来配置显示面板9061。用户输入单元907包括触控面板9071以及其他输入设备9072中的至少一种。触控面板9071,也称为触摸屏。触控面板9071可包括触摸检测装置和触摸控制器两个部分。其他输入设备9072可以包括但不限于物理键盘、功能键(比如音量控制按键、开关按键等)、轨迹球、鼠标、操作杆,在此不再赘述。It should be understood that, in this embodiment, the input unit 904 may include a graphics processor 9041 and a microphone 9042. The graphics processor 9041 processes image data of still images or videos obtained by an image acquisition device (such as a camera) in video acquisition mode or image acquisition mode, or it may process the obtained point cloud data. The display unit 906 may include a display panel 9061, which may be configured in the form of a liquid crystal display, an organic light-emitting diode, etc. The user input unit 907 includes at least one of a touch panel 9071 and other input devices 9072. The touch panel 9071 is also called a touch screen. The touch panel 9071 may include two parts: a touch detection device and a touch controller. Other input devices 9072 may include, but are not limited to, physical keyboards, function keys (such as volume control buttons, power buttons, etc.), trackballs, mice, and joysticks, which will not be described in detail here.

本申请实施例中,射频单元901接收来自网络侧设备的下行数据后,可以传输给处理器910进行处理;另外,射频单元901可以向网络侧设备发送上行数据。通常,射频单元901包括但不限于天线、放大器、收发器、耦合器、低噪声放大器、双工器等。In this embodiment, after receiving downlink data from the network-side device, the radio frequency unit 901 can transmit it to the processor 910 for processing; in addition, the radio frequency unit 901 can send uplink data to the network-side device. Typically, the radio frequency unit 901 includes, but is not limited to, antennas, amplifiers, transceivers, couplers, low-noise amplifiers, duplexers, etc.

存储器909可用于存储软件程序或指令以及各种数据。存储器909可主要包括存储程序或指令的第一存储区和存储数据的第二存储区,其中,第一存储区可存储操作系统、至少一个功能所需的应用程序或指令(比如声音播放功能、图像播放功能等)等。此外,存储器909可以包括易失性存储器或非易失性存储器。其中,非易失性存储器可以是只读存储器(Read-Only Memory,ROM)、可编程只读存储器(Programmable ROM,PROM)、可擦除可编程只读存储器(Erasable PROM,EPROM)、电可擦除可编程只读存储器(Electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(Random Access Memory,RAM),静态随机存取存储器(Static RAM,SRAM)、动态随机存取存储器(Dynamic RAM,DRAM)、同步动态随机存取存储器(Synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(Double Data Rate SDRAM,DDRSDRAM)、增强型同步动态随机存取存储器(Enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(Synch link DRAM,SLDRAM)和直接内存总线随机存取存储器(Direct Rambus RAM,DRRAM)。本申请实施例中的存储器909包括但不限于这些和任意其它适合类型的存储器。The memory 909 can be used to store software programs or instructions, as well as various data. The memory 909 may primarily include a first storage area for storing programs or instructions and a second storage area for storing data. The first storage area may store the operating system, application programs or instructions required for at least one function (such as sound playback, image playback, etc.). Furthermore, the memory 909 may include volatile memory or non-volatile memory. The non-volatile memory may be read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), or flash memory. Volatile memory can be random access memory (RAM), static random access memory (SRAM), dynamic random access memory (DRAM), synchronous dynamic random access memory (SDRAM), double data rate synchronous dynamic random access memory (DDRSDRAM), enhanced synchronous dynamic random access memory (ESDRAM), synchronous link dynamic random access memory (SLDRAM), and direct memory bus RAM (DRRAM). The memory 909 in the embodiments of this application includes, but is not limited to, these and any other suitable types of memory.

处理器910可包括一个或多个处理单元;可选的,处理器910集成应用处理器和调制解调处理器,其中,应用处理器主要处理涉及操作系统、用户界面和应用程序等的操作,调制解调处理器主要处理无线通信信号,如基带处理器。可以理解的是,上述调制解调处理器也可以不集成到处理器910中。Processor 910 may include one or more processing units; optionally, processor 910 integrates an application processor and a modem processor, wherein the application processor mainly handles operations involving the operating system, user interface, and applications, and the modem processor mainly handles wireless communication signals, such as a baseband processor. It is understood that the aforementioned modem processor may also not be integrated into processor 910.

其中,处理器910,用于获取神经网络的输出图像的目标频段分量;获取所述输出图像对应的目标图像的目标频段分量;计算所述神经网络的损失信息,所述损失信息包括所述输出图像的目标频段分量和所述目标图像的目标频段分量之间的损失信息。The processor 910 is configured to acquire the target frequency band component of the output image of the neural network; acquire the target frequency band component of the target image corresponding to the output image; and calculate the loss information of the neural network, wherein the loss information includes the loss information between the target frequency band component of the output image and the target frequency band component of the target image.

可选地,所述获取神经网络的输出图像的目标频段的分量,包括:Optionally, obtaining the components of the target frequency band of the output image of the neural network includes:

对神经网络的输出图像进行频域变换,得到所述输出图像的频域信息,并从所述输出图像的频域信息中获取目标频段分量。The output image of the neural network is transformed in the frequency domain to obtain the frequency domain information of the output image, and the target frequency band component is obtained from the frequency domain information of the output image.

可选地,所述获取所述输出图像对应的目标图像的目标频段分量,包括:Optionally, obtaining the target frequency band component of the target image corresponding to the output image includes:

对所述输出图像对应的目标图像进行频域变换,得到所述目标图像的频域信息,并从所述目标图像的频域信息中获取目标频段分量。The target image corresponding to the output image is subjected to frequency domain transformation to obtain the frequency domain information of the target image, and the target frequency band component is obtained from the frequency domain information of the target image.

可选地,所述频域变换包括如下一项:Optionally, the frequency domain transformation includes the following:

离散小波变换DWT、离散余弦变换DCT、傅里叶变换DFT。Discrete Wavelet Transform (DWT), Discrete Cosine Transform (DCT), and Fourier Transform (DFT).

可选地,所述目标频段分量包括如下至少一项:Optionally, the target frequency band component includes at least one of the following:

高频分量、中频分量、低频分量。High-frequency components, mid-frequency components, and low-frequency components.

可选地,所述计算所述神经网络的损失信息,包括如下至少一项:Optionally, calculating the loss information of the neural network includes at least one of the following:

计算所述输出图像的高频分量和所述目标图像的高频分量之间的第一损失值;Calculate a first loss value between the high-frequency components of the output image and the high-frequency components of the target image;

计算所述输出图像的中频分量和所述目标图像的中频分量之间的第二损失值;Calculate a second loss value between the mid-frequency components of the output image and the mid-frequency components of the target image;

计算所述输出图像的低频分量和所述目标图像的低频分量之间的第三损失值;Calculate a third loss value between the low-frequency components of the output image and the low-frequency components of the target image;

可选地,在所述目标频段分量包括高频分量、中频分量和低频分量中的至少两项的情况下,所述输出图像的目标频段分量和所述目标图像的目标频段分量之间的损失信息包括:所述第一损失值、所述第二损失值和所述第三损失值中的至少两项的加权平均损失值。Optionally, when the target frequency band component includes at least two of high-frequency components, mid-frequency components, and low-frequency components, the loss information between the target frequency band component of the output image and the target frequency band component of the target image includes: a weighted average loss value of at least two of the first loss value, the second loss value, and the third loss value.

可选地,所述输出图像包括:Optionally, the output image includes:

所述神经网络输出的上采样图像。The upsampled image output by the neural network.

上述终端有利于提升神经网络的学习效果。The aforementioned terminals are beneficial for improving the learning performance of neural networks.

可以理解,本实施例中提及的各实现方式的实现过程可以参照方法实施例损失信息计算方法的相关描述,并达到相同或相应的技术效果,为避免重复,在此不再赘述。It is understood that the implementation process of each implementation method mentioned in this embodiment can refer to the relevant description of the loss information calculation method in the method embodiment, and achieve the same or corresponding technical effect. To avoid repetition, it will not be described again here.

本申请实施例还提供一种可读存储介质,所述可读存储介质上存储有程序或指令,该程序或指令被处理器执行时实现上述损失信息计算方法实施例的各个过程,且能达到相同的技术效果,为避免重复,这里不再赘述。This application also provides a readable storage medium storing a program or instructions. When the program or instructions are executed by a processor, they implement the various processes of the above-described loss information calculation method embodiments and achieve the same technical effect. To avoid repetition, they will not be described again here.

其中,所述处理器为上述实施例中所述的终端中的处理器。所述可读存储介质,包括计算机可读存储介质,如ROM、RAM、磁碟或者光盘等。在一些示例中,可读存储介质可以是非瞬态的可读存储介质。The processor mentioned above is the processor in the terminal described in the above embodiments. The readable storage medium includes computer-readable storage media, such as ROM, RAM, magnetic disk, or optical disk. In some examples, the readable storage medium may be a non-transient readable storage medium.

本申请实施例另提供了一种芯片,所述芯片包括处理器和通信接口,所述通信接口和所述处理器耦合,所述处理器用于运行程序或指令,实现上述损失信息计算方法实施例的各个过程,且能达到相同的技术效果,为避免重复,这里不再赘述。This application embodiment also provides a chip, which includes a processor and a communication interface. The communication interface is coupled to the processor. The processor is used to run programs or instructions to implement the various processes of the above-described loss information calculation method embodiment and can achieve the same technical effect. To avoid repetition, it will not be described again here.

应理解,本申请实施例提到的芯片可以包括系统级芯片(也可称为系统芯片、芯片系统或片上系统芯片),也可以包括独立显示芯片等。It should be understood that the chips mentioned in the embodiments of this application may include system-on-a-chip (also known as system chip, chip system, or system-on-a-chip) or discrete display chips, etc.

本申请实施例另提供了一种计算机程序/程序产品,所述计算机程序/程序产品被存储在存储介质中,所述计算机程序/程序产品被至少一个处理器执行以实现上述损失信息计算方法实施例的各个过程,且能达到相同的技术效果,为避免重复,这里不再赘述。This application also provides a computer program/program product, which is stored in a storage medium and executed by at least one processor to implement the various processes of the above-described loss information calculation method embodiments, and can achieve the same technical effect. To avoid repetition, it will not be described again here.

需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者装置不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者装置所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、方法、物品或者装置中还存在另外的相同要素。此外,需要指出的是,本申请实施方式中的方法和装置的范围不限按示出或讨论的顺序来执行功能,还可包括根据所涉及的功能按基本同时的方式或按相反的顺序来执行功能,例如,可以按不同于所描述的次序来执行所描述的方法,并且还可以添加、省去或组合各种步骤。另外,参照某些示例所描述的特征可在其他示例中被组合。It should be noted that, in this document, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Without further limitations, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes that element. Furthermore, it should be noted that the scope of the methods and apparatuses in the embodiments of this application is not limited to performing functions in the order shown or discussed, but may also include performing functions substantially simultaneously or in the reverse order, depending on the functions involved. For example, the described methods may be performed in a different order than described, and various steps may be added, omitted, or combined. Additionally, features described with reference to certain examples may be combined in other examples.

通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助计算机软件产品加必需的通用硬件平台的方式来实现,当然也可以通过硬件。该计算机软件产品存储在存储介质(如ROM、RAM、磁碟、光盘等)中,包括若干指令,用以使得终端或者网络侧设备执行本申请各个实施例所述的方法。From the above description of the embodiments, those skilled in the art can clearly understand that the methods of the above embodiments can be implemented by means of computer software products plus necessary general-purpose hardware platforms, and of course, they can also be implemented by hardware. The computer software product is stored in a storage medium (such as ROM, RAM, magnetic disk, optical disk, etc.) and includes several instructions to cause the terminal or network-side device to execute the methods described in the various embodiments of this application.

上面结合附图对本申请的实施例进行了描述,但是本申请并不局限于上述的具体实施方式,上述的具体实施方式仅仅是示意性的,而不是限制性的,本领域的普通技术人员在本申请的启示下,在不脱离本申请宗旨和权利要求所保护的范围情况下,还可做出很多形式的实施方式,这些实施方式均属于本申请的保护之内。The embodiments of this application have been described above with reference to the accompanying drawings. However, this application is not limited to the specific embodiments described above. The specific embodiments described above are merely illustrative and not restrictive. Those skilled in the art can make many other implementations under the guidance of this application without departing from the spirit and scope of the claims. All of these implementations are within the protection scope of this application.

Claims (17)

一种损失信息计算方法,包括:A method for calculating loss information, comprising: 获取神经网络的输出图像的目标频段分量;Obtain the target frequency band components of the output image of the neural network; 获取所述输出图像对应的目标图像的目标频段分量;Obtain the target frequency band component of the target image corresponding to the output image; 计算所述神经网络的损失信息,所述损失信息包括所述输出图像的目标频段分量和所述目标图像的目标频段分量之间的损失信息。Calculate the loss information of the neural network, which includes the loss information between the target frequency band component of the output image and the target frequency band component of the target image. 根据权利要求1所述的方法,其中,所述获取神经网络的输出图像的目标频段的分量,包括:According to the method of claim 1, wherein obtaining the components of the target frequency band of the output image of the neural network includes: 对神经网络的输出图像进行频域变换,得到所述输出图像的频域信息,并从所述输出图像的频域信息中获取目标频段分量。The output image of the neural network is transformed in the frequency domain to obtain the frequency domain information of the output image, and the target frequency band component is obtained from the frequency domain information of the output image. 根据权利要求1或2所述的方法,其中,所述获取所述输出图像对应的目标图像的目标频段分量,包括:According to the method of claim 1 or 2, wherein obtaining the target frequency band component of the target image corresponding to the output image includes: 对所述输出图像对应的目标图像进行频域变换,得到所述目标图像的频域信息,并从所述目标图像的频域信息中获取目标频段分量。The target image corresponding to the output image is subjected to frequency domain transformation to obtain the frequency domain information of the target image, and the target frequency band component is obtained from the frequency domain information of the target image. 根据权利要求2或3所述的方法,其中,所述频域变换包括如下一项:According to claim 2 or 3, the method wherein the frequency domain transformation includes one of the following: 离散小波变换DWT、离散余弦变换DCT、傅里叶变换DFT。Discrete Wavelet Transform (DWT), Discrete Cosine Transform (DCT), and Fourier Transform (DFT). 根据权利要求1至4中任一项所述的方法,其特征在于,所述目标频段分量包括如下至少一项:The method according to any one of claims 1 to 4, characterized in that the target frequency band component includes at least one of the following: 高频分量、中频分量、低频分量。High-frequency components, mid-frequency components, and low-frequency components. 根据权利要求5所述的方法,其中,所述计算所述神经网络的损失信息,包括如下至少一项:According to the method of claim 5, the calculation of the loss information of the neural network includes at least one of the following: 计算所述输出图像的高频分量和所述目标图像的高频分量之间的第一损失值;Calculate a first loss value between the high-frequency components of the output image and the high-frequency components of the target image; 计算所述输出图像的中频分量和所述目标图像的中频分量之间的第二损失值;Calculate a second loss value between the mid-frequency components of the output image and the mid-frequency components of the target image; 计算所述输出图像的低频分量和所述目标图像的低频分量之间的第三损失值。Calculate a third loss value between the low-frequency components of the output image and the low-frequency components of the target image. 根据权利要求6所述的方法,其中,在所述目标频段分量包括高频分量、中频分量和低频分量中的至少两项的情况下,所述输出图像的目标频段分量和所述目标图像的目标频段分量之间的损失信息包括:所述第一损失值、所述第二损失值和所述第三损失值中的至少两项的加权平均损失值。According to the method of claim 6, wherein, when the target frequency band component includes at least two of a high-frequency component, a mid-frequency component, and a low-frequency component, the loss information between the target frequency band component of the output image and the target frequency band component of the target image includes: a weighted average loss value of at least two of the first loss value, the second loss value, and the third loss value. 根据权利要求1至7中任一项所述的方法,其中,所述输出图像包括:The method according to any one of claims 1 to 7, wherein the output image comprises: 所述神经网络输出的上采样图像。The upsampled image output by the neural network. 一种损失信息计算装置,包括:A loss information calculation device, comprising: 第一获取模块,用于获取神经网络的输出图像的目标频段分量;The first acquisition module is used to acquire the target frequency band components of the output image of the neural network; 第二获取模块,用于获取所述输出图像对应的目标图像的目标频段分量;The second acquisition module is used to acquire the target frequency band component of the target image corresponding to the output image; 计算模块,用于计算所述神经网络的损失信息,所述损失信息包括所述输出图像的目标频段分量和所述目标图像的目标频段分量之间的损失信息。The calculation module is used to calculate the loss information of the neural network, the loss information including the loss information between the target frequency band component of the output image and the target frequency band component of the target image. 根据权利要求9所述的装置,其中,所述第一获取模块用于对神经网络的输出图像进行频域变换,得到所述输出图像的频域信息,并从所述输出图像的频域信息中获取目标频段分量。According to the apparatus of claim 9, the first acquisition module is used to perform frequency domain transformation on the output image of the neural network to obtain frequency domain information of the output image, and to obtain the target frequency band component from the frequency domain information of the output image. 根据权利要求9或10所述的装置,其中,所述第二获取模块用于对所述输出图像对应的目标图像进行频域变换,得到所述目标图像的频域信息,并从所述目标图像的频域信息中获取目标频段分量。According to the apparatus of claim 9 or 10, the second acquisition module is used to perform frequency domain transformation on the target image corresponding to the output image to obtain the frequency domain information of the target image, and to obtain the target frequency band component from the frequency domain information of the target image. 根据权利要求9至11中任一项所述的装置,其中,所述目标频段分量包括如下至少一项:The apparatus according to any one of claims 9 to 11, wherein the target frequency band component comprises at least one of the following: 高频分量、中频分量、低频分量。High-frequency components, mid-frequency components, and low-frequency components. 根据权利要求12所述的装置,其中,所述计算模块用于如下至少一项:The apparatus according to claim 12, wherein the computing module is used for at least one of the following: 计算所述输出图像的高频分量和所述目标图像的高频分量之间的第一损失值;Calculate a first loss value between the high-frequency components of the output image and the high-frequency components of the target image; 计算所述输出图像的中频分量和所述目标图像的中频分量之间的第二损失值;Calculate a second loss value between the mid-frequency components of the output image and the mid-frequency components of the target image; 计算所述输出图像的低频分量和所述目标图像的低频分量之间的第三损失值。Calculate a third loss value between the low-frequency components of the output image and the low-frequency components of the target image. 一种电子设备,包括处理器和存储器,所述存储器存储可在所述处理器上运行的程序或指令,所述程序或指令被所述处理器执行时实现如权利要求1至8中任一项所述的损失信息计算方法的步骤。An electronic device includes a processor and a memory, the memory storing a program or instructions executable on the processor, the program or instructions, when executed by the processor, implementing the steps of the loss information calculation method as described in any one of claims 1 to 8. 一种可读存储介质,所述可读存储介质上存储程序或指令,所述程序或指令被处理器执行时实现如权利要求1至8中任一项所述的损失信息计算方法的步骤。A readable storage medium storing a program or instructions that, when executed by a processor, implement the steps of the loss information calculation method as described in any one of claims 1 to 8. 一种计算机程序产品,所述计算机程序产品被存储在存储介质中,所述计算机程序产品被至少一个处理器执行以实现如权利要求1至8中任一项所述的损失信息计算方法的步骤。A computer program product, the computer program product being stored in a storage medium, the computer program product being executed by at least one processor to implement the steps of the loss information calculation method as described in any one of claims 1 to 8. 一种芯片,所述芯片包括处理器和通信接口,所述通信接口和所述处理器耦合,所述处理器用于运行程序或指令,实现如权利要求1至8中任一项所述的损失信息计算方法的步骤。A chip includes a processor and a communication interface coupled to the processor, the processor being configured to run a program or instructions to implement the steps of the loss information calculation method as described in any one of claims 1 to 8.
PCT/CN2025/103711 2024-07-03 2025-06-26 Loss information calculation method and apparatus, and related device Pending WO2026007784A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202410886308.4A CN121279381A (en) 2024-07-03 2024-07-03 Loss information calculation method and device and related equipment
CN202410886308.4 2024-07-03

Publications (1)

Publication Number Publication Date
WO2026007784A1 true WO2026007784A1 (en) 2026-01-08

Family

ID=98237408

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2025/103711 Pending WO2026007784A1 (en) 2024-07-03 2025-06-26 Loss information calculation method and apparatus, and related device

Country Status (2)

Country Link
CN (1) CN121279381A (en)
WO (1) WO2026007784A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116797461A (en) * 2023-07-12 2023-09-22 福州大学 Binocular image super-resolution reconstruction method based on multi-level enhanced attention mechanism
CN116993852A (en) * 2023-09-26 2023-11-03 阿尔玻科技有限公司 Training method, main control device and image reconstruction method of image reconstruction model
CN117218149A (en) * 2023-11-08 2023-12-12 南通度陌信息科技有限公司 An image reconstruction method and system based on autoencoding neural network
US20240212252A1 (en) * 2022-10-13 2024-06-27 Tencent Technology (Shenzhen) Company Limited Method and apparatus for training video generation model, storage medium, and computer device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20240212252A1 (en) * 2022-10-13 2024-06-27 Tencent Technology (Shenzhen) Company Limited Method and apparatus for training video generation model, storage medium, and computer device
CN116797461A (en) * 2023-07-12 2023-09-22 福州大学 Binocular image super-resolution reconstruction method based on multi-level enhanced attention mechanism
CN116993852A (en) * 2023-09-26 2023-11-03 阿尔玻科技有限公司 Training method, main control device and image reconstruction method of image reconstruction model
CN117218149A (en) * 2023-11-08 2023-12-12 南通度陌信息科技有限公司 An image reconstruction method and system based on autoencoding neural network

Also Published As

Publication number Publication date
CN121279381A (en) 2026-01-06

Similar Documents

Publication Publication Date Title
US11102477B2 (en) DC coefficient sign coding scheme
US9407915B2 (en) Lossless video coding with sub-frame level optimal quantization values
US20190020888A1 (en) Compound intra prediction for video coding
US10469841B2 (en) Motion vector prediction using prior frame residual
CN109905714A (en) Inter-frame prediction method, device and terminal device
WO2020013735A1 (en) Method and apparatus for aspect-ratio dependent filtering for intra-prediction
AU2023202986B2 (en) Method and apparatus for intra prediction
CN109756737B (en) Image prediction method and device
US10448013B2 (en) Multi-layer-multi-reference prediction using adaptive temporal filtering
WO2025130738A1 (en) Image block compression processing method and apparatus, and electronic device
WO2026007784A1 (en) Loss information calculation method and apparatus, and related device
EP3841558B1 (en) Method and apparatus for intra reference sample interpolation filter switching
WO2025185566A1 (en) Decoding methods, encoding methods, and related device
CN121486580A (en) Video coding processing method, video decoding processing method, device and related equipment
CN119583805A (en) Video compression method, neural network model training method and related equipment
CN120614456A (en) Decoding method, encoding method, device and related equipment
WO2025140072A1 (en) Video compression processing method and apparatus, and electronic device
WO2026007833A1 (en) Method and apparatus for processing haptic signal, and related device
CN120302058A (en) Intra-frame prediction method, device and related equipment
CN120302047A (en) Pixel value prediction method and related device
CN121284233A (en) Image prediction method, device and equipment
WO2026026679A1 (en) Spatial tactile signal processing method and apparatus, and device
CN120835150A (en) Coding method, compression network training method and related equipment
WO2025067195A1 (en) Image block prediction method and apparatus, and electronic device
CN120915966A (en) Encoding and decoding method and device and electronic equipment