[go: up one dir, main page]

AU2024203901A1 - Method, apparatus and system for encoding and decoding tensors - Google Patents

Method, apparatus and system for encoding and decoding tensors

Info

Publication number
AU2024203901A1
AU2024203901A1 AU2024203901A AU2024203901A AU2024203901A1 AU 2024203901 A1 AU2024203901 A1 AU 2024203901A1 AU 2024203901 A AU2024203901 A AU 2024203901A AU 2024203901 A AU2024203901 A AU 2024203901A AU 2024203901 A1 AU2024203901 A1 AU 2024203901A1
Authority
AU
Australia
Prior art keywords
codec
nal
bitstream
unit
tensors
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
AU2024203901A
Inventor
Thi Hong Nhung NGUYEN
Christopher James ROSEWARNE
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Canon Inc
Original Assignee
Canon Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Canon Inc filed Critical Canon Inc
Priority to AU2024203901A priority Critical patent/AU2024203901A1/en
Priority to PCT/AU2025/050437 priority patent/WO2025251103A1/en
Publication of AU2024203901A1 publication Critical patent/AU2024203901A1/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/188Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a video data packet, e.g. a network abstraction layer [NAL] unit
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/119Adaptive subdivision aspects, e.g. subdivision of a picture into rectangular or non-rectangular coding blocks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/12Selection from among a plurality of transforms or standards, e.g. selection between discrete cosine transform [DCT] and sub-band transform or selection between H.263 and H.264
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/172Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a picture, frame or field
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/90Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using coding techniques not provided for in groups H04N19/10-H04N19/85, e.g. fractals
    • H04N19/91Entropy coding, e.g. variable length coding [VLC] or arithmetic coding
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/02Conversion to or from weighted codes, i.e. the weight given to a digit depending on the position of the digit within the block or code word
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/26Conversion to or from stochastic codes
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/40Conversion to or from variable length codes, e.g. Shannon-Fano code, Huffman code, Morse code
    • H03M7/4006Conversion to or from arithmetic code
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/13Adaptive entropy coding, e.g. adaptive variable length coding [AVLC] or context adaptive binary arithmetic coding [CABAC]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/90Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using coding techniques not provided for in groups H04N19/10-H04N19/85, e.g. fractals
    • H04N19/96Tree coding, e.g. quad-tree coding

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Biophysics (AREA)
  • General Physics & Mathematics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Multimedia (AREA)
  • Biomedical Technology (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Discrete Mathematics (AREA)
  • Neurology (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

41663381_1 METHOD, APPARATUS AND SYSTEM FOR ENCODING AND DECODING TENSORS A system and method of decoding a bitstream to produce tensors for use by a neural network second portion. The method comprises decoding a network abstraction layer (NAL) unit from the bitstream having a predetermined length, wherein the NAL unit of the predetermined length indicates a network abstraction layer (NAL) unit format of one inner codec of a plurality of inner codecs, each other inner codec having NAL unit lengths different to the predetermined length; selecting an inner codec from the plurality of inner codecs based on the decoded NAL unit of the predetermined length; and decoding the bitstream using the selected inner codec to produce the tensors.

Description

1200 (146) (146) 149
149 Temporal
Temporal upsampler
upsampler 1260
1260
1210
1206 1220
1216 1224 1254
1206 1210 1216 1220 1224 1254
1206 1206
Picture Tensor
Tensor
NAL unit Picture
1 NAL unit Tensor Tensor
Unpacker Q Unpacker Q-1
demultiplex decoder decompressor
storage
demultiplex decoder storage decompressor
1218
1214 1214 13/20
1218
1204
or 1202 1222 1250
or 1202 1204 1222 1250
1270 1270 1230
1207 1207 1230 Tensor
Tensor decompressor
decompressor
Metadata Metadata repository 1232
repository 1232
parser parser 1238
1238
1208 1208 1234
1234 Tensor weight
Tensor weight repository 1236
repository 1236 1240
1240 Fig. 12
Fig. 12
44204394_1
1
METHOD,APPARATUS METHOD, APPARATUSAND ANDSYSTEM SYSTEM FORENCODING FOR ENCODING AND AND DECODING DECODING 07 Jun 2024
TENSORS TENSORS TECHNICALFIELD TECHNICAL FIELD
[0001] The
[0001] The present present invention invention relates relates generally generally to digital to digital video processing video signal signal processing and, in and, in particular, totoa amethod, particular, method,apparatus apparatus and and system system for for encoding anddecoding encoding and decodingtensors tensorsfor foraa 2024203901
convolutional neural convolutional neural network. network.The The presentinvention present inventionalso alsorelates relatesto to aa computer program computer program
product including product including aa computer computerreadable readablemedium medium having having recorded recorded thereon thereon a computer a computer program program
for encoding for anddecoding encoding and decodingtensors tensorsfor foraa convolutional convolutionalneural neuralnetwork networkusing usingvideo videocompression compression technology. technology.
BACKGROUND BACKGROUND
[0002] Convolutionalneural
[0002] Convolutional neuralnetworks networks (CNNs) (CNNs) are are an emerging an emerging technology technology addressing, addressing, amongamong
other things, use cases involving machine vision such as object detection, instance other things, use cases involving machine vision such as object detection, instance
segmentation,object segmentation, object tracking, tracking, human poseestimation, human pose estimation,and andaction actionrecognition. recognition.Applications Applicationsforfor CNNs CNNs can can involve involve use use of of ‘edge 'edge devices’,with devices', withsensors sensorsand andsome some processing processing capability, capability, coupled coupled
to application servers as part of a ‘cloud’. CNNs can require relatively high computational to application servers as part of a 'cloud'. CNNs can require relatively high computational
complexity, more complexity, morethan thancan cantypically typicallybe beafforded affordedeither either in in computing capacityororpower computing capacity power consumptionbybyananedge consumption edge device.Executing device. Executing a CNN a CNN in a in a distributed distributed manner manner has emerged has emerged as oneas one solution to solution to running running leading-edge networksusing leading-edge networks usinglimited limitedcapability capability edge edgedevices deviceswithout without requiring all requiring allcomputational computational complexity to be complexity to be incurred incurred within within cloud cloud servers servers whilst whilst edge devices edge devices
have potentially under-utilised inferencing resources. In other words, distributed processing have potentially under-utilised inferencing resources. In other words, distributed processing
allows legacy edge devices to still provide the capability of leading-edge CNNs by distributing allows legacy edge devices to still provide the capability of leading-edge CNNs by distributing
processing between processing betweenthe theedge edgedevice deviceand andother otherprocessing processingmeans, means, such such as as cloud cloud servers. servers. Such Such a a distributed network architecture may be referred to as ‘collaborative intelligence’ (CI) and distributed network architecture may be referred to as 'collaborative intelligence' (CI) and
offers benefits such as re-using a partial result from a first portion of the network with several offers benefits such as re-using a partial result from a first portion of the network with several
different second portions, perhaps with each portion being optimised for a different task. CI different second portions, perhaps with each portion being optimised for a different task. CI
architectures introduce a need for efficient compression of tensor data, for transmission over a architectures introduce a need for efficient compression of tensor data, for transmission over a
network such network such as asaaWAN. WAN.
[0003] CNNs
[0003] CNNs typicallyinclude typically includemany many layers, layers, such such as as convolution convolution layers layers and and fullyconnected fully connected layers, with data passing from one layer to the next in the form of ‘tensors’. Splitting a network layers, with data passing from one layer to the next in the form of 'tensors'. Splitting a network
across different across different devices devices introduces introduces aa need need to tocompress the intermediate compress the intermediate multi-dimensional multi-dimensional
44204385_1 44204385_1
2
tensor data that passes from one layer to the next within a CNN in order to facilitate tensor data that passes from one layer to the next within a CNN in order to facilitate 07 Jun 2024
transmission over transmission over aa network networkhaving havingbandwidth bandwidth limitationsororcosts. limitations costs.Compression Compression of such of such
tensors may be referred to as ‘feature compression’ and the intermediate tensor data is often tensors may be referred to as 'feature compression' and the intermediate tensor data is often
referred to as ‘features’ or ‘feature maps’. Features or feature maps are generally a collection of referred to as 'features' or 'feature maps'. Features or feature maps are generally a collection of
two-dimensional(2D) two-dimensional (2D)arrays arraysofofvalues valueswhich, which,when when combined combined into into a 3Da (or 3D 4D) (or 4D) data data structure structure
form aa tensor, form tensor, with with each each feature feature map correspondingtotoone map corresponding one'channel' ‘channel’ofofthe thetensor. tensor. Intermediate Intermediate
tensor data represents a partially processed form of input such as an image frame or video tensor data represents a partially processed form of input such as an image frame or video 2024203901
frame, encountered frame, encounteredwithin withinaaneural neural network. network.Although Although a unit a unit ofof datatotobebeprocessed data processedininaaneural neural network is typically a tensor, operations performed may be described in relation to a feature network is typically a tensor, operations performed may be described in relation to a feature
map, in which case it is understood that the operation is to be performed to each feature map in map, in which case it is understood that the operation is to be performed to each feature map in
the tensor. International Organisation for Standardisation / International Electrotechnical the tensor. International Organisation for Standardisation / International Electrotechnical
Commission Commission JointTechnical Joint Technical Committee Committee 1 / Subcommittee 1 / Subcommittee 29 / Working 29 / Working Groups Groups 4 (ISO/IEC 4 (ISO/IEC
JTC1/SC JTC 1/SC29/WG 29/WG 4), 4), alsoalso known known as the as the “Moving "Moving Picture Picture Experts Experts Group” Group" (MPEG) (MPEG) Video Video coding are coding are tasked tasked with with developing developingaastandard standardfor for feature feature compression, known compression, known as as the'feature the ‘feature coding for coding for machines’ (FCM) machines' (FCM) standard. standard. Previously Previously WG 2WG 2 ‘MPEG 'MPEG Technical Technical Requirements’ Requirements' had had completedaa'Call completed ‘Call for for Proposals’ whichreceived Proposals' which receivedresponses responsesthat thatdemonstrated demonstratedsignificant significant outperformanceover outperformance overfeature featurecompression compression resultsachieved results achieved using using state-of-the-artstandardised state-of-the-art standardised video compression video compressiontechnology technology directlyapplied directly appliedtotothe thetensors. tensors.
[0004] CNNs
[0004] CNNs typicallyrequire typically requireweights weightsfor foreach eachofofthe thelayers layers to to be be predetermined inaa training predetermined in training stage, where stage, a very where a very large large amount of training amount of training data data is ispassed passed through through the the CNN anda aresult CNN and result determinedbybythe determined thenetwork networkundergoing undergoing training training being being compared compared to ground to ground truth truth associated associated with with
the training data. Discrepancy between the obtained and desired result is expressed as a ‘loss’ the training data. Discrepancy between the obtained and desired result is expressed as a 'loss'
and measured and measuredwith witha a'loss ‘lossfunction'. function’. Using Usingthe thedetermined determinedloss, loss,aaprocess processfor for updating updatingnetwork network weights, such weights, such as as stochastic stochastic gradient gradient descent descent (SGD), is performed. (SGD), is Network performed. Network weight weight update update
typically involves a process of back-propagation of ‘gradients’ that begins at the output layer of typically involves a process of back-propagation of 'gradients' that begins at the output layer of
the network the andproceeds network and proceedsbackward backwardto to terminate terminate when when the the input input layer layer to to thenetwork the network is is updated, updated,
propagatingthrough propagating throughintermediate, intermediate,oror 'hidden', ‘hidden’, layers layers of of the the network. Gradientsare network. Gradients are indicative indicative of deltas to be applied to network weights and are themselves updated as part of the back of deltas to be applied to network weights and are themselves updated as part of the back
propagation process. The rate of weight update is set by a ‘learning rate’ hyperparameter, propagation process. The rate of weight update is set by a 'learning rate' hyperparameter,
typically set to facilitate the training process in finding a global minima in terms of loss (i.e., typically set to facilitate the training process in finding a global minima in terms of loss (i.e.,
highest possible highest possible task task performance for the performance for the network architecture and network architecture and training training data) data) while while avoiding avoiding
the training the training process process becoming ‘stuck’ in becoming 'stuck' in aa local local minima. Becoming minima. Becoming stuck stuck in in a localminima a local minima correspondsto corresponds to obtaining obtaining sub-optimal sub-optimaltask taskperformance performanceforforthe thenetwork network architectureand architecture andbeing being incapable of incapable of finding finding new weightvalues new weight valuesthat that could couldlead lead to to higher higher task task performance. Network performance. Network
44204385_1 44204385_1
3
weights are weights are repeatedly repeatedly updated updatedby bysupplying supplyinginput inputdata dataand andground groundtruth truthdata dataorganised organisedinto into 07 Jun 2024
‘batches’ toiteratively 'batches' to iterativelyrefine refinethethenetwork network performance performance until further until further improvement improvement in accuracy in is accuracy is
no longer no longer achievable. achievable. AnAniteration iteration through throughthe the entire entire training training dataset datasetforms forms one one ‘epoch’ 'epoch' of of
training and training typically requires performing multiple epochs to achieve a high level of training and training typically requires performing multiple epochs to achieve a high level of
performancefor performance forthe the task. task. Weights Weightsfor foraatrained trained network networkare arethen thenavailable available for for deployment, and deployment, and
the network the operates in network operates in aa mode whereweights mode where weights arefixed are fixedand andgradients gradientsfor forweight weightupdate updateareare omitted. The omitted. Theprocess processofofexecuting executinga apretrained pretrainedCNN CNN with with an an input input andand progressively progressively 2024203901
transformingthe transforming the input input into into an an output output according to aa topology according to topology of of the the CNN CNN isis commonly commonly referred referred
to as ‘inferencing’. to as 'inferencing'.
[0005] Generally, aa tensor
[0005] Generally, tensor is is an an array arrayof ofelements elementshaving having four four dimensions, dimensions, namely: batch, namely: batch,
channels, height and width. The first dimension, ‘batch’, is typically of size one when channels, height and width. The first dimension, 'batch', is typically of size one when
inferencing on inferencing on video video data data and and indicates indicates that that one one frame frame is is passed passed through through a a CNN CNN asasone onebatch. batch. Whentraining When trainingaanetwork, network,the thevalue valueofofthe the batch batch dimension dimensionmay maybe be increased increased SO so thatmultiple that multiple frames are frames are passed passed through throughthe the network networkinineach eachbatch batchbefore beforethe thenetwork networkweights weights areupdated, are updated, according to according to aa predetermined ‘batchsize'. predetermined 'batch size’. AAmulti-frame multi-framevideo videomay may be be passed passed through through as as a a single tensor single tensor with with the the batch batch dimension increased in dimension increased in size size according according to to the thenumber of frames number of frames of of aa given video. given video. However, However, forpractical for practicalconsiderations considerationsrelating relating to to memory consumption memory consumption and and
access, inferencing access, inferencing on on video video data data is is typically typicallyperformed performed on on aa frame-wise basis. The frame-wise basis. ‘channels’ The 'channels'
dimensionindicates dimension indicatesthe the number numberofofconcurrent concurrent'feature ‘featuremaps' maps’for fora agiven giventensor tensorand andthe theheight height and width dimensions indicate the size of the feature maps at the particular stage of the CNN. and width dimensions indicate the size of the feature maps at the particular stage of the CNN.
Channelcount Channel countvaries variesthrough throughthe thelayers layers of of aa CNN according CNN according to to thenetwork the network architecture. architecture.
Feature map Feature mapsize sizealso also varies, varies, depending onsubsampling depending on subsamplingoror upsampling upsampling occurring occurring in specific in specific
networklayers. network layers.
[0006] The
[0006] The overall overall complexity complexity of theof thetends CNN CNN to tends to be relatively be relatively high, with high, with large relatively relatively large numbersofofmultiply-accumulate numbers multiply-accumulate (MAC) (MAC) operations operations beingbeing performed performed and numerous and numerous
intermediate tensors intermediate tensors being written to being written to and and read read from from memory, alongwith memory, along withreading reading weights weights forfor
performanceofofeach performance eachlayer layerofof the the CNN. CNN.As As such, such, dividing dividing a neural a neural network network into into portions portions allows allows
implementation implementation ofofmore morecomplex complex networks networks eveneven in systems in systems containing containing less less capable capable edge edge
devices, without requiring cloud servers to bear the full burden of performing the network. devices, without requiring cloud servers to bear the full burden of performing the network.
[0007] Feature compression
[0007] Feature compressionmay may benefit benefit from from existing existing video video compression compression standards, standards, suchsuch as as
ISO/IEC23090-2 ISO/IEC 23090-2 “Versatile "Versatile Video Video Coding” Coding" (VVC)/H.266, (VVC)/H.266, developed developed by the by theVideo Joint Joint Video Experts Team Experts Team(JVET), (JVET), a jointactivity a joint activity by byISO/IEC ISO/IEC and and ITU-T. ITU-T. VVC VVC is anticipated is anticipated to address to address
44204385_1 44204385_1
4
ongoingdemand ongoing demandforfor ever-higher ever-higher compression compression performance, performance, especially especially as video as video formats formats increase increase 07 Jun 2024
in capability (for example, with higher resolution and higher frame rate) and to address in capability (for example, with higher resolution and higher frame rate) and to address
increasing market increasing demand market demand forservice for servicedelivery deliveryover overWANs, WANs, where where bandwidth bandwidth costs costs are are relatively high. relatively high. VVC VVC isisimplementable implementablein in contemporary contemporary silicon silicon processes processes andand offers offers an an
acceptable trade-off acceptable trade-off between achievedperformance between achieved performance versus versus implementation implementation cost. cost. TheThe
implementationcost implementation costmay maybebe considered considered forfor example, example, in in terms terms of of one one or or more more of of siliconarea, silicon area, CPUprocessor CPU processorload, load,memory memory utilisation utilisation and and bandwidth. bandwidth. Other Other video video compression compression standards, standards, 2024203901
such as such as ISO/IEC ISO/IEC23008-2 23008-2 “High "High Efficiency Efficiency Video Video Coding” Coding" (HEVC)/H.265 (HEVC)/H.265 or ISO/IEC or ISO/IEC 14496- 14496- 15, 15, “Advanced Video "Advanced Video Coding” Coding" may may also also be used be used for feature for feature compression compression applications. applications. OtherOther
standards such standards such as as AV-1, AV-1,developed developedbyby theAlliance the AllianceforforOpen Open Media Media (AOMedia) (AOMedia) maybealso may also be used. used.
[0008] Videodata
[0008] Video dataincludes includesaa sequence sequenceofofframes framesofofimage imagedata, data,each eachframe frame including including one one or or
morecolour more colourchannels. channels.Where Where feature feature mapmap datadata is to is to bebe represented represented inin a apacked packed frame, frame,
generally aa monochrome generally frame monochrome frame having having luminance luminance only only andchroma and no no chroma channels channels is adequate. is adequate.
Whenonly When onlyluma luma samples samples areare present, present, thetheresulting resultingmonochrome monochrome frames frames are said are said to use to use a “4:0:0 a "4:0:0
chromaformat". chroma format”.
[0009] TheVVC
[0009] The VVC standard standard specifies specifies a ‘block a 'block based’ based' architecture,ininwhich architecture, whichframes framesarearefirstly firstly divided into divided into an an array array of of square square regions regions known as 'coding known as ‘codingtree tree units' units’ (CTUs). InVVC, (CTUs). In VVC, CTUs CTUs
generally occupy generally 128×128 occupy 128x128 luma luma samples. samples. Other Other possible possible CTU sizes CTU sizes when the when using using VVCthe VVC standard are standard are 32×32 and64x64. 32x32 and 64×64.However, However, CTUsCTUs at theatright the right and and bottom bottom edge edge of each of each frameframe
may be smaller in area, with implicit splitting occurring to ensure coding blocks remain in the may be smaller in area, with implicit splitting occurring to ensure coding blocks remain in the
frame. Associated frame. Associatedwith witheach eachCTU CTU is ais 'coding a ‘coding tree’defining tree' defininga adecomposition decompositionof of thethe area area ofof the the
CTU into a set of blocks, also referred to as ‘coding units’ (CUs). Blocks applicable to only the CTU into a set of blocks, also referred to as 'coding units' (CUs). Blocks applicable to only the
lumachannel luma channelororonly onlythe the chroma chromachannels channels arereferred are referredtotoas as 'coding ‘codingblocks' blocks’(CBs). (CBs).A A prediction of the contents of a coding block is held in a ‘prediction block’ (PB) or ‘prediction prediction of the contents of a coding block is held in a 'prediction block' (PB) or 'prediction
unit’ (PU) unit' (PU) and a residual and a residual block block defining defining an an array array of ofsample sample values values to to be be additively additivelycombined combined
with the PB or PU is referred to as a ‘transform block’ (TB) or ‘transform unit’ (TU), owing to with the PB or PU is referred to as a 'transform block' (TB) or 'transform unit' (TU), owing to
the typical use of a transformation process in the generation of the TB or TU. In the case of the typical use of a transformation process in the generation of the TB or TU. In the case of
HEVC, HEVC, theCTUCTU the sizesize maymay be 64×64, be 64x64, 32×32, 32x32, or 16×16 or 16x16 luma samples. luma samples. In the In the case ofcase of advanced advanced
video coding video coding(AVC), (AVC),a a"Macroblock" “Macroblock” is the is the analogue analogue ofCTU of a a CTU and ahas and has a size size of 16×16 of 16x16 luma luma
samples. samples.
44204385_1 44204385_1
5
[00010] Notwithstandingthetheabove
[00010] Notwithstanding above distinctionbetween distinction between ‘units’and 'units' and'blocks', ‘blocks’,the theterm term'block' ‘block’ 07 Jun 2024
may be used as a general term to refer to areas or regions of a frame for which operations are may be used as a general term to refer to areas or regions of a frame for which operations are
applied to all colour channels. applied to all colour channels.
[00011] Foreach
[00011] For eachCU, CU,a aprediction predictionunit unit(PU) (PU)ofofthe the contents contents (sample (samplevalues) values)ofof the the corresponding area corresponding area of frame of frame data data is generated is generated (a ‘prediction (a 'prediction unit’). Further, unit'). Further, a representation a representation of of the difference (or ‘spatial domain’ residual) between the prediction and the contents of the area the difference (or 'spatial domain' residual) between the prediction and the contents of the area
as seen seen at at input inputto tothe theencoder encoderisis formed. formed. The The difference difference in ineach each colour colour channel channel may be 2024203901
as may be
transformedand transformed andcoded codedasasa asequence sequenceofofresidual residualcoefficients, coefficients, forming oneorormore forming one moreTUs TUsforfor a a given CU. given CU.The The applied applied transform transform maymay be abeDiscrete a Discrete Cosine Cosine Transform Transform (DCT)(DCT) or or other other transform, applied to each block of residual values. The transform is applied separably, (i.e., transform, applied to each block of residual values. The transform is applied separably, (i.e.,
the two-dimensional the transformisisperformed two-dimensional transform performedinintwo twopasses, passes,one onehorizontally horizontallyand andone onevertically). vertically). Theblock The blockisis firstly firstly transformed transformed by by applying applying a a one-dimensional transformtotoeach one-dimensional transform eachrow rowofof samples in the block. Then, the partial result is transformed by applying a one-dimensional samples in the block. Then, the partial result is transformed by applying a one-dimensional
transform to each column of the partial result to produce a final block of transform coefficients transform to each column of the partial result to produce a final block of transform coefficients
that substantially decorrelates the residual samples. Transforms of various sizes are supported that substantially decorrelates the residual samples. Transforms of various sizes are supported
by the by the VVC standard,including VVC standard, includingtransforms transforms ofof rectangular-shaped rectangular-shaped blocks, blocks, with with each each side side
dimensionbeing dimension beinga apower powerofof two.Transform two. Transform coefficients coefficients areare quantised quantised forfor entropy entropy encoding encoding
into a bitstream. into a bitstream.
[00012] PBsororPUs
[00012] PBs PUsininVVC VVCmay may be generated be generated usingusing either either an intra-frame an intra-frame prediction prediction or an or an
inter-frame prediction inter-frame prediction process. Intra-frame prediction process. Intra-frame prediction involves involves the the use use of of previously previously processed processed
samples in a frame being used to generate a prediction of a current block of data samples in the samples in a frame being used to generate a prediction of a current block of data samples in the
frame. Inter-frame prediction involves generating a prediction of a current block of samples in frame. Inter-frame prediction involves generating a prediction of a current block of samples in
a frame a using aa block frame using of samples block of obtainedfrom samples obtained fromone oneorortwo twopreviously previouslydecoded decoded frames. frames. TheThe
block of block of samples obtainedfrom samples obtained froma apreviously previouslydecoded decoded frame frame is is offsetfrom offset fromthethespatial spatiallocation location of the current block according to a motion vector, which often has filtering applied. Intra-frame of the current block according to a motion vector, which often has filtering applied. Intra-frame
prediction blocks can be (i) a uniform sample value (“DC intra prediction”), (ii) a plane having prediction blocks can be (i) a uniform sample value ("DC intra prediction"), (ii) a plane having
an offset and horizontal and vertical gradient (“planar intra prediction”), (iii) a population of the an offset and horizontal and vertical gradient ("planar intra prediction"), (iii) a population of the block with neighbouring samples applied in a particular direction (“angular intra prediction”) or block with neighbouring samples applied in a particular direction ("angular intra prediction") or
(iv) the result of a matrix multiplication using neighbouring samples and selected matrix (iv) the result of a matrix multiplication using neighbouring samples and selected matrix
coefficients. coefficients.
[00013] Encodersand
[00013] Encoders anddecoders decoders conforming conforming to different to different video video encoding encoding standards standards may may be used be used
to compress to intermediatefeature compress intermediate feature maps mapsfrom froma afirst first portion portion (a (a ‘backbone’) of aa neural 'backbone') of neural network network
44204385_1 44204385_1
6
separated into separated into two two portions. portions. In In compression, the feature compression, the feature maps fromthe maps from the backbone backboneare arearranged arranged 07 Jun 2024
into aa frame into frame and and quantised fromaa floating-point quantised from floating-point domain to aa sample domain to sampledomain domain suitablefor suitable for compressionasasvideo compression videodata. data.Neural Neuralnetwork network layers,such layers, such asas convolutions,batch convolutions, batchnormalisations, normalisations, and activation functions, may be applied to reduce the dimensionality of the tensors prior to and activation functions, may be applied to reduce the dimensionality of the tensors prior to
compressionusing compression usinga avideo videocompression compression standard standard such such as as VVC. VVC. Dimensionality Dimensionality reduction reduction of of tensors reduces tensors the volume reduces the ofdata volume of data to to be be compressed, improving compressed, improving compression compression efficiency efficiency andand
reducing the reducing the runtime runtime of of the the VVC encoding VVC encoding andand decoding decoding stages. stages. Dimensionality Dimensionality reduction reduction 2024203901
introduces complexity introduces complexityoffsetting offsetting the the reduction reduction in in runtime runtime seen seen in in the the VVC encoding.A need VVC encoding. A need exists to exists tosupport support the theuse useofofencoders encodersand and decoders decoders conforming to various conforming to various video videoencoding encoding standards to standards to improve flexibility and improve flexibility and multi-encoder compatibility of multi-encoder compatibility of FCM implementations. FCM implementations.
SUMMARY SUMMARY
[00014]
[00014] ItItisisan anobject objectofofthethe present present invention invention to substantially to substantially overcome, overcome, or at or at least least ameliorate, ameliorate,
one or one or more disadvantagesofofexisting more disadvantages existingarrangements. arrangements.
[00015] Oneaspect
[00015] One aspectofofthe the present present disclosure disclosure provides provides aa method methodofofdecoding decodinga abitstream bitstreamtoto producetensors produce tensors for for use use by a neural by a neural network secondportion, network second portion, the the method methodcomprising: comprising: decoding decoding a a networkabstraction network abstraction layer layer (NAL) (NAL)unit unitfrom fromthe thebitstream bitstreamhaving havinga apredetermined predetermined length, length,
whereinthe wherein the NAL NAL unitofofthe unit thepredetermined predetermined length length indicatesa aNAL indicates NAL unit unit format format of of oneone inner inner
codec of a plurality of inner codecs, each other inner codec having NAL unit lengths different codec of a plurality of inner codecs, each other inner codec having NAL unit lengths different
to the predetermined length; selecting an inner codec from the plurality of inner codecs based to the predetermined length; selecting an inner codec from the plurality of inner codecs based
on the on the decoded NAL decoded NAL unit unit of of thepredetermined the predetermined length; length; and and decoding decoding the the bitstream bitstream using using thethe
selected inner codec to produce the tensors. selected inner codec to produce the tensors.
[00016] Anotheraspect
[00016] Another aspectofofthe thepresent present disclosure disclosure provides provides aa method methodofofencoding encodingtensors tensorstotoa a bitstream for bitstream for use use by by aa neural neural network secondportion, network second portion, the the method comprising:selecting method comprising: selectingananinner inner codec from the plurality of inner codecs for use in encoding tensors to the bitstream; encoding a codec from the plurality of inner codecs for use in encoding tensors to the bitstream; encoding a
networkabstraction network abstraction layer layer (NAL) (NAL)unit unittotothe the bitstream bitstream having havingaa predetermined predeterminedlength, length,wherein wherein the NAL the unitofofthe NAL unit the predetermined predeterminedlength lengthindicates indicatesaaNAL NAL unitformat unit format of of theselected the selectedinner inner codec of the plurality of inner codecs, each other inner codec having NAL unit lengths different codec of the plurality of inner codecs, each other inner codec having NAL unit lengths different
to the predetermined length; and encoding the tenors to the bitstream using the selected inner to the predetermined length; and encoding the tenors to the bitstream using the selected inner
codec. codec.
[00017] Another
[00017] Anotheraspect aspectofofthe thepresent present disclosure disclosure provides provides aa decoder decoderfor for decoding decodinga abitstream bitstreamto to producetensors produce tensors for for use use by a neural by a neural network secondportion, network second portion,the the decoder decoderconfigured configuredto: to: decode decodea a
44204385_1 44204385_1
7
networkabstraction network abstraction layer layer (NAL) (NAL)unit unitfrom fromthe thebitstream bitstreamhaving havinga apredetermined predetermined length, length, 07 Jun 2024
whereinthe wherein the NAL NAL unitofofthe unit thepredetermined predetermined length length indicatesa aNAL indicates NAL unit unit format format of of oneone inner inner
codec of a plurality of inner codecs, each other inner codec having NAL unit lengths different codec of a plurality of inner codecs, each other inner codec having NAL unit lengths different
to the predetermined length; select an inner codec from the plurality of inner codecs based on to the predetermined length; select an inner codec from the plurality of inner codecs based on
the decoded the NAL decoded NAL unit unit ofof thepredetermined the predetermined length; length; and and decode decode thethe bitstream bitstream using using thethe selected selected
inner codec to produce the tensors. inner codec to produce the tensors.
[00018] Another Anotheraspect aspectofofthe thepresent present disclosure disclosure provides provides aa non-transitory non-transitory computer-readable computer-readable 2024203901
[00018]
storage medium storage which medium which stores stores a a program program forfor executing executing a method a method of decoding of decoding a bitstream a bitstream to to producetensors produce tensors for for use use by a neural by a neural network secondportion, network second portion,the the method methodcomprising: comprising: decoding decoding a a networkabstraction network abstraction layer layer (NAL) (NAL)unit unitfrom fromthe thebitstream bitstreamhaving havinga apredetermined predetermined length, length,
whereinthe wherein the NAL NAL unitofofthe unit thepredetermined predetermined length length indicatesa aNAL indicates NAL unit unit format format of of oneone inner inner
codec of a plurality of inner codecs, each other inner codec having NAL unit lengths different codec of a plurality of inner codecs, each other inner codec having NAL unit lengths different
to the predetermined length; selecting an inner codec from the plurality of inner codecs based to the predetermined length; selecting an inner codec from the plurality of inner codecs based
on the on the decoded NAL decoded NAL unit unit of of thepredetermined the predetermined length; length; and and decoding decoding the the bitstream bitstream using using thethe
selected inner codec to produce the tensors. selected inner codec to produce the tensors.
[00019] Anotheraspect
[00019] Another aspectofofthe thepresent present disclosure disclosure provides provides an an encoder encoderfor for encoding encodingtensors tensorstotoaa bitstream for bitstream for use use by by aa neural neural network secondportion, network second portion, the the encoder configuredto: encoder configured to: select select an an inner inner codec from the plurality of inner codecs for use in encoding tensors to the bitstream; encode a codec from the plurality of inner codecs for use in encoding tensors to the bitstream; encode a
networkabstraction network abstraction layer layer (NAL) (NAL)unit unittotothe the bitstream bitstream having havingaa predetermined predeterminedlength, length,wherein wherein the NAL the unitofofthe NAL unit the predetermined predeterminedlength lengthindicates indicatesaaNAL NAL unitformat unit format of of theselected the selectedinner inner codec of the plurality of inner codecs, each other inner codec having NAL unit lengths different codec of the plurality of inner codecs, each other inner codec having NAL unit lengths different
to the predetermined length; and encode the tenors to the bitstream using the selected inner to the predetermined length; and encode the tenors to the bitstream using the selected inner
codec. codec.
[00020] Anotheraspect
[00020] Another aspectofofthe thepresent present disclosure disclosure provides provides aa non-transitory non-transitory computer-readable computer-readable
storage medium storage which medium which stores stores a a program program forfor executing executing a method a method of encoding of encoding tensors tensors to ato a bitstream for bitstream for use use by by aa neural neural network secondportion, network second portion, the the method comprising:selecting method comprising: selectingananinner inner codec from the plurality of inner codecs for use in encoding tensors to the bitstream; encoding a codec from the plurality of inner codecs for use in encoding tensors to the bitstream; encoding a
networkabstraction network abstraction layer layer (NAL) (NAL)unit unittotothe the bitstream bitstream having havingaa predetermined predeterminedlength, length,wherein wherein the NAL the unitofofthe NAL unit the predetermined predeterminedlength lengthindicates indicatesaaNAL NAL unit unit format format of of theselected the selectedinner inner codec of the plurality of inner codecs, each other inner codec having NAL unit lengths different codec of the plurality of inner codecs, each other inner codec having NAL unit lengths different
to the predetermined length; and encoding the tenors to the bitstream using the selected inner to the predetermined length; and encoding the tenors to the bitstream using the selected inner
codec. codec.
44204385_1 44204385_1
8
[00021] Other
[00021] Other aspects aspects are are also also disclosed. disclosed.
BRIEF DESCRIPTION BRIEF DESCRIPTION OF OF THE THE DRAWINGS DRAWINGS
[00022] Atleast
[00022] At least one embodiment one embodiment of of thepresent the presentinvention inventionwill willnow nowbebe described described with with reference reference
to the to the following following drawings andappendices, drawings and appendices,ininwhich: which: 2024203901
[00023] Fig. 11 is
[00023] Fig. is aaschematic schematic block block diagram showinga adistributed diagram showing distributedmachine machine tasksystem; task system;
[00024] Figs. 2A
[00024] Figs. 2Aand and2B2Bform form a schematic a schematic block block diagram diagram of aofgeneral-purpose a general-purpose computer computer
systemupon system uponwhich which thedistributed the distributedmachine machine tasksystem task system of of Fig.1 1may Fig. maybe be practiced; practiced;
[00025] Fig. 3A
[00025] Fig. 3Aisis aa schematic blockdiagram schematic block diagramshowing showing functional functional modules modules of aofbackbone a backbone portion of portion of aa CNN; CNN;
[00026] Fig. 3B
[00026] Fig. 3Bis is aa schematic block diagram schematic block diagramshowing showing a residualblock a residual blockofofFig. Fig.3A; 3A;
[00027] Fig. 3C
[00027] Fig. 3Cis is aa schematic block diagram schematic block diagramshowing showing a residualunit a residual unitofofFig. Fig. 3A; 3A;
[00028] Fig. 3D
[00028] Fig. 3Disis aa schematic blockdiagram schematic block diagramshowing showing a CBL a CBL module module of Fig. of Fig. 3A; 3A;
[00029] Fig. 44 is
[00029] Fig. is aaschematic schematic block block diagram showingfunctional diagram showing functionalmodules modules of of an an alternative alternative
backboneportion backbone portionofofaaCNN; CNN;
[00030] Fig. 55 is
[00030] Fig. is aaschematic schematic block block diagram of aa tensor diagram of tensor encoder using aa configurable encoder using configurable tensor tensor compressorstage; compressor stage;
[00031] Fig. 66 is
[00031] Fig. is aaschematic schematic block block diagram showinga amulti-scale diagram showing multi-scalefeature featurefusion fusionstage stage for for aa tensor compressor; tensor compressor;
[00032] Fig.7 7shows
[00032] Fig. shows a picture a picture structure structure withdelay with low lowand delay and one one level level ofinterpolation; of temporal temporal interpolation;
[00033] Fig. 88 is
[00033] Fig. is aaschematic schematic block block diagram showingfunctional diagram showing functionalmodules modules of of a video a video encoder; encoder;
[00034] Figs. 9A
[00034] Figs. 9A&&9B9B areschematic are schematic block block diagrams diagrams showing showing an arrangement an arrangement of regions of regions or or subpictures for subpictures for holding holding compressed featuremap compressed feature mapdata datafrom from compressed compressed tensor tensor data; data;
[00035] Fig. 10
[00035] Fig. 10 is is aa schematic schematic block diagramshowing block diagram showing thestructure the structureofofaa network networkabstraction abstraction layer (NAL) layer unit; (NAL) unit;
44204385_1 44204385_1
9
[00036] Fig. 11
[00036] Fig. 11 is is aa schematic schematic block diagramshowing block diagram showing a bitstreamholding a bitstream holding NAL NAL units units for for 07 Jun 2024
various compression various compressionstandards; standards;
[00037] Fig. 12
[00037] Fig. 12 is is aa schematic schematic block diagramshowing block diagram showing a tensordecoder a tensor decoder with with a configurable a configurable
tensor decompressor; tensor decompressor;
[00038] Fig. 13
[00038] Fig. 13 is is aa schematic schematic block diagramshowing block diagram showing functionalmodules functional modules of of a video a video decoder; decoder; 2024203901
[00039] Fig.
[00039] Fig. 14 14 is is aa schematic schematic block diagramshowing block diagram showinganan implementation implementation of aofconfigurable a configurable feature reconstruction feature reconstruction module performinga adecoder module performing decodernetwork network topology; topology;
[00040] Fig. 15
[00040] Fig. 15 is is aa schematic schematic block diagramshowing block diagram showinganan embodiment embodiment of a of a multi-scale multi-scale feature feature
reconstruction stage; reconstruction stage;
[00041] Fig. 16A
[00041] Fig. 16Aisis aa schematic schematicblock blockdiagrams diagramsshowing showing a head a head portion portion of of a CNN; a CNN;
[00042] Fig. 16B
[00042] Fig. 16Bisis aa schematic blockdiagram schematic block diagramshowing showing an an upscaler upscaler module module of Fig. of Fig. 16A;16A;
[00043] Fig. 16C
[00043] Fig. 16Cisis aa schematic blockdiagram schematic block diagramshowing showing a detection a detection module module of Fig. of Fig. 16A; 16A;
[00044] Fig. 17
[00044] Fig. 17 is is aa schematic schematic block diagramshowing block diagram showinganan alternativehead alternative headportion portionofofa aCNN; CNN;
[00045] Fig.
[00045] Fig. 18 18 shows showsa amethod method forperforming for performing a firstportion a first portionof of aa CNN, CNN,selecting selectinga afeature feature mapcompression map compression standard, standard, compressing compressing tensors tensors using using thethe selected selected feature feature map map compressor, compressor, and and encodingresulting encoding resulting compressed compressedtensors tensorsinto intoaabitstream; bitstream;
[00046] Fig.
[00046] Fig. 19 19 shows showsa amethod method fordecoding for decoding a bitstream,determining a bitstream, determining a selectedfeature a selected featuremap map compressionstandard, compression standard,reconstructing reconstructingtensors tensorsaccording accordingtotoan anselected selected feature feature map compression map compression
standard, and standard, performingaasecond and performing secondportion portionofofthe the CNN; CNN; and and
[00047] Appendix
[00047] Appendix A shows A shows a syntax a syntax table table forfor NALNAL units units conforming conforming to AVC to the the standard, AVC standard,
[00048] Appendix
[00048] Appendix B shows B shows a syntax a syntax table table forfor NALNAL units units conforming conforming to HEVC to the the HEVC standard; standard;
[00049] Appendix
[00049] Appendix C shows C shows a syntax a syntax table table forfor NALNAL units units conforming conforming to VVC to the the VVC standard; standard;
[00050] Appendix
[00050] Appendix D shows D shows a syntax a syntax structure structure forfor identifyingoneone identifying manner manner of feature of feature mapmap
compression,such compression, suchasasaavideo videocompression compression standard,outoutofofa aplurality standard, plurality of of manners offeature manners of feature map compression; map compression; and and
44204385_1 44204385_1
10
[00051] Appendix
[00051] Appendix E shows E shows syntax syntax structures structures forfor signallinga afeature signalling featurecoding codingfor formachines machines vision vision 07 Jun 2024
modelparameter model parameterset set(FCM (FCM VMPS), VMPS), sequence sequence parameter parameter set SPS), set (FCM (FCMand SPS), and picture picture parameter parameter
set (FCM set PPS). (FCM PPS).
DETAILED DESCRIPTION DETAILED DESCRIPTION INCLUDING INCLUDING BEST BEST MODE MODE
[00052] Wherereference
[00052] Where referenceisismade madein in any any one one or or more more of of thethe accompanying accompanying drawings drawings to steps to steps
and/or features, and/or features, which which have the same have the referencenumerals, same reference numerals,those thosesteps stepsand/or and/orfeatures features have have for for 2024203901
the purposes of this description the same function(s) or operation(s), unless the contrary the purposes of this description the same function(s) or operation(s), unless the contrary
intention appears. intention appears.
[00053]
[00053] AAdistributed distributed machine machinetask tasksystem systemmay may include include an an edge edge device, device, such such as as a network a network
cameraororsmartphone camera smartphoneproducing producing intermediate intermediate compressed compressed data.data. The distributed The distributed machine machine task task system may also include a final device, such as a server farm based (‘cloud’) application, system may also include a final device, such as a server farm based ('cloud') application,
operating on the intermediate compressed data to produce a task result. Additionally, the edge operating on the intermediate compressed data to produce a task result. Additionally, the edge
device functionality device functionality may be embodied may be embodied inin thecloud the cloudand andthe theintermediate intermediatecompressed compressed data data maymay
be stored for later processing, potentially for multiple different tasks depending on need. be stored for later processing, potentially for multiple different tasks depending on need.
[00054] AAconvenient
[00054] convenientform form of of intermediate intermediate compressed compressed datadata is ais compressed a compressed video video bitstream, bitstream,
owingtoto the owing the availability availability ofofhigh-performing high-performing compression standardsand compression standards andimplementations implementations thereof. Video thereof. compressionstandards Video compression standardstypically typicallyoperate operateononinteger integer samples samplesofofsome somegiven given bit bit
depth, such as 10 bits, arranged in planar arrays. Colour video has three planar arrays, depth, such as 10 bits, arranged in planar arrays. Colour video has three planar arrays,
corresponding, for corresponding, for example, example,toto colour colour components components Y, Y, Cb,Cb, Cr,Cr, or or R,R, G,G, B,B, depending depending on on application. CNNs application. CNNs typicallyoperate typically operateononfloating floatingpoint pointdata data in in the the form of tensors form of tensors but but may also may also
operate oninteger operate on integer data, data, also also forming forming tensors. tensors. Tensors Tensors generally generally have a relatively have a relatively smaller spatial smaller spatial
dimensionality compared dimensionality comparedtoto incoming incoming video video data data upon upon which which the the CNN CNN operates operates while while havinghaving
morechannels more channelsthan thanthe thethree three channels channelstypical typical of of colour colour video data, for video data, for example 128, 256, example 128, 256, or or 512 channels. 512 channels.
[00055] Tensorstypically
[00055] Tensors typically have havethe the following followingdimensions: dimensions:frames, frames,channels, channels,height, height,and andwidth. width. For example, a tensor of dimensions [1, 256, 76, 136] would be said to contain floating-point or For example, a tensor of dimensions [1, 256, 76, 136] would be said to contain floating-point or
integer values integer values for for one one frame frame comprising anarray comprising an array of of two-hundred two-hundredand andfifty-six fifty-six (256) (256)feature feature maps(channels), maps (channels),each eachofof size size 136x76. 136×76.For Forvideo videodata, data,inferencing inferencingis is typically typically performed one performed one
frame at a time (frame or 'batch’ value of 1), rather than using tensors containing multiple frame at a time (frame or 'batch' value of 1), rather than using tensors containing multiple
frames.VVC, frames. HEVC, VVC, HEVC, andand AVC AVC support support a division a division of a of a picture picture intointo ’slices’,ororcontiguous 'slices', contiguous sequencesofof coded sequences codedCTUs CTUsor or Macroblocks Macroblocks in the in the casecase of AVC. of AVC. In VVCInand VVC andaHEVC, HEVC, 'tile' a ‘tile’
44204385_1 44204385_1
11
mechanism mechanism is isalso alsoavailable availableto to divide divide aa picture picture into intoaanumber number of of independently decodeable independently decodeable 07 Jun 2024
regions. regions.
[00056] Fig. 11 is
[00056] Fig. is aaschematic schematic block block diagram showingfunctional diagram showing functionalmodules modules of of a distributed a distributed
machinetask machine tasksystem system100, 100,capable capableofofperforming performing a machine a machine task task network network in aindistributed a distributed manner.The manner. The divisionofofa aparticular division particular neural neural network networkinto into two twoportions portionsrequires requires specifying specifying aa ‘split 'split point’ in the point' in network.Layers the network. Layers in network in the the network from from the thelayer input input up layer to theup to the split split point are point are
performedininaa first first device device and and the the resulting resultingintermediate intermediatetensor(s) areare compressed. compressed. Layers 2024203901
performed tensor(s) Layers
from the split point up to the last layer in the network are performed using decompressed from the split point up to the last layer in the network are performed using decompressed
tensor(s) from the first device as input to the layer(s) immediately following the split point. At tensor(s) from the first device as input to the layer(s) immediately following the split point. At
the split the splitpoint pointthere may there maybe beone one or ormore more tensors tensors that thatneed need to tobe becompressed for conveyance compressed for conveyance
over aa communication over channel communication channel with with limited limited bandwidth bandwidth compared compared to bandwidth to the the bandwidth requirement requirement
for transmission for transmission of of uncompressed tensors.Where uncompressed tensors. Where a ‘feature a 'feature pyramid pyramid network’ network' (FPN) (FPN) is inisuse, in use, it is common for layers in the FPN to be related in width and height such that a given layer is it is common for layers in the FPN to be related in width and height such that a given layer is
half the half the width width and and half half the theheight heightof ofan anadjacent adjacentlayer among layer among the the layers. layers. FPN architectures may FPN architectures may
also involve the width and height halving alternatively from one layer to the next layer. In also involve the width and height halving alternatively from one layer to the next layer. In
somearchitectures, some architectures, multiple multiple tensors tensors of of the thesame same width and height width and height are are produced withinthe produced within the FPN. FPN. An FPN may occur relatively early in the neural network topology, resulting in a necessity for a An FPN may occur relatively early in the neural network topology, resulting in a necessity for a
split point to occur within the FPN in order for a useful division of the network workload across split point to occur within the FPN in order for a useful division of the network workload across
the edge the device and edge device and the the cloud cloud to to be be achieved. When achieved. When a splitoccurs a split occurswithin withinthe theFPN FPNofof the the
machinetask machine tasknetwork, network,performance performanceof of a varietyofofmachine a variety machine task task networks networks where where layers layers up up to to the split the splitpoint pointare arecommon among common among themachine the machine task task networks networks (‘shared ('shared backbone’ backbone' architecture) architecture)
maybebeachieved. may achieved.Where Where a splitpoint a split pointoccurs occurswithin withinthe theFPN, FPN, tensorcompression tensor compression methods methods may may exploit redundancies exploit across the redundancies across the FPN layersto FPN layers to improve improvecompression compression performance. performance. Compression Compression
methodsapplicable methods applicabletotothe the various various network networktopologies topologiesused usedinincontemporary contemporary CNNs CNNs are therefore are therefore
beneficial for application in a wide range of scenarios. beneficial for application in a wide range of scenarios.
[00057] Thesystem
[00057] The system100 100 may may be be used used for for implementing implementing methods methods for decorrelating, for decorrelating, packing packing and and
quantising feature quantising feature maps into planar maps into planar frames for encoding frames for anddecoding encoding and decodingfeature featuremaps maps from from
encodeddata encoded datafor for various various neural neural networks. networks.Various Variousneural neuralnetworks networks maymay be split be split at at different different
points and points mayresult and may result in in intermediate intermediate tensors tensors of of various various number anddimensionality. number and dimensionality.A A feature feature
compressionscheme compression scheme capable capable of of adapting adapting to to differenttypes different typesofofintermediate intermediatedata dataand andcapable capableofof providing different quality reconstruction results in advantageous flexibility. Moreover, the providing different quality reconstruction results in advantageous flexibility. Moreover, the
system 100 provides flexibility to interface neural networks of various architectures and for system 100 provides flexibility to interface neural networks of various architectures and for
various applications subjected to splitting into portions (e.g., for distributed execution). various applications subjected to splitting into portions (e.g., for distributed execution).
44204385_1 44204385_1
12
[00058] Thesystem
[00058] The system100 100 includes includes a sourcedevice a source device 110 110 forfor generating generating frame frame data data 113. 113. TheThe 07 Jun 2024
frame data frame data 113 113is is passed to aa CNN passed to backbone CNN backbone 114114 to produce to produce tensors tensors 115. 115. The The tensors tensors 115 115 are are passed to passed to aa tensor tensor encoder encoder 116, 116, which producesananencoded which produces encoded bitstream bitstream 121. 121. TheThe system system 100 100 also also
includes a destination device 140 for decoding tensor data in the form of a received includes a destination device 140 for decoding tensor data in the form of a received
bitstream 143. bitstream 143. The Thedestination destinationdevice device140 140may maybe be used used forfor decoding decoding thethe tensor tensor data data (or (or
tensors) for content (e.g., of audio data, video data, image data, and textual data) of the tensors) for content (e.g., of audio data, video data, image data, and textual data) of the
bitstream 143. bitstream 143. 2024203901
[00059]AAcommunication
[00059] communication channel channel 130 130 is used is used to communicate to communicate the encoded the encoded bitstream bitstream 121 121 from from the source the source device 110 to device 110 to the the destination destination device device 140. In some 140. In arrangements,the some arrangements, thesource source device 110 device 110and anddestination destination device device 140 140may mayeither eitherororboth bothcomprise compriserespective respectivemobile mobile telephone telephone
handsets (e.g., handsets (e.g., “smartphones”) or network "smartphones") or networkcameras camerasand and cloud cloud applications.TheThe applications. communication communication
channel 130 channel 130may maybebea awired wiredconnection, connection, such such as as Ethernet,orora awireless Ethernet, wirelessconnection, connection,such suchasas WiFioror 5G, WiFi 5G,including includingconnections connectionsacross acrossa aWide Wide Area Area Network Network (WAN). (WAN). The communication The communication
channel 130 channel 130may mayalso alsobebeimplemented implemented across across ad-hoc ad-hoc connections. connections. Moreover, Moreover, the source the source
device 110 device 110and andthe the destination destination device device 140 140may maycomprise comprise applications applications where where encoded encoded video video datadata
is captured is captured on on some computer-readable some computer-readable storagemedium, storage medium, suchsuch as aashard a hard disk disk drive drive in in a file a file
server or server or memory. Although memory. Although thethe system system 100 100 is described is described as as including including thethe video video source source 112, 112,
whichwould which wouldprovide provide theframe the frame data113113 data forfor a a neuralnetwork neural network targetinga acomputer targeting computer vision vision
application, other types of source data, such as audio or text, may be input to a suitable neural application, other types of source data, such as audio or text, may be input to a suitable neural
network implemented network in the implemented in theCNN CNN backbone backbone 114 114 and and aa CNN head 150. CNN head 150. The The CNN backbone114 CNN backbone 114 mayalso may alsobe bereferred referred to to as as aa neural neuralnetwork network first firstportion portionoror NNNN part part1.1.The TheCNN head150 CNN head 150may may also be referred to as a neural network second portion or NN part 2. also be referred to as a neural network second portion or NN part 2.
[00060]As
[00060] Asshown shownin in Fig.1,1,the Fig. the source sourcedevice device110 110includes includesa avideo videosource source112, 112,the theCNN CNN backbone114, backbone 114,the thetensor tensorencoder encoder116, 116,and anda atransmitter transmitter122. 122.The Thevideo video source source 112 112 typically typically
comprisesaa source comprises sourceof of captured capturedvideo videoframe framedata data(shown (shownasas 113),such 113), suchasasananimage image capture capture
sensor, aa previously sensor, previously captured captured video sequencestored video sequence stored on onaa non-transitory non-transitory recording recording medium, medium,orora a video feed video feed from fromaa remote remoteimage imagecapture capturesensor. sensor.TheThe video video source source 112112 may may also also beoutput be an an output of of a computer a graphicscard, computer graphics card, for for example, example,displaying displayingthe the video videooutput outputof of an an operating operating system systemand and various applications various applications executing uponaa computing executing upon computingdevice device (e.g.,aa tablet (e.g., tablet computer). Examples computer). Examples of of
source devices source devices 110 110that that may mayinclude includeananimage imagecapture capturesensor sensorasasthe thevideo videosource source112 112include include smart-phones,video smart-phones, videocamcorders, camcorders,professional professionalvideo videocameras, cameras, and and network network video video cameras. cameras. The The video source video source 112 112may mayproduce produce independent independent images images or may or may produce produce temporally temporally sequential sequential
images, i.e., a video. images, i.e., a video.
44204385_1 44204385_1
13
[00061] Theneural
[00061] The neuralnetwork networkimplemented implemented in the in the CNNCNN backbone backbone 114 114 and theand CNNthe CNN head 150head 150 07 Jun 2024
maydepend may dependonon theapplication. the application.For Forexample, example, a ‘YOLOv3’ a 'YOLOv3' network network may bemay usedbe asused one as oneofpart part of an object an object tracking tracking system and aa 'FasterRCNN' system and ‘FasterRCNN’ network network may may be used be used as anasobject an object detection detection
system. The system. Thenumber numberandand dimensionality dimensionality of tensors of tensors 115115 depends depends on aon a particular particular network network
performed in the system 100 and the split point of the particular network. performed in the system 100 and the split point of the particular network.
[00062] TheCNN
[00062] The CNN backbone backbone 114 receives 114 receives the video the video frameframe data data 113performs 113 and and performs specific specific layerslayers of an an overall overall CNN, suchasaslayers layers corresponding correspondingtotothe the'backbone' ‘backbone’ofofthe theCNN, CNN, outputting 2024203901
of CNN, such outputting
tensors 115. tensors Thebackbone 115. The backbone layersofofthe layers theCNN CNNmaymay produce produce multiple multiple tensors tensors as output, as output, for for example, corresponding to different spatial scales of an input image represented by the video example, corresponding to different spatial scales of an input image represented by the video
frame data frame data 113 113when whensplitting splitting the the network networkwithin withinthe theFPN. FPN.An An FPNFPN may result may result in three in three tensors, tensors,
corresponding to three layers, output from the backbone 114 as the tensors 115 (e.g., if a corresponding to three layers, output from the backbone 114 as the tensors 115 (e.g., if a
‘YOLOv3’ network 'YOLOv3' network is performed is performed by the by the system system 100),100), withwith varying varying spatial spatial resolution resolution and and
channel count. channel count. When Whenthethe system system 100100 is performing is performing networks networks such such as ‘Faster as 'Faster RCNNRCNN X101- X101- FPN’oror'Mask FPN' ’MaskRCNN RCNN X101-FPN’ X101-FPN' the tensors the tensors 115 may115 may include include tensors tensors for fourfor four layers layers (P2-P5). (P2-P5).
Use of a FPN results in a plurality of tensors forming a hierarchical representation for a single Use of a FPN results in a plurality of tensors forming a hierarchical representation for a single
frame to frame to be be encoded encodedtoto(and (anddecoded decodedfrom) from) thebitstream the bitstreamwhen when thethe splitpoint split pointofofthe the network network occurs within occurs within the the FPN, as described FPN, as describedhereafter. hereafter. The Thetensor tensorencoder encoder116 116produces produces thethe encoded encoded
bitstream 121 bitstream 121 from fromthe thetensors tensors 115. 115.
[00063] The
[00063] The bitstream bitstream 121 121 is is supplied supplied to the to the transmitter transmitter 122 for 122 for transmission transmission over the over the communications channel 130 or the bitstream 121 is written to storage 132 for later use. communications channel 130 or the bitstream 121 is written to storage 132 for later use.
[00064] Thesource
[00064] The sourcedevice device110 110supports supportsa aparticular particularnetwork networkfor forthe theCNN CNN backbone backbone 114.114.
However,the However, thedestination destinationdevice device140 140may mayuseuse one one of of severalnetworks several networks forfor thethehead head CNNCNN 150. 150.
In using In using one of several one of several networks for the networks for the head head CNN 150,partially CNN 150, partiallyprocessed processeddata dataininthe the form formof of packedfeature packed feature maps mapsmay maybe be stored stored forlater for later use use in in performing performingvarious varioustasks tasks without withoutneeding needingtoto again perform again performthe the operation operation of of the the CNN backbone CNN backbone 114. 114.
[00065] Thebitstream
[00065] The bitstream121 121isistransmitted transmittedby bythe the transmitter transmitter 122 over the 122 over the communication communication
channel 130 channel 130asas encoded encodeddata. data.The The bitstream bitstream 121 121 cancan in in some some implementations implementations be stored be stored in ain a storage memory storage 132,where memory 132, where thethe storage storage 132 132 is is a a non-transitorystorage non-transitory storagedevice devicesuch suchasasa a"Flash" “Flash” memory memory or or a a harddisk hard diskdrive, drive,until until later laterbeing being transmitted transmitted over over the thecommunication channel130 communication channel 130 (or in-lieu (or in-lieuofoftransmission transmissionover overthe thecommunication channel130). communication channel 130).For Forexample, example, encoded encoded video video
44204385_1 44204385_1
14
data may data beserved may be servedupon upondemand demand to customers to customers overover a wide a wide areaarea network network (WAN)(WAN) for a for a video video 07 Jun 2024
analytics application. analytics application.
[00066] Thedestination
[00066] The destinationdevice device140 140includes includesa areceiver receiver 142, 142,aa tensor tensor decoder decoder146, 146,the the CNN CNN head 150, head 150, and andaa CNN CNN task task resultbuffer result buffer152. 152.The The receiver142 receiver 142 receivesencoded receives encoded video video data data
from the from the communication communication channel channel 130130 and and passes passes the the bitstream bitstream 143143 to the to the tensor tensor decoder decoder 146. 146.
Thetensor The tensor decoder decoder146 146outputs outputsdecoded decoded tensors149, tensors 149, which which areare supplied supplied to to theCNN the CNN headhead 150.150.
TheCNN CNN head 150150 receives thethe tensors 149149 andand performs the the later layers ofof theneural neuralnetwork network 2024203901
The head receives tensors performs later layers the
that began that with the began with the CNN backbone CNN backbone 114114 to produce to produce a task a task result result 151. 151. TheThe task task result151 result 151isis stored in the task result buffer 152. The contents of the task result buffer 152 may be presented stored in the task result buffer 152. The contents of the task result buffer 152 may be presented
to the user (e.g., via a graphical user interface), or provided to an analytics application where to the user (e.g., via a graphical user interface), or provided to an analytics application where
someaction some actionis is decided basedon decided based onthe the task task result, result, which which may includesummary may include summary level level presentation presentation
of aggregated task results to a user. It is also possible for the functionality of each of the source of aggregated task results to a user. It is also possible for the functionality of each of the source
device 110 device 110 and andthe the destination destination device device 140 140to to be be embodied embodiedinina asingle singledevice, device, examples examplesofofwhich which include mobile include mobiletelephone telephonehandsets handsetsand andtablet tabletcomputers computersand andcloud cloud applications. applications.
[00067] Asseen
[00067] As seenininFig. Fig. 1, 1, the the system system 100 also comprises 100 also comprises aa tensor tensor codec codecrepository repository 180. 180. The The codec repository codec repository 180 180may mayinclude includenetwork network topologies topologies covering covering a variety a variety ofof neuralnetworks neural networks andand
associated split points, and reconstruction fidelity levels. The network topologies may be associated split points, and reconstruction fidelity levels. The network topologies may be
stored in in the tensor codec repository 180 for future reference (or use). The tensor codec stored in in the tensor codec repository 180 for future reference (or use). The tensor codec
repository 180 repository maybebeaccessed 180 may accessed'out ‘outofofband' band’ororseparately separatelystored stored in in each of the each of the source source
device 110 device 110 and andthe the destination destination device device 140. 140. InIn other other words, words,the the tensor tensor codec codec repository repository 180 180may may be accessed be accessed over over aa network networkbybythe thesource sourcedevice device110 110and andthethedestination destinationdevice device140 140rather ratherthan than via the via the bitstream bitstream 143. 143. A networktopology A network topologyidentifier identifier 174 174and and176 176may maybe be sent sent by by thethe tensor tensor
encoder 116 encoder 116and andthe thetensor tensordecoder decoder146, 146,respectively, respectively, to to the the tensor tensor codec codec repository repository 180. The 180. The
networktopology network topologyidentifiers identifiers 174 174 and and176 176may maybe be used used forfor determining determining a given a given network network
topologyfrom topology fromthe thebitstream bitstream143. 143.
[00068] Asaaresult
[00068] As result of of aa request request for fora agiven givennetwork network topology, topology, aa network topology172 network topology 172and and178 178 maybebereturned may returnedbybythe thetensor tensor codec codecrepository repository180 180totothe the tensor tensor encoder encoder116 116and andthe thetensor tensor decoder 146, decoder 146, respectively. respectively. As Asdescribed describedinindetail detail below, the information below, the including the information including the network network topologyidentifier topology identifier 178 178 may bedecoded may be decodedand andused used by by thethe tensordecoder tensor decoder 146 146 forfor producing producing
decodedtensors decoded tensorsusing usingthe the determined determinednetwork network topology. topology. TheThe tensor tensor codec codec repository repository 180 180 may may be accessible via public file repository or within a private network accessible to the source be accessible via public file repository or within a private network accessible to the source
device 110 device 110 and andthe the destination destination device device 140. 140. AAgiven givennetwork network topology topology defines defines thethe composition composition
44204385_1 44204385_1
15
and interconnection of a set of machine learning primitive operations, including convolutions, and interconnection of a set of machine learning primitive operations, including convolutions, 07 Jun 2024
batch normalisations, batch normalisations, activation activation functions, functions, concatenations. concatenations. A networktopology A network topologymay may be be available for a split point. However, with mismatch in the supported dimensionality, in available for a split point. However, with mismatch in the supported dimensionality, in
particular the particular thespatial spatialdimensions dimensions of ofthe thefeature maps feature mapsmay may differ differ from from those those provided provided by CNN by CNN
backbone114. backbone 114.Moreover, Moreover, thethe data data type type provided provided from from the the CNN CNN backbone backbone 114 and114 and supplied supplied to to the CNN the head CNN head 150 150 maymay differ differ from from that that used used internallybyby internally thenetwork the network topology. topology. For For example, example,
integer inferencing integer inferencing is iscommonly useddue commonly used duetotoits its reduced reducedcomplexity complexitycompared compared to floating-point to floating-point 2024203901
inferencing. Where inferencing. Wherea anetwork network topology topology is is availablebut available butconfigured configuredtotouse usefloating-point floating-point inferencing, an inferencing, an adaptation adaptation between integer and between integer and floating-point floating-point domains is needed domains is neededtoto couple couplethe the networktopology network topologyimplemented implemented in the in the tensor tensor encoder encoder 116116 andand the the tensor tensor decoder decoder 146 146 withwith the the CNNbackbone CNN backbone114 114and andthe the CNN CNNhead head150. 150.
[00069] Thevideo
[00069] The videosource source112 112cancan provide provide vision vision model model parameters parameters 113a113a to the to the tensor tensor encoder encoder
116, asdescribed 116, as described hereafter. hereafter. TheThe vision vision modelmodel parameters parameters 113atheinclude 113a include spatial the spatial ofresolution of resolution
the frame data 113, used for bounding boxes (an example of the task result 151) to be scaled to the frame data 113, used for bounding boxes (an example of the task result 151) to be scaled to
correspond to the resolution of the frame data 113. correspond to the resolution of the frame data 113.
[00070] Thearrangements
[00070] The arrangements described described allow allow a different'inner a different ‘innercodec' codec’totobebeselected selectedand andused used based on based on implementation implementationrequirements. requirements. In In thecontext the contextofofthe thearrangements arrangements described,thethe'inner described, ‘inner codec’ relates codec' relates to tothe thefunctionality functionalityforfor encoding encodingtensors tensorsreduced reducedinindimensionality dimensionalitycompared to compared to
the tensors the tensors 115 115 from the CNN from the backbone CNN backbone 114 114 (or (or a feature a feature pyramid) pyramid) for for transmission transmission between between
the source the source device 110 and device 110 andthe the destination destination device 140, and device 140, and correspondingly correspondinglydecoding decodinga a bitstream bitstream
to produce to compressedtensors produce compressed tensorsafter afterreception receptionat at the the destination destination device device 140, 140, where the where the
compressedtensors compressed tensorswill willbe befurther further processed processedto to produce producetensors tensors 149, 149, restored restored in in dimensionality dimensionality
to correspond to the tensors 115. The ‘inner codec’ generates and decodes a bitstream in the to correspond to the tensors 115. The 'inner codec' generates and decodes a bitstream in the
examplesdescribed. examples described.InInother other implementations, implementations,a adifferent different encoded encodedoutput, output,for for example examplea a packedframe packed framemay maybe be used. used.
[00071] Notwithstanding
[00071] Notwithstanding theexample the example devices devices mentioned mentioned above, above, each each of source of the the source device device 110 110 and destination and destination device 140 may device 140 maybebeconfigured configuredwithin withina ageneral-purpose general-purpose computing computing system, system,
typically through typically through a a combination of hardware combination of hardwareand andsoftware softwarecomponents. components. Fig.Fig. 2A illustrates 2A illustrates such such
a computer a system200, computer system 200,which which includes:a acomputer includes: computer module module 201;201; input input devices devices suchsuch as aas a keyboard202, keyboard 202,aamouse mouse pointerdevice pointer device203, 203,a ascanner scanner226, 226,a acamera camera 227, 227, which which may may be be configured as configured as the the video source 112, video source 112, and and aa microphone microphone280; 280; and and output output devices devices including including a a printer 215, printer 215, aa display displaydevice device 214 214 and and loudspeakers 217. AnAnexternal loudspeakers 217. externalModulator-Demodulator Modulator-Demodulator
44204385_1 44204385_1
16
(Modem) (Modem) transceiverdevice transceiver device 216 216 maymay be used be used by the by the computer computer module module 201communicating 201 for for communicating 07 Jun 2024
to and to and from from aa communications communications network network 220 220 via via a connection a connection 221.221. The communications The communications
network220, network 220,which whichmay may represent represent thethe communication communication channel channel 130, 130, may may be be a (WAN), a (WAN), such as such as the Internet, the Internet,a acellular cellulartelecommunications telecommunications network, network, or or aa private privateWAN. Where WAN. Where thethe
connection221 connection 221isis aa telephone telephone line, line, the themodem 216may modem 216 may be be a traditional"dial-up" a traditional “dial-up”modem. modem. Alternatively, where the connection 221 is a high capacity (e.g., cable or optical) connection, Alternatively, where the connection 221 is a high capacity (e.g., cable or optical) connection,
the modem the 216 modem 216 maymay be abebroadband a broadband modem. modem. A wireless A wireless modem modem may also may alsofor be used be wireless used for wireless 2024203901
connectionto connection to the the communications network communications network 220. 220. The The transceiver transceiver device device 216 216 may provide may provide the the functionality of functionality of the thetransmitter transmitter122 122and andthe thereceiver receiver142 142and andthe thecommunication channel130 communication channel 130 maybebeembodied may embodiedin in theconnection the connection 221. 221.
[00072] Thecomputer
[00072] The computer module module 201 201 typically typically includes includes at at leastone least oneprocessor processor unit205, unit 205,and anda a memory memory unit206. unit 206.ForFor example, example, thethe memory memory unit unit 206 have 206 may may semiconductor have semiconductor random random access access memory(RAM) memory (RAM) and and semiconductorread semiconductor readonly only memory memory(ROM). (ROM).TheThe computer computer module module 201201 also also
includes a number of input/output (I/O) interfaces including: an audio-video interface 207 that includes a number of input/output (I/O) interfaces including: an audio-video interface 207 that
couples to couples to the the video video display display 214, 214, loudspeakers 217and loudspeakers 217 andmicrophone microphone 280; 280; an an I/OI/O interface213213 interface
that couples that couples to to the thekeyboard keyboard 202, 202, mouse 203,scanner mouse 203, scanner226, 226,camera camera 227 227 andand optionally optionally a joystick a joystick
or other human interface device (not illustrated); and an interface 208 for the external or other human interface device (not illustrated); and an interface 208 for the external
modem modem 216216 andand printer printer 215. 215. TheThe signal signal from from the the audio-video audio-video interface interface 207207 to the to the computer computer
monitor214 monitor 214isis generally generally the the output output of of aa computer graphics card. computer graphics card. In In some someimplementations, implementations,thethe
modem modem 216216 maymay be incorporated be incorporated within within the the computer computer module module 201,example 201, for for example within within the the interface 208. interface The computer 208. The computermodule module 201201 also also hashas a local a local network network interface interface 211, 211, which which permits permits
coupling of the coupling of the computer system200 computer system 200viaviaa aconnection connection223223 to to a a local-areacommunications local-area communications network222, network 222,known knownas as a Local a Local Area Area Network Network (LAN). (LAN). As illustrated As illustrated in Fig. in Fig. 2A, 2A, the local the local
communications communications network network 222222 may may also also couple couple to the to the widewide network network 220a via 220 via a connection connection 224, 224, which would typically include a so-called “firewall” device or device of similar functionality. which would typically include a so-called "firewall" device or device of similar functionality.
Thelocal The local network networkinterface interface 211 211may maycomprise comprise an an EthernetTM EthernetTM circuit circuit BluetoothTM card,a aBluetooth card,
wireless arrangement wireless oran arrangement or anIEEE IEEE802.11 802.11 wireless wireless arrangement; arrangement; however, however, numerous numerous other other typestypes
of interfaces of interfaces may be practiced may be practiced for for the theinterface interface211. 211. The The local localnetwork network interface interface 211 211 may also may also
provide the provide the functionality functionality of of the thetransmitter transmitter122 122and andthe thereceiver receiver142 142and andcommunication communication
channel 130 channel 130may mayalso alsobebeembodied embodiedin in thethe localcommunications local communications network network 222. 222.
[00073] The
[00073] The I/OI/O interfaces interfaces 208213 208 and andmay213 mayeither afford afford or either both ofor both and serial of serial and parallel parallel
connectivity, the connectivity, the former former typically typically being being implemented accordingtotothe implemented according theUniversal UniversalSerial Serial Bus Bus (USB)standards (USB) standardsand andhaving having corresponding corresponding USBUSB connectors connectors (not (not illustrated). illustrated). Storage Storage
44204385_1 44204385_1
17
devices 209 devices 209 are are provided providedand andtypically typically include include aa hard hard disk disk drive drive (HDD) 210.Other (HDD) 210. Other storage storage 07 Jun 2024
devices such as a floppy disk drive and a magnetic tape drive (not illustrated) may also be used. devices such as a floppy disk drive and a magnetic tape drive (not illustrated) may also be used.
An optical disk drive 212 is typically provided to act as a non-volatile source of data. Portable An optical disk drive 212 is typically provided to act as a non-volatile source of data. Portable
TM memory memory devices,such devices, such opticaldisks optical disks(e.g. (e.g. CD-ROM, CD-ROM, DVD,DVD, BluDiscTM), Blu ray ray DiscUSB-RAM, ), USB-RAM, portable, external hard drives, and floppy disks, for example, may be used as appropriate portable, external hard drives, and floppy disks, for example, may be used as appropriate
sources of sources of data data to to the thecomputer computer system 200.Typically, system 200. Typically,any anyofofthe theHDD HDD 210, 210, optical optical drive212, drive 212, networks220 networks 220and and222 222maymay also also be be configured configured to to operate operate as as thevideo the video source source 112, 112, or or asas a a 2024203901
destination for destination for decoded video data decoded video data to to be be stored stored for forreproduction reproduction via viathe thedisplay display214. 214. The The source source
device 110 device 110 and andthe the destination destination device device 140 140of of the the system 100may system 100 maybebeembodied embodied in the in the computer computer
system200. system 200.
[00074] Thecomponents
[00074] The components205205 to 213 to 213 of the of the computer computer module module 201 typically 201 typically communicate communicate via an via an
interconnected bus interconnected bus 204 204and andininaa manner mannerthat thatresults results in in aa conventional conventional mode ofoperation mode of operationofof the the computersystem computer system200 200 known known to those to those in in thethe relevantart. relevant art.For Forexample, example, theprocessor the processor 205 205 is is
coupledto coupled to the the system bus 204 system bus 204using usingaaconnection connection218. 218.Likewise, Likewise, thethe memory memory 206 206 and optical and optical
disk disk drive drive 212 212 are are coupled to the coupled to the system system bus 204 by bus 204 byconnections connections219. 219.Examples Examples of computers of computers
on which on whichthe thedescribed describedarrangements arrangementscancan bebe practisedinclude practised includeIBM-PC's IBM-PC’s and and compatibles, compatibles, Sun Sun SPARCstations, Apple SPARCstations, Apple Mac MacM orTM or alike alike computer computer systems. systems.
[00075] Thetensor
[00075] The tensorencoder encoder116, 116,the thetensor tensordecoder decoder146 146and and methods methods to be to be described, described, maymay be be
implementedasasone implemented oneorormore more software software application application programs programs 233 233 executable executable within within the the computer computer
system 200. In particular, the tensor encoder 116, the tensor decoder 146 and the steps of the system 200. In particular, the tensor encoder 116, the tensor decoder 146 and the steps of the
described methods are effected by instructions 231 (see Fig. 2B) in the software 233 that are described methods are effected by instructions 231 (see Fig. 2B) in the software 233 that are
carried out carried out within within the the computer system200. computer system 200.The The software software instructions231 instructions 231maymay be be formed formed as as one or one or more codemodules, more code modules,each each forperforming for performing oneone or or more more particular particular tasks.TheThe tasks. software software maymay
also be divided into two separate parts, in which a first part and the corresponding code also be divided into two separate parts, in which a first part and the corresponding code
modulesperforms modules performsthethedescribed describedmethods methods andand a second a second partpart andand thethe corresponding corresponding codecode
modules manage a user interface between the first part and the user. modules manage a user interface between the first part and the user.
[00076] Thesoftware
[00076] The softwaremay maybe be stored stored inin a acomputer computer readable readable medium, medium, including including the the storage storage
devices described devices described below, below,for for example. example.The The software software is is loaded loaded intothe into thecomputer computer system system 200200
from the from the computer computerreadable readablemedium, medium,andand then then executed executed by the by the computer computer system system 200. 200. A A computerreadable computer readablemedium medium having having suchsuch software software or computer or computer program program recorded recorded on the on the computerreadable computer readablemedium medium is computer is a a computer program program product. product. Theofuse The use ofcomputer the the computer program program
44204385_1 44204385_1
18
product in product in the the computer system200 computer system 200preferably preferablyeffects effectsananadvantageous advantageous apparatus apparatus forfor 07 Jun 2024
implementingthe implementing thesource sourcedevice device110 110 and and thedestination the destinationdevice device140 140 and and thethe described described methods. methods.
[00077] Thesoftware
[00077] The software233 233isistypically typically stored stored in in the the HDD 210ororthe HDD 210 thememory memory 206. 206. The The software software
is loaded is loaded into into the thecomputer computer system 200from system 200 froma acomputer computer readable readable medium, medium, and and executed executed by by the the computersystem computer system200. 200.Thus, Thus, forfor example, example, thethe software software 233233 may may be stored be stored onoptically on an an optically readable disk readable disk storage storage medium (e.g., CD-ROM) medium (e.g., CD-ROM) 225 225 that that is read is read by by thethe optical optical diskdrive disk drive212. 212. 2024203901
[00078] In some
[00078] In someinstances, instances, the the application application programs 233may programs 233 maybe be supplied supplied to to theuser the userencoded encoded on one on one or or more moreCD-ROMs CD-ROMs 225read 225 and and via readthe viacorresponding the corresponding drive drive 212, 212, or or alternatively alternatively may may be read by the user from the networks 220 or 222. Still further, the software can also be loaded be read by the user from the networks 220 or 222. Still further, the software can also be loaded
into the into the computer system200 computer system 200from fromother othercomputer computer readable readable media. media. Computer Computer readable readable storage storage
mediarefers media refers to to any any non-transitory non-transitory tangible tangible storage storage medium that provides medium that providesrecorded recordedinstructions instructions and/or data and/or data to to the the computer system200 computer system 200for forexecution executionand/or and/orprocessing. processing.Examples Examples of such of such
TM storage media storage includefloppy media include floppydisks, disks, magnetic magnetictape, tape, CD-ROM, CD-ROM, DVD,DVD, Blu-ray Blu-ray DiscTMDisc a hard, a hard disk drive, disk drive, aaROM ROM ororintegrated integratedcircuit, circuit, USB memory, USB memory, a magneto-optical a magneto-optical disk, disk, or or a computer a computer
readable card readable card such such as as aa PCMCIA card PCMCIA card andand the the like,whether like, whether or or notnot such such devices devices areare internaloror internal
external of external of the the computer module201. computer module 201.Examples Examples of transitory of transitory or or non-tangible non-tangible computer computer
readable transmission media that may also participate in the provision of the software, readable transmission media that may also participate in the provision of the software,
application programs, application instructions and/or programs, instructions and/or video video data data or or encoded videodata encoded video data to to the the computer computer
module201 module 201include includeradio radioororinfra-red infra-red transmission transmissionchannels, channels,as as well well as as aa network connectiontoto network connection
another computer another computerorornetworked networked device, device, and and theInternet the InternetororIntranets Intranets including including e-mail e-mail transmissions and transmissions andinformation informationrecorded recordedononWebsites Websitesandand thethe like. like.
[00079] Thesecond
[00079] The secondpart partofofthe the application application program program233 233andand thecorresponding the corresponding code code modules modules
mentionedabove mentioned abovemaymay be be executed executed to implement to implement onemore one or or more graphical graphical user user interfaces interfaces (GUIs) (GUIs)
to be to be rendered rendered or or otherwise represented upon otherwise represented uponthe the display display 214. 214. Through Through manipulation manipulation of of typically the typically the keyboard 202 and keyboard 202 andthe the mouse mouse203, 203,a auser userofofthe the computer computersystem system 200 200 andand thethe
application may application manipulatethe may manipulate theinterface interface in in aa functionally functionally adaptable adaptable manner to provide manner to provide controlling commands controlling and/or commands and/or inputtotothe input theapplications applicationsassociated associatedwith withthe the GUI(s). GUI(s).Other Otherforms forms of functionally of functionally adaptable adaptable user user interfaces interfacesmay may also also be be implemented, suchasasan implemented, such anaudio audiointerface interface utilizing speech utilizing speech prompts output via prompts output via the the loudspeakers 217and loudspeakers 217 anduser uservoice voicecommands commands input input viavia
the microphone the 280. microphone 280.
44204385_1 44204385_1
19
[00080] Fig. 2B
[00080] Fig. 2Bis is aa detailed detailed schematic schematic block diagramofofthe block diagram the processor processor 205 205and anda a 07 Jun 2024
“memory” "memory" 234. 234. TheThe memory memory 234 represents 234 represents a logical a logical aggregation aggregation ofthe of all all the memory memory modules modules
(including the (including the storage storage devices devices 209 209 and semiconductormemory and semiconductor memory 206)206) thatthat can can be accessed be accessed by by the the computermodule computer module201201 in in Fig.2A.2A. Fig.
[00081] When
[00081] When thecomputer the computer module module 201initially 201 is is initially powered powered up, up, a power-on a power-on self-test self-test (POST) (POST)
program250 program 250executes. executes.TheThe POST POST program program 250 is250 is typically typically stored stored in a in a ROM ROM 249 of249 the of the semiconductormemory memory206 206 of Fig. 2A.2A. A hardware devicedevice such theasROM the249 ROM 249 storing 2024203901
semiconductor of Fig. A hardware such as storing
software is software is sometimes referred to sometimes referred to as as firmware. ThePOST firmware. The POST program program 250 examines 250 examines hardware hardware
within the within the computer module computer module 201 201 to to ensure ensure proper proper functioning functioning andand typically typically checks checks thethe
processor 205, processor 205, the the memory 234 memory 234 (209, (209, 206), 206), and and a basicinput-output a basic input-outputsystems systems software software (BIOS) (BIOS)
module251, module 251,also alsotypically typically stored stored in in the the ROM 249,for ROM 249, forcorrect correctoperation. operation. Once OncethethePOST POST program250 program 250has hasrun runsuccessfully, successfully,the theBIOS BIOS 251 251 activatesthe activates thehard harddisk diskdrive drive210 210ofofFig. Fig.2A. 2A. Activation of the hard disk drive 210 causes a bootstrap loader program 252 that is resident on Activation of the hard disk drive 210 causes a bootstrap loader program 252 that is resident on
the hard the hard disk disk drive drive 210 210 to to execute execute via via the theprocessor processor 205. 205. This This loads loads an an operating operating system 253 system 253
into the into the RAM memory RAM memory 206,206, uponupon which which the operating the operating system system 253 commences 253 commences operation. operation. The The operating system 253 is a system level application, executable by the processor 205, to fulfil operating system 253 is a system level application, executable by the processor 205, to fulfil
various high various high level level functions, functions, including including processor processor management, memory management, memory management, management, devicedevice
management, management, storage storage management, management, software software application application interface, interface, andand generic generic user user interface. interface.
[00082] Theoperating
[00082] The operatingsystem system253 253 manages manages the the memory memory 234 (209, 234 (209, 206) 206) to to ensure ensure that each that each
process or process or application application running on the running on the computer module computer module 201 201 hashas sufficientmemory sufficient memory in which in which to to execute without execute withoutcolliding colliding with with memory memory allocatedtotoanother allocated anotherprocess. process.Furthermore, Furthermore, thethe different different
types of types of memory availableininthe memory available the computer computersystem system 200200 of of Fig.2A2A Fig. need need to to be be used used properly properly SO so that each that each process process can can run run effectively. effectively. Accordingly, the aggregated Accordingly, the memory aggregated memory 234234 is is notnotintended intended to illustrate how particular segments of memory are allocated (unless otherwise stated), but to illustrate how particular segments of memory are allocated (unless otherwise stated), but
rather to rather to provide provide aa general general view view of of the thememory accessibleby memory accessible bythe the computer computersystem system 200 200 andand howhow
such memory such memory is is used. used.
[00083] As
[00083] Asshown shownin in Fig.2B, Fig. 2B,the theprocessor processor205 205includes includesa anumber numberof of functional functional modules modules
including a control unit 239, an arithmetic logic unit (ALU) 240, and a local or internal including a control unit 239, an arithmetic logic unit (ALU) 240, and a local or internal
memory memory 248, 248, sometimes sometimes called called a cache a cache memory. memory. The cache The cache memorymemory 248 typically 248 typically includesincludes a a numberofofstorage number storageregisters registers 244-246 244-246inin aa register register section. section. One or more One or internal busses more internal 241 busses 241
functionally interconnect functionally interconnect these these functional functional modules. Theprocessor modules. The processor205 205typically typicallyalso alsohas hasone oneoror
44204385_1 44204385_1
20
moreinterfaces more interfaces 242 242for for communicating communicating with with external external devices devices viathethesystem via system bus bus 204, 204, using using thethe 07 Jun 2024
connection218. connection 218.The Thememory memory 234 234 is coupled is coupled to the to the bus bus 204 204 using using the the connection connection 219.219.
[00084] Theapplication
[00084] The applicationprogram program 233 233 includes includes a sequence a sequence of of instructions231231 instructions thatmay that may include include
conditional branch conditional andloop branch and loopinstructions. instructions. The Theprogram program233233 maymay also also include include data data 232232 which which is is used in used in execution of the execution of the program 233.The program 233. Theinstructions instructions231 231and andthe thedata data232 232are arestored storedinin memory memory locations228, locations 228,229, 229,230 230 and and 235, 235, 236, 236, 237, 237, respectively.Depending respectively. Depending uponupon the relative the relative
size of of the theinstructions instructions231 231and andthe thememory locations 228-230, 228-230,aa particular particular instruction instructionmay may be 2024203901
size memory locations be
stored in stored in aa single singlememory location as memory location as depicted depicted by by the the instruction instruction shown in the shown in the memory memory
location 230. location Alternately, an 230. Alternately, an instruction instruction may be segmented may be segmentedinto intoaanumber numberofof partseach parts eachofof whichisis stored which stored in in aa separate separatememory location, as memory location, as depicted by the depicted by the instruction instruction segments showninin segments shown
the memory the locations228 memory locations 228andand 229. 229.
[00085] In general, the processor 205 is given a set of instructions which are executed therein.
[00085] In general, the processor 205 is given a set of instructions which are executed therein.
Theprocessor The processor205 205waits waitsfor foraa subsequent subsequentinput, input, to to which whichthe the processor processor205 205reacts reacts to to by by executing another executing another set set of of instructions. instructions. Each Each input input may be provided may be providedfrom fromone oneorormore moreof of a a numberofofsources, number sources,including includingdata datagenerated generatedbybyone oneorormore moreofofthe theinput inputdevices devices202, 202,203, 203,data data received from received froman anexternal external source source across across one oneof of the the networks networks220, 220,202, 202,data dataretrieved retrieved from fromone one of the of the storage storage devices devices 206, 206, 209 209 or or data data retrieved retrievedfrom from aa storage storagemedium 225inserted medium 225 insertedinto into the the corresponding reader 212, all depicted in Fig. 2A. The execution of a set of the instructions corresponding reader 212, all depicted in Fig. 2A. The execution of a set of the instructions
mayininsome may somecases casesresult resultin in output output of of data. data. Execution mayalso Execution may alsoinvolve involvestoring storingdata dataor or variables to variables to the thememory 234. memory 234.
[00086] Thetensor
[00086] The tensorencoder encoder116, 116,the thetensor tensordecoder decoder146 146and and thedescribed the describedmethods methods maymay use use
input variables input variables 254, 254, which are stored which are stored in in the thememory 234inincorresponding memory 234 correspondingmemory memory locations 255, locations 255, 256, 256, 257. Thetensor 257. The tensorencoder encoder116, 116,the thetensor tensor decoder decoder146 146and andthe thedescribed described methodsproduce methods produceoutput outputvariables variables261, 261,which which areare storedininthe stored thememory memory234234 in corresponding in corresponding
memory memory locations262, locations 262,263, 263,264. 264.Intermediate Intermediate variables variables 258258 maymay be stored be stored in memory in memory
locations 259, locations 259, 260, 260, 266 and 267. 266 and 267.
[00087] Referring
[00087] Referring to the to the processor processor 205 of205 Fig.of2B, Fig. the2B, the registers registers 244, 244, 245, 246,245, 246, the arithmetic the arithmetic
logic unit logic unit (ALU) 240,and (ALU) 240, andthe the control control unit unit 239 worktogether 239 work togetherto to perform performsequences sequencesofofmicro- micro- operations needed to perform “fetch, decode, and execute” cycles for every instruction in the operations needed to perform "fetch, decode, and execute" cycles for every instruction in the
instruction set instruction setmaking making up the program up the 233.Each program 233. Each fetch,decode, fetch, decode,and andexecute execute cycle cycle comprises: comprises:
44204385_1 44204385_1
21
a fetch a fetch operation, operation, which which fetches fetches or or reads reads an an instruction instruction231 231from from aa memory memory 07 Jun 2024
location 228, location 228, 229, 229, 230; 230;
a decode a operationin decode operation in which whichthe thecontrol control unit unit 239 determineswhich 239 determines whichinstruction instructionhas hasbeen been fetched; and fetched; and
an execute an execute operation operation in in which the control which the control unit unit 239 239 and/or and/or the the ALU 240execute ALU 240 execute the the
instruction. 2024203901
instruction.
[00088] Thereafter,
[00088] Thereafter, a further a further fetch, fetch, decode, decode, and execute and execute cycle cycle for for the the next next instruction instruction may be may be executed. Similarly, executed. Similarly, aa store store cycle cycle may be performed may be performedbybywhich which thethe controlunit control unit239 239stores storesoror writes aa value writes value to to aamemory location 232. memory location 232.
[00089] Each
[00089] Each stepstep or sub-process or sub-process in thein the methods methods of Figs. of 18 Figs. 18toand and 19, 19, to be described, be described, is is associated with associated one or with one or more segmentsofofthe more segments theprogram program 233 233 andand is is typicallyperformed typically performedby by thethe
register section 244, 245, 246, the ALU 240, and the control unit 239 in the processor 205 register section 244, 245, 246, the ALU 240, and the control unit 239 in the processor 205
working together to perform the fetch, decode, and execute cycles for every instruction in the working together to perform the fetch, decode, and execute cycles for every instruction in the
instruction set for the noted segments of the program 233. instruction set for the noted segments of the program 233.
[00090] Fig.
[00090] Fig. 3A 3Aisis aa schematic blockdiagram schematic block diagram300 300showing showing functional functional modules modules of aof a backbone backbone
portion 310 portion of aa CNN. 310 of Thediagram CNN. The diagram 300300 which which may may serveserve as anasimplementation an implementation of theofCNN the CNN backbone114 backbone 114when when thethe system system 100100 is configured is configured to perform to perform a ‘YOLOv3’ a 'YOLOv3' network. network. The The backboneportion backbone portion114 114isissometimes sometimes referredtotoasas'DarkNet-53', referred ‘DarkNet-53’,although although differentbackbones different backbones are also possible, resulting in a different number of and dimensionality of layers of the are also possible, resulting in a different number of and dimensionality of layers of the
tensors 115 tensors for each 115 for each frame. In one frame. In oneimplementation, implementation,the thebackbone backbone portion portion 310310 maymay be used be used as aas a person detector for the purpose of object tracking. person detector for the purpose of object tracking.
[00091] Asshown
[00091] As shownin in Fig.3A, Fig. 3A,the thevideo videodata data113 113isispassed passedtotoaa resizer resizer module 304.The module 304. Theresizer resizer module304 module 304resizes resizeseach eachframe frameofofthe thevideo videodata data113 113totoaaresolution resolution suitable suitable for for processing processing by by
the CNN the backbone CNN backbone 310, 310, producing producing resized resized frame frame datadata 312.312. If the If the resolution resolution of of thethe video video
data 113 data is already 113 is already suitable suitablefor forthe CNN the CNN backbone 310,operation backbone 310, operationofofthe theresizer resizer module 304isis module 304
not needed. not Theresized needed. The resized frame framedata data312 312isis passed passedto to aa convolutional batch normalisation convolutional batch normalisationleaky leaky rectified linear rectified linear(CBL) (CBL) module 314totoproduce module 314 producetensors tensors316. 316.The TheCBL CBL module module 314 contains 314 contains
modulesasasdescribed modules describedwith withreference referencetotoaa CBL CBLmodule module 360360 as shown as shown in Fig in Fig 3D. 3D.
44204385_1 44204385_1
22
[00092] TheCBL
[00092] The CBL module module 360 360 takes takes as input as input a tensor a tensor 361361 of the of the resized resized frame frame data data 312. 312. TheThe 07 Jun 2024
tensor 361 is passed to a convolutional layer 362 to produce tensor 363. If the convolutional tensor 361 is passed to a convolutional layer 362 to produce tensor 363. If the convolutional
layer 362 has a stride of one, the tensor 363 has the same spatial dimensions as the tensor 361. layer 362 has a stride of one, the tensor 363 has the same spatial dimensions as the tensor 361.
If the convolution layer 362 has a larger stride, such as two, the tensor 363 has smaller spatial If the convolution layer 362 has a larger stride, such as two, the tensor 363 has smaller spatial
dimensionscompared dimensions comparedto to thetensor the tensor361, 361,for forexample, example,halved halved in in width width and and height height forthe for thestride stride of two. Regardless of the stride, the size of channel dimension of the tensor 363 may vary of two. Regardless of the stride, the size of channel dimension of the tensor 363 may vary
comparedtotothe compared thechannel channeldimension dimensionof of thetensor the tensor361 361forfora aparticular particular CBL CBLblock. block.The The 2024203901
tensor 363 tensor is passed 363 is passed to to aa batch batch normalisation normalisation module 364,which module 364, whichoutputs outputsa atensor tensor365. 365.The The batch normalisation batch normalisation module module364 364 normalises normalises thethe input input tensor363 tensor 363 and and applies applies a a scalingfactor scaling factor and an offset value to produce the output tensor 365. The scaling factor and offset value are and an offset value to produce the output tensor 365. The scaling factor and offset value are
derived from a training process. The tensor 365 is passed to a leaky rectified linear activation derived from a training process. The tensor 365 is passed to a leaky rectified linear activation
(“LeakyReLU”) ("LeakyReLU") module module 366produce 366 to to produce a tensor a tensor 367. 367. The module The module 366 provides 366 provides a ‘leaky’ a 'leaky'
activation function activation function whereby positive values whereby positive values in in the the tensor tensor are arepassed passed through through and and negative negative
values are values are severely severely reduced in magnitude, reduced in for example, magnitude, for example,toto 0.1X 0.1Xtheir their former formervalue. value.
[00093] ReturningtotoFig.
[00093] Returning Fig. 3A, 3A,the the tensor tensor 316 316 is is passed from the passed from the CBL CBLblock block 314 314 to to a a residual residual
block module block module320, 320,such suchasasa a'res1+2+8' ‘res1+2+8’module module (also (also referred referred toto asasa ares11 res11module) module) containing containing
a concatenation of three residual blocks, each residual block containing one (1) residual unit, a concatenation of three residual blocks, each residual block containing one (1) residual unit,
two (2) residual units, and eight (8) residual units, respectively. The spatial resolution of the two (2) residual units, and eight (8) residual units, respectively. The spatial resolution of the
tensors is halved horizontally and halved vertically in each of the residual blocks (see Fig. 3B) tensors is halved horizontally and halved vertically in each of the residual blocks (see Fig. 3B) by aa convolution by withstride convolution with stride equal equal to to two two in in aaCBL block344. CBL block 344.
[00094]
[00094] AAresidual residual block blockis is described described with with reference reference to to aa ResBlock 340asasshown ResBlock 340 shownin in Fig.3B. Fig. 3B. TheResBlock The ResBlock 340 340 receives receives a tensor341. a tensor 341.The The tensor tensor 341 341 is is zero-padded zero-padded by by a zero-padding a zero-padding
module342 module 342totoproduce producea atensor tensor343. 343.The Thetensor tensor343 343 isispassed passedtotothe theCBL CBL module module 344 344 to to produceaa tensor produce tensor 345. 345. The TheCBL CBL module module 344 contains 344 contains a convolution a convolution (for (for example example 362) awith 362) with a stride parameter set to two, resulting in the tensor 345 having half the width and half the height stride parameter set to two, resulting in the tensor 345 having half the width and half the height
of the tensor 343. The tensor 345 is passed to a residual unit 346. The residual unit 346 of the tensor 343. The tensor 345 is passed to a residual unit 346. The residual unit 346
contains a series of concatenated residual units, based on the number of residual block (for contains a series of concatenated residual units, based on the number of residual block (for
example, eleven (11) units for the block 320). The last residual unit of the residual units 346 example, eleven (11) units for the block 320). The last residual unit of the residual units 346
outputs a tensor 347. outputs a tensor 347.
[00095]
[00095] AAresidual residual unit unit is is described described with with reference reference to to aaResUnit ResUnit 350 as shown 350 as in Fig. shown in Fig. 3C. 3C. The The ResUnit 350takes ResUnit 350 takesaatensor tensor 351 351asas input. input. The tensor 351 The tensor 351 is is passed to aa CBL passed to module CBL module 352 352 to to
produceaa tensor produce tensor 353. 353. The Thetensor tensor 353 353isis passed passed to to aa second CBLunit second CBL unit354 354totoproduce producea a tensor tensor
355. An 355. Anadd addmodule module 356 356 sums sums the the tensor tensor 355355 with with the the tensor tensor 351351 to to produce produce a tensor a tensor 357. 357. TheThe
44204385_1 44204385_1
23
add module 356 may also be referred to as a ‘shortcut’ as the input tensor 351 substantially add module 356 may also be referred to as a 'shortcut' as the input tensor 351 substantially 07 Jun 2024
influences the influences the output output tensor tensor 357. 357. For For an an untrained untrained network, network, ResUnit 350acts ResUnit 350 actsto to pass-through pass-through tensors. As tensors. As training training is isperformed, performed, the theCBL modules352 CBL modules 352andand 354 354 actact toto deviatethe deviate thetensor tensor357 357 awayfrom away fromthe thetensor tensor351 351ininaccordance accordancewith withtraining trainingdata dataand andground groundtruth truthdata. data.
[00096] ReturningtotoFig.
[00096] Returning Fig. 3A, 3A,the the Res1 Res11 module module 320 320 outputs outputs a tensor a tensor 322.322. The The tensor tensor 322 322 is is
output from output from the the backbone backbonemodule module310310 as as oneone of of thethe layersandand layers alsoprovided also provided to to a a Res8 Res8
module324. 324.The TheRes8 Res8 module 324 324 is aisresidual a residual block (i.e.,340), 340), which whichincludes includeseight eightresidual residual 2024203901
module module block (i.e.,
units (i.e. units (i.e. 350). 350).The TheRes8 Res8 module 324produces module 324 producesa atensor tensor326. 326.The Thetensor tensor326 326isispassed passedtoto aa Res4 Res4 module328 module 328and andoutput outputfrom from thethe backbone backbone module module 310one 310 as as of onethe of layers. the layers. The The Res4Res4 module module is is a residual block (i.e., 340), which includes four residual units (i.e., 350). The Res4 module 328 a residual block (i.e., 340), which includes four residual units (i.e., 350). The Res4 module 328
producesaa tensor produces tensor 329. 329. The Thetensor tensor 329 329isis output output from fromthe the backbone backbonemodule module310310 as one as one of of thethe
layers. Collectively, the layer tensors 322, 326, and 329 are output as the tensors 115 and may layers. Collectively, the layer tensors 322, 326, and 329 are output as the tensors 115 and may
be referred be referred to to as aslayers layers0-2 0-2oror L0, L0,L1, L1,and andL2, L2,respectively. The respectively. Thebackbone backbone CNN 310maymay CNN 310 take take
as input as input aa video video frame frame of of resolution resolution 1088×608 andproduce 1088x608 and produce threetensors, three tensors,corresponding correspondingtoto three layers, with the following dimensions: [1, 256, 76, 136], [1, 512, 38, 68], [1, 1024, 19, three layers, with the following dimensions: [1, 256, 76, 136], [1, 512, 38, 68], [1, 1024, 19,
34]. 34]. Another exampleofofthe Another example thethree three tensors tensors 115 115 corresponding correspondingtotothree threelayers layers may maybebe[1,
[1, 512, 512, 34, 34, 19], [1, 256, 19], [1, 68,38], 256, 68, 38],[1,
[1,128, 128,136, 136, 76]76] which which are respectively are respectively separated separated at layerat layer75,index index 90, 75, 90,
and 105 and 105when whenthe thelayers layersare areenumerated enumerated according according to to theYOLOv3 the YOLOv3 software software implementation implementation of of the backbone the 300and backbone 300 anda ahead head1200. 1200.
[00097] Eachofofthe
[00097] Each theRes11 Res11320, 320,Res8 Res8 324 324 andand Res4 Res4 328 328 operates operates in ainsimilar a similar manner manner to to
ResBlock340. ResBlock 340.Each Eachofof theCBL the CBL 314, 314, thethe CBLCBL 344 the 344 and andCBL the 354 CBLoperate 354 operate in a similar in a similar
mannertotothe manner the CBL CBL 360. 360.
[00098] Fig. 44 is
[00098] Fig. is aaschematic schematic block block diagram showingfunctional diagram showing functionalmodules modules of of an an alternative alternative
backboneportion backbone portion400 400ofofa aCNN, CNN, which which may may serve serve asimplementation as an an implementation ofCNN of the the CNN backbone114 backbone 114when when thethe system system 100100 is configured is configured to perform to perform a “FasterRCNN” a "FasterRCNN" or or “MaskRCNN” "MaskRCNN" ResNet ResNet 101 network. 101 network. Frame Frame data 113data 113 isand is input input and through passes passes through a stem a stem network408, network 408,aares2 res2 module module412, 412,a ares3 res3module module 416, 416, a res4module a res4 module 420, 420, andand a res5 a res5 module module 424 424 via tensors via tensors 409, 409, 413, 413, 417, 417, 421, 421, 425 425 respectively. respectively. The backboneportion The backbone portion400 400maymay be be used used as as part of a general object detector or for instance segmentation, with various classes of object part of a general object detector or for instance segmentation, with various classes of object
supported. supported.
[00099] The
[00099] Thestem stemnetwork network408408 includes includes a convolution a convolution with with a kernel a kernel size size of of 7x7 7x7 and and a strideofof a stride
two (2) two (2) and and aa max poolingoperation. max pooling operation.The The res2module res2 module 412, 412, thethe res3 res3 module module 416,416, the the res4res4
44204385_1 44204385_1
24
module420 module 420and andthetheres5 res5module module424424 perform perform convolution convolution operations, operations, suchsuch as LeakyReLU as LeakyReLU 07 Jun 2024
activations. Each activations. module412, Each module 412,416, 416,420 420 and and 424 424 also also performs performs oneone halving halving of the of the width width andand
height of the processed tensors via a stride setting of two. Each of the tensors 413, 417, 421 height of the processed tensors via a stride setting of two. Each of the tensors 413, 417, 421
and 425 and 425are are passed passedto to one one of of 1x1 1x1 lateral lateral convolution convolution modules 446,444, modules 446, 444,442 442and and440 440 respectively. The respectively. The modules 446,444, modules 446, 444,442, 442,and and440 440produce produce tensors tensors 447, 447, 445, 445, 443 443 andand 441441
respectively. The respectively. tensor 441 The tensor 441is is passed to aa 3x3 passed to 3x3 output output convolution module470, convolution module 470,which which producesan produces anoutput outputtensor tensor P5 P5471. 471. 2024203901
[000100] Thetensor
[000100] The tensor441 441isisalso also passed passedto to upsampler upsamplermodule module450450 to to produce produce an upsampled an upsampled
tensor 451. tensor 451. AAsummation summation module module 460 460 sums sums the tensors the tensors 443451 443 and andto451 to produce produce a tensor a tensor 461. 461. Thetensor The tensor 461 461is is passed to an passed to an upsampler module upsampler module 452 452 andand a 3x3 a 3x3 lateralconvolution lateral convolution module472. module 472.TheThe module module 472 472 outputs outputs a P4a tensor P4 tensor 473.473. The upsampler The upsampler modulemodule 452 produces 452 produces an an upsampledtensor upsampled tensor453. 453.A A summation summation module module 462tensors 462 sums sums tensors 445 445 and 453and to 453 to produce produce a a tensor 463. tensor 463. The tensor 463 The tensor 463 is is passed to aa 3x3 passed to 3x3 lateral lateralconvolution convolution module 474and module 474 andananupsampler upsampler module454. module 454.TheThe module module 474 474 outputs outputs a P3a tensor P3 tensor 475.475. The upsampler The upsampler modulemodule 454 outputs 454 outputs an an upsampledtensor upsampled tensor455. 455.A A summation summation module module 464the 464 sums sums the tensors tensors 447 447 and 455and to 455 to produce produce
tensor 465, tensor 465, which is passed which is to aa 3x3 passed to 3x3 lateral lateralconvolution convolution module 476. The module 476. Themodule module 476476 outputs outputs a a P2 tensor P2 tensor 477. 477. The Theupsampler upsampler modules modules 450,450, 452,452, and and 454 454 use use nearest nearest neighbour neighbour interpolation interpolation
for low for low computational complexity.The computational complexity. The tensors471, tensors 471,473, 473,475, 475,and and477477 form form thethe output output
tensors 115 tensors of the 115 of the CNN backbone CNN backbone 400. 400. Although Although Fig. Fig. 4 shows 4 shows a particular a particular backbone backbone portion portion of of the Faster RCNN network architecture (a ‘P-layer split point), different divisions into backbone the Faster RCNN network architecture (a 'P-layer split point), different divisions into backbone
and head are possible. Splitting the network at tensor 409 is termed a ‘stem’ split point. and head are possible. Splitting the network at tensor 409 is termed a 'stem' split point.
Splitting the network at tensors 447, 445, 443, and 441 is termed a ‘C-layer’ split point. Splitting the network at tensors 447, 445, 443, and 441 is termed a 'C-layer' split point.
[000101] The bitstream includes a plurality of network abstraction layer (NAL) units. Fig. 10 is
[000101] The bitstream includes a plurality of network abstraction layer (NAL) units. Fig. 10 is
a schematic a block diagram schematic block diagramshowing showingthethe structureofofa aNAL structure NAL unit unit 1000. 1000. Each Each NAL NAL unit unit is is prefixed with a start code 1010, consisting of three contiguous bytes having values of 0x00, prefixed with a start code 1010, consisting of three contiguous bytes having values of 0x00,
0x00, and 0x00, and 0x01. 0x01.The The startcode start code1010 1010isisfollowed followedbybya aNAL NALunitunit header header 1012. 1012. The The NAL NAL unit unit header 1012 header 1012isis of of aa format format as as described described with with reference reference to to Appendices A,B,B,and Appendices A, andC,C,for forAVC, AVC, HEVC,and HEVC, andVVC, VVC, respectively. For respectively. For HEVC andVVC HEVC and VVCthethe NAL NAL unit unit header1012 header 1012isis aa predeterminedlength predetermined lengththat that is is always twobytes. always two bytes. For ForAVC, AVC,thethe NAL NAL unitunit header header 10121012 is a is a predetermined length of either one, three or four bytes, i.e., different to the predetermined predetermined length of either one, three or four bytes, i.e., different to the predetermined
length for length for the the other othercodecs. codecs. The NALunit The NAL unitheader header1012 1012 includes includes a nal_unit_type, a nal_unit_type, used used to to
identify the parsing process to be applied to parse a raw bitstream sequence payload identify the parsing process to be applied to parse a raw bitstream sequence payload
(RBSP)1020. (RBSP) 1020.TheThe NALNAL unit unit type type is either is either a five-ororsix-bit a five- six-bit fixed fixed length length code code and andincludes includes ‘reserved’ values,which 'reserved' values, which may may be defined be defined as partas of part of a revision a future future revision of the respective of the respective
44204385_1 44204385_1
25
specification (AVC, specification HEVC, (AVC, HEVC, or or VVC) VVC) and ‘unspecified’ and 'unspecified' values, values, which which will will notused not be be used in future in future 07 Jun 2024
versions of versions of the the AVC, HEVC, AVC, HEVC, or or VVCVVC standards standards andinstead and are are instead available available for for use use by other by other
bodies wishing bodies wishingtoto extend extendto to encoding encodingmethods methods other other than than AVC, AVC, HEVC, HEVC, orTo or VVC. VVC. avoidTo avoid detection of detection of false falsestart startcodes that codes may that maybebepresent presentinin thethe RBSP RBSP by by chance, chance, the the RBSP 1020isis RBSP 1020
encapsulated into encapsulated into aa NAL unitpayload NAL unit payload1014 1014 with with a process a process of of insertionofof'emulation insertion ‘emulationprevention prevention bytes’. In bytes'. In forming the NAL forming the NAL unitpayload unit payload 1014, 1014, whenever whenever a two-byte a two-byte sequence sequence 0x00 0x00 0x00 0x00 is is encounteredinin the encountered the RBSP RBSP 1020, 1020, such such as as zero zero bytes1016, bytes 1016, an an ‘emulation_prevention_three_byte’, 'emulation_prevention_three_byte', 2024203901
such as such as byte byte 1018, havingvalue 1018, having value0x03, 0x03,isis inserted inserted immediately followingthe immediately following thezero zero bytes 1016. bytes 1016. The Theprocess processofofT'emulation_prevention_three_byte: emulation_prevention_three_byteinsertion insertion to to form the NAL form the unit NAL unit
payload1014 payload 1014from fromthe theRBSP RBSP10201020 ensures ensures an absence an absence of false of any any false start start codes codes that that would would
trigger erroneous parsing of the bitstream 121. The start of the RBSP 1020 is the earliest trigger erroneous parsing of the bitstream 121. The start of the RBSP 1020 is the earliest
position at which detection of two zero bytes for the purpose of position at which detection of two zero bytes for the purpose of
emulation_prevention_three_byte insertion emulation_prevention_three_byteinsertion is is possible,which possible, which would would take take place place between between the the
secondand second andthird third byte byte of of the the RBSP 1020.To To RBSP 1020. prevent prevent false false startcode start codedetection detectionininthe the early early bytes bytes
(first ororsecond) (first second)ofofthe RBSP the RBSP 1020, the last 1020, the lastbyte byteofofthe NAL the NAL unit unit header header 1012 needsto 1012 needs to be be nonzero. nonzero.
[000102] ForHEVC
[000102] For HEVCand and VVC,VVC, the last the last syntax syntax element element of NAL of the the NAL unit header unit header 1012 is 1012 is
‘nuh_temporal_id_plus1’,coded 'nuh_temporal_id_plusl', coded with with a three-bitfixed-length a three-bit fixed-lengthcodeword codewordandand prohibited prohibited from from
using the using the bit bit string string“000”. "000".For ForHEVC andVVC HEVC and VVCthe the NALNAL unit unit header header 1012 1012 is always is always a a predeterminedlength predetermined lengthofoftwo twobytes bytesininlength. length. For ForAVC, AVC, when when using using a one-byte a one-byte NAL NAL unit unit header (that is the predetermined length of the NAL unit header is one byte rather than three header (that is the predetermined length of the NAL unit header is one byte rather than three
bytes or bytes or four four bytes), bytes),the thenal_unit_type nal_unit_typeneeds needs to tobe beavoided. avoided. For For AVC, nal_unit_typeequal AVC, nal_unit_type equaltoto0 0 may be marked as ‘reserved’ or ‘prohibited’ indicating that nal_unit_type is not available for may be marked as 'reserved' or 'prohibited' indicating that nal_unit_type is not available for
use by use by other other bodies outside of bodies outside of one one responsible responsible for for issuing issuing new new versions versions of of the the AVC AVC
specification. specification. InIn theimplementation the implementation described, described, the nal_unit_type the nal_unit_type of 0 shallof 0 shall not not be used, be used,
preventing the preventing the possibility possibility ofofa aNAL unit header NAL unit for AVC header for consistingofofaazero AVC consisting zerobyte. byte.
[000103] A bit ‘forbidden_zero_bit’, a single bit always set to value zero, is coded as the first
[000103] A bit 'forbidden_zero_bit', a single bit always set to value zero, is coded as the first
bit to be parsed (in bit position 7, as bit consumption in each byte progresses from bit 7 down to bit to be parsed (in bit position 7, as bit consumption in each byte progresses from bit 7 down to
bit 0), bit 0),ininthe NAL the NAL unit unit header header 1012, 1012, regardless regardless of of the theusage usage of ofAVC, HEVC, AVC, HEVC, or or VVCVVC as “inner as "inner
codec”. The codec". Theforbidden_zero_bit forbidden_zero_bit may may be used be used to signal to signal an an alternativeNAL alternative NAL unitunit header header format, format,
containing aa different containing different space space for forNAL unit types, NAL unit types, such such as as the the FCM_VMPS, FCM_SPS, FCM_VMPS, FCM_SPS, and and FCM_PPS FCM_PPS NAL NAL units. units. Standards Standards such such as as RFC6184 RFC6184 include include functionality functionality to set to theset the
forbidden_zero_bitinin aa NAL forbidden_zero_bit NAL unitheader unit headerwhen when transmission transmission errors errors areare detected detected in in receiveddata received data that forms that forms a a NAL unitpayload NAL unit payloadassociated associatedwith withthe theNAL NAL unit unit header. header.
44204385_1 44204385_1
26
[000104]
[000104] AAdecoder decoderconforming conforming to AVC, to AVC, HEVC, HEVC, or VVC,oris VVC, is not required not required to decode to decode NAL units NAL units 07 Jun 2024
with the with the forbidden_zero_bit set. In forbidden_zero_bit set. In some implementations,the some implementations, thedecoder decodermay may attempt attempt to to decode decode
NAL NAL unitswith units withthe theforbidden_zero_bi forbidden_zero_bit setset anyway anyway to provide to provide output output frame frame datadata thatthat is likely is likely
corrupted but corrupted but may maybebemore moredesirable desirablethan thandiscarding discardingthe theNAL NAL unit unit andand notnot provide provide anyany
associated output associated output frame data. In frame data. In the the context context of of FCM, tensorcompression FCM, tensor compression involves involves a reduction a reduction
into aa smaller into smaller dimensionality dimensionality space, space, quantisation, quantisation,and and generally generally packing packing into into aavideo video frame frame and and
coding the coding the video video frame framewith withaaconventional conventionalvideo videocodec. codec.Coding Codingof of thethevideo video frame frame is is known known as as 2024203901
an 'inner an ‘inner codec’ codec' as as this thisisis performed performed as asaastage stageinin thethe FCM FCM ‘outer’ 'outer'codec. codec. Reuse of aa Reuse of
conventionalvideo conventional videocodec codec(such (suchasasVVC, VVC, HEVC HEVC or AVC) or AVC) permitspermits deployment deployment of of the FCM the FCM standard onto standard onto existing existing system-on-chips byreusing system-on-chips by reusingcodec codecimplementations implementations already already provided provided by by ASICvendors. ASIC vendors.
[000105] Fig. 55 is
[000105] Fig. is aa schematic schematic block diagram500 block diagram 500ofofananexample example implementation implementation of the of the tensor tensor
encoder116 encoder 116using usingaaconfigurable configurabletensor tensorcompressor compressor stage,a atensor stage, tensordata datatype type adaptation adaptation stage, stage, and a tensor dimensionality adaptation stage, and a selectable video encoder 542, also referred and a tensor dimensionality adaptation stage, and a selectable video encoder 542, also referred
to as to as the the "inner "innercodec". codec". The video encoder The video encoder542 542implements implementsoneone of of several several video video standards, standards,
such as such as AVC, HEVC, AVC, HEVC, or VVC. or VVC. The video The video encoder encoder 542 may542 may also also implement implement a customised a customised
compressionmethod, compression method, such such as as losslessCABAC lossless CABAC encoding encoding of each of each quantised quantised value value in theininput the input tensors 115, tensors 115, using using an an algorithm such as algorithm such as ISO/IEC ISO/IEC15938-17, 15938-17, also also known known as “deepCABAC”. as "deepCABAC".
[000106]Fig.
[000106] Fig. 11 11is is aa schematic block diagram schematic block diagramshowing showingthethe bitstream bitstream 121 121 or or 143 143 holding holding
encodedpacked encoded packedfeature featuremaps, maps,parameter parameter setsforforthe sets theFCM FCM codec codec and and parameter parameter sets sets for for the the
inner codec. inner codec.
[000107] Fig. 18
[000107] Fig. 18 shows showsa amethod method 1800 1800 forfor performing performing a firstportion a first portionofofa aCNN, CNN, selecting selecting a a
tensor compressor, tensor compressingtensors compressor, compressing tensorsusing usingthe theselected selectedtensor tensorcompressor, compressor,and andencoding encoding resulting compressed resulting tensorsusing compressed tensors usingaa video videoencoder encoderconforming conformingto to a selectedvideo a selected videocompression compression standard. The standard. Thetensor tensorencoder encoder116 116(and (andthe theexample example implementation implementation 500)500) and and the method the method 1800 1800 maybebeimplemented may implementedas as oneone or or more more software software application application programs programs 233 executable 233 executable within within the the computersystem computer system200. 200.TheThe tensor tensor encoder encoder 116 116 and and the the method method 1800 1800 may bemay be effected effected by by instructions 231 (see Fig. 2B) in the software 233 that are carried out within the computer instructions 231 (see Fig. 2B) in the software 233 that are carried out within the computer
system200. system 200.The Thesoftware software instructions231 instructions 231may may be be formed formed as one as one or more or more codecode modules, modules, each each for performing for oneor performing one or more moreparticular particular tasks. tasks. The Themethod method 1800 1800 commences commences at a at a select select inner inner
codec step codec step 1801. 1801.
44204385_1 44204385_1
27
[000108] Atthe
[000108] At thestep step 1801, 1801, the the processor processor 205 205selects selects one one video video compression compressionstandard standardoutoutofofa a 07 Jun 2024
plurality of plurality ofvideo video compression standards. AVC, compression standards. AVC, HEVC, HEVC, and may and VVC VVC be may be options options for selection for selection
at step at step 1801 1801 and and the the video video compression standardselected compression standard selectedmay maybebedependent dependent on on considerations considerations
such as the capabilities of the source device 110 and the destination device 140. A capabilities such as the capabilities of the source device 110 and the destination device 140. A capabilities
negotiation may negotiation takeplace may take placebetween betweenthe thesource sourcedevice device110 110 and and thedestination the destinationdevice device140 140 wherebya aselection whereby selectionis is made suchthat made such that the the same samecompression compression standard standard is is usedinineach used eachdevice, device, prioritising VVC prioritising overHEVC VVC over HEVCand and HEVCHEVC overSelection over AVC. AVC. Selection at step at step 1801 may1801 may be constrained be constrained 2024203901
based on a profile of the FCM standard, such that a smaller set of compression standards is based on a profile of the FCM standard, such that a smaller set of compression standards is
available, such available, such as as HEVC and HEVC and AVC, AVC, fromfrom which which oneselected one is is selected based based on aforementioned on the the aforementioned capabilities negotiation capabilities negotiationstep. step.Rather Rather than than selecting selectinga arepurposed repurposedvideo video codec codec for for compressing compressing
features, specific features, specificFCM profiles may FCM profiles select aa customised may select approachsuch customised approach suchasascompressing compressing quantized values quantized values of of features features using using deepCABAC, deepCABAC, withwith or without or without prediction prediction of values of values within within a a feature map feature or from map or fromone onefeature feature map maptotoanother. another.Such Such customised customised approaches approaches may may be targeted be targeted
applications where applications achievinglow where achieving lowbitrate bitrate is is aa secondary secondary consideration comparedtotoachieving consideration compared achieving very low very low complexity, complexity,for forexample. example.InIntypical typicaluse, use,the the selection selection of of the the step step1801 1801 is isperformed performed
one time one time and andthus thus does doesnot not change changeduring duringthe thecourse courseofofencoding encodingone onebitstream. bitstream.Arrangements Arrangements may select a different inner codec during coding of the bitstream 121 provided that the switch may select a different inner codec during coding of the bitstream 121 provided that the switch
from one from oneinner inner codec codectoto aa different different inner inner codec codec is ismade prior to made prior to encoding encoding a a new group-of- new group-of-
pictures (GOP), pictures i.e., prior (GOP), i.e., priortoto a new a new“intra random "intra randomaccess accesspicture” picture"(IRAP) (IRAP) or or “instantaneous "instantaneous
decoderrefresh" decoder refresh” (IDR) (IDR)picture. picture. Control Controlinin the the processor processor 205 205progresses progressesfrom fromthe thestep step1801 1801toto an encode inner codec identifier step 1802. an encode inner codec identifier step 1802.
[000109]AtAtthe
[000109] thestep step 1802, 1802,aa metadata metadataencoder encoder544, 544,under underexecution execution of of theprocessor the processor205, 205, encodesan encodes anidentifier identifier for for the theselection selectionofof which whichinner innercodec codectotouse usefrom fromstep step1801 1801 to toFCM FCM
metadata548. metadata 548.The Theidentifier identifier is is encoded using aa NAL encoded using NALunit, unit,inin particular particular as as an an “inner "inner codec codec
identifier NAL identifier (ICI) unit" NAL (ICI) unit” 1110 (see Fig. 1110 (see Fig. 11). 11). A NAL A NAL unitmultiplexor unit multiplexor 550 550 multiplexes multiplexes thethe
NAL NAL unit1110 unit 1110 from from thethe FCMFCM NAL units NAL units bitstream bitstream 548the 548 into intobitstream the bitstream 121. 121. The syntax The syntax
described with described with reference reference to to Appendix Appendix D D isisused usedtotoencode encodethe theselected selectedinner innercodec. codec.AsAs described in described in Appendix Appendix D,D,ananinner innercodex codexidentifier identifier NAL NAL unit1110 unit 1110 hashas a fixed a fixed lengthofofone- length one- byte, i.e., byte, i.e.,having havinga a one-byte one-byteheader headerand andno no RBSP, distinguishing such RBSP, distinguishing suchNAL NAL unitsfrom units from allall
HEVC HEVC andand VVCVVC NAL units. NAL units. A five-bit A five-bit code code of of “11111” "11111" (decimal (decimal 31),occupies 31), which which occupies bit bit positions corresponding positions to "nal_unit_type" corresponding to “nal_unit_type”inin the the AVC AVC standard,distinguishes standard, distinguishesthe theNAL NAL unit unit
from an from an extension extensionto to the the AVC standard.Notably, AVC standard. Notably, thethe “forbidden_zero_bit”, "forbidden_zero_bit", common common to to AVC, AVC, HEVC, HEVC, andand VVC, VVC, is retained is retained as as a bitalways a bit always settotothe set thevalue valueofofzero. zero. The The"forbidden_zero_bit" “forbidden_zero_bit” remains free to be used for other purposes, such as indication of errors at the transport layer, remains free to be used for other purposes, such as indication of errors at the transport layer,
44204385_1 44204385_1
28
whichisis outside which outside of of the the scope scope of of the theAVC, HEVC, AVC, HEVC, andand VVCVVC standards. standards. A two-bit A two-bit codeword codeword 07 Jun 2024
“inner_codec_identifier”, occupying "inner_codec_identifier", occupyingthe thesame samebit bitpositions positionsas as the the “nal_ref_idc” syntax element "nal_ref_idc" syntax element of the of the AVC standard,signals AVC standard, signalswhich whichone oneofofAVC, AVC, HEVC, HEVC, or VVC, or VVC, or a custom or a custom codec codec are to are be to be used. An used. Aninstance instanceofofthe the inner inner codec codecidentifier identifier NAL unitisis required NAL unit required at at each each random-access random-access
entry point into the bitstream, that is, required at the start of the bitstream and prior to each entry point into the bitstream, that is, required at the start of the bitstream and prior to each
IRAPororIDR IRAP IDR picture1122 picture 1122 andand associated associated parameter parameter sets, sets, such such as as anan SPS SPS (sequence (sequence parameter parameter
set) 1118 set) 1118 and a PPS and a (picture parameter PPS (picture parameterset) set) 1120, as shown 1120, as inFig. shown in Fig. 11. 11. Each of the Each of the SPS 1118 SPS 1118 2024203901
and the and the PPS 1120contain PPS 1120 containNAL NAL unit unit headers headers of of thethe format format indicated indicated by by thethe inner inner codec codec
identifier 1110. identifier 1110. As a result As a resultof ofthe theinner codec inner codecidentifier NAL identifier NAL unit unit1110, 1110,the theformat formatof ofNAL unit NAL unit
headers of headers of subsequent subsequentNAL NAL units units is isknown. known. Thus, Thus, thethe NALNAL unit unit headers headers are are parseable, parseable, whenwhen
any one any one of of the the AVC, HEVC, AVC, HEVC, or VVC or VVC may may be beasused used as inner inner codecscodecs in the in the bitstream bitstream 121. 121. The The step 1802 step can operate 1802 can operate to to encode the inner encode the inner codec codecidentifier identifier NAL unit1110 NAL unit 1110and andassociated associated codewordsforbidden_zero_bit,inner_codec_identifier codewords forbidden_zero_bit, inner_codec_identifier and and constant_value_31 constant_value_31 (as the (as per per the exampleofofAppendix example AppendixD) D) forfor thetheselected selectedinner innerencoder encodertotothe thebitstream bitstream121. 121.The Thestep step1802 1802 operates to operates to encode encode aa NAL unittotothe NAL unit the bitstream bitstream having havingaapredetermined predeterminedlength, length,wherein whereinthethe NAL NAL unitofofthe unit thepredetermined predeterminedlength lengthcorresponds corresponds to to possibleNALNAL possible units units of one of one of the of the
selectable inner codecs (AVC) but the bit field (at bit positions four down to zero) that would selectable inner codecs (AVC) but the bit field (at bit positions four down to zero) that would
indicate nal_unit_type indicate nal_unit_type in in the the case case of ofAVC indicates aa reserved AVC indicates reserved or or prohibited prohibited codeword (suchasas codeword (such
0x1f or 0x1f or 31 or 0b11111). 31 or Instead,the 0b11111). Instead, the presence presenceofof binary binary value value 0b11111 0b11111atatbit bitpositions positions four four down to zero indicate this NAL unit selects one inner codec out of a of a plurality of inner down to zero indicate this NAL unit selects one inner codec out of a of a plurality of inner
codecs (AVC, codecs (AVC,VVC, VVC, HEVC, HEVC, customcustom for example). for example). Thepotential The other other potential inner inner codecscodecs (HEVC, (HEVC,
VVC,custom) VVC, custom) have have NALNAL unit unit lengths lengths different different to to thethe predetermined predetermined length length possible possible forfor AVCAVC
(i.e., one byte), as described above. Since a NAL unit with one byte length (excluding the start (i.e., one byte), as described above. Since a NAL unit with one byte length (excluding the start
code) only code) only appears appears for for AVC AVC and and forfor theinner the innercodec codecidentifier, identifier, any any such such NAL NAL unitcancanbebe unit
unambiguously unambiguously parsed, parsed, based based on on bitsfour bits fourdown downto to bitzero, bit zero,toto determine determinewhether whetherthetheNAL NALunitunit
is intended is intended for for parsing parsing by by an an AVC innercodec AVC inner codecfor forananinner innercodec codecidentification identification purpose. purpose. AA bitstream must bitstream must an an inner inner codec codecidentifier identifier NAL unit, to NAL unit, to select select an an inner innercodec, codec, prior priortoto any anyNAL NAL
units intended to be parsed by the selected inner codec. units intended to be parsed by the selected inner codec.
[000110] AdditionalNAL
[000110] Additional NAL units units conveying conveying parameters parameters for modules for modules asideaside from from the inner the inner codec, codec,
such as such as an an FCM VMPS FCM VMPS (vision (vision model model parameter parameter set) 1112, set) 1112, anSPS an FCM FCM SPSand 1114, 1114, and an FCM an FCM PPS1116 PPS 1116use usethe thesame sameNALNAL unitunit header header format format as the as the inner inner codec codec and and thusthus mustmust alsoalso follow follow
the inner the inner codec codec identifier identifierNAL unit 1110. NAL unit 1110. AsAsindicated indicatedininAppendix AppendixD, D, a custom a custom inner inner codec codec
mayalso may alsobe beselected selected at at step step 1801 and encoded 1801 and encodedatatstep step 1802. 1802.When When a custom a custom inner inner codec codec is is selected, aacustom selected, custom NAL unitheader NAL unit headerformat formatisisused used(and (andencoded encodedat at step1802), step 1802),which whichmaymay
44204385_1 44204385_1
29
duplicate the bit fields of an existing standard such as VVC. A custom codec typically requires duplicate the bit fields of an existing standard such as VVC. A custom codec typically requires 07 Jun 2024
a custom a enumerationofofNAL custom enumeration NALunitunit types types andand support support thethe selection selection of of one one out out ofof a aplurality plurality of of inner codecs, inner codecs, to to provide provide an an extensiblity extensiblitymechanism. Selectionscan mechanism. Selections caninclude includedirect direct deepCABAC deepCABAC coding of coding of tensor tensor values, values, intra-predictive intra-predictivedeepCABAC coding deepCABAC coding of of tensor tensor values, values, tensorencoding tensor encoding using an using an end-to-end end-to-end learned learned codec codecsuch suchasasthe theapproach approachdescribed describedininpaper paper"Learned “Learned Image Image
Compression Compression with with Discretized Discretized Gaussian Gaussian Mixture Mixture Likelihoods Likelihoods and Attention and Attention Modules” Modules" by by Cheng et Cheng et al. al.NAL NAL units units(such (suchasas thethe FCM VMPS FCM VMPS 1112, 1112, the theFCM FCM SPS 1114, and SPS 1114, and the theFCM FCM PPS PPS 2024203901
1116) defining parameters 1116) defining parameterswithin withinthe the FCM FCM standard standard scope scope butbut outside outside of of thethe innercodec inner codec scope scope
are referred are referred to toas as‘FCM NAL 'FCM NAL units’,and units', andthe theenumerated enumerated nal_unit_type nal_unit_type of of FCMFCM NAL units NAL units is is dependentononthe dependent theselected selected inner inner codec, codec, since since each each standard standard of of AVC, AVC,HEVC, HEVC, and and VVC VVC has has different enumerations of nal_unit_type and different ‘unspecified’ values, available for use different enumerations of nal_unit_type and different 'unspecified' values, available for use
such as such as by the FCM by the standard.InInone FCM standard. one example, example, nal_unit_type nal_unit_type of the of the FCMFCM VMPS VMPS 1112 is1112 is described with described with reference reference to to Appendices A-C Appendices A-C forfor AVC, AVC, HEVCHEVC and and VVC VVCcodecs inner inner codecs respectively (nal_unit_type respectively (nal_unit_type of of FCM VMPS FCM VMPS is for is 24 24 for AVC, AVC, 48HEVC, 48 for for HEVC, and 28 and for 28 for VVC). VVC). Also, nal_unit_type Also, nal_unit_type of of the the FCM_SPS FCM_SPS and and nal_unit_type nal_unit_type of the of the FCM_PPS FCM_PPS are described are described with with reference to reference to Appendices A-C Appendices A-C forAVC, for AVC, HEVC HEVC andinner and VVC VVCcodecs inner respectively. codecs respectively. As with As thewith the HLS(high HLS (highlevel levelsyntax) syntax)design designofofthe the inner inner codec, codec, the the syntax syntax of of the the FCM_PPS FCM_PPS andand thethe
FCM_SPS FCM_SPS are are intended intended to avoid to avoid parsing parsing dependencies, dependencies, i.e., i.e., anan FCM_PPS FCM PPS can becan be parsed parsed
regardless of the loss of the FCM_SPS for the bitstream. It should be noted that due to the regardless of the loss of the FCM_SPS for the bitstream. It should be noted that due to the
differing NAL differing unitheader NAL unit headerformat formatamong amongthethe inner inner codecs, codecs, lossofofthe loss theICI ICINAL NAL unit unit prevents prevents
parsing of parsing of any any other other NAL unitsininthe NAL units the bitstream. bitstream. Also, Also, as as with with the the inner innercodec codec NAL format, NAL format,
emulationprevention emulation preventionbytes bytesare are inserted inserted as as needed into the needed into the FCM NAL FCM NAL units units to to avoid avoid possible possible
false start false startcode codedetection. detection.The Theinner innercodec codecidentifier identifierNAL NAL unit unit1110 1110 needs needs to to precede precede the the FCM FCM
NALunits NAL unitsininorder orderfor for the the tensor tensor decoder 146to decoder 146 to parse parse the the bitstream bitstream 121. TheNAL 121. The NAL unit unit 1110 1110
is therefore usually encoded at the start of the bitstream by the step 1802. However, multiple is therefore usually encoded at the start of the bitstream by the step 1802. However, multiple
instances of instances of the the NAL unit 1110 NAL unit 1110including includingthe theNAL NAL unit unit header header cancan be be present present in in thebitstream the bitstream 121, as encoded 121, as encodedby by stepstep 1802. 1802. The multiple The multiple instances instances can, for can, forbeexample example bethe present at present start at of the start of
the bitstream the bitstream before before other other NAL unitsand NAL units andone oneorormore moreinstances instancesmay maybe be priortotoany prior any'random ‘random access’ (entry) point into the bitstream, such as prior to periodic intra random access pictures access' (entry) point into the bitstream, such as prior to periodic intra random access pictures
(IRAPs) that may be coded in the bitstream 121. If a different inner codec is selected at step (IRAPs) that may be coded in the bitstream 121. If a different inner codec is selected at step
1801 duringcoding 1801 during codingofofthe the bitstream bitstream 121 121as as describe describe above, above,i.e., i.e., aaswitch switchfrom from one one inner inner codec codec
to a different inner codec, a plurality of inner codecs are used in the bitstream and NAL units to a different inner codec, a plurality of inner codecs are used in the bitstream and NAL units
1110 are encoded 1110 are encodedtotothe the bitstream bitstream correspondingly correspondinglyatat step step 1802. 1802. Also, Also,regardless regardlessofof the the inner inner codec in codec in use use and and the the assigned nal_unit_type, the assigned nal_unit_type, the syntax syntax structure structure for foreach each parameter parameter set set (FCM (FCM
VMPS112, VMPS112, thethe FCMFCM SPS1114, SPS1114, and and the thePPS FCM FCM PPS 1116) is 1116) is unchanged. unchanged. Control inControl in the processor the processor
44204385_1 44204385_1
30
progresses from progresses fromthe the step step 1802 1802toto an an encode encodeFCM FCM Vision Vision model model parameter parameter set (VMPS) set (VMPS) 07 Jun 2024
step 1803. step 1803.
[000111] Atthe
[000111] At thestep step 1803, 1803, the the metadata metadataencoder encoder544, 544,under underexecution execution ofof theprocessor the processor205, 205, encodesvision encodes vision model modelparameters parameters 113a, 113a, used used forfor theoperation the operationofofthe theCNN CNN head head 150,150, as the as the
FCMVMPS FCM VMPS 1112 1112 as as FCMFCM metadata metadata 548.548. In one In one implementation,the implementation, theCM_VMPS FCM_VMPS 1112 1112 may may include output_picture_width include output_picture_widthandand output_picture_height output_picture_height (width (width andand height height of output of output pictures) pictures)
for the the vision visionmodel model parameters 113a, as as shown shownininthe theexample exampleofofAppendix AppendixE. E. TheThe NAL NAL unit 2024203901
for parameters 113a, unit
type (nal_unit_type) type (nal_unit_type) of of the the FCM VMPS FCM VMPS 11121112 is dependent is dependent oninner on the the inner codec codec selected selected at step at step
1801. NAL 1801. NAL unittypes unit typesofofthe theFCM FCM VMPS VMPS are described are described with reference with reference to Appendices to Appendices A-C forA-C for AVC,HEVC AVC, HEVC and and VVC codecs VVC inner inner codecs respectively. respectively. The vision The vision model parameters model parameters 113a 113a include include the spatial the spatialresolution resolutionofofthe frame the framedata data113, 113,needed neededfor forbounding bounding boxes boxes (an (an example of the example of the task task result 151) to be scaled to correspond to the resolution of the frame data 113, which is not result 151) to be scaled to correspond to the resolution of the frame data 113, which is not
otherwise known otherwise knownbyby thedestination the destinationdevice device140. 140.Control Control in in theprocessor the processor205 205 progresses progresses from from
the step the step 1803 to aa select 1803 to selectset setofoftensor compressors/decompressors tensor step 1805. compressors/decompressors step 1805.
[000112]AtAtthe
[000112] thestep step 1805, 1805,aa tensor tensor compressor compressorselector selector510, 510,under underexecution executionofofthe the processor 205, processor 205, selects selects aa set setofofmechanisms that may mechanisms that beused may be usedfor for compressing compressingand and decompressingintermediate decompressing intermediatetensors. tensors.Each Each mechanism mechanism of set of the the set forms forms a ‘bottleneck’ a 'bottleneck' and and
correspondsto corresponds to an an encoder encodernetwork networktopology topology coupled coupled to to a decoder a decoder network network topology. topology. The The interface between interface the encoder between the encodernetwork networktopology topology and and thedecoder the decoder network network topology topology is the is the
narrowest and narrowest andhence hencethe thelayer layer with withmost mostreduced reduceddimensionality. dimensionality.TheThe interface interface between between the the
encoder network encoder networktopology topology and and thedecoder the decoder network network topology topology alsoalso includes includes one one or more or more tensors tensors
that may that be referred may be referred to to as as ‘compressed tensors’. The 'compressed tensors'. Theone oneorormore moretensors tensorsmay maybe be produced produced
from operations from operations such such as convolutions, as convolutions, batch normalisations, batch normalisations, activationactivation functions, functions, or matrix or matrix
multiplications, tensor additions and/or subtractions. The dimensionality of tensors at the multiplications, tensor additions and/or subtractions. The dimensionality of tensors at the
interface between interface the encoder between the encodernetwork networktopology topology and and thedecoder the decoder network network topology topology may may vary vary from one from oneinvocation invocationofofthe the method method1800 1800to to a anext nextinvocation invocationofofthe themethod method 1800 1800 (e.g.,the (e.g., the channel count channel countmay mayvary). vary).Support Support forfor a a pluralityof plurality of mechanisms mechanisms in in one one bitstream bitstream enables enables
adapting to adapting to changing networkconditions changing network conditionsand andapplication applicationrequirements requirementsby by switching switching from from one one
mechanism mechanism to to anotherdynamically. another dynamically. ForFor example, example, an object an object segmentation segmentation network network may operate may operate
using aa mechanism using providing mechanism providing lowlow bitrateatatthe bitrate theexpense expenseofoflower lowerquality qualityoutput output(resulting (resulting in in lower mAP) lower mAP) ofof thetask the taskresult result 151 151from fromthe theCNN CNN head head 150. 150.
[000113] Eachmechanism
[000113] Each mechanism selected selected at the at the step1805 step 1805 needs needs to to match match the the dimensionality dimensionality of the of the
tensors 115 tensors at the 115 at the input inputto tothe theencoder encodernetwork network topology and output topology and output of of the the decoder network decoder network
44204385_1 44204385_1
31
topologyin topology in order order to to be be compatible with the compatible with the neural neural network formedbybythethebackbone network formed backbone114114 andand the the 07 Jun 2024
head 150. Tensors in compressed form, (i.e., at the ‘bottleneck’ point or output from the head 150. Tensors in compressed form, (i.e., at the 'bottleneck' point or output from the
encoder network encoder networktopology topology and and input input toto thedecoder the decodernetwork network topology), topology), maymay havehave varying varying
numberand number anddimensionality. dimensionality.Where Where a mechanism a mechanism involves involves theofuse the use of trainable trainable elements, elements, such such
as convolutions, as convolutions, the the tensor tensor compressor selector 510 compressor selector also determines 510 also selected weights determines selected weights516 516totobe be used by used by the the encoder encodernetwork networktopology topology and and thethe decoder decoder network network topology. topology. Multiple Multiple weights weights
maybebeavailable may availablefor for aa given encodernetwork given encoder networktopology topology and and decoder decoder network network topology, topology, suchsuch as as 2024203901
different weights targeting different quality operating points. Control in the processor 205 different weights targeting different quality operating points. Control in the processor 205
progresses from progresses fromthe the step step 1805 1805to to aa select select tensor tensorcompressor/decompressor step1810. compressor/decompressor step 1810.
[000114]AtAtthe
[000114] thestep step 1810, 1810,the the tensor tensor compressor selector 510, compressor selector 510, under underexecution executionofofthe the processor 205, processor 205, selects selects aa mechanism mechanism totobe beused usedfor forcompression compressionandand decompression decompression of the of the
tensors 115. tensors 115. The tensor compressor The tensor compressorselector selector510 510outputs outputsaaselected selected tensor tensor decompressor decompressor512 512 and associated and associated metadata metadata520. 520.The The selectionmade selection madeat at thethestep step1810 1810isisfrom fromthe theset setdetermined determinedatat the step 1805. The selection may be the result of a request by the destination device via an out- the step 1805. The selection may be the result of a request by the destination device via an out-
of-band signalling of-band signalling mechanism mechanism toto increaseorordecrease increase decreasethe thedecoded decodedquality quality(and (andhence hence thebit- the bit- rate) of rate) ofthe thebitstream bitstream121. 121. Where Where aa mechanism mechanism is is parameterizable parameterizable (e.g.,the (e.g., the channel channelcount countofof one or one or more of the more of the compressed compressedtensors tensorsmay maybe be varied varied dynamically), dynamically), a suitablevalue a suitable valueisisselected selected at the step 1810. at the step 1810.
[000115] Thesystem
[000115] The system 100 100 performs performs a given a given neural neural network network which which is divided is divided intointo a firstportion, a first portion, performedbybythe performed theCNN CNN backbone backbone 114,114, and and a a second second portion, portion, performed performed byCNN by the thehead CNN150. head 150. Thefirst The first portion portion of ofthe theneural neuralnetwork network may be aa Darknet-53 may be backbone Darknet-53 backbone asas described described with with
reference to reference to Figs. Figs. 3A-3D, 3A-3D, aa backbone backboneofofaaFasterRCNN FasterRCNN or MaskRCNN or MaskRCNN network,network, as described as described
with reference with reference to to Fig. Fig. 4, 4,orora a first portion first of some portion other of some neural other network. neural The network. Thenumber number and and
dimensionality of dimensionality of the the tensors tensors 115 115 depends onthe depends on thenetwork networkbeing beingimplemented implemented in the in the system system 100 100
and the and the division division of of the thenetwork network into into aa first firstportion, executed portion, in in executed thethe CNNCNNbackbone 114, and backbone 114, and aa secondportion, second portion, executed executedin in the the CNN head CNN head 150. 150. Compression Compression and decompression and decompression mechanisms mechanisms
involve an involve an encoder encodernetwork networktopology, topology,totobebeperformed performedin in thesource the sourcedevice device110, 110,andand a decoder a decoder
networktopology, network topology,totobe beperformed performedininthe thedestination destination device device140. 140.
[000116]The
[000116] Theencoder encoder network network topology topology and and decoder decoder network network topology topology may involve may involve the usethe of use of trained layers, trained layers,such such as asconvolutions, convolutions,in inwhich which case case weights weights are are also alsoneeded. needed. The encoder The encoder
networktopology network topologyand andthe thedecoder decodernetwork network topology topology formform a ‘bottleneck’ a 'bottleneck' between between the first the first
networkportion network portion(i.e., (i.e., the theCNN backbone114) CNN backbone 114)andand thesecond the second network network portion portion (i.e.,the (i.e., theCNN CNN head 150), head 150), with with the the interface interface between the encoder between the networktopology encoder network topologyandand thedecoder the decoder network network
44204385_1 44204385_1
32
topologyforming topology formingthe the'narrowest' ‘narrowest’part partof of the the bottleneck and thus bottleneck and thus the the lowest lowest bitrate bitratewhen when 07 Jun 2024
compressedininthe compressed theform formofofpacked packedvideo videoframes. frames.Different Different mechanisms mechanisms (encoder (encoder network network
topologies and topologies and corresponding correspondingdecoder decodernetwork network topologies) topologies) maymay be selected, be selected, signalled, signalled, andand activated dynamically to adapt the bitrate of the bitstream 121 to network conditions or to activated dynamically to adapt the bitrate of the bitstream 121 to network conditions or to
adjust to adjust to meet meet application application requirements for quality. requirements for quality. Mechanisms Mechanisms orortopologies topologiesproviding providinghigher higher quality generally quality generally have have larger larger dimensions of the dimensions of the compressed tensorsand compressed tensors andhence hencerequire requirea alarger larger packed frame area, resulting in a higher bitrate for the bitstream 121. Control in the packed frame area, resulting in a higher bitrate for the bitstream 121. Control in the 2024203901
processor 205 processor 205progresses progressesfrom fromthe thestep step1810 1810totoananinstantiate instantiate tensor tensor compressor step 1815. compressor step 1815.
[000117]AtAtthe
[000117] thestep step 1815, 1815,the the source source device device 110, 110, under underexecution executionofofthe theprocessor processor205, 205,obtains obtains a tensor structural description 522 from a tensor codec repository 514 based on the selected a tensor structural description 522 from a tensor codec repository 514 based on the selected
tensor decompressor tensor 512.The decompressor 512. Thesource source device device 110 110 instantiatesthe instantiates thetensor tensorstructural structural description 522 description into aa form 522 into form suitable suitable for forexecution execution by by the thetensor tensorcompressor compressor 530. The 530. The
instantiating step instantiating step1815 1815 may involve declaring may involve declaring required required memory memory andand initialisingdata initialising data structures structures in the memory 205 associated with the tensor structural description 522, or allocating resources in the memory 205 associated with the tensor structural description 522, or allocating resources
in aa reconfigurable in reconfigurable hardware devicesuch hardware device suchasas aa field field programmable gatearray programmable gate array(FPGA). (FPGA). Operationsdefined Operations definedin in the the tensor tensor structural structuraldescription description522 522may may be be converted to aa form converted to more form more
amenable for execution by the processor 205 as part of the instantiation step 1815. ‘Just-in- amenable for execution by the processor 205 as part of the instantiation step 1815. 'Just-in-
time compilation' time compilation’is is one approachfor one approach for obtaining obtaining aa representation representation such as aa ‘byte such as 'byte code’ code' that thatmay may
be executed more rapidly by the processor 205 than interpreting the tensor structural be executed more rapidly by the processor 205 than interpreting the tensor structural
description description 522 directly to 522 directly toperform perform each each tensor tensor compression operation. Where compression operation. Wherethethe instantiated instantiated
tensor compressor tensor 530contains compressor 530 containstrainable trainableelements, elements,such suchasasconvolutions, convolutions,aatensor tensor weight weight repository 518 is accessed to obtain necessary weights 524 for use by the trainable elements, repository 518 is accessed to obtain necessary weights 524 for use by the trainable elements,
with the with the weights selected based weights selected on the based on the weight weight selection selection 516. Thetensor 516. The tensorcodec codecrepository repository514 514 and the and the tensor tensor weight repository 518 weight repository maybebepopulated 518 may populatedfrom from thetensor the tensorcodec codec repository180. repository 180. Control in Control in the the processor processor 205 progresses from 205 progresses fromthe the step step 1815 1815toto aa determine determinecomplexity complexity indication step 1830. indication step 1830.
[000118] Atthe
[000118] At thestep step 1830, 1830, the the source source device device 110, 110, under underexecution executionofofthe the processor processor205, 205, determinesan determines anindication indication representative representative of of aa worst-case worst-case complexity for any complexity for any decoder decodernetwork network topologythat topology that could be signalled could be signalled for for the thebitstream bitstream121. 121. Where the source Where the sourcedevice device110 110may may select select
one topology one topologyfrom frommultiple multiplepossible possibledecoder decodernetwork network topologies topologies at at thestep the step1810 1810ititisis desirable desirable for the for the destination destinationdevice device140 140 to toknow, know, at at the thebeginning beginning of of decoding decoding the the bitstream bitstream 121, 121, whether whether
the destination device 140 will be able to decode the entirety of the bitstream. The first the destination device 140 will be able to decode the entirety of the bitstream. The first
signalled decoder signalled networktopology decoder network topologyininthe thebitstream bitstream121 121may may not not bebe themost the most complex complex topology topology
used for used for decoding that bitstream decoding that bitstream 121. Forexample, 121. For example,the thesystem system100 100 may may commence commence operation operation
44204385_1 44204385_1
33
in a low bitrate mode, later increasing bitrate (and required decoder network topology) based on in a low bitrate mode, later increasing bitrate (and required decoder network topology) based on 07 Jun 2024
somecriteria. some criteria. Aspects of decoder Aspects of decodernetwork networkcomplexity complexity include include thethe number number of multiply-and- of multiply-and-
accumulate(MAC) accumulate (MAC) operations operations and and the the number number of weights of weights required required to execute to execute the decoder the decoder
networktopology. network topology.The The decoder decoder network network complexity complexity indication indication is configured is configured to indicate to indicate thethe
highest complexity highest of all complexity of all possible possible decoder decoder network topologiesthat network topologies that the the source device 110 source device 110 may may instruct the instruct thedestination destinationdevice device140 140to toperform. perform. The The decoder networkcomplexity decoder network complexity indicationmaymay indication
be based be based on on aa decoded decodedcapability capabilityindication. indication. In In one one arrangement, arrangement,the thedecoder decodernetwork network 2024203901
complexityindication complexity indication may maybebea ascalar scalarvalue valuemapped mapped onto onto each each aspect aspect of of thenetwork the network complexity. For complexity. Forexample, example, thenetwork the network complexity complexity indication indication maymay be abe a scalar scalar value value that that relates relates
to aspects to aspects such such as as MAC count MAC count and and weight weight count count by by reference reference to to look-up look-up tables, tables, with with the the
networkcomplexity network complexityindication indicationset settoto accommodate accommodate thethe worst-case worst-case aspect aspect of of each each aspect aspect of of thethe
set of set of decompressors determinedatatthe decompressors determined thestep step 1805. 1805. Control Controlininthe theprocessor processor205 205progresses progressesfrom from the step 1830 to a perform neural network first portion step 1840. the step 1830 to a perform neural network first portion step 1840.
[000119] Atthe
[000119] At thestep step 1840, 1840, the the CNN CNN backbone backbone 114,114, under under execution execution of the of the processor processor 205,205,
performsthe performs the first first portion portionof ofa aneural neuralnetwork networkusing using frame frame data data 113 113 from the video from the source 112 video source 112 as input. as input. The step 1840 The step outputs the 1840 outputs the tensors tensors 115. Controlinin the 115. Control the processor processor 205 205progresses progressesfrom from the step the step 1840 to aa perform 1840 to tensor downsampling perform tensor step1850. downsampling step 1850.
[000120] Atthe
[000120] At thestep step 1850, 1850, aa tensor tensor downsampler downsampler 520 520 performs performs a temporal a temporal decimation decimation
operation on operation on the the tensors tensors 115 115 to to produced temporaldownsampled produced temporal downsampled tensors tensors 524.524. WhenWhen a a downsampling ratio of two is selected, tensors of every alternate (e.g., frames with an odd downsampling ratio of two is selected, tensors of every alternate (e.g., frames with an odd
picture order count) are dropped, resulting in a halving of the frame rate for the tensors 524 picture order count) are dropped, resulting in a halving of the frame rate for the tensors 524
comparedtotothe compared theframe framerate rateof of the the frame data 113. frame data 113. Other Otherdownsampling downsampling ratios, ratios, such such as as threetoto three
one, four one, four to to one one are are possible possiblewith with signalling signallingtoto support supportany anyinteger ratio. integer However, ratio. However,a a maximum maximum
limit, such limit, such as asan anexample maximum example maximum ratio ratio ofof four,isis needed four, neededtotoprevent preventthe the need needfor for excessive excessive tensor buffering. tensor buffering. The downsampling The downsampling ratio ratio isissignalled signalledininthe the FCM FCM PPS PPS 1116 1116 whenwhen
fcm_pps_temporal_upsampling_enabled_flag fcm_pps_temporal_upsampling_enabled_flag is equal is equal to one, to one, allowing allowing the ratio the ratio to altered to altered
during the during the course of one course of one bitstream. Thetensor bitstream. The tensor downsampler downsampler 520520 maymay be configured be configured into into an an active state, active state,where where tensor tensordownsampling downsampling isisperformed, performed,ororinto into aa bypass bypassstate, state, where the where the
tensors 115 are propagated to the tensors 524 with no alteration. Configuration of the tensor tensors 115 are propagated to the tensors 524 with no alteration. Configuration of the tensor
downsampler downsampler 520 520 into into activeororbypassed active bypassed statemay state maybe be predetermined, predetermined, e.g.,byby e.g., user user
configuration, or may be altered during operation of the source device 110, such as in response configuration, or may be altered during operation of the source device 110, such as in response
to available to available bandwidth of the bandwidth of the communications channel communications channel 130130 or or thethe levelofofdetected level detectedactivity activity such such as the as the number of bounding number of boundingboxes boxesinin thetask the taskresult result 151. 151. The The fcm_pps_temporal_upsampling_enabled_flag in the cm_pps_temporal_upsampling_enabled_flag: in the FCM FCM PPSis1116 PPS 1116 is encoded encoded (see 18110 (see 18110
44204385_1 44204385_1
34
below)regardless below) regardless of of the the value value of of the the fcm_sps_temporal_upsampling_enabled_flag in the fcm_sps_temporal_upsampling_enabled_flag in the FCM FCM 07 Jun 2024
SPS1114 SPS 1114totoavoid avoida aparsing parsingdependency dependencyof of thethe FCM FCM PPS PPS 1116 1116 on theonFCM theSPS FCM SPS 1114. 1114. However,the However, thefcm_pps_temporal_upsampling_enabled_flagis fcm_pps_temporal_upsampling_enabled_flag is not permitted not permitted to be enabled to be enabled (1) (1) whenthe when thefcm_sps_temporal_upsampling_enabled_flag: fcm_sps_temporal_upsampling_enabled_flag is to is set setdisabled to disabled (0).(0). Control Control in the in the
processor 205 processor 205progresses progressesfrom fromthe thestep step1850 1850totoaaperform performtensor tensorcompression compression step step 1860. 1860.
[000121] Atthe
[000121] At thestep step 1860, 1860, aa tensor tensor compressor compressor530, 530,under underexecution executionofofthe theprocessor processor205, 205, compressesthe thetensors tensors524 524totoproduce producecompressed compressed tensors 532. TheThe compressed tensors 532 2024203901
compresses tensors 532. compressed tensors 532
are fewer in number than the tensors 524 and reduced in dimensionality (i.e., reduced in either are fewer in number than the tensors 524 and reduced in dimensionality (i.e., reduced in either
or both or both of of channel channel count and feature count and feature map widthand map width andheight). height).The The compressed compressed tensors tensors 532532 formform
a representation of the tensors 524 that may be referred to as the ‘reduced domain’ or ‘feature a representation of the tensors 524 that may be referred to as the 'reduced domain' or 'feature
reduceddomain' reduced domain’and andthetheoperation operationofofthe thetensor tensorcompressor compressor530530 maymay be referred be referred to to as as ‘feature 'feature
reduction’. The reduction'. Thetensor tensor compressor compressor530 530 may may implement implement the instantiated the instantiated tensor tensor compressor compressor 512 512 in the in the form form of of precompiled ‘byte code' precompiled 'byte code’ or or machine machinecode codeororother otherform formmore more amenable amenable to direct to direct
execution by the processor 205, including by an inferencing engine as part of or associated with execution by the processor 205, including by an inferencing engine as part of or associated with
the processor the processor 205, 205, such as aa graphics such as graphics processing processing unit unit (GPU). Thestep (GPU). The step1860 1860 operatestoto operates
producethe produce the tensors tensors 532 532 from fromthe thetensors tensors produced producedatatstep step 1840. 1840.The Thetensor tensordownsampler downsampler 520 520 may be configured into an ‘active’ state where the instantiated tensor compressor 512 is used to may be configured into an 'active' state where the instantiated tensor compressor 512 is used to
produce the tensors 532 or into a ‘bypass’ state where the tensors 524 are passed along as the produce the tensors 532 or into a 'bypass' state where the tensors 524 are passed along as the
tensors 532 without modification. When in the active state, the tensors 532 have at least a tensors 532 without modification. When in the active state, the tensors 532 have at least a
smaller tensor count, a smaller channel count, or a smaller spatial size compared to the tensors smaller tensor count, a smaller channel count, or a smaller spatial size compared to the tensors
524. Controlinin the 524. Control the processor processor 205 205progresses progressesfrom fromthe thestep step1860 1860totoaaquantise quantisetensors tensors step 1870. step 1870.
[000122]AtAtthe
[000122] thestep step 1870, 1870,aa quantiser quantiser module module534, 534,under underexecution executionofofthe theprocessor processor205, 205,when when configured into an ‘active’ state quantises floating-point values in each tensor of the configured into an 'active' state quantises floating-point values in each tensor of the
compressedtensors compressed tensors532 532totoproduce producequantised quantised compressed compressed tensors tensors 536. 536. The The quantised quantised
compressedtensors compressed tensors536 536have have integervalues integer valuesand andoccupy occupy a range a range within within a sample a sample range range as as defined by defined by the the operational operational bit bit depth depth of ofthe thevideo videoencoder encoder 542. For example, 542. For example,when whenencoding encoding video using 8-bit, or 10-bit samples, integer values in the interval [0, 255] or [0, 1023], video using 8-bit, or 10-bit samples, integer values in the interval [0, 255] or [0, 1023],
respectively, are permitted. Quantisation firstly normalises elements from the tensor 532 into a respectively, are permitted. Quantisation firstly normalises elements from the tensor 532 into a
[0.0,
[0.0, 1.0] 1.0]floating-point range, floating-point resulting range, in one resulting minimum in one minimum and and one one maximum floating-point maximum floating-point
value for the tensor 532. A tensor normalised into the [0.0, 1.0] range is then converted and value for the tensor 532. A tensor normalised into the [0.0, 1.0] range is then converted and
rescaled into an integer sample range, such as [0, 1023] or [0, 255]. For each tensor, the rescaled into an integer sample range, such as [0, 1023] or [0,255]. For each tensor, the
minimum minimum andand maximum maximum floating-point floating-point values values form form a quantisation a quantisation rangerange 526the 526 for forfirst the first normalisation and normalisation andthe the range range for for the the second normalisation(to second normalisation (to integer integer sample range) is sample range) is dependent dependent
44204385_1 44204385_1
35
on the on the bit-depth bit-depth of of the thevideo videoencoder encoder 542. Thenormalisation 542. The normalisationtotointeger integer range range may mayoperate operateonona a 07 Jun 2024
minimum minimum andand a maximum a maximum value value thatupdated that is is updated from from one frame one frame to thetonext the next such such that that the the minimum minimum value value is is eitherdecreased either decreasedbased based onon thecurrent the currenttensors tensors532 532ororretains retains the the same samevalue valueasas derived for the previous tensors 532 (i.e., the tensors from the previous performance of the derived for the previous tensors 532 (i.e., the tensors from the previous performance of the
step 1860). step 1860). The maximum The maximum value value of the of the integer integer range range is is eitherincreased either increasedbased basedononthe thecurrent current tensors 532 tensors or retains 532 or retains the thesame same value value as as computed for the computed for the previous previous tensors tensors 532. 532. The Thequantiser quantiser module534 module 534can canbebeconfigured configured intoa a'bypass' into ‘bypass’state statewhere wherethe thetensors tensors532 532are arepassed passedalong alongasas 2024203901
the tensors the tensors 536. Configurationinto 536. Configuration into the the ‘bypass’ state may 'bypass' state be applied may be applied when whenthe thetensors tensors532 532 already contain already contain integer integer values values or or when the selected when the selected video video encoder 542isis capable encoder 542 capable of of encoding encoding tensor values tensor values in in floating-point floating-pointformat. format. Control Control in in the theprocessor processor205 205 progresses progresses from the from the
step 1870 to a pack tensors step 1880. step 1870 to a pack tensors step 1880.
[000123] Atthe
[000123] At thestep step 1880, 1880, aa packer packer module module538, 538,under under execution execution of of theprocessor the processor 205, 205, packs packs
the feature the feature maps of the maps of the tensor tensor 536 536 into into aaframe, frame, forming forming a a packed feature frame packed feature 540. Operation frame 540. Operation of the of the packer packer module 540generally module 540 generallyresults results in in placement of the placement of the two-dimensional two-dimensionalfeature featuremaps maps into an into an arrangement as described arrangement as described with withreference reference to to Fig. Fig. 9B. When 9B. When multiple multiple tensorsarearepresent tensors present in the tensors 536, a frame 900b is of sufficient size to hold feature maps for all the tensors of in the tensors 536, a frame 900b is of sufficient size to hold feature maps for all the tensors of
the tensors the tensors 536. Control in 536. Control in the the processor processor 205 progresses from 205 progresses fromthe the step step 1880 1880toto an an encode encodeframe frame step 1890. step 1890.
[000124] Atthe
[000124] At thestep step 1890, 1890, the the video video encoder encoder542 542(selected (selectedatat operation operation of of the the step step 1801 1801 and and
having aa corresponding having correspondingidentifier identifier encoded at step encoded at step 1802), under execution 1802), under executionof of the the processor processor 205, 205, compressesthe compresses thevideo videoframe frame540 540 totoproduce produce a compressed a compressed video video bitstream bitstream 546.546. The The encoder 542 encoder 542isis selected selected to to embody onecompression embody one compression approach approach out out of multiple of multiple compression compression
approachesinin accordance approaches accordancewith withthe theselection selectionof of step step 1801. In the 1801. In the case case of of the the use use of of H.266/VVC, H.266/VVC,
operation of the video encoder 542 is described with reference to Fig. 8. In the case of operation of the video encoder 542 is described with reference to Fig. 8. In the case of
H.265/HEVC H.265/HEVC or H.264/AVC, or H.264/AVC, operation operation involves involves generally generally subsets subsets of theoffunctional the functional modules modules as as described with reference to Fig. 8. The first packed frame 540 to be coded results in the described with reference to Fig. 8. The first packed frame 540 to be coded results in the
SPS1118 SPS 1118and andthe thePPS PPS 1120, 1120, followed followed by by thethe IRAP IRAP picture picture 11221122 (referred (referred to an to as as an ‘instantaneous 'instantaneous decoder refresh’ picture decoder refresh' picture IIH.264/AVC) H.264/AVC) asasshown shownin in Fig.11. Fig. 11.When When using using a low- a low-
delay coding delay coding configuration configurationaa subsequent subsequentpicture picturewould wouldbebecoded codedas as inter-picture1124. inter-picture 1124.InInthe the case of case of aa customised compressionapproach, customised compression approach, a a method method such such as directly as directly compressing compressing eacheach value value
in the in the tensors tensors536 536 using using an an arithmetic arithmetic coder coder such such as as deepCABAC or variable-length deepCABAC or variable-length coding coding
such as such as exponential Golomb exponential Golomb coding coding maymay be applied, be applied, with with the the frame frame packing packing stepstep 18801880 omitted. omitted.
Control in Control in the the processor processor 205 progresses from 205 progresses fromthe the step step 1890 1890toto an an encode encodeFCM FCMSPSSPS stepstep 18100. 18100.
44204385_1 44204385_1
36
[000125] Atthe
[000125] At thestep step 18100, 18100,the the metadata metadataencoder encoder544 544 encodes encodes sequence-level sequence-level parameters parameters 07 Jun 2024
neededfor needed for the the FCM FCM decoder decoder into into theFCM the FCM sequence sequence parameter parameter set 1114 set 1114 as part as part of the of the FCM FCM metadata548. metadata 548.The TheFCMFCM SPS SPS 1114 1114 includes includes tensortensor information information specifying specifying the dimensionality the dimensionality
of the of the compressed tensors532 compressed tensors 532and andthe theplacement placementofoffeature featuremaps mapsasas packing packing information information forfor
each tensor each tensor among thecompressed among the compressed tensors tensors 532532 into into a video a video frame frame 540. 540. The The tensor tensor
information includes, information includes, for for each each tensor, tensor, aamaximum channel maximum channel count count andand a used a used channel channel count. count. The The
frame area for a region must be sufficient for the tensors within the region to be packed up to frame area for a region must be sufficient for the tensors within the region to be packed up to 2024203901
the maximum the channel maximum channel count, count, i.e.,the i.e., themaximum maximum number number of feature of feature maps. maps. Flags Flags signalling signalling the the application or application or bypass bypass of of the the inner innerdecoding, decoding, corresponding to the corresponding to the encoding at 1890, encoding at inverse 1890, inverse
quantisation corresponding quantisation to quantisation corresponding to quantisation performed performedatatstep step 1870, 1870,and andfeature feature restoration restoration correspondingtoto feature corresponding feature compression compressionperformed performedat at step1860, step 1860,and and temporal temporal upsampling upsampling step, step,
correspondingtoto downsampling corresponding downsampling performed performed at step at step 1850, 1850, are are also also included included in in thethe FCM FCM
SPS 1114. SPS 1114.
[000126]The
[000126] Themetadata metadata encoder encoder 544, 544, under under execution execution of the of the processor processor 205, 205, at at step step 18100 18100
encodesthe encodes the selected selected tensor tensor decompressor 520into decompressor 520 intothe thebitstream bitstream121 121asasaadecoder decodernetwork network topologyin topology in the the FCM SPS FCM SPS 1114, 1114, as as described described with with reference reference to to Fig.11.11.The Fig. The selectedtensor selected tensor decompressor520 decompressor 520 may may be be signalled signalled as as an an explicitnetwork explicit network topology, topology, using using a textual a textual
representation (or representation (or syntax) syntax) such such as as Open NeuralNetwork Open Neural Network Exchange Exchange (‘ONNX’) ('ONNX') formatformat or Neural or Neural
NetworkExchange Network Exchange Format Format (‘NNEX’), ('NNEX'), fromKhronos from the the Khronos Group, Group, or usingorother usingformats other formats including aa short including short code code fragment suchas fragment such as aa Pytorch Pytorch function. function. Compression Compressionof of textual textual
representations of representations of the the tensor tensordecompressor using techniques decompressor using techniquessuch suchasasaa 'DEFLATE' ‘DEFLATE’ or ‘LZMA’ or 'LZMA'
algorithm may algorithm maybebeapplied appliedtotoreduce reducethe theoverhead overheadofofthe themetadata metadatawhen when stored stored in in the the
bitstream 121. As such, information (such as the textual representation or the syntax described bitstream 121. As such, information (such as the textual representation or the syntax described
above) representing above) representing the the network networktopology topologyisissignalled signalled in in compressed form. compressed form.
[000127]The
[000127] Thedecoder decoder network network topology topology information information representing representing the the decoder decoder network network
topologyis topology is to to be be decoded fromthe decoded from thebitstream bitstream 143 143ononthe thedestination destination device device 140 140toto determine determinethe the networktopology network topologytotobebeused usedononthe thedestination destinationdevice device140. 140.The Thedecoder decodernetwork network topology topology
specifies operations to be performed in the destination device 140 to convert tensors from a specifies operations to be performed in the destination device 140 to convert tensors from a
compressedrepresentation compressed representationtototheir their original original number anddimensionality, number and dimensionality,such suchthat thatthe the uncompressed uncompressed tensorsmay tensors may be be supplied supplied to to thethe CNN CNN headhead 149. 149. When When the network the network topology topology is is selected by reference to a collection of network topologies, such as available in the tensor codec selected by reference to a collection of network topologies, such as available in the tensor codec
repository 514, and associated weights as available in the tensor weight repository 518, a repository 514, and associated weights as available in the tensor weight repository 518, a
registered_decoder_idcsyntax registered_decoder_ido syntaxelement elementisisused. used.The The registered_decoder_idc registered_decoder_ido syntax syntax element element
may provide a direct index into a look-up table, a string or universally unique identifier (UUID) may provide a direct index into a look-up table, a string or universally unique identifier (UUID)
44204385_1 44204385_1
37
to perform to an associative perform an associative look-up to obtain look-up to obtain the the selected selected decoder decoder network topologyand network topology andweights. weights. 07 Jun 2024
Thenetwork The networkweights weightsmaymay also also be be signalled signalled inin theFCM the FCMSPS SPS 11141114 usingusing a format a format such such as as ISO/IEC15938-17 ISO/IEC 15938-17 “Compression "Compression of Neural of Neural Networks Networks for Multimedia for Multimedia Description Description and and Analysis”. When Analysis". When signallinginformation signalling information representing representing a given a given network network topology, topology, the the given given
networktopology network topologymay maybe be registeredwith registered withthethedestination destinationdevice device140 140for forreference referenceand andactivation activation in subsequent in bitstreams, avoiding subsequent bitstreams, the need avoiding the to signal need to signal information information representing representing the the network network
topologywith topology withevery everybitstream. bitstream. InInone onearrangement, arrangement,the thesystem system 100 100 maymay provide provide predetermined predetermined 2024203901
network topologies that do not need to be explicitly signalled in the bitstream. Predetermined network topologies that do not need to be explicitly signalled in the bitstream. Predetermined
networktopologies network topologiesmay maybebeactivated activatedininthe thedestination destination device device 140 140via via aa reference reference to to an an
identifier. Predetermined identifier. networktopologies Predetermined network topologiesmay maybe be made made available available to to thethe destination destination
device 140 device 140 via via external external means, suchas means, such as downloaded downloaded from from a repository a repository or or registryofofnetwork registry network topologies. Repositories topologies. Repositories or or registries registries ofofnetwork network topologies topologies may be accessible may be accessible publicly publicly or or may may
be accessible be accessible within within some private scope, some private scope, for for example, obtainedvia example, obtained via aa private private network or aa network or
secure network (e.g., VPN) available to instances of the destination device 140 but not the secure network (e.g., VPN) available to instances of the destination device 140 but not the
general public. general Wherethe public. Where thedestination destinationdevice device140 140isis known knowntoto haveaccess have access(either (eitheralready already registered or available for download from an external server) to the desired network topology, registered or available for download from an external server) to the desired network topology,
the source the source device 110 may device 110 mayencode encode thereference the referencetotoidentify identifyand andactivate activate the the specific specific network network
topologyrequired topology required to to be be used used when whendecoding decoding thebitstream the bitstream121. 121.Appendix Appendix E shows E shows an example an example
syntax structure for information encoded at operation of the step 18100. Control in the syntax structure for information encoded at operation of the step 18100. Control in the
processor 205 processor 205progresses progressesfrom fromthe thestep step18100 18100totoananencode encodeFCMFCM PPS PPS step step 18110. 18110.
[000128] Atthe
[000128] At thestep step 18110, 18110,the the metadata metadataencoder encoder544, 544,under under execution execution of of theprocessor the processor 205, 205,
encodesthe encodes the quantisation quantisation range range 526 526for for each each tensor tensor of of the the compressed tensors532 compressed tensors 532into intoan anFCM FCM PPS1116, PPS 1116,asasadditional additional FCM FCM metadata metadata 548. 548. Appendix Appendix E shows E shows an example an example syntax structure syntax structure
for information for information encoded at operation encoded at operation of of the the step step 18110. 18110. Quantisation ranges are Quantisation ranges are used used in in the the
bitstream 121 to enable inverse quantisation to the correct range by the destination device 140. bitstream 121 to enable inverse quantisation to the correct range by the destination device 140.
Thequantisation The quantisation range rangesignalled signalled in in the the FCM PPS FCM PPS 1116 1116 is is effectivefrom effective from thepicture the pictureatatwhich which the FCM the PPS FCM PPS 1116 1116 precedes precedes onwards, onwards, in output in output order order fromfrom a picture a picture decoder decoder 1204, 1204, i.e.,i.e., thethe
IRAP1122, IRAP 1122,which which hashas a pictureorder a picture ordercount count(POC) (POC) of of 0. 0. An An FCM FCM PPSprecedes PPS 1123 1123 precedes inter inter picture 1124, picture 1124, which hasaa POC which has POCofof1 1and andthus thusa aquantization quantizationrange rangecoded codedininFCM FCMPPS PPS 1123 1123
applies from applies POC1 1onwards from POC onwards (untila asubsequent (until subsequent FCM FCM PPSencountered PPS is is encountered preceding preceding another another
picture with picture with higher higher POC, andthe POC, and thesubsequent subsequentFCM FCMPPS PPS signals signals another another quantization quantization range range
update). Accordingly, update). Accordingly,when when thereisisnonochange there changeinin thequantisation the quantisationrange rangefor foraa given givenframe, frame,the the FCM FCM PPS PPS 1116 1116 needneed not not be encoded be encoded for that for that frame. frame. ThePPS The FCM FCM PPS 1116 may1116 may explicitly explicitly encode encode the picture the picture order order count count from from which the FCM which the FCM PPSPPS 1116 1116 parameters parameters apply, apply, i.e., i.e., a POC a POC of of 0, 0,
44204385_1 44204385_1
38
(from that (from that POC until another POC until anotherFCM FCMPPSPPS withwith a higher a higher POC POC is decoded, is decoded, such such as FCM as the the FCM 07 Jun 2024
PPS1123. PPS 1123.A A fixed fixed number number of least of least significantbits significant bits of of the the POC, suchasas88oror 12 POC, such 12bits, bits, may be may be
codedto coded to avoid avoid coding codingthe the entire entire 32-bit 32-bit POC witheach POC with eachFCM FCM PPS. PPS. The MSBs The MSBs of the of POCthe POC may may inferred based inferred based on the pattern on the pattern that thatPOC is increasing POC is increasing over over time time (with (with localised localisedexception exception when when aa
random-access configuration is used, due to localised difference in coding order vs output random-access configuration is used, due to localised difference in coding order VS output
order). In order). In one one arrangement FCM arrangement FCM PPSs PPSs (e.g., (e.g., 1116 1116 and and 1122) 1122) include include a picture a picture parameter parameter set set ID ID syntax element, syntax element, which whichcorresponds correspondstotothe thepicture pictureparameter parameterset setID IDofofany anyPPSs PPSspresent presentininthe the 2024203901
bitstream, such as the PPS 1120. The slice header or picture header of each picture includes a bitstream, such as the PPS 1120. The slice header or picture header of each picture includes a
picture parameter picture set ID parameter set ID (“ph_pic_parameter_set_id” ("ph_pic_parameter_set_id" ininVVC) VVC) which which activates activates one one of the of the
previously signalled previously signalled PPSs andFCM PPSs and FCM PPSs, PPSs, i.e.,parameters i.e., parameters in in theFCM the FCMPPS PPS identified identified by aby a particular picture particular pictureparameter parameter set setID ID are areselected selectedby bythe theph_pic_parameter_set_id decodedfrom ph_pic_parameter_set_id decoded from the slice header or picture header of a picture or slice. The NAL unit multiplexor 550 operates the slice header or picture header of a picture or slice. The NAL unit multiplexor 550 operates
to combine to theNAL combine the NAL unitsofofthe units theFCM FCM metadata metadata 548 548 and compressed and the the compressed video video bitstream bitstream 546 to546 to produce the bitstream 121, such that the inner coded identifier 1110 is coded firstly, followed produce the bitstream 121, such that the inner coded identifier 1110 is coded firstly, followed
by the by the FCM VMPS FCM VMPS 1112,the 1112, theFCM FCM SPS SPS 1114, 1114, andthe and theFCM FCM PPS PPS 1116. 1116. Following Following theFCM the FCM PPS1116 PPS 1116the theNAL NAL units units produced produced by the by the video video encoder encoder 542 542 are present, are present, suchsuch as the as the SPSSPS 1118, 1118,
the PPS 1120, and an IRAP picture 1122, and an inter picture such as the inter picture 1124. the PPS 1120, and an IRAP picture 1122, and an inter picture such as the inter picture 1124.
TheFCM The FCMSPSSPS 11141114 needs needs to betocoded be coded with with the IRAP the IRAP picture picture 1122 1122 as theasdecoder the decoder needs needs to to knowtensor know tensordimensionality dimensionalityinformation information and and decoder decoder network network topology topology information information to proceed. to proceed.
Oneinstance One instanceof of the the FCM FCM PPS PPS 1116 1116 is needed is needed withwith the the IRAP IRAP picture picture 11221122 in order in order for inverse for inverse
quantisation to operate, subsequent instances are needed only when there is a change in the quantisation to operate, subsequent instances are needed only when there is a change in the
quantisation range quantisation 526 to range 526 to be be used with aa given used with picture. An given picture. instance of An instance of the the PPS 1116isis PPS 1116
effective from effective from the the next next coded picture. The coded picture. method1800 The method 1800 terminates terminates andand processing processing progresses progresses
to the next instance of the source data 113 (e.g., the next frame from the video source 112). to the next instance of the source data 113 (e.g., the next frame from the video source 112).
[000129] Fig. 66 is
[000129] Fig. is aa schematic schematic block diagram600 block diagram 600showing showingoneone type type of of multi-scale multi-scale feature feature
fusion (MSFF) fusion module (MSFF) module 600, 600, which which may may serveserve as the as the tensor tensor compressor compressor 530. 530. The The MSFF MSFF module600 module 600takes takesthe thetensors tensors115 115and andproduces producesa a compressed compressed tensor tensor 532, 532, having having reduced reduced
dimensionality compared to the tensors 115 and thus resulting in a reduction in bitrate when dimensionality compared to the tensors 115 and thus resulting in a reduction in bitrate when
encodedasaspart encoded part of of aa packed frame. The packed frame. TheMSFF MSFF module module 600 trained 600 uses uses trained network network layerslayers and and requires aa corresponding requires moduleininthe corresponding module thetensor tensor decoder decoder146 146totorestore restore tensor tensor dimensionality dimensionalitySO so the tensors the tensors 149 149 may besupplied may be suppliedtoto the the CNN CNN head head 150. 150. TheThe MSFFMSFF modulemodule 600four 600 takes takes four tensors as input and requires each one the tensors to have two-hundred and fixty-six (256) tensors as input and requires each one the tensors to have two-hundred and fixty-six (256)
channels, SO channels, so that that the theMSFF module MSFF module 600 600 is is compatible compatible with with thethe P-layers P-layers of of theFasterRCNN the FasterRCNNor or
44204385_1 44204385_1
39
MaskRCNN MaskRCNN networks. networks. However, However, variants variants of the of themodule MSFF MSFF600 module 600 compatible compatible with with different different 07 Jun 2024
numbers of layers and different channel counts are possible. numbers of layers and different channel counts are possible.
[000130] TheMSFF
[000130] The MSFF module module 600 produces 600 produces one tensor one tensor as output as output with sixty-four with sixty-four (64) (64) channels channels
and a feature map size corresponding to the P5 layer seen at the input, however variants with and a feature map size corresponding to the P5 layer seen at the input, however variants with
different channel different channel count count are are also also possible. possible. Each Each variant variant of of the theMSFF module MSFF module 600 600 requires requires
different weights different weights to to be be used used for forproper proper operation. operation. Where several variants Where several variants of of the the MSFF MSFF
module600 600are areable abletotobe beused usedinin the the system system100 100for for aa given given network, network,the the packing packingformat formatmay may 2024203901
module
be set to a worst-case feature map count of the compressed tensors of the currently used be set to a worst-case feature map count of the compressed tensors of the currently used
decodernetwork decoder networktopology, topology,and andthetheactual actualused usedchannel channelcount countmaymay be be updated updated at runtime at runtime as part as part
of the tensor information. of the tensor information.
[000131]The
[000131] TheMSFC MSFC module module 600 includes 600 includes an block an MSFF MSFF610 block 610inshown shown in which Fig. 6, Fig. 6,produces which produces a single a single tensor tensor from from the the plurality pluralityofof tensors 115 tensors 115using usingone oneoror more moredownsampling filters. The downsampling filters. The MSFF MSFF block block 610, 610, under under execution execution of of thethe processor processor 205, 205, combines combines eacheach tensor tensor of aoffirst a firstset setofof tensors (i.e., tensors (i.e., 602, 603, 602, 604, 603, 605), 604, to to 605), produce a combined produce a combined tensor tensor629. 629.The The combined tensor629 combined tensor 629 forms aa representation forms representation of of the the FPN layer tensors. FPN layer tensors. Downsample Downsample modules modules 622a, 622a, 622b,622b, and and 622c 622c operate on the tensors having larger spatial scale, i.e., P4 604 at (2h, 2w, 256), and P3 603 at operate on the tensors having larger spatial scale, i.e., P4 604 at (2h, 2w, 256), and P3 603 at
(4h, 4w, (4h, 4w, 256), 256), and and P2 602 602 at (8h, at (8h, 8w,8w, 256), 256), respectively. respectively. Modules Modules 622a, 622a, 622b,622b, and and 622c 622c perform downsampling to match the spatial scale of the smallest tensor, i.e., P5 605 at (h, w, perform downsampling to match the spatial scale of the smallest tensor, i.e., P5 605 at (h, W,
256), producing 256), downscaled producing downscaled P5 P5 tensors tensors 623a, 623a, 623b, 623b, 623c, 623c, respectively. respectively. A concatenation A concatenation
module624 module 624performs performs a channel-wise a channel-wise concatenation concatenation of the of the tensors tensors 605, 605, 623a, 623a, 623b, 623b, andand 623c 623c to to produceconcatenated produce concatenatedtensor tensor625, 625,ofofdimensions dimensions (h,W,w,1024). (h, 1024).TheThe concatenated concatenated tensor tensor 625625 is is passed to passed to aa squeeze and excitation squeeze and excitation (SE) module626 (SE) module 626totoproduce producea atensor tensor627. 627.TheThe SE SE
module626 module 626sequentially sequentiallyperforms performsa a globalpooling, global pooling,a afully-connected fully-connectedlayer layerwith withreduction reductioninin channel count, a rectified linear unit activation unit, a second fully-connected layer restoring the channel count, a rectified linear unit activation unit, a second fully-connected layer restoring the
channel count, channel count, and and aa sigmoid sigmoidactivation activation function function to to produce produce aa scaling scaling tensor. tensor. The The tensor tensor 625 625is is scaled according scaled to the according to the scaling scaling tensor tensor to toproduce produce the theoutput outputas asthe thetensor 627. tensor 627.The The SE SE
block 626 is capable of being trained to adaptively alter the weighting of different channels in block 626 is capable of being trained to adaptively alter the weighting of different channels in
the tensor passed through, based on the first fully-connected layer output. the tensor passed through, based on the first fully-connected layer output.
[000132]The
[000132] Thefirst first fully-connected fully-connected layer layer output output reduces each feature reduces each feature map for each map for eachchannel channeltotoaa single value. Each single value is passed through a non-linear activation unit (ReLU) to create a single value. Each single value is passed through a non-linear activation unit (ReLU) to create a
conditional representation of the single value, suitable for weighting of other channels, with conditional representation of the single value, suitable for weighting of other channels, with
restoration to restoration tothe thefull fullchannel count channel countperformed performed by by the the second second fully-connected layer. The fully-connected layer. SE The SE
block 626 is thus capable of extracting non-linear inter-channel correlation in producing the block 626 is thus capable of extracting non-linear inter-channel correlation in producing the
44204385_1 44204385_1
40
tensor 627 from the tensor 625, to a greater extent than is possible purely with convolutional tensor 627 from the tensor 625, to a greater extent than is possible purely with convolutional 07 Jun 2024
(linear) layers. The tensor 627 is passed to a convolutional layer 628. The convolutional layer (linear) layers. The tensor 627 is passed to a convolutional layer 628. The convolutional layer
628 implements 628 implementsone one oror more more convolutional convolutional layers layers to to produce produce thethe combined combined tensor tensor 629,629, withwith
channel count reduced to F channels, typically 256 channels (i.e., F = 256). Further reduction channel count reduced to F channels, typically 256 channels (i.e., F = 256). Further reduction
in the in the channel channel count count is is achieved achieved by by a a single-scale single-scalefeature featurecompression compression (SSFC) module (SSFC) module 650. 650.
[000133]The
[000133] TheSSFC SSFC module module 650 650 receives receives the tensor the tensor 629 629 and and applies applies a convolution a convolution 652 652 to to reduce the the channel channel count countfrom fromF F(256) (256)down downto to C'C’ (nominally setset to to 6464 channels) toto produce 2024203901
reduce (nominally channels) produce
tensor 653. tensor Thetensor 653. The tensor653 653isis then then passed passedto to aa batch batch normalisation module654 normalisation module 654totoproduce produce batch normalised tensor 655, which is passed to a hyperbolic tangent activation layer 656 to batch normalised tensor 655, which is passed to a hyperbolic tangent activation layer 656 to
producethe produce the compressed compressedtensor tensor532. 532.TheThe output output of of thethe MSFC MSFC module module 600 is600 is atensor a one one tensor per per frame with frame with aa fixed fixed feature feature map size and map size and fixed fixed channel channel count. count.
[000134]Fig.
[000134] Fig. 77 is is aa schematic schematic block diagramshowing block diagram showingan an example example picture picture structure structure 700 700 with with
one level one level of of temporal temporal interpolation interpolation added added to to aa low-delay low-delay bi-predicted bi-predicted coding structure. The coding structure. The
video encoder video encoder542 542may maybe be configured configured to to implement implement the the picture picture structure structure 700, 700, providing providing an an
alternative totothe alternative theuse useofof thethetemporal temporaldownsampler 520and downsampler 520 andaatemporal temporalupsampler upsampler 1260, 1260,
described with described with reference reference to to Fig. Fig. 12. 12. The picture structure The picture structure 700 700 operates operates such such that that odd-numbered odd-numbered
pictures, by pictures, by picture pictureorder ordercount count(POC) refer to (POC) refer to the theimmediately immediately preceding andfollowing preceding and following pictures for inter prediction, via list 0 (L0) and list 1 (L1), respectively. For example, POC #1 pictures for inter prediction, via list 0 (L0) and list 1 (L1), respectively. For example, POC #1
refers to refers toPOC #0and POC #0 andPOC POC#2.#2. This This requires requires POCPOC #2betodecoded #2 to be decoded priorprior to POC to POC #1, resulting #1, resulting
in one frame of structural delay implicit in the picture structure 700. Then, each even- in one frame of structural delay implicit in the picture structure 700. Then, each even-
numberedPOC numbered POC includes includes a reference a reference to to thethe previous previous picture picture with with anan even-numbered even-numbered POC, POC, for for examplePOC example POC#2 #2 refers refers to to POC POC #0. #0. To assist To assist withwith coding coding of relatively of relatively stablescenes, stable scenes,past past reference to pictures of POC # modulus 8 equal to 0 are also kept, up to a limit, such as the last reference to pictures of POC # modulus 8 equal to 0 are also kept, up to a limit, such as the last
two or two or three three such such pictures. pictures. For For example, POC example, POC # 6also # 6 alsorefers referstoto POC POC#0#0 and and POCPOC #14 #14 (i.e., (i.e.,
belongingto belonging to the the subsequent GOP) subsequent GOP) referstotoPOC refers POC#8 #8 (firstpicture (first pictureof of the the second secondGOP) GOP) and and POCPOC
#0 (first #0 (first picture pictureofof thethe first GOP). first Each GOP). Eachpicture picturewith withananeven-numbered POC even-numbered POC referencesthethe references
previous even-numbered previous even-numbered POCPOC and pictures and pictures withwith POC POC modulus modulus 8 of 8 of the the current current and asand as many many previous GOPs as possible up to a limit, such as the decoded picture buffer size limitation of six previous GOPs as possible up to a limit, such as the decoded picture buffer size limitation of six
pictures (in HEVC) or eight pictures (in VVC), with one picture slot reserved for the current pictures (in HEVC) or eight pictures (in VVC), with one picture slot reserved for the current
picture, resulting picture, resultinginina maximum of five a maximum of five or or seven seven reference reference pictures, pictures,respectively. respectively.The The GOP GOP
structure shown in Fig. 7 repeats every eight frames, so where prior references with negative structure shown in Fig. 7 repeats every eight frames, SO where prior references with negative
numbers are shown (e.g., -8 or -16), these are to be interpreted as references to preceding numbers are shown (e.g., -8 or -16), these are to be interpreted as references to preceding
GOPs.InInthe GOPs. thecase caseof of pictures pictures with with even-numbered POCs, even-numbered POCs, both both reference reference listsinclude lists includethethesame same set of preceding pictures with the same ordering. set of preceding pictures with the same ordering.
44204385_1 44204385_1
41
[000135] Fig. 88 is
[000135] Fig. is aa schematic schematic block diagramshowing block diagram showing functionalmodules functional modules of of a video a video encoder encoder 07 Jun 2024
800 which 800 whichmay maybebe implemented implemented as the as the video video encoder encoder 542.542. The video The video encoder encoder 542 542 may bemay be implementedusing implemented usinga ageneral-purpose general-purpose computer computer system system 200,200, as shown as shown in Figs. in Figs. 2A 2B, 2A and and 2B, wherethe where the various various functional functional modules modulesmay maybe be implemented implemented by dedicated by dedicated hardware hardware within within the the computersystem computer system200, 200,bybysoftware software executable executable within within thethe computer computer system system 200 200 suchsuch as one as one or or moresoftware more softwarecode codemodules modulesof of thethe software software applicationprogram application program 233233 resident resident on on thethe hard hard disk disk
drive 205 and being controlled in its execution by the processor 205. Alternatively, the video drive 205 and being controlled in its execution by the processor 205. Alternatively, the video 2024203901
encoder542 encoder 542may maybebeimplemented implemented by aby a combination combination of dedicated of dedicated hardware hardware and software and software
executable within executable within the the computer computersystem system200. 200.TheThe video video encoder encoder 542 542 and and the described the described methods methods
mayalternatively may alternatively be be implemented implementedinindedicated dedicatedhardware, hardware, such such as as one one or or more more integrated integrated
circuits performing circuits performing the the functions functions or or sub sub functions functions of ofthe thedescribed describedmethods. Suchdedicated methods. Such dedicated hardwaremay hardware may include include graphic graphic processing processing units(GPUs), units (GPUs), digitalsignal digital signalprocessors processors(DSPs), (DSPs), application-specific standard products (ASSPs), application-specific integrated circuits application-specific standard products (ASSPs), application-specific integrated circuits
(ASICs),FPGAs (ASICs), FPGAsor or oneone or or more more microprocessors microprocessors and associated and associated memories. memories. In particular, In particular, the the video encoder video encoder542 542comprises comprises modules modules 810-890 810-890 which which maybe may each each be implemented implemented as more as one or one or more software code software codemodules modulesofofthe thesoftware softwareapplication applicationprogram program 233. 233.
[000136] Although
[000136] Although thevideo the videoencoder encoder 542542 of of Fig. Fig. 8 8isisan anexample exampleof of a a versatilevideo versatile videocoding coding (VVC)video (VVC) videoencoder, encoder, othervideo other videocodecs codecs maymay alsoalso be be used used to perform to perform the the processing processing stages stages
described herein. described herein. For For example, HEVC example, HEVC or AVC or AVC or other or other types types of encoders of encoders may may be be used. used. The The examplesdescribed examples describedgenerate generatea abitstream bitstreamofofencoded encodeddata. data.IfIf other other codecs codecswere wereused, used,some some implementationsmay implementations may pack pack data data intoa adifferent into differentformat formatsuch suchasasaaframe frameformat formatororthe thelike. like. The The video encoder video encoder800 800receives receivesframe framedata data712, 712,each eachframe frame including including one one or or more more colour colour channels. channels.
Theframe The framedata data712 712corresponds correspondstoto thetensors the tensors540 540ininpacked packedform, form,asasimplemented implemented at the at the step step
1890. Theframe 1890. The framedata data712 712may maybe be in in any any chroma chroma format format and and bit bit depth depth supported supported by the by the profile profile
in use, for example 4:0:0, 4:2:0 for the “Main 10” profile of the VVC standard, at eight (8) to in use, for example 4:0:0, 4:2:0 for the "Main 10" profile of the VVC standard, at eight (8) to
ten (10) bits in sample precision. ten (10) bits in sample precision.
[000137]
[000137] As As seen seen in Fig. in Fig. 8, a8, a block block partitioner partitioner 810 firstly 810 firstly divides divides thedata the frame frame 712 data 712 into CTUs, into CTUs,
generally square in shape and configured such that a particular size for the CTUs is used. The generally square in shape and configured such that a particular size for the CTUs is used. The
maximum maximum enabled enabled sizesize of of thethe CTUs CTUs may may be 32×32, be 32x32, 64x64,64×64, or 128×128 or 128x128 luma samples luma samples for for example,configured example, configuredbybya a'sps_log2_ctu_size_minus5' ‘sps_log2_ctu_size_minus5’ syntax syntax element element present present in the in the ‘sequence 'sequence
parameterset' parameter set’ (i.e., (i.e., thethe SPS SPS1118). 1118). The The ‘sps_log2_ctu_size_minus5’ syntax 'sps_log2_ctu_size_minus5' syntax element element uses uses
values 0, values 0, 1, 1, and and 22 to tocorrespond correspond to to CTU sizes of CTU sizes of 32×32, 64×64,and 32x32, 64x64, and128x128, 128×128, respectively respectively TheThe
CTUsize CTU sizealso alsoprovides providesaamaximum maximum CU size, CU size, as aas a CTU CTU with with no further no further splitting splitting willwill contain contain
44204385_1 44204385_1
42
one CU. one CU.Ternary Ternary splittingisis prohibited splitting prohibited when whena aCUCU has has one one or or more more dimensions dimensions of length of length 128 128 07 Jun 2024
lumasamples. luma samples.AsAs a a consequence, consequence, processing processing may may fully fully handle handle eacheach 64×64 64x64 quadrant quadrant of theof the 128×128 beforeprogressing 128x128 before progressingfrom from one one quadrant quadrant to to thethe next next quadrant. quadrant. Large Large CUsCUs such such as as
64×128are 64x128 areprocessed processedasasa apair pair of of 64x64 64×64regions. regions.AsAsa aresult resultof of quadrant-based quadrant-basedprocessing processing (sometimes referred to as “virtual pipeline data units” or “VPDUs”), internal storage in the (sometimes referred to as "virtual pipeline data units" or "VPDUs"), internal storage in the
video encoder video encoder800, 800,and andaacorresponding correspondingvideo videodecoder decoder 1204 1204 (also (also referred referred toto asasa apicture picture decoder), is decoder), is only only needed for 64×64 needed for sampleseven 64x64 samples evenwhen when thethe CTUCTU sizesize is configured is configured as 128x128. as 128x128. 2024203901
Feature maps are typically smaller than video frame size, due to use of intervening pooling Feature maps are typically smaller than video frame size, due to use of intervening pooling
operations or operations or convolution operations with convolution operations with stride stride parameter greater than parameter greater than one. Feature maps one. Feature mapsdodo not require not require the the large largeCU sizes provided CU sizes provided by VVC.UseUse by VVC. of of a 32×32 a 32x32 CTU CTU size size provides provides sufficient sufficient
flexibility in block structure to efficiently encode structural detail found in feature maps with a flexibility in block structure to efficiently encode structural detail found in feature maps with a
smaller amount smaller amountofofmemory memory required required forfor intermediate intermediate storage storage in in thememory the memory 206,206, i.e.,storage i.e., storage for partially decoded data from a bitstream 1206 but prior to a frame buffer 1396, to be for partially decoded data from a bitstream 1206 but prior to a frame buffer 1396, to be
described. Use of a smaller CTU size reduces the variety of CU sizes that are able to be tested described. Use of a smaller CTU size reduces the variety of CU sizes that are able to be tested
in the in the block block partitioner partitioner810, 810,reducing reducingruntime. runtime. Constraining the CTU Constraining the sizetoto 32x32 CTU size 32×32indicates indicatesaa reducedmemory reduced memory consumption consumption in the in the video video decoder decoder 1204 1204 required required for decoding for decoding the the bitstream 1206 bitstream 1206however howeverthetheworst-case worst-caseofof128x128 128×128 would would needneed to betosupported be supported should should such such a a bitstream be bitstream be encountered. encountered. A A collectionofofsyntax collection syntaxelements elementsforming forming a ‘general_constraints_info’ a 'general_constraints_info'
syntax structure syntax structure may be present may be present in in the the SPS 1118that SPS 1118 that constrains constrains allowed allowedvalues valuesof of other other syntax syntax elements in the SPS 1118 and indicate a compatibility point other than a profile defined in the elements in the SPS 1118 and indicate a compatibility point other than a profile defined in the
H.266/VVC H.266/VVC specification,such specification, such compatibility compatibility pointsareareknown points knownas as ‘subprofiles’ 'subprofiles' andand enable enable
application-specific definition of a subset of the tools of a given H.266/VVC profile. A application-specific definition of a subset of the tools of a given H.266/VVC profile. A
gci_three_minus_max_log2_ctu_size_constraint_idc gci_three_minus_max_log2_ctu_size_constraint_ide syntax syntax elementelement with 0, with values values 1, 2 0, 1, 2
constrains the constrains the maximum allowed maximum allowed CTUCTU size size in the in the SPS SPS 11181118 to 128×128, to 128x128, 64x64,64×64, or 32×32, or 32x32,
respectively. AA general respectively. general constraint constraint restricting restrictingthe themaximum CTU maximum CTU size size to to 32×32 32x32 maymay formform a a subprofile (or part of a subprofile), enabling the worst-case complexity requirement of the video subprofile (or part of a subprofile), enabling the worst-case complexity requirement of the video
decoder 1204 decoder 1204totobe bereduced reducedcompared comparedto to thethe case case where where thethe fullH.266/VVC full H.266/VVC profile profile werewere
required to be supported. One instance of the SPS 1118 is needed prior to the first picture in the required to be supported. One instance of the SPS 1118 is needed prior to the first picture in the
bitstream 121 and also at any subsequent entry points (or ‘random access points’) into the bitstream 121 and also at any subsequent entry points (or 'random access points') into the
bitstream 121 bitstream fromwhich 121 from whichdecoding decoding cancan commence. commence. The block The block partitioner partitioner 810 further 810 further divides divides
each CTU each CTUinto intoone oneorormore more CBs CBs according according to atoluma a luma coding coding treetree and and a chroma a chroma coding coding tree.tree. The The lumachannel luma channelmay may alsobebereferred also referredtotoas as aa primary primarycolour colourchannel. channel.Each Eachchroma chroma channel channel maymay
also be also be referred referred to toas asa asecondary secondarycolour colourchannel. channel.The The CBs haveaavariety CBs have variety of of sizes, sizes,and and may may
include both include both square square and and non-square non-squareaspect aspectratios. ratios. However, However,inin theVVC the VVC standard, standard, CBs, CBs, CUs,CUs,
PUs, and PUs, andTUs TUsalways always have have side side lengths lengths thatare that arepowers powersofof two.Thus, two. Thus, a current a current CB, CB, represented represented
44204385_1 44204385_1
43
as 812, is output from the block partitioner 810, progressing in accordance with an iteration as 812, is output from the block partitioner 810, progressing in accordance with an iteration 07 Jun 2024
over the over the one or more one or blocksof more blocks of the the CTU, CTU,ininaccordance accordancewith withthetheluma luma coding coding tree tree and and the the
chromacoding chroma codingtree treeofofthe the CTU. CTU.
[000138] TheCTUs
[000138] The CTUs resulting resulting from from thethe firstdivision first divisionofof the the frame framedata data 712 712may maybebescanned scanned in in
raster scan order and may be grouped into one or more ‘slices’. A slice may be an ‘intra’ (or raster scan order and may be grouped into one or more 'slices'. A slice may be an 'intra' (or
‘I’) 'I') slice. slice. An intra slice An intra slice (I (I slice) slice) indicates that every indicates that everyCUCU in the in the slice slice is intra is intra predicted. predicted.
Generally, the first picture in a coded layer video sequence (CLVS) contains only I slices, and 2024203901
Generally, the first picture in a coded layer video sequence (CLVS) contains only I slices, and
is referred to as an ‘intra picture’. The CLVS may contain periodic intra pictures, forming is referred to as an 'intra picture'. The CLVS may contain periodic intra pictures, forming
‘random accesspoints' 'random access points’(i.e., (i.e., intermediate intermediateframes frames in inaavideo videosequence sequence upon whichdecoding upon which decodingcan can commence). Alternatively, a slice may be uni- or bi-predicted (‘P’ or ‘B’ slice, respectively), commence). Alternatively, a slice may be uni- or bi-predicted ('P' or 'B' slice, respectively),
indicating additional availability of uni- and bi-prediction in the slice, respectively. indicating additional availability of uni- and bi-prediction in the slice, respectively.
[000139]The
[000139] Thevideo videoencoder encoder 542 542 encodes encodes sequences sequences of pictures of pictures according according to atopicture a picture structure. structure.
One picture structure is ‘low delay’, in which case pictures using inter-prediction may only One picture structure is 'low delay', in which case pictures using inter-prediction may only
reference pictures reference pictures occurring occurring previously previously in in the the sequence. Lowdelay sequence. Low delayenables enableseach eachpicture picturetotobebe output as soon as the picture is decoded, in addition to being stored for possible reference by a output as soon as the picture is decoded, in addition to being stored for possible reference by a
subsequentpicture. subsequent picture. Another Anotherpicture picturestructure structure is is ‘random access’, whereby 'random access', wherebythe thecoding codingorder orderofof pictures differs from the display order. Random access allows inter-predicted pictures to pictures differs from the display order. Random access allows inter-predicted pictures to
reference other reference other pictures pictures that, that,although althoughdecoded, decoded, have have not not yet yet been been output. output. A degree of A degree of picture picture buffering is needed so the reference pictures in the future in terms of display order are present buffering is needed so the reference pictures in the future in terms of display order are present
in the decoded picture buffer, resulting in a latency of multiple frames. in the decoded picture buffer, resulting in a latency of multiple frames.
[000140] When
[000140] When a chroma a chroma format format other other than than 4:0:0 4:0:0 is in use, is inin anuse, in anthe I slice, I slice, codingthe coding tree of eachtree of each
CTUmay CTU may diverge diverge below below the the 64×64 64x64 levellevel intointo two two separate separate coding coding trees, trees, oneone forfor luma luma and and
another for another for chroma. Useofofseparate chroma. Use separatetrees trees allows allowsdifferent different block structure to block structure toexist existbetween between luma luma
and chroma and chromawithin withina aluma luma64x64 64×64 area area of of a CTU. a CTU. For For example, example, a large a large chroma chroma CB mayCB be may be collocated with collocated numeroussmaller with numerous smallerluma luma CBs CBs and and vicevice versa. versa. In aInPaor P or B slice,a asingle B slice, singlecoding coding tree of tree of aaCTU defines aa block CTU defines block structure structure common common toto luma luma andand chroma. chroma. The The resulting resulting blocks blocks of of the single tree may be intra predicted or inter predicted. the single tree may be intra predicted or inter predicted.
[000141] In addition to a division of pictures into slices, pictures may also be divided into
[000141] In addition to a division of pictures into slices, pictures may also be divided into
‘tiles’. AA tile 'tiles'. tileis is a sequence of of a sequence CTUs CTUscovering covering aa rectangular rectangularregion regionof ofa apicture. picture.CTU CTU scanning scanning
occurs in a raster-scan manner within each tile and progresses from one tile to the next. A slice occurs in a raster-scan manner within each tile and progresses from one tile to the next. A slice
44204385_1 44204385_1
44
can be can be either either an an integer integernumber of tiles, number of tiles,oror ananinteger number integer numberof ofconsecutive consecutiverows rows of of CTUs CTUs 07 Jun 2024
within a given tile. within a given tile.
[000142]For
[000142] Foreach eachCTU, CTU,thethe video video encoder encoder 542542 as shown as shown in Fig. in Fig. 8 operates 8 operates in two in two stages. stages. In In the first stage (referred to as a ‘search’ stage), the block partitioner 810 tests various potential the first stage (referred to as a 'search' stage), the block partitioner 810 tests various potential
configurations of a coding tree. Each potential configuration of a coding tree has associated configurations of a coding tree. Each potential configuration of a coding tree has associated
‘candidate’ 'candidate' CBs. Thefirst CBs. The first stage stage involves testing various involves testing various candidate candidate CBs to select CBs to select CBs providing CBs providing
relatively high compression efficiency with relatively low distortion. The testing generally 2024203901
relatively high compression efficiency with relatively low distortion. The testing generally
involves aa Lagrangian involves optimisationwhereby Lagrangian optimisation whereby a candidate a candidate CB CB is evaluated is evaluated based based on on a weighted a weighted
combination of rate (i.e., coding cost) and distortion (i.e., error with respect to the input frame combination of rate (i.e., coding cost) and distortion (i.e., error with respect to the input frame
data 712). ‘Best’ candidate CBs (i.e., the CBs with the lowest evaluated rate/distortion) are data 712). 'Best' candidate CBs (i.e., the CBs with the lowest evaluated rate/distortion) are
selected for selected for subsequent subsequent encoding into the encoding into the bitstream bitstream portion portion 121. Includedinin evaluation 121. Included evaluation of of candidate CBs is an option to use a CB for a given area or to further split the area according to candidate CBs is an option to use a CB for a given area or to further split the area according to
various splitting options and code each of the smaller resulting areas with further CBs, or split various splitting options and code each of the smaller resulting areas with further CBs, or split
the areas the areas even even further. further. As As a a consequence, boththe consequence, both the coding codingtree tree and and the the CBs CBsthemselves themselvesareare selected in the search stage. selected in the search stage.
[000143] Thevideo
[000143] The videoencoder encoder 542 542 produces produces a prediction a prediction block block (PB), (PB), indicated indicated by by an an arrow arrow 820, 820,
for each for each CB, for example, CB, for CB812. example, CB 812.TheThe PB PB 820 820 is aisprediction a prediction of of thethe contents contents ofof theassociated the associated CB812. CB 812.A A subtractermodule subtracter module 822822 produces produces a difference, a difference, indicated indicated as as 824824 (or(or ‘residual’, 'residual',
referring to referring tothe thedifference differencebeing beingininthe spatial the domain), spatial between domain), betweenthe thePB PB820 820 and and the the CB 812. CB 812.
Thedifference The difference 824 824is is aa block-size block-size difference difference between correspondingsamples between corresponding samplesinin thePBPB the 820 820 andand
the CB the 812.The CB 812. The difference824 difference 824 isistransformed, transformed,quantised quantisedandand represented represented as as a a transform transform block block
(TB), indicated (TB), indicated by an arrow by an arrow 836. 836. The ThePBPB 820820 andand associated associated TB TB 836 836 are typically are typically chosen chosen fromfrom
one of one of many manypossible possiblecandidate candidateCBs, CBs,forforexample, example, based based on on evaluated evaluated cost cost or or distortion. distortion.
[000144]
[000144] AAcandidate candidatecoding codingblock block (CB) (CB) is is a aCBCB resulting resulting from from oneone of of thethe prediction prediction modes modes
available to available to the thevideo video encoder encoder 542 for the 542 for the associated associated PB PB and the resulting and the resulting residual. residual.When When
combinedwith combined withthe thepredicted predictedPBPB inin thevideo the videoencoder encoder542, 542,thetheTBTB 836836 reduces reduces thethe difference difference
betweenaadecoded between decodedCBCB andand thethe original original CBCB 812812 at the at the expense expense of additional of additional signalling signalling inin a a bitstream. bitstream.
[000145] Eachcandidate
[000145] Each candidatecoding coding block block (CB) (CB) (i.e.,prediction (i.e., predictionblock block(PB) (PB)inincombination combination with with a a
transform block (TB)), has an associated coding cost (or ‘rate’) and an associated difference (or transform block (TB)), has an associated coding cost (or 'rate') and an associated difference (or
‘distortion’). Thedistortion 'distortion'). The distortion of of thethe CBtypically CB is is typically estimated estimated as a difference as a difference invalues, in sample sample values,
44204385_1 44204385_1
45
such as such as aa sum of absolute sum of absolute differences differences (SAD), (SAD), aa sum sumofofsquared squareddifferences differences(SSD) (SSD)or or a a 07 Jun 2024
Hadamard Hadamard transform transform applied applied to to thedifferences. the differences.The Theestimate estimateresulting resultingfrom fromeach eachcandidate candidatePBPB maybebedetermined may determinedbyby a a mode mode selector selector 886886 using using thethe difference difference 824 824 to to determine determine a prediction a prediction
mode887. mode 887.TheThe prediction prediction mode mode 887 887 indicates indicates the the decision decision to to useuse a particularprediction a particular predictionmode mode for the current CB, for example, intra-frame prediction or inter-frame prediction. Estimation of for the current CB, for example, intra-frame prediction or inter-frame prediction. Estimation of
the coding the costs associated coding costs associated with with each each candidate prediction mode candidate prediction andcorresponding mode and corresponding residual residual
coding may coding maybebeperformed performedat at significantlylower significantly lowercost costthan thanentropy entropycoding codingofofthe theresidual. residual. 2024203901
Accordingly,aanumber Accordingly, numberofofcandidate candidatemodes modes maymay be evaluated be evaluated to determine to determine an optimum an optimum mode mode in in a rate-distortion sense even in a real-time video encoder. a rate-distortion sense even in a real-time video encoder.
[000146] Determining
[000146] Determining a preferredmode a preferred mode in in terms terms of of rate-distortionisis typically rate-distortion typically achieved using aa achieved using
variation of variation of Lagrangian optimisation. Lagrangian Lagrangian optimisation. Lagrangianororsimilar similaroptimisation optimisationprocessing processingcan canbebe employedtotoboth employed bothselect select aa preferred preferred partitioning partitioning of ofaaCTU into CBs CTU into (bythe CBs (by the block blockpartitioner partitioner 810) as well as the selection of a prediction mode from a plurality of possibilities. Through 810) as well as the selection of a prediction mode from a plurality of possibilities. Through
application of application of aa Lagrangian optimisation process Lagrangian optimisation process of of the the candidate modesininthe candidate modes the mode modeselector selector module886, module 886,the theintra intra prediction prediction mode withthe mode with thelowest lowestcost costmeasurement measurementis is selectedasasa a'best' selected ‘best’ mode.The mode. The lowest lowest costmode cost mode includes includes a selected a selected secondary secondary transform transform index index 888,888, which which is is encodedinin the encoded the bitstream bitstream 121 121by byan anentropy entropyencoder encoder838. 838.
[000147]
[000147] In In thethe second second stage stage of operation of operation of the of the encoder video video encoder 542 to 542 (referred (referred to as a ‘coding’ as a 'coding'
stage), an iteration over the determined coding tree(s) of each CTU is performed in the video stage), an iteration over the determined coding tree(s) of each CTU is performed in the video
encoder 542. encoder 542. For Fora aCTU CTU using using separate separate trees,for trees, foreach each64x64 64×64 luma luma region region of of thethe CTU, CTU, a luma a luma
coding tree coding tree is is firstly firstlyencoded encodedfollowed followed by by aachroma codingtree. chroma coding tree. Within Withinthe the luma lumacoding codingtree, tree, only luma only lumaCBs CBsare areencoded encodedandand within within thethe chroma chroma coding coding treetree only only chroma chroma CBsencoded. CBs are are encoded. For a CTU using a shared tree, a single tree describes the CUs (i.e., the luma CBs and the For a CTU using a shared tree, a single tree describes the CUs (i.e., the luma CBs and the
chromaCBs) chroma CBs) according according to to thecommon the common block block structure structure of the of the shared shared tree. tree.
[000148]The
[000148] Theentropy entropyencoder encoder 838 838 supports supports bitwise bitwise coding coding of of syntax syntax elements elements using using variable- variable-
length and length fixed-length codewords, and fixed-length andananarithmetic codewords, and arithmeticcoding codingmode modeforfor syntax syntax elements. elements.
Portions of the bitstream such as ‘parameter sets’, for example, the SPS, the picture parameter Portions of the bitstream such as 'parameter sets', for example, the SPS, the picture parameter
set (PPS), set (PPS), and and the the picture pictureheader header (PH) (PH) use use a a combination of fixed-length combination of fixed-length codewords codewordsand and variable-length codewords. Slices, also referred to as contiguous portions, have a slice header variable-length codewords. Slices, also referred to as contiguous portions, have a slice header
that uses variable length coding followed by slice data, which uses arithmetic coding. The slice that uses variable length coding followed by slice data, which uses arithmetic coding. The slice
header defines parameters specific to the current slice, such as slice-level quantisation header defines parameters specific to the current slice, such as slice-level quantisation
parameteroffsets, parameter offsets, and and may includean may include aninstance instanceof of the the PH. Theslice PH. The slicedata data includes includes the the syntax syntax
44204385_1 44204385_1
46
elements of elements of each each CTU CTU in in theslice. the slice. Use Useofofvariable variablelength lengthcoding codingand andarithmetic arithmeticcoding coding 07 Jun 2024
requires sequential requires sequential parsing parsing within within each each portion portion of of the thebitstream. bitstream. The The portions portions may be may be
delineated with a start code to form ‘network abstraction layer units’ or ‘NAL units’. delineated with a start code to form 'network abstraction layer units' or 'NAL units'.
Arithmeticcoding Arithmetic codingisis supported supportedusing usingaa context-adaptive context-adaptivebinary binaryarithmetic arithmeticcoding codingprocess. process.
[000149] Arithmeticallycoded
[000149] Arithmetically codedsyntax syntaxelements elements consistofofsequences consist sequences of of one one or or more more ‘bins’. 'bins'.
Bins, like bits, have a value of ‘0’ or ‘1’. However, bins are not encoded in a bitstream Bins, like bits, have a value of '0' or '1'. However, bins are not encoded in a bitstream
portion 716 (corresponding to the bitstream 546) as discrete bits. Bins have an associated 2024203901
portion 716 (corresponding to the bitstream 546) as discrete bits. Bins have an associated
predicted (or ‘likely’ or ‘most probable’) value and an associated probability, known as a predicted (or 'likely' or 'most probable') value and an associated probability, known as a
‘context’. 'context'. When theactual When the actualbin bin to to be be coded codedmatches matchesthe thepredicted predictedvalue, value,aa 'most ‘mostprobable probable symbol’(MPS) symbol' (MPS)is is coded.Coding coded. Coding a most a most probable probable symbol symbol is relatively is relatively inexpensive inexpensive in terms in terms of of consumed bits in the bitstream portion 121, including costs that amount to less than one discrete consumed bits in the bitstream portion 121, including costs that amount to less than one discrete
bit. When bit. the actual When the actual bin bin to to be be coded coded mismatches thelikely mismatches the likely value, value, aa ‘least 'leastprobable probablesymbol’ symbol'
(LPS)is (LPS) is coded. Codinga aleast coded. Coding least probable probable symbol symbolhas hasa arelatively relatively high high cost cost in in terms terms of of consumed consumed
bits. The bin coding techniques enable efficient coding of bins where the probability of a ‘0’ bits. The bin coding techniques enable efficient coding of bins where the probability of a '0'
versus a ‘1’ is skewed. For a syntax element with two possible values (i.e., a ‘flag’), a single versus a '1' is skewed. For a syntax element with two possible values (i.e., a 'flag'), a single
bin is bin is adequate. adequate. For syntax elements For syntax elementswith withmany many possiblevalues, possible values,a asequence sequenceofof binsisisneeded. bins needed. Theconvention The conventionfor forconverting convertingvalues valuesofofaasyntax syntaxelement elementinto intoaasequence sequenceofofbins binsisis termed termed ‘binarisation’. Where 'binarisation'. Where the the values values ‘0’'1' '0' and andfor‘1’a for bin a bin are are equally equally (or near(or near equally) equally) likely, itlikely, is it is possible to possible to omit omit use use of of aacontext contextand and assume an equiprobable assume an equiprobabledistribution. distribution. Bins Binswith withaa context context are termed are ‘context-codedbins' termed 'context-coded bins’ and andbins bins omitting omittingaa context context are are termed termed'bypass-coded ‘bypass-codedbins'. bins’. Thebinarization The binarization of of aa syntax syntax element into one element into or more one or bins may more bins mayresult result in in aa combination of combination of
context-codedand context-coded andbypass-coded bypass-coded bins.Unlike bins. Unlike directly directly coding coding oneone bitbit intothe into thebitstream, bitstream,aa bypass-codedbin bypass-coded binuses usesthe thearithmetic arithmetic coding codingengine, engine,which whichfacilitates facilitates mixing context-codedand mixing context-coded and bypass-codedbins bypass-coded binsinto intosyntax syntaxelement elementbinarisations. binarisations.
[000150] Foraagiven
[000150] For givenbinarization, binarization, the the presence of later presence of laterbins binsininthe sequence the sequencemay may be be determined determined
based on the value of earlier bins in the sequence, resulting in variable-length binarisations. based on the value of earlier bins in the sequence, resulting in variable-length binarisations.
Additionally, each Additionally, bin may each bin beassociated may be associatedwith withmore morethan thanone onecontext, context,with withone onecontext contextselected selected for use in coding a specific instances of the bin. The selection of a particular context may be for use in coding a specific instances of the bin. The selection of a particular context may be
dependentononearlier dependent earlier bins bins in in the the syntax syntax element, element, the the decoded values of decoded values of neighbouring syntax neighbouring syntax
elements (i.e., elements (i.e., those thosefrom from neighbouring blocks) and neighbouring blocks) and the the like. like. Each time aa context-coded Each time binis context-coded bin is encoded, the context that was selected for that bin (if any) is updated in a manner reflective of encoded, the context that was selected for that bin (if any) is updated in a manner reflective of
the new the bin value. new bin value. As Assuch, such,the thebinary binaryarithmetic arithmetic coding codingscheme schemeisissaid saidtoto be be adaptive. adaptive.
44204385_1 44204385_1
47
[000151] Theabsence
[000151] The absenceofofa acontext contextfor forbypass-coded bypass-coded binssaves bins saves memory memory and and reduces reduces 07 Jun 2024
complexity, and thus bypass bins are used where the distribution of values for the particular bin complexity, and thus bypass bins are used where the distribution of values for the particular bin
is not is not skewed. Oneexample skewed. One exampleof of an an entropy entropy coder coder employing employing context context and and adaption adaption is known is known in in the art the artas asCABAC (contextadaptive CABAC (context adaptivebinary binaryarithmetic arithmeticcoder) coder)and andmany many variants variants of of thiscoder this coder have been have beenemployed employedin in video video coding. coding.
[000152]
[000152] AAQPQP controller890 controller 890determines determines a quantisation a quantisation parameter parameter 892, 892, used used to to establisha a establish
quantisation step size for use by a quantiser 834 and a dequantiser 840. A larger quantisation 2024203901
quantisation step size for use by a quantiser 834 and a dequantiser 840. A larger quantisation
step size results in primary transform coefficients 828 being quantised into smaller values, step size results in primary transform coefficients 828 being quantised into smaller values,
reducing bitrate of the bitstream portion 716 at the expense of a reduction in the fidelity of reducing bitrate of the bitstream portion 716 at the expense of a reduction in the fidelity of
inverse transform coefficients 846. inverse transform coefficients 846.
[000153]The
[000153] Theentropy entropyencoder encoder 838 838 encodes encodes the the quantisation quantisation parameter parameter 892 892 and,and, if in if in useuse forfor the the
current CB, current the LFNST CB, the index LFNST index 888, 888, using using a combination a combination of context-coded of context-coded and and bypass-coded bypass-coded
bins. The bins. Thequantisation quantisation parameter parameter892 892isisencoded encodedatatthe thebeginning beginningofofeach eachslice sliceand andchanges changesinin the quantisation the quantisation parameter 892within parameter 892 withinaa slice slice are are coded coded using using a a ‘delta 'deltaQP’ QP' syntax syntax element. The element. The
delta QP delta syntax element QP syntax elementisis signalled signalled at at most most once in each once in area known each area known asasaa'quantisation ‘quantisation group'. group’. Thequantisation The quantisation parameter parameter892 892isisapplied appliedto to residual residual coefficients coefficients of ofthe theluma luma CB. Anadjusted CB. An adjusted quantisation parameter is applied to the residual coefficients of collocated chroma CBs. The quantisation parameter is applied to the residual coefficients of collocated chroma CBs. The
adjusted quantisation adjusted quantisation parameter mayinclude parameter may includemapping mapping from from the the luma luma quantisation quantisation
parameter892 parameter 892according accordingtotoa amapping mapping tableand table and a a CU-level CU-level offset,selected offset, selectedfrom froma alist list of of offsets. The offsets. secondarytransform The secondary transformindex index888 888isissignalled signalledwhen whenthe theresidual residualassociated associatedwith withthe the transform block includes significant residual coefficients only in those coefficient positions transform block includes significant residual coefficients only in those coefficient positions
subject to transforming into primary coefficients by application of a secondary transform. subject to transforming into primary coefficients by application of a secondary transform.
[000154] Residualcoefficients
[000154] Residual coefficients of of each each TB TBassociated associatedwith witha aCBCBarearecoded coded using using a residual a residual
syntax. The syntax. Theresidual residual syntax syntaxis is designed to efficiently designed to efficiently encode encode coefficients coefficientswith withlow low magnitudes, magnitudes,
using mainly arithmetically coded bins to indicate significance of coefficients, along with using mainly arithmetically coded bins to indicate significance of coefficients, along with
lower-valuedmagnitudes lower-valued magnitudes and and reserving reserving bypass bypass bins bins forfor higher higher magnitude magnitude residual residual coefficients. coefficients.
Accordingly,residual Accordingly, residual blocks blocks comprising comprisingvery verylow lowmagnitude magnitude values values andand sparse sparse placement placement of of significant coefficients significant coefficientsare areefficiently compressed. efficiently compressed.Moreover, Moreover, two residual coding two residual schemesare coding schemes are present. A regular residual coding scheme is optimised for TBs with significant coefficients present. A regular residual coding scheme is optimised for TBs with significant coefficients
predominantly located in the upper-left corner of the TB, as is seen when a transform is applied. predominantly located in the upper-left corner of the TB, as is seen when a transform is applied.
A transform-skip A transform-skipresidual residual coding codingscheme schemeisisavailable availablefor for TBs TBswhere wherea a transform transform isisnot not
44204385_1 44204385_1
48
performed and is able to efficiently encode residual coefficients regardless of their distribution performed and is able to efficiently encode residual coefficients regardless of their distribution 07 Jun 2024
throughoutthe throughout the TB. TB.
[000155]
[000155] AAmultiplexer multiplexermodule module884884 outputs outputs thethe PB PB 820 820 fromfrom an intra-frame an intra-frame prediction prediction
module864 module 864according according toto thedetermined the determined bestintra best intraprediction predictionmode, mode,selected selectedfrom fromthe thetested tested prediction mode prediction ofeach mode of eachcandidate candidateCB. CB.TheThe candidate candidate prediction prediction modes modes needneed not include not include every every
conceivableprediction conceivable prediction mode modesupported supported byby thevideo the video encoder encoder 542. 542. Intra Intra prediction prediction fallsinto falls into three types, first, “DC intra prediction”, which involves populating a PB with a single value 2024203901
three types, first, "DC intra prediction", which involves populating a PB with a single value
representing the representing the average of nearby average of reconstructed samples; nearby reconstructed samples;second, second,"planar “planarintra intraprediction", prediction”, whichinvolves which involvespopulating populatinga aPBPBwith withsamples samples according according to to a plane,with a plane, witha aDCDC offset offset andand a a vertical and vertical and horizontal horizontal gradient gradientbeing being derived derived from from nearby reconstructed neighbouring nearby reconstructed neighbouringsamples. samples. Thenearby The nearbyreconstructed reconstructedsamples samplestypically typicallyinclude includea arow rowofofreconstructed reconstructedsamples samples above above thethe
current PB, current extending to PB, extending to the the right right of ofthe thePB PB to toan anextent extentand anda acolumn column of of reconstructed reconstructed samples samples
to the left of the current PB, extending downwards beyond the PB to an extent; and, third, to the left of the current PB, extending downwards beyond the PB to an extent; and, third,
“angular intra "angular intra prediction”, prediction",which which involves involves populating a PB populating a withreconstructed PB with reconstructedneighbouring neighbouring samples filtered and propagated across the PB in a particular direction (or ‘angle’). In VVC, samples filtered and propagated across the PB in a particular direction (or 'angle'). In VVC,
sixty-five (65) angles are supported, with rectangular blocks able to utilise additional angles, sixty-five (65) angles are supported, with rectangular blocks able to utilise additional angles,
not available to square blocks, to produce a total of eighty-seven (87) angles. not available to square blocks, to produce a total of eighty-seven (87) angles.
[000156]
[000156] AAfourth fourthtype typeofofintra intra prediction prediction is isavailable availabletoto chroma chroma PBs, PBs, whereby the PB whereby the PBisis generated from generated fromcollocated collocatedluma lumareconstructed reconstructedsamples samples according according to to a ‘cross-component a 'cross-component linear linear
model’(CCLM) model' (CCLM) mode. mode. ThreeThree different different CCLM CCLM modes modes are are available, available, each each mode mode using a using a different model different derived from model derived fromthe the neighbouring neighbouringluma luma and and chroma chroma samples. samples. The derived The derived modelmodel
is used is used to to generate generate aablock block of ofsamples samples for forthe thechroma chroma PB fromthe PB from thecollocated collocatedluma lumasamples. samples. Lumablocks Luma blocksmay may be be intrapredicted intra predictedusing usinga amatrix matrixmultiplication multiplicationofofthe thereference referencesamples samples using one using one matrix matrix selected selected from fromaa predefined predefinedset set of of matrices. This matrix matrices. This matrix intra intra prediction prediction (MIP) (MIP)
achieves gain by using matrices trained on a large set of video data, with the matrices achieves gain by using matrices trained on a large set of video data, with the matrices
representing relationships between reference samples and a predicted block that are not easily representing relationships between reference samples and a predicted block that are not easily
captured in angular, planar, or DC intra prediction modes. captured in angular, planar, or DC intra prediction modes.
[000157] Themodule
[000157] The module864864 maymay alsoalso produce produce a prediction a prediction unitunit by copying by copying a block a block fromfrom nearby nearby
the current the current frame frame using using an an ‘intra 'intrablock block copy’ copy' (IBC) (IBC) method. Thelocation method. The locationofofthe thereference reference block is block is constrained constrained to to an an area areaequivalent equivalentto toone oneCTU, divided into CTU, divided into 64x64 regionsknown 64x64 regions knownasas
VPDUs, VPDUs, with with thearea the areacovering covering theprocessed the processed VPDUs VPDUs of current of the the current CTU CTU and VPDUs and VPDUs of the of the previous CTU(s) previous CTU(s)within withineach eachrowrow or or CTUs CTUs and and within within eacheach slice slice or tileupup or tile toto thearea the arealimit limit
44204385_1 44204385_1
49
correspondingtoto one corresponding one128x128 128×128 luma luma samples, samples, regardless regardless of the of the configured configured CTUCTU size size for for the the 07 Jun 2024
bitstream. This area is known as an ‘IBC virtual buffer’ and limits the IBC reference area, thus bitstream. This area is known as an 'IBC virtual buffer' and limits the IBC reference area, thus
limiting the required storage. The IBC buffer is populated with reconstructed samples 854 (i.e., limiting the required storage. The IBC buffer is populated with reconstructed samples 854 (i.e.,
prior to prior to loop loop filtering), filtering),andand so SO a separate buffer a separate to atoframe buffer buffer a frame 872872 buffer is needed. When is needed. Whenthe CTU the CTU
size isis128×128 size the virtual 128x128 the virtual buffer bufferincludes includessamples samples only only from the CTU from the adjacentand CTU adjacent andtotothe theleft left of the of the current current CTU. When CTU. When thethe CTUCTU sizesize is 32×32 is 32x32 or 64×64 or 64x64 the virtual the virtual buffer buffer includes includes CTUs CTUs
from up from upto to the the four four or or sixteen sixteen CTUs to the CTUs to the left leftof ofthe thecurrent currentCTU. CTU. Regardless of the Regardless of the CTU size, CTU size, 2024203901
access to access to neighbouring CTUs neighbouring CTUs forobtaining for obtainingsamples samples forfor IBC IBC reference reference blocks blocks is is constrained constrained by by
boundaries such as edges of pictures, slices, or tiles. Particularly for feature maps of FPN boundaries such as edges of pictures, slices, or tiles. Particularly for feature maps of FPN
layers having layers smaller dimensions, having smaller dimensions,use useof of aa CTU CTUsize sizesuch suchasas32x32 32×32or or 64×64 64x64 results results in in a a reference area reference area more aligned to more aligned to cover cover aa set set of of previous previous feature featuremaps. Wherefeature maps. Where featuremap map placementisis ordered placement ordered based basedononSAD, SAD,SSESSE or other or other difference difference metric, metric, access access toto similarfeature similar feature mapsfor maps for IBC IBCprediction predictionoffers offers coding codingefficient efficient advantage. advantage.
[000158]The
[000158] Theresidual residualfor for aa predicted predicted block block when whenencoding encoding featuremap feature map data data is is differenttoto the different the residual seen for natural video. Natural video is typically captured by an image sensor, or residual seen for natural video. Natural video is typically captured by an image sensor, or
screen content, as generally seen in operating system user interfaces and the like. Feature map screen content, as generally seen in operating system user interfaces and the like. Feature map
residuals tend to contain much detail. The level of detail in feature map residuals is amenable to residuals tend to contain much detail. The level of detail in feature map residuals is amenable to
transform skip transform skip coding codingmore morethan thanpredominantly predominantly low-frequency low-frequency coefficients coefficients of various of various
transforms. An transforms. Anintra-predicted intra-predictedluma lumacoding codingblock block may may be be partitioned partitioned into into a a setofofequal-sized set equal-sized prediction blocks, prediction blocks, either eithervertically verticallyoror horizontally, which horizontally, each which block each having block havinga aminimum area of minimum area of sixteen (16) sixteen (16) luma samples. luma samples.
[000159] Where
[000159] Where previously previously reconstructed reconstructed neighbouring neighbouring samples samples are unavailable, are unavailable, for for example example at at
the edge of the frame, a default half-tone value of one half the range of the samples is used. For the edge of the frame, a default half-tone value of one half the range of the samples is used. For
example,for example, for 10-bit 10-bit video a value video a value of of five-hundred and twelve five-hundred and twelve(512) (512)is is used. Asnonoprevious used. As previous samples are available for a CB located at the top-left position of a frame, angular and planar samples are available for a CB located at the top-left position of a frame, angular and planar
intra-prediction modes intra-prediction producethe modes produce thesame sameoutput outputasasthe theDCDC predictionmode prediction mode (i.e.a aflat (i.e. flat plane plane of of sampleshaving samples havingthe thehalf-tone half-tone value value as as magnitude). magnitude).
[000160] Forinter-frame
[000160] For inter-frameprediction predictionaa prediction prediction block block 882 882is is produced usingsamples produced using samplesfrom from one one
or two or frames preceding two frames precedingthe thecurrent current frame frameinin the the coding codingorder order frames framesinin the the bitstream bitstream by by aa motioncompensation motion compensation module module 880 880 and and output output as the as the PB by PB 820 820the bymultiplexer the multiplexer module module 884. 884. Moreover, for inter-frame prediction, a single coding tree is typically used for both the luma Moreover, for inter-frame prediction, a single coding tree is typically used for both the luma
channel and channel andthe the chroma chromachannels. channels.TheThe order order of of coding coding frames frames in the in the bitstream bitstream maymay differ differ from from
44204385_1 44204385_1
50
the order the order of of the theframes frames when capturedor when captured or displayed. displayed. When Whenoneone frame frame is used is used forfor prediction,the prediction, the 07 Jun 2024
block is block is said said to tobe be‘uni-predicted’ 'uni-predicted'and andhas hasone oneassociated associatedmotion motion vector. vector. When twoframes When two frames are are
used for prediction, the block is said to be ‘bi-predicted’ and has two associated motion vectors. used for prediction, the block is said to be 'bi-predicted' and has two associated motion vectors.
For aa P For P slice, slice,each eachCU maybebeintra CU may intra predicted predicted or or uni-predicted. uni-predicted. For For aa B B slice, slice,each eachCU CU may be may be
intra predicted, uni-predicted, or bi-predicted. intra predicted, uni-predicted, or bi-predicted.
[000161] Framesarearetypically
[000161] Frames typicallycoded codedusing usinga a'group ‘groupofofpictures' pictures’structure, structure, enabling enabling a a temporal temporal
hierarchy of of frames. Framesmay maybe be divided into multiple slices,each eachofofwhich whichencodes encodes a 2024203901
hierarchy frames. Frames divided into multiple slices, a
portion of portion of the the frame. A temporal frame. A temporalhierarchy hierarchyofofframes framesallows allowsa aframe frametotoreference referenceaapreceding preceding and aa subsequent and subsequentpicture picture in in the the order order of of displaying displaying the theframes. frames.The The images are coded images are in the coded in the
order necessary order to ensure necessary to the dependencies ensure the for decoding dependencies for decodingeach eachframe frameare aremet. met.An An affine affine inter inter
prediction mode prediction is available mode is available where instead of where instead of using using one one or or two two motion motionvectors vectorstotoselect select and and
filter reference sample blocks for a prediction unit, the prediction unit is divided into multiple filter reference sample blocks for a prediction unit, the prediction unit is divided into multiple
smaller blocks smaller blocks and and aa motion motionfield field is is produced so each produced SO each smaller smaller block block has has aa distinct distinct motion motion
vector. The motion field uses the motion vectors of nearby points to the prediction unit as vector. The motion field uses the motion vectors of nearby points to the prediction unit as
‘control points’.Affine 'control points'. Affine prediction prediction allows allows codingcoding of motion of motion differentdifferent to translation to translation with less with less
need to need to use use deeply split coding deeply split coding trees. trees. A A bi-prediction bi-prediction mode available to mode available to VVC performs VVC performs a a geometric blend of the two reference blocks along a selected axis, with angle and offset from geometric blend of the two reference blocks along a selected axis, with angle and offset from
the centre the centre of of the theblock blocksignalled. signalled.This Thisgeometric geometric partitioning partitioningmode (“GPM”) mode ("GPM") allows allows larger larger
coding units coding units to to be be used used along along the the boundary betweentwo boundary between two objects,with objects, withthe thegeometry geometryof of the the
boundarycoded boundary codedfor forthe thecoding codingunit unitasasan anangle angleand andcentre centreoffset. offset. Motion Motionvector vectordifferences, differences, instead of using cartesian (x, y) offset, may be coded as a direction (up/down/left/right) and a instead of using cartesian (x, y) offset, may be coded as a direction (up/down/left/right) and a
distance, with distance, with aa set setofofpower-of-two power-of-two distances distances supported. Themotion supported. The motionvector vectorpredictor predictorisis obtained from obtained fromaa neighbouring neighbouringblock block('merge (‘merge mode’) mode') as as if if nono offsetisisapplied. offset applied. The Thecurrent current block will block will share share the the same motionvector same motion vectoras as the the selected selected neighbouring block. neighbouring block.
[000162]The
[000162] Thesamples samples areselected are selectedaccording accordingtotoa amotion motionvector vector878878 and and reference reference picture picture
index. The index. Themotion motionvector vector878 878 and and reference reference pictureindex picture indexapplies appliestotoall all colour colour channels channelsand and thus inter prediction is described primarily in terms of operation upon Pus rather than PBs. The thus inter prediction is described primarily in terms of operation upon Pus rather than PBs. The
decompositionofofeach decomposition eachCTU CTU into into oneone or or more more inter-predicted inter-predicted blocks blocks is is described described with with a single a single
coding tree. coding tree. Inter Inter prediction prediction methods mayvary methods may varyininthe the number numberofofmotion motion parameters parameters andand their their
precision. Motion precision. Motionparameters parameterstypically typicallycomprise comprisea areference referenceframe frame index,indicating index, indicatingwhich which reference frame(s) from lists of reference frames are to be used plus a spatial translation for reference frame(s) from lists of reference frames are to be used plus a spatial translation for
each of each of the the reference reference frames, frames, but but may include more may include moreframes, frames,special specialframes, frames,oror complex complexaffine affine
44204385_1 44204385_1
51
parameterssuch parameters suchasasscaling scaling and and rotation. rotation. In In addition, addition, aa pre-determined pre-determined motion refinement motion refinement 07 Jun 2024
process may process maybebeapplied appliedtotogenerate generatedense densemotion motionestimates estimatesbased based onon referenced referenced sample sample blocks. blocks.
[000163] Havingdetermined
[000163] Having determined andand selected selected thethe PB PB 820820 and and subtracted subtracted the the PB 820 PB 820 from from the the
original sample block at the subtractor 822, a residual with lowest coding cost, represented original sample block at the subtractor 822, a residual with lowest coding cost, represented
as 824, as 824, is isobtained obtained and and subjected subjected to to lossy lossycompression. compression. The lossy compression The lossy compressionprocess process comprisesthe comprises the steps steps of of transformation, transformation, quantisation quantisation and and entropy coding. AAforward entropy coding. forwardprimary primary transform module module826 826 appliesa aforward forward transform to to thedifference difference824, 824,converting convertingthe the 2024203901
transform applies transform the
difference 824 difference fromthe 824 from the spatial spatial domain to the domain to the frequency domain,and frequency domain, andproducing producing primary primary
transform coefficients transform coefficients represented represented by an arrow by an 828. The arrow 828. Thelargest largestprimary primarytransform transformsize sizeininone one dimensionisis either dimension either aa 32-point 32-point DCT-2 DCT-2 ororaa64-point 64-pointDCT-2 DCT-2 transform, transform, configured configured by by a a ‘sps_max_luma_transform_size_64_flag’ in the 'sps_max_luma_transform_size_64_flag' in the sequence sequence parameter parameter set. set. If the If the CB being CB being
encoded is larger than the largest supported primary transform size expressed as a block size encoded is larger than the largest supported primary transform size expressed as a block size
(e.g., 64×64 or 32×32), the primary transform 826 is applied in a tiled manner to transform all (e.g., 64x64 or 32x32), the primary transform 826 is applied in a tiled manner to transform all
samplesof samples of the the difference difference 824. Wherea anon-square 824. Where non-squareCB CB is used, is used, tilingisisalso tiling also performed performedusing using the largest the largestavailable availabletransform transformsize sizeinin each eachdimension dimension of ofthe theCB. CB. For example,when For example, whena a maximum maximum transform transform sizesize of of thirty-two thirty-two (32)isisused, (32) used,aa64x16 64×16CBCB uses uses twotwo 32×16 32x16 primary primary
transforms arranged transforms arrangedinin aa tiled tiled manner. When manner. When a CB a CB is is largerininsize larger sizethan thanthe the maximum maximum supported transform supported transformsize, size, the the CB is filled CB is filledwith withTBs TBs in in aatiled tiledmanner. manner. For For example, example, aa 128×128 128x128
CBwith CB with64-pt 64-pttransform transformmaximum maximumsize size is filledwith is filled withfour four64x64 64×64 TBsTBs in ain2x2 a 2×2 arrangement. arrangement. A A 64×128CBCB 64x128 with with a 32-pt a 32-pt transform transform maximum maximum sizefilled size is is filled with with eight eight 32×32 32x32 TBs TBs in a in a 2x4 2×4 arrangement. arrangement.
[000164]Application
[000164] Applicationofofthe thetransform transform826 826results results in in multiple multiple TBs TBsfor for the the CB. CB.Where Where each each
application of the transform operates on a TB of the difference 824 larger than 32×32, e.g., application of the transform operates on a TB of the difference 824 larger than 32x32, e.g.,
64×64, all resulting primary transform coefficients 828 outside of the upper-left 32×32 area of 64x64, all resulting primary transform coefficients 828 outside of the upper-left 32x32 area of
the TB are set to zero (i.e., discarded). The remaining primary transform coefficients 828 are the TB are set to zero (i.e., discarded). The remaining primary transform coefficients 828 are
passed to passed to the the quantiser quantiser module 834. The module 834. Theprimary primarytransform transformcoefficients coefficients828 828are arequantised quantised according to according to the the quantisation quantisation parameter 892associated parameter 892 associated with withthe the CB CBtotoproduce produceprimary primary transform coefficients transform coefficients 832. In addition 832. In addition to to the the quantisation quantisationparameter parameter 892, 892, the the quantiser quantisermodule module
834 may 834 mayalso alsoapply applya a'scaling ‘scaling list' list’ totoallow allownon-uniform non-uniform quantisation quantisation within within the the TB by further TB by further scaling residualcoefficients scaling residual coefficients according according to their to their spatial spatial position position withinwithin the TB.the TheTB. The quantisation quantisation
parameter892 parameter 892may may differfor differ foraa luma lumaCBCB versus versus each each chroma chroma CB. CB. The primary The primary transform transform
coefficients 832 coefficients 832 are are passed passed to to aaforward forward secondary transformmodule secondary transform module830830 to to produce produce transform transform
coefficients represented coefficients represented by by the the arrow arrow 836 by performing 836 by performingeither either aa non-separable non-separablesecondary secondary
44204385_1 44204385_1
52
transform (NSST) transform (NSST)operation operation oror bypassing bypassing thethe secondary secondary transform. transform. The The forward forward primary primary 07 Jun 2024
transform 826 is typically separable, transforming a set of rows and then a set of columns of transform 826 is typically separable, transforming a set of rows and then a set of columns of
each TB. each TB.The Theforward forwardprimary primary transform transform module module 826 826 uses uses either either a type-II a type-II discrete discrete cosine cosine
transform (DCT-2) in the horizontal and vertical directions, or bypass of the transform transform (DCT-2) in the horizontal and vertical directions, or bypass of the transform
horizontally and horizontally and vertically, vertically,ororcombinations combinations of of aatype-VII type-VII discrete discretesine sinetransform transform(DST-7) (DST-7) and and a a
type-VIII discrete cosine transform (DCT-8) in either horizontal or vertical directions for luma type-VIII discrete cosine transform (DCT-8) in either horizontal or vertical directions for luma
TBsnot TBs notexceeding exceeding1616samples samples in in width width and and height.UseUse height. of of combinations combinations of aof a DST-7 DST-7 and and DCT- DCT- 2024203901
8 is referred to as ‘multi transform selection set’ (MTS) in the VVC standard. 8 is referred to as 'multi transform selection set' (MTS) in the VVC standard.
[000165] Theforward
[000165] The forward secondary secondary transform transform of of thethe module module 830 830 is generally is generally a non-separable a non-separable
transform, which transform, whichis is only only applied applied for for the the residual residualof ofintra-predicted CUs intra-predicted CUsand and may nonetheless may nonetheless
also be also be bypassed. Theforward bypassed. The forwardsecondary secondary transform transform operates operates either either on on sixteen sixteen (16)samples (16) samples (arranged as the upper-left 4×4 sub-block of the primary transform coefficients 828) or forty- (arranged as the upper-left 4x4 sub-block of the primary transform coefficients 828) or forty-
eight (48) samples (arranged as three 4×4 sub-blocks in the upper-left 8×8 coefficients of the eight (48) samples (arranged as three 4x4 sub-blocks in the upper-left 8x8 coefficients of the
primarytransform primary transformcoefficients coefficients 828) 828) to to produce produceaa set set of of secondary transformcoefficients. secondary transform coefficients. The The set of set of secondary secondary transform coefficients may transform coefficients be fewer may be fewerin in number numberthan thanthe theset setof of primary primary transform coefficients transform coefficients from whichthey from which theyare are derived. derived. Due Duetotoapplication applicationofofthe the secondary secondary transform to only a set of coefficients adjacent to each other and including the DC coefficient, transform to only a set of coefficients adjacent to each other and including the DC coefficient,
the secondary the transformisis referred secondary transform referred to to as asaa‘low 'lowfrequency frequency non-separable secondarytransform' non-separable secondary transform’ (LFNST).Such (LFNST). Such secondary secondary transforms transforms may may be obtained be obtained through through a training a training process process andtodue and due to their non-separable nature and trained origin, exploit additional redundancy in the residual their non-separable nature and trained origin, exploit additional redundancy in the residual
signal not signal not able able to tobe becaptured capturedby by separable separable transforms transforms such such as as variants variantsof ofDCT and DST. DCT and DST. Moreover,when Moreover, when theLFNST the LFNST is applied, is applied, allall remaining remaining coefficients coefficients in in theTBTB the areare zero,both zero, bothininthe the primarytransform primary transformdomain domain and and thethe secondary secondary transform transform domain. domain.
[000166] Thequantisation
[000166] The quantisationparameter parameter892892 is isconstant constantfor foraagiven givenTBTBand and thusresults thus resultsinin aa uniformscaling uniform scaling for for producing producingresidual residual coefficients coefficients in in the theprimary primary transform transform domain for aa TB. domain for TB. Thequantisation The quantisation parameter parameter892 892may may vary vary periodically periodically with with a signalled'delta a signalled ‘deltaquantisation quantisation parameter’. The parameter'. Thedelta delta quantisation quantisation parameter (delta QP) parameter (delta is signalled QP) is signalled once once for for CUs contained CUs contained
within a given area, referred to as a ‘quantisation group’. If a CU is larger than the quantisation within a given area, referred to as a 'quantisation group'. If a CU is larger than the quantisation
group size, delta QP is signalled once with one of the TBs of the CU. That is, the delta QP is group size, delta QP is signalled once with one of the TBs of the CU. That is, the delta QP is
signalled by signalled by the the entropy entropy encoder 838once encoder 838 oncefor for the the first first quantisation quantisationgroup group of ofthe theCU CU and and not not
signalled for signalled for any any subsequent quantisation groups subsequent quantisation groups of of the the CU. CU. A A non-uniform non-uniform scaling scaling is is also also
possible by application of a ‘quantisation matrix’, whereby the scaling factor applied for each possible by application of a 'quantisation matrix', whereby the scaling factor applied for each
residual coefficient residual coefficientisisderived derivedfrom fromaacombination combination of of the the quantisation quantisationparameter parameter 892 892 and the and the
44204385_1 44204385_1
53
corresponding entry in a scaling matrix. The scaling matrix may have a size that is smaller than corresponding entry in a scaling matrix. The scaling matrix may have a size that is smaller than 07 Jun 2024
the size the size of ofthe theTB, TB, and and when applied to when applied to the the TB TB aa nearest nearest neighbour approachisisused neighbour approach usedtoto provide provide scaling values for each residual coefficient from a scaling matrix smaller in size than the TB scaling values for each residual coefficient from a scaling matrix smaller in size than the TB
size. The residual coefficients 836 are supplied to the entropy encoder 838 for encoding in the size. The residual coefficients 836 are supplied to the entropy encoder 838 for encoding in the
bitstream portion 716. Typically, the residual coefficients of each TB with at least one bitstream portion 716. Typically, the residual coefficients of each TB with at least one
significant residual coefficient of the TU are scanned to produce an ordered list of values, significant residual coefficient of the TU are scanned to produce an ordered list of values,
according to according to aa scan scan pattern. pattern. The scan pattern The scan pattern generally generally scans scans the the TB as aa sequence TB as of 4x4 sequence of 4×4'sub- ‘sub- 2024203901
blocks’, providing a regular scanning operation at the granularity of 4×4 sets of residual blocks', providing a regular scanning operation at the granularity of 4x4 sets of residual
coefficients, with coefficients, withthe thearrangement arrangement of of sub-blocks sub-blocks dependent onthe dependent on thesize size of of the the TB. Thescan TB. The scan within each within each sub-block sub-blockand andthe theprogression progressionfrom fromone onesub-block sub-block to to thenext the nexttypically typicallyfollow followaa backwarddiagonal backward diagonalscan scanpattern. pattern.Additionally, Additionally,the thequantisation quantisationparameter parameter892 892 isisencoded encoded into into
the bitstream portion 716 using a delta QP syntax element, and a slice QP for the initial value in the bitstream portion 716 using a delta QP syntax element, and a slice QP for the initial value in
a given a given slice slice or orsubpicture subpictureand and the thesecondary secondary transform transform index 888 is index 888 is encoded in the encoded in the bitstream bitstream
portion 716. portion 716.
[000167] Asdescribed
[000167] As describedabove, above,the thevideo videoencoder encoder 542 542 needs needs access access to to a frame a frame representation representation
correspondingtoto the corresponding the decoded decodedframe framerepresentation representationseen seenininthe thevideo videodecoder. decoder.Thus, Thus, the the
residual coefficients residual coefficients836 836 are arepassed passed through through an an inverse inverse secondary transform module secondary transform module844, 844, operating in operating in accordance withthe accordance with the secondary secondarytransform transformindex index888 888 toto produce produce intermediate intermediate
inverse transform inverse coefficients, represented transform coefficients, represented by by an an arrow arrow 842. Theintermediate 842. The intermediateinverse inverse transform coefficients transform coefficients 842 are inverse 842 are inverse quantised quantised by by the the dequantiser dequantiser module 840according module 840 accordingtoto the quantisation the quantisation parameter 892to parameter 892 to produce producethe the inverse inverse transform transformcoefficients, coefficients, represented represented by by an an
arrow 846. arrow 846. The Thedequantiser dequantisermodule module 840840 may may also also perform perform an inverse an inverse non-uniform non-uniform scaling scaling of of residual coefficients using a scaling list, corresponding to the forward scaling performed in the residual coefficients using a scaling list, corresponding to the forward scaling performed in the
quantiser module quantiser 834.The module 834. The inverse inverse transform transform coefficients846 coefficients 846 arepassed are passed toto anan inverseprimary inverse primary transform module transform module848 848 toto produce produce residualsamples, residual samples, represented represented by by an an arrow arrow 850, 850, of of thethe TU.TU.
Theinverse The inverse primary primarytransform transformmodule module848848 applies applies DCT-2 DCT-2 transforms transforms horizontally horizontally and and vertically, constrained vertically, constrainedby bythe themaximum availabletransform maximum available transformsize sizeasasdescribed describedwith withreference referencetoto the forward the primarytransform forward primary transformmodule module 826. 826. TheThe types types of inverse of inverse transform transform performed performed by by the the inverse secondary inverse transformmodule secondary transform module 844 844 correspond correspond withwith the the types types of of forward forward transform transform
performedbybythe performed theforward forwardsecondary secondary transform transform module module 830.830. The types The types of inverse of inverse transform transform
performedbybythe performed theinverse inverseprimary primarytransform transformmodule module 848848 correspond correspond withwith the the types types of primary of primary
transform performed transform performedbybythe theprimary primarytransform transform module module 826.826. A summation A summation module module 852 adds852 theadds the residual samples residual 850and samples 850 andthe thePB PB820 820totoproduce produce reconstructed reconstructed samples samples (indicated (indicated by by an an
arrow 854) arrow 854)of of the the CU. CU.
44204385_1 44204385_1
54
[000168] Thereconstructed
[000168] The reconstructedsamples samples 854 854 areare passed passed to to a referencesample a reference sample cache cache 856856 and and an in- an in- 07 Jun 2024
loop filters loop filters module module 868. Thereference 868. The referencesample samplecache cache856, 856,typically typicallyimplemented implemented using using static static
RAM RAM on on an an ASIC ASIC to avoid to avoid costly costly off-chip off-chip memory memory access, access, provides provides minimal minimal samplesample storagestorage
neededtoto satisfy needed satisfy the the dependencies for generating dependencies for intra-frame PBs generating intra-frame for subsequent PBs for subsequentCUs CUsinin the the
frame. The frame. Theminimal minimal dependencies dependencies typically typically include include a ‘linebuffer' a 'line buffer’ofofsamples samplesalong alongthe thebottom bottom of aa row of row of of CTUs, for use CTUs, for use by by the the next next row rowof of CTUs CTUs and and column column buffering buffering the the extent extent of of which which is is set by set by the the height height of ofthe theCTU. Thereference CTU. The referencesample samplecache cache856856 supplies supplies reference reference samples samples 2024203901
(represented by an arrow 858) to a reference sample filter 860. The sample filter 860 applies a (represented by an arrow 858) to a reference sample filter 860. The sample filter 860 applies a
smoothingoperation smoothing operationtotoproduce producefiltered filtered reference reference samples samples(indicated (indicatedby byan anarrow arrow862). 862).The The filtered reference filtered referencesamples samples 862 862 are are used used by by the the intra-frame intra-frame prediction prediction module 864to module 864 to produce produceanan intra-predicted block intra-predicted block of of samples, samples, represented represented by by an an arrow 866. For arrow 866. Foreach eachcandidate candidateintra intra prediction mode prediction theintra-frame mode the intra-frame prediction prediction module module864 864produces produces a block a block of of samples, samples, that that
is 866. is 866. The blockof The block of samples samples866 866isisgenerated generatedbybythe themodule module 864 864 using using techniques techniques such such as as DC,DC,
planar or planar or angular angular intra intraprediction. prediction. The block of The block of samples samples866 866may may alsobebeproduced also produced using using a a matrix-multiplication approach matrix-multiplication approachwith withneighbouring neighbouring referencesample reference sample as as input input andand a matrix a matrix
selected from a set of matrices by the video encoder 800, with the selected matrix signalled in selected from a set of matrices by the video encoder 800, with the selected matrix signalled in
the bitstream 121 using an index to identify which matrix of the set of matrices is to be used by the bitstream 121 using an index to identify which matrix of the set of matrices is to be used by
the video the video decoder. decoder.
[000169]
[000169] TheThe in-loop in-loop filters filters module module 868 applies 868 applies several several filteringfiltering stages tostages to the reconstructed the reconstructed
samples854. samples 854.The The filteringstages filtering stages include include aa ‘deblocking filter’ (DBF) 'deblocking filter' whichapplies (DBF) which applies smoothing smoothing aligned to aligned to the the CU boundariestoto reduce CU boundaries reduceartefacts artefacts resulting resulting from from discontinuities. discontinuities. Another Another
filtering stage present in the in-loop filters module 868 is an ‘adaptive loop filter’ (ALF), which filtering stage present in the in-loop filters module 868 is an 'adaptive loop filter' (ALF), which
applies a Wiener-based adaptive filter to further reduce distortion. A further available filtering applies a Wiener-based adaptive filter to further reduce distortion. A further available filtering
stage in stage in the the in-loop in-loopfilters filtersmodule module868 868isisa ‘sample a 'sampleadaptive adaptiveoffset’ offset'(SAO) (SAO)filter. filter.The TheSAO SAO
filter operates by firstly classifying reconstructed samples into one or multiple categories and, filter operates by firstly classifying reconstructed samples into one or multiple categories and,
according to the allocated category, applying an offset at the sample level. according to the allocated category, applying an offset at the sample level.
[000170] Filtered
[000170] Filtered samples, samples, represented represented by an870, by an arrow arrow are 870, outputare output from from the the in-loop in-loop filters filters
module868. module 868.The Thefiltered filtered samples samples870 870are arestored storedininaa frame framebuffer buffer 872. 872. The Theframe frame buffer872872 buffer
typically has the capacity to store several (e.g., up to sixteen (16)) pictures and thus is stored in typically has the capacity to store several (e.g., up to sixteen (16)) pictures and thus is stored in
the memory the 206.TheThe memory 206. frame frame buffer buffer 872872 is not is not typically typically storedusing stored usingon-chip on-chip memory memory due due to to the the large memory large consumption memory consumption required. required. As such, As such, access access to the to the frame frame buffer buffer 872 872 is costly is costly in in terms terms
of memory of bandwidth. memory bandwidth. The The frame frame buffer buffer 872 872 provides provides reference reference frames frames (represented (represented by anby an arrow 874) arrow 874)to to aa motion estimationmodule motion estimation module 876 876 andand thethe motion motion compensation compensation module module 880. 880. The The
44204385_1 44204385_1
55
reference frames reference frames 874 874are are output output as as aa reconstructed frame 718 reconstructed frame 718ofofthe the encoder encodermodule module 542. 542. In In thethe 07 Jun 2024
example of Fig. 8, the reconstructed frame is a result of operation of lossy VVC encoding, that example of Fig. 8, the reconstructed frame is a result of operation of lossy VVC encoding, that
is due to operation of the modules 810 to 890. is due to operation of the modules 810 to 890.
[000171] Themotion
[000171] The motion estimation estimation module module 876 876 estimates estimates a number a number of ‘motion of 'motion vectors’ vectors' (indicated (indicated
as 878), each being a Cartesian spatial offset from the location of the present CB, referencing a as 878), each being a Cartesian spatial offset from the location of the present CB, referencing a
block in one of the reference frames in the frame buffer 872. A filtered block of reference block in one of the reference frames in the frame buffer 872. A filtered block of reference
samples(represented (representedas as 882) 882)is is produced for each eachmotion motionvector. vector.The The filteredreference reference 2024203901
samples produced for filtered
samples882 samples 882form formfurther furthercandidate candidatemodes modes available available forpotential for potentialselection selection by bythe the mode mode selector 886. selector Moreover,for 886. Moreover, foraa given givenCU, CU,the thePUPU820820 maymay be formed be formed using using one reference one reference blockblock
(‘uni-predicted’) or ("uni-predicted') or may be formed may be usingtwo formed using tworeference referenceblocks blocks('bi-predicted'). (‘bi-predicted’). For Forthe the selected motion selected vector, the motion vector, the motion compensationmodule motion compensation module 880880 produces produces the the PB in PB 820 820 in accordancewith accordance withaafiltering filtering process process supportive supportive of of sub-pixel sub-pixel accuracy accuracy in in the the motion motion vectors. vectors. As As
such, the such, the motion estimation module motion estimation module876 876(which (which operates operates on on many many candidate candidate motion motion vectors) vectors)
mayperform may performa asimplified simplifiedfiltering filtering process process compared compared totothat that of of the the motion compensation motion compensation
module880 module 880(which (which operates operates on on thethe selectedcandidate selected candidateonly) only)totoachieve achievereduced reduced computational computational
complexity.When complexity. Whenthethe video video encoder encoder 542 542 selects selects inter inter predictionforfora aCUCU prediction thethe motion motion
vector 878 is encoded into the bitstream portion 121. vector 878 is encoded into the bitstream portion 121.
[000172]Although
[000172] Although thevideo the videoencoder encoder 542542 of of Fig. Fig. 8 isdescribed 8 is describedwith withreference referencetotoversatile versatile video coding video coding(VVC), (VVC), othervideo other videocoding coding standards standards or or implementations implementations may may also also employ employ the the processing stages processing stages of of modules 810-890.TheThe modules 810-890. frame frame datadata 712712 (and(and bitstream bitstream 716)716) may may also also be be TM read from read (or written from (or written to) to) memory 206,the memory 206, thehard harddisk diskdrive drive 210, 210, aa CD-ROM, CD-ROM, a Blu-ray a Blu-ray diskor diskTM or other computer other readablestorage computer readable storagemedium. medium. Additionally, Additionally, thethe frame frame data data 712712 (and (and bitstream bitstream 716) 716)
may be received from (or transmitted to) an external source, such as a server connected to the may be received from (or transmitted to) an external source, such as a server connected to the
communications communications network network 220220 or aorradio-frequency a radio-frequency receiver. receiver. The The communications communications network network 220 220 may provide limited bandwidth, necessitating the use of rate control in the video encoder 120 to may provide limited bandwidth, necessitating the use of rate control in the video encoder 120 to
avoid saturating the network at times when the frame data 712 is difficult to compress. avoid saturating the network at times when the frame data 712 is difficult to compress.
[000173]The
[000173] Thebitstream bitstream716 716may may be be constructed constructed from from one one or more or more slices, slices, representing representing spatial spatial
sections (collections sections (collectionsof ofCTUs) of the CTUs) of the frame frame data data 712, 712, produced byone produced by oneorormore moreinstances instancesofofthe the video encoder video encoder542, 542,each eachproducing producingthethebitstream bitstreamportion portion716 716and andoperating operating inin a aco-ordinated co-ordinated mannerunder manner undercontrol controlofofthe theprocessor processor205. 205.The The bitstream bitstream portion716716 portion may may also also contain contain oneone
slice that corresponds to one region to be output as a collection of subpictures forming one slice that corresponds to one region to be output as a collection of subpictures forming one
44204385_1 44204385_1
56
picture, each picture, each being being independently encodableand independently encodable andindependently independently decodable decodable with with respect respect to to anyany 07 Jun 2024
of the other slices or subpictures in the picture. of the other slices or subpictures in the picture.
[000174]Figs.
[000174] Figs. 9A 9A& &9B9B areare schematic schematic block block diagrams diagrams showing showing an arrangement an arrangement for holding for holding or or packingcompressed packing compressed featuremap feature map data data from from compressed compressed tensor tensor data. data. A feature A feature map,map,
corresponding to one channel of a tensor, is packed or stored in rectangular area of the frame. corresponding to one channel of a tensor, is packed or stored in rectangular area of the frame.
The feature maps of each channel are packed typically in a left-to-right manner firstly and top- The feature maps of each channel are packed typically in a left-to-right manner firstly and top-
to-bottom mannersecondly secondly fillingthe the frame framewidth widthininthe theorder order of of incrementing incrementingchannel channelcount. count. 2024203901
to-bottom manner filling
Fig. 9A Fig. showsa aframe 9A shows frame900 900that thatcontains containsa aregion region910 910ininwhich whichfeature featuremaps mapsofof a a tensorare tensor aretoto be packed. be packed. Frames Frames containing containing featuremaps feature maps packed packed ontoonto the the area area of of thethe frame frame maymay be referred be referred
to as to as “feature "featureframes”. frames". The size of The size of the the frame frame 900 maybebespecified 900 may specifiedin in terms terms of of width width and andheight height in units in units of ofsamples, samples, smallest smallestCU width/height, or CU width/height, or CTU width/height.Fig. CTU width/height. Fig.9B9B shows shows thethe
frame 900b frame 900bwhich whichcorresponds corresponds to to theframe the frame 900900 once once feature feature maps, maps, i.e.,feature i.e., featuremaps mapsobtained obtained from the from the tensor tensor 532, 532, are are packed. Wherethethetensor packed. Where tensorcompressor compressor530530 waswas configured configured to perform to perform
the feature reduction network topology described with reference to Fig. 6, the tensor 532 the feature reduction network topology described with reference to Fig. 6, the tensor 532
contains feature contains feature maps correspondingtotothe maps corresponding theP5 P5layer, layer, such such as as aa feature feature map 930. map 930.
[000175] Fig. 12
[000175] Fig. 12 is is aa schematic block diagram schematic block diagram1200 1200showing showing an an example example implementation implementation of theof the
tensor decoder tensor 146. In decoder 146. In the the example of Fig. example of Fig. 12, 12, the the tensor tensor decoder decoder 146 includes aa configurable 146 includes configurable
tensor decompressor tensor 1250 decompressor 1250 and and a selectablepicture a selectable picturedecoder decoder1204. 1204.Fig. Fig. 1919 shows shows a method a method 19001900
for decoding a bitstream, including reconstructing tensors according to an indicated tensor for decoding a bitstream, including reconstructing tensors according to an indicated tensor
decompressor,and decompressor, andperforming performing a second a second portion portion of of thethe CNN. CNN. In the In the example example described, described, the the method1900 method 1900isisconfigured configuredfor fordecoding decodinganan FCM FCM bitstream bitstream where where the inner the inner coding coding is performed is performed
using one using one of of several several compression standards,each compression standards, eachofofwhich whichhas hasa adifferent different NAL NAL unitformat, unit format, affecting signalling of metadata outside the scope of the inner coding stage. affecting signalling of metadata outside the scope of the inner coding stage.
[000176]The
[000176] Thetensor tensordecoder decoder1200 1200 (146) (146) andand thethe method method 19001900 may may be implemented be implemented as one as or one or moresoftware more softwareapplication applicationprograms programs233233 executable executable within within thethe computer computer system system 200.200. The The tensor decoder tensor 146and decoder 146 andthe themethod method1900 1900 maymay be effected be effected by by instructions instructions 231231 (see (see Fig. Fig. 2B)2B) in in
the software the software 233 that are 233 that are carried carriedout outwithin withinthe thecomputer computer system 200. The system 200. Thesoftware software instructions 231 instructions 231 may beformed may be formedasasone oneorormore more code code modules, modules, each each for for performing performing one one or more or more
particular tasks. particular tasks. The The method 1900begins method 1900 beginsatataadecode decodecodec codec identifierNAL identifier NAL unit unit step1902. step 1902.
[000177]AtAtthe
[000177] thestep step 1902, 1902,aa NAL NAL unitdemultiplexor unit demultiplexor 1202 1202 passes passes NALNAL unitsunits 1207 1207 received received in in the bitstream the bitstream 143 to aa metadata 143 to parser 1208, metadata parser 1208, under executionof under execution of the the processor processor 205, 205, to to decode decode
received NAL received NAL units.AtAtthis units. this stage, stage, the the specific specificinner innercodec codec isisnot notknown known and and so SO the the only only NAL NAL
44204385_1 44204385_1
57
unit format unit format that that can can be be unambiguously decoded unambiguously decoded is is theinner the innercodec codecidentifier identifier NAL NAL unitasas unit 07 Jun 2024
described with described with reference reference to to Appendix Appendix D.D.InInparticular, particular, an an inner_codec_identifier syntax element inner_codec_identifier syntax element is decoded is frominner decoded from inner codec codecidentifier identifier (ICI) (ICI) 1110 at step 1110 at step 1902. 1902. For example,ifif AVC For example, was AVC was
selected as selected as the the inner innercodec codec at atstep step1802, 1802,step step1902 1902operates operatestotodecode decode aaNAL unit from NAL unit fromthe the bitstream having bitstream having aa predetermined predeterminedlength. length.As Asdescribed describedininrelation relation to to Appendix Appendix D,D,the theNAL NAL unit unit
of the of the predetermined length indicates predetermined length indicates aa NAL unitformat NAL unit formatofofone oneinner innercodec codecofofaaplurality plurality of of
inner codecs. inner codecs. Each other inner Each other inner codec (such as codec (such as HEVC, HEVC, VVCVVC or ‘custom’) or 'custom') has ahas NALa NAL unit length unit length 2024203901
different to the predetermined length. The bitstream includes a plurality of NAL units and the different to the predetermined length. The bitstream includes a plurality of NAL units and the
decodedNAL decoded NAL unit unit of of predetermined predetermined length length forfor identifying identifying a particularinner a particular innercodec codecisis the the NAL NAL unit header unit header 1012. 1012.
[000178] Controlininthe
[000178] Control the processor processor 205 205progresses progressesfrom fromthe thestep step1902 1902totoaaselect select inner inner codec codec
step 1904. step 1904.
[000179] Atthe
[000179] At thestep step 1904, 1904, the the tensor tensor decoder 146, under decoder 146, underexecution executionofofthe the processor processor205, 205, selects one selects one inner inner codec codec from from aa plurality plurality of ofinner innercodecs codecsbased based on on the the decoded NALunit decoded NAL unitofof the the predeterminedlength predetermined lengthdecoded decodedatatstep step1902. 1902.The Theinner innercodec, codec,i.e., i.e., the the compression standardtoto compression standard
be performed be performedbybythe thepicture picture decoder decoder1204, 1204,isis determined determinedfrom fromthetheinner_codec_identifier inner_codec_identifier syntax element syntax elementdecoded decodedatatthe thestep step 1902. 1902.Control Controlininthe theprocessor processor205 205progresses progressesfrom from the the
step 1902 step to aa decode 1902 to FCM decode FCM VMPS VMPS step step 1906.1906.
[000180] Atthe
[000180] At thestep step 1906, 1906, the the NAL NAL unitdemultiplexor unit demultiplexor 1202, 1202, configured configured to to parse parse NALNAL unit unit
headers in headers in accordance withthe accordance with the selected selected inner inner codec, passes the codec, passes the FCM VMPS FCM VMPS 1112 1112 to to the the metadataparser metadata parser 1208. 1208.The Thedemultiplexor demultiplexor 1202 1202 is able is able to to distinguishNAL distinguish NAL units units forfor thethe metadata parser metadata parser(FCM (FCM VMPS, FCM VMPS, FCM SPS, SPS, andFCM and FCM PPS) PPS) from from NALNAL units units forfor thethepicture picture decoder1204 decoder 1204based basedononthe theal_unit_type nal_unit_type enumerations enumerations described described withwith reference reference to Appendices to Appendices
A-C.The A-C. Themetadata metadata parser parser 1208 1208 decodes decodes the the FCM FCM VMPS VMPS 1112 in 1112 in accordance accordance with the with the syntax syntax structure shown structure in Appendix shown in AppendixE Etotoproduce produce visionmodel vision model parameters parameters (output_picture_width (output_picture_width and and output_picture_heightin output_picture_height in the the example ofAppendix example of AppendixE),E), which which areare passed passed to to theCNNCNN the headhead 150. 150.
Thevision The vision model modelparameters parametersproduced produced at at step1906 step 1906 correspond correspond to the to the parameters parameters 113a 113a of Fig. of Fig.
1. 1. Vision Vision model parametersmay model parameters may include include items items such such as as thedimensions the dimensions of of thethe frame frame data data 113, 113,
neededfor needed for bounding boundingboxes boxestoto bebe scaledcorrectly. scaled correctly. Control Controlininthe theprocessor processor205 205progresses progressesfrom from the step the step 1906 to aa decode 1906 to FCM decode FCM SPS SPS step step 1907. 1907.
44204385_1 44204385_1
58
[000181] Atthe
[000181] At thestep step 1907, 1907, the the metadata metadataparser parser 1208 1208parses parsesthe theFCM FCMSPSSPS 11141114 received received from from 07 Jun 2024
the bitstream 143 via the demultiplexor 1202 to obtain tensor information relating to the bitstream 143 via the demultiplexor 1202 to obtain tensor information relating to
dimensionality of dimensionality of compressed compressedtensors tensorsand andplacement placement of of featuremaps feature maps as as packing packing information information for for
each tensor each tensor in in the the bitstream bitstream 143. 143. The FCM The FCM SPSSPS 1114 1114 is parsed is parsed to obtain to obtain thethe information information
encodedatat step encoded step 18100. 18100.The TheFCM FCMSPSSPS 11141114 is parsed is parsed at step at step 1907 1907 in accordance in accordance withwith the syntax the syntax
structure and structure and semantics described with semantics described with reference reference to to Appendix Appendix E,E,for forexample. example.Control Control in in the the
processor 205 processor 205progresses progressesfrom fromthe thestep step1907 1907totoaadecode decodeFCM FCMPPS PPS stepstep 1908. 1908. 2024203901
[000182]AtAtthe
[000182] thestep step 1908, 1908,the the metadata metadataparser parser1208 1208parses parsesthe theFCM FCMPPSPPS 11161116 if received if received fromfrom
the bitstream the bitstream 143 via the 143 via the demultiplexor 1202. The demultiplexor 1202. TheFCM FCM PPS PPS 1116 1116 is parsed is parsed to obtain to obtain
information encoded information encodedatatstep step 18110. 18110.The Thestep step1908 1908operated operated toto decode decode andand parse parse information information in in accordancewith accordance withthe thesyntax syntaxstructure structure and and semantics semanticsdescribed describedwith withreference referencetotoAppendix AppendixE, E, for for
example.For example. Forexample, example,thethe FCM FCM PPS PPS 1116 1116 includes includes information information relating relating to quantisation to quantisation ranges ranges
in elements in qr_min_exp,qr_min_exp_sign, elements qr_min_exp, qr_min_exp_sign, qr_min_mantissa, qr_min_mantissa, qr_min_mantissa_sign, qr_min_mantissa_sign,
qr_max_exp,qr_max_exp_sign, qr_max_exp, qr_max_exp_sign, qr_max_mantissa, qr_max_mantissa, qr_max_mantissa_sign) qr_max_mantissa_sign) in the example in the example of of AppendixE.E.Control Appendix Controlininthe theprocessor processor205 205progresses progressesfrom from thestep the step1908 1908to to a adetermine determine tensor tensor
decompressorstep decompressor step1910. 1910.
[000183] Atthe
[000183] At thestep step 1910, 1910, the the tensor tensor decoder 146determines decoder 146 determinesa adecoder decodernetwork network topology topology to be to be
used for used for restoring restoring dimensionality dimensionality of of compressed tensorsto compressed tensors to aa dimensionality compatiblefor dimensionality compatible foruse use as input as input to to the theCNN head150. CNN head 150.The The metadata metadata parser parser 1208, 1208, under under execution execution of the of the
processor 205, processor 205, decodes decodesinformation informationrepresenting representingthe thedecoder decodertopology topology from from thethe FCMFCM SPS SPS 1114 1114 as aa full as fulldecoder decoder network network topology. Alternatively, the topology. Alternatively, the metadata metadata parser parser 1208 decodesaareference 1208 decodes reference to aa description to description of ofthe thedecoder decodernetwork network topology, topology, which mayhave which may have been been previously previously included included in in the bitstream the bitstream 143 or may 143 or havebeen may have beenobtained obtainedvia viaexternal externalmeans means (e.g.,downloaded (e.g., downloaded from from an an internet connection). internet connection). The decodernetwork The decoder networktopology topology maymay be encoded be encoded usingusing formats formats such such as as ONNX, ONNX, NNEX, NNEX, or Pytorch or Pytorch code.code. The decoded The decoded networknetwork topologytopology maya include may include format a format indication, signalling which format is in use. For a given format, multiple versions may be indication, signalling which format is in use. For a given format, multiple versions may be
defined (or defined (or new versions may new versions maybebecreated createdininfuture) future) and and SO so aa format format version version indicator indicator may also may also
be included be included in in the the decoded networktopology. decoded network topology.Formats Formats maymay be textual be textual in nature in nature andand thus thus useuse of of an optional an optional compression stage, using compression stage, using aa technique techniquesuch suchasas ZIP ZIPororLZMA, LZMA,maymay be signalled be signalled to to minimisethe minimise thestorage storageoverhead overheadofofthe thedecoder decodernetwork network topology topology in in thebitstream the bitstream143. 143.
[000184]AsAsseen
[000184] seenininFig. Fig. 12, 12, aa tensor tensor decompressor information1230 decompressor information 1230 is is outputbybythe output theparser parser 1208. Thetensor 1208. The tensor decompressor decompressorinformation information 1230 1230 specifies specifies a decoder a decoder network network topology topology either either
44204385_1 44204385_1
59
by reference by reference or or by by structure. structure. If Ifthe thetensor tensordecompressor decompressor information 1230specifies information 1230 specifies the the decoder decoder 07 Jun 2024
networktopology network topologybybyreference, reference,the thestructure structure of of the the decoder networktopology decoder network topologyisisobtained obtainedfrom from a tensor a tensor decompressor repository1232 decompressor repository 1232(if (if available) available) or or the the repository repository1232 1232 obtains obtains the the topology topology
from the from the tensor tensor codec repository 180. codec repository 180. The Thedecoder decodernetwork network topology topology may may be retained be retained in the in the
repository 1232 repository for future 1232 for future use use even even after after aadifferent differentdecoder decodernetwork network topology is used. topology is used. The The
repository 1232 repository outputs information 1232 outputs information1238 1238corresponding correspondingto to thedecoder the decoder topology topology to to thethe tensor tensor
decompressor1250. decompressor 1250.TheThe decoder decoder network network topology topology may may be be retained retained in theinrepository the repository 1232 1232 for for 2024203901
future use future use even even after after aasubsequent subsequent ICI ICI (e.g., (e.g.,1110) 1110)isis received, indicating received, thethe indicating commencement of commencement of
a new a bitstream. IfIf the new bitstream. the decoder networktopology decoder network topologyrequires requiresweights, weights,weight weightinformation information 1234 1234 is is providedfrom provided fromthe themetadata metadataparser parser1208 1208totoa atensor tensorweights weightsrepository repository1236 1236totospecify specifyweights weights either by either by reference reference or or by by value. value.The The repository repository 1236 1236 output output information 1240corresponding information 1240 correspondingtoto the weights the information. AAgiven weights information. givenfeature featurerestoration restoration network networktopology topologymay may allow allow specific specific
dimensionsofofthe dimensions the input input and and output output tensors tensors to to be be changed at runtime, changed at runtime, sometimes sometimesreferred referredtoto as as ‘dynamic axes’. Such 'dynamic axes'. Suchdynamic dynamic axes axes maymay correspond correspond to width to the the width and height and height of feature of feature mapsmaps of of
the tensors the tensors being being compressed orrestored. compressed or restored. Control Controlininthe the processor processor 205 205progresses progressesfrom fromthe the step 1910 step to aa decode 1910 to complexityindication decode complexity indicationstep step 1920. 1920.
[000185] Atthe
[000185] At thestep step 1920, 1920, the the metadata metadataparser parser 1208, 1208,under underexecution executionofofthe theprocessor processor205, 205, decodesone decodes oneorormore moresyntax syntaxelements elements indicatingthe indicating theworst-case worst-casecomplexity complexity of of anyany decoder decoder
networktopology network topologythat thatwill will be be implemented implemented inin thetensor the tensordecompressor decompressor 1250 1250 as the as the complexity complexity
indication from indication the FCM from the SPS FCM SPS 1114. 1114. Worst-case Worst-case complexity complexity includes includes one one or or of more more theof the worst worst
case in case in terms terms of of storage storage of ofintermediate intermediate tensors tensorswithin withinthe thetensor tensordecompressor decompressor 1250, worst- 1250, worst-
case number case number ofofMAC MAC operations operations to be to be performed performed by tensor by the the tensor decompressor decompressor 1250 1250 and and worst worst case floating-point case floating-point operations operations of of any any kind kind to tobe beperformed performed in in the the tensor tensordecompressor 1250. decompressor 1250.
Control in Control in the the processor processor 205 progresses from 205 progresses fromthe the step step 1920 1920toto aa determine determinetensor tensor decompressor decompressor complexitystep complexity step 1930. 1930.
[000186] Atthe
[000186] At thestep step 1930, 1930, the the tensor tensor decoder 146, under decoder 146, underexecution executionofofthe the processor processor205, 205, determinesthe determines the required required complexity complexitytotoperform performthe thedetermined determined decoder decoder network network topology. topology. An An operation count operation count is is produced byperforming produced by performinga atraversal traversal of of the the stages stages defined defined in in the thedetermined determined
decodernetwork decoder networktopology topology and and counting counting operations operations implied implied by by each each stage stage without without performing performing
the stage. Dimensionality of any persistent tensor data (i.e., tensors retained from one the stage. Dimensionality of any persistent tensor data (i.e., tensors retained from one
invocation of invocation of the the tensor tensor decompressor 1250totothe decompressor 1250 thenext nextinvocation), invocation), is is recorded. Thevolume recorded. The volumeofof intermediate tensor intermediate tensor data data is isretained, retained,such suchthat thethe that maximum amountofofintermediate maximum amount intermediatetensor tensordata data
44204385_1 44204385_1
60
concurrently retained concurrently retained in in the the memory 206isisdetermined. memory 206 determined.Control Control in in theprocessor the processor205 205 07 Jun 2024
progresses from progresses fromthe the step step 1930 1930to to aa complexity complexitytest test step step 1950. 1950.
[000187] Atthe
[000187] At thestep step 1950, 1950,the the tensor tensor decoder 146, under decoder 146, underexecution executionofofthe theprocessor processor205, 205, comparesthe compares thedecoder decodernetwork network topology topology complexity complexity determined determined at the at the stepstep 19301930 withwith the the complexity indication decoded at the step 1920. If the determined complexity is less than or complexity indication decoded at the step 1920. If the determined complexity is less than or
equal to equal to the the complexity indication (“OK” complexity indication at step ("OK" at step 1950) 1950)control control in in the the processor processor 205 progresses 205 progresses
from the the step step 1950 to an an instantiate instantiate tensor tensordecompressor step 1970. If the thedetermined 2024203901
from 1950 to decompressor step 1970. If determined
complexityisis greater complexity greater than than the the complexity indication (“NOT complexity indication OK” ("NOT OK" at at step step 1950) 1950) control control in in the the
processor 205 processor 205progresses progressesfrom fromthe thestep step1950 1950totoananerror error condition condition step step 1960. 1960.
[000188] Atthe
[000188] At thestep step 1960, 1960, the the tensor tensor decoder 146enters decoder 146 enters an an error error state state and and decoding cannot decoding cannot
continue due continue due to to the the possibility possibilityof ofa asignalled decoder signalled decodernetwork network topology topology exceeding the exceeding the
capabilities ofofthe capabilities thetensor decompressor tensor decompressor 1250. Themethod 1250. The method 1900 1900 terminates. terminates.
[000189] Atthe
[000189] At thestep step 1970, 1970, the the tensor tensor decompressor 1250 decompressor 1250 is isinitialized initialized in in accordance with the accordance with the decodernetwork decoder networktopology topologyasas determined determined at at thethestep step1910. 1910.TheThe step step 1970 1970 is is performed performed onlyonly
whena anew when newdecoder decoder network network topology topology was was determined determined at step at the the step 1910, 1910, i.e.i.e. subsequent subsequent
invocations of invocations of the the method 1900for method 1900 forwhich whichnononew new decoder decoder network network topology topology is determined is determined may may reuse resources allocated at the step 1970. Sufficient storage memory is allocated to hold any reuse resources allocated at the step 1970. Sufficient storage memory is allocated to hold any
persistent tensors, (i.e., tensors retained from one invocation of the method 1900 to the next persistent tensors, (i.e., tensors retained from one invocation of the method 1900 to the next
invocation of invocation of the the method 1900),along method 1900), alongwith withmemory memory to hold to hold thethe maximum maximum concurrently concurrently used used intermediate tensors intermediate tensors in in performing the decoder performing the networktopology. decoder network topology.In Inthethecase casewhere where hardware hardware
acceleration is applied for the decoder network topology, reservation of sufficient execution acceleration is applied for the decoder network topology, reservation of sufficient execution
units, such units, such as as MACs, DSP MACs, DSP blocks, blocks, e.g.,inin an e.g., an FPGA, FPGA, may may also also take take place. place. In In thethe case case where where thethe
decodernetwork decoder networktopology topologyisistotobe beperformed performedininsoftware, software,sufficient sufficient execution executiontime timeon onavailable available resources such resources such as as CPU CPUororGPU GPUis is reserved reserved to to enable enable real-timeoperation real-time operationofofthe themethod method 1900 1900
(i.e., sufficient (i.e., to allow sufficient repeated to allow invocations repeated of the invocations method of the 1900 method 1900asas incoming incoming packed packed frames frames
are decoded without accumulated stalling, jitter, buffering delay). Control in the processor 205 are decoded without accumulated stalling, jitter, buffering delay). Control in the processor 205
progresses from progresses fromthe the step step 1970 1970toto aa decode decodepacked packedframe frame step1980. step 1980.
[000190]AtAtthe
[000190] thestep step 1980, 1980,the the picture picture decoder 1204,under decoder 1204, underexecution executionofofthe theprocessor processor205, 205, decodesone decodes onepacked packedframe frame from from thethe bitstream bitstream 1206 1206 to to produce produce a decoded a decoded frame frame 1210,1210, provided provided
fcm_sps_inner_decoding_bypass_flag fcm_sps_inner_decoding_bypass_flag was was set set to disabled to disabled (i.e.,zero, (i.e., zero, or or do do not not bypass bypassthe the inner inner decoding step). The decoding step). Thebitstream bitstream1206 1206includes includesNAL NAL units units from from the the bitstream bitstream 143143 having having NAL NAL
44204385_1 44204385_1
61
unit types unit types that thatare arenot notallocated asas allocated FCM FCM VMPS, FCM VMPS, FCM SPS,SPS, or FCM or FCM PPS. PPS. Due to Due to operation operation of of 07 Jun 2024
step 1890 step at the 1890 at the encoding stage, the encoding stage, the step step1980 1980 executes executes to to produce one or produce one or more integer tensors more integer tensors from the bitstream. The step 1980 operates to decode the bitstream using the selected inner from the bitstream. The step 1980 operates to decode the bitstream using the selected inner
codec of codec of step step 1904 to produce 1904 to producethe the tensors tensors to to be be provided to the provided to the neural neural network head (second network head (second portion) 150. Operation of the picture decoder 1204 is described with reference to Fig. 13. portion) 150. Operation of the picture decoder 1204 is described with reference to Fig. 13.
Control in Control in the the processor processor 205 progresses from 205 progresses fromthe the step step 1980 1980toto an an unpack unpacktensors tensorsstep step 1990. 1990.
[000191] Atthe thestep step 1990, 1990, an an unpacker unpacker1214, 1214,under under execution of of theprocessor processor205, 205,reads reads 2024203901
[000191] At execution the
feature feature maps fromthe maps from thedecoded decodedframe frame 1210 1210 in in accordance accordance with with the the packing packing format format as as
determinedatat the determined the step step 1906 in decoding 1906 in the FCM decoding the FCM VMPS VMPS 1112 1112 and described and described with reference with reference to to Figs. 9A Figs. 9A &&9B. 9B.For Foreach each tensor,a anumber tensor, numberof of featuremaps feature maps areare decoded, decoded, thethe number number
corresponding to the number of used channels in the tensor as signalled in the tensor corresponding to the number of used channels in the tensor as signalled in the tensor
information. Aspects information. Aspectsofofthe thetensor tensor information informationused usedfor forpacking packingand andunpacking unpackingareare reduced- reduced-
domaintensor domain tensordimensionality dimensionalityand andplacement placement of of each each reduced-domain reduced-domain tensor tensor in the in the decoded decoded
frame 1210. frame 1210.The Thechannels channels forfor each each tensorare tensor areunpacked unpackedas as two-dimensional two-dimensional feature feature maps. maps. The The numberofoffeature number featuremaps mapsororchannels channelstotodecode decodeforfora agiven giventensor tensorisis decoded decodedfrom fromthethe bitstream 143 bitstream 143 as as aa ‘channel count’. The 'channel count'. Theunpacker unpacker1214 1214 outputs outputs integertensors integer tensors1216, 1216,where where thethe
tensors 1216 tensors havebeen 1216 have beendecoded decoded using using thedecoder the decoder topology topology forfor thethe tensordecoder tensor decoder 146. 146. DueDue to to use of use of the the video video decoder 1204, the decoder 1204, the tensors tensors 1216 contain integer 1216 contain integer elements in the elements in the range range afforded afforded
by the by the bit bit depth depth in inuse usein inthe video the videodecoder decoder1204. 1204. Control in the Control in the processor processor 205 205 progresses from progresses from
the step 1990 to an inverse quantise tensors step 19100. the step 1990 to an inverse quantise tensors step 19100.
[000192] Atthe
[000192] At thestep step 19100, 19100,ananinverse inversequantiser quantiser 1218, 1218,under underexecution executionofofthe theprocessor processor205, 205, performsinverse performs inversequantisation quantisation on on the the integer integer tensors tensors 1216 to produce 1216 to inverse quantised produce inverse quantised tensors 1220, tensors provided cm_sps_quantisation_bypass_flag 1220, provided fcm_sps_quantisation_bypass_flag is set is set to to zero zero (i.e., do (i.e., donot not bypass bypass inverse quantisation). inverse quantisation). The The inverse inverse quantiser quantiser 1218 1218 applies applies quantisation quantisation ranges ranges decoded fromthe decoded from the bitstream 143, to the determined channel count of each tensor, also converting the resultant bitstream 143, to the determined channel count of each tensor, also converting the resultant
tensor to tensor to aa floating-point floating-pointdata dataformat. format.The The quantisation quantisation ranges ranges indicate indicateaamaximum and maximum and
minimum minimum value value (or(or lower lower andand upper upper bound) bound) usedused in the in the floating-point floating-point domain domain at the at the output output
from the from the feature feature reduction reduction network or tensor network or tensor compressor compressor530, 530,i.e., i.e., 532. Toperform 532. To performinverse inverse quantisation, the quantisation, the quantisation quantisationrange range isisdecoded decoded from from the the FCM PPS FCM PPS 1116 1116 by by thethe metadata metadata
parser 1208 parser as 1270 1208 as 1270atat the the step step 1908. Thestep 1908. The step19100 19100therefore thereforeoperates operatestotodecode decodethe the quantisation range quantisation and produce range and produceone oneorormore morefloating-point floating-pointtensors tensors(1220) (1220)from fromthe theinteger integer tensor(s) produced tensor(s) at the produced at the step step 1990 1990 using using the the range range information. information. The quantisation The quantisation
range indicates range indicates aa range range of of values, values,and and the thetensors tensors1220 1220 are areproduced produced so SO that that each each element element of of
44204385_1 44204385_1
62
each feature each feature map of each map of eachtensor tensor has has aa value value within within the the indicated indicated range. The quantisation range. The quantisation 07 Jun 2024
range may range maybebeobtained obtainedbybydecoding decoding syntax syntax elements elements qr_min_exp, qr_min_exp_sign, qr_min_exp,qr_min_exp_sign
qr_min_mantissa, qr_min_mantissa_sign, qr_min_mantissa, qr_min_mantissa_sign,qr_max_exp, qr_max_exp_sign, qr_max_mantissa, qr_max_exp,qr_max_exp_sign,qr_max_mantissa,
qr_max_mantissa_sign, qr_max_mantissa_sign, as as described described with with reference reference to to Appendix Appendix E. Control E. Control in the in the
processor 205 processor 205progresses progressesfrom fromthe thestep step19100 19100totoone oneofofaabuffer buffer quantised quantisedtensors tensors step step 19110. 19110.
[000193] Atthe
[000193] At thestep step 19110, 19110,aa tensor tensor storage storage module module1222, 1222,under underexecution execution of of the the
processor 205, 205, provides provides inter-frame inter-frame storage storage of of the the inverse inverse quantised quantised tensors tensors 1220. Of the the 2024203901
processor 1220. Of
tensors for each region of the packing format, each tensor with at least one channel or feature tensors for each region of the packing format, each tensor with at least one channel or feature
mapdecoded map decodedisisstored storedinin the the tensor tensor storage storage module 1222.TheThe module 1222. tensor tensor storage storage module module 1222 1222
produces output produces output tensors tensors 1224, 1224, including including therecent the most mosttensor recentfortensor for each each tensor tensor where where at least at least
one feature one feature map wasdecoded. map was decoded.In In other other words, words, where where a tensor a tensor waswas notnot decoded decoded for for a current a current
frame (i.e., a channel count of zero was determined), the most recent value for the tensor where frame (i.e., a channel count of zero was determined), the most recent value for the tensor where
a nonzero a channelcount nonzero channel countwas wasdecoded, decoded, is isused. used.Control Control in in theprocessor the processor205 205 progresses progresses from from
the step the step 19110 to aa perform 19110 to tensor decompression perform tensor step19120. decompression step 19120.
[000194] Atthe
[000194] At thestep step 19120, 19120,the the tensor tensor decompressor decompressor1250, 1250, under under execution execution of of thethe
processor 205, processor 205, performs performsthe thesteps steps specified specified by by the the decoder networktopology decoder network topologyusing usingthe the tensors 1224 tensors as input 1224 as input to to produce decodedtensors produce decoded tensors1254 1254when when fcm_sps_feature_restoration_bypass_flag fcm_sps_feature_restoration_bypass_flagi is set is to set to i.e., zero, zero, do i.e., notdo not bypass bypass the tensor the tensor
decompreesion decompreesion oror featurerestoration feature restoration step. step. Operation Operationofofthe the step step 19120 19120isis described described further further with with
reference to Fig. 14. By virtue of the check performed at the step 1950, performance of the reference to Fig. 14. By virtue of the check performed at the step 1950, performance of the
decoder network decoder networktopology topology willnot will notconsume consume resources resources beyond beyond those those already already allocated allocated by the by the
destination device destination device 140 for the 140 for the purpose purpose of of tensor tensor decompression andhence decompression and hencewill willsucceed succeedinin producingananoutput. producing output. Control Controlininthe theprocessor processor205 205progresses progressesfrom fromthethestep step19120 19120to to a aperform perform upsamplingstep upsampling step19130. 19130.
[000195]AtAtthe
[000195] thestep step 19130, 19130,aatemporal temporalupsampler upsampler 1260, 1260, under under execution execution of the of the processor processor 205, 205,
performsaa temporal performs temporalupsampling upsampling (interpolation)totoproduce (interpolation) producethe thetensors tensors149 149from fromthethe tensors 1254. tensors Thetemporal 1254. The temporalupsampler upsampler is is activewhen active when fcm_pps_temporal_upsampling_enabled_flag is to fcm_pps_temporal_upsampling_enabled_flag is set set indicate to indicate application application of of temporal temporal
upsampling,inin accordance upsampling, accordancewith withthe theratio ratio indicated indicated by by temporal_upsampling_ratio_minus2. temporal_upsampling_ratio_minus2. Eachtemporal Each temporalupsampling upsampling operation operation takes takes twotwo consecutive consecutive sets sets of of tensorsfrom tensors from thethe
tensors 1254 tensors andproduces 1254 and producesone oneorormore more intermediate intermediate tensors,output tensors, outputalong alongwith withthe thetensors tensors1254 1254 to produce to tensors 149. produce tensors 149. Due Duetotouse useofofthe the second secondset set of of tensors tensors from 1254toto produce from 1254 produce
44204385_1 44204385_1
63
intermediate tensors, intermediate tensors, structural structuraldelay delayisis introduced introducedwhen when temporal temporal upsampling is enabled, upsampling is enabled, hence hence 07 Jun 2024
temporal upsampling is suited to applications that can tolerate a degree of latency. Control in temporal upsampling is suited to applications that can tolerate a degree of latency. Control in
the processor the processor 205 progresses from 205 progresses fromthe thestep step 1930 1930totoaa perform performneural neuralnetwork networksecond second portion portion
step 19140. step 19140.
[000196] Atthe
[000196] At thestep step 19140, 19140,the the CNN CNN head head 150, 150, under under execution execution of the of the processor processor 205, 205, performs performs
the remaining the layers of remaining layers of the the neural neural network implemented network implemented byby thesystem the system 100, 100, using using thethe
tensors 149 as input. input. The method1900 1900 terminates and thethe processor 205205 maymay reinvoke the 2024203901
tensors 149 as The method terminates and processor reinvoke the
method1900 method 1900upon upon receiving receiving thethe next next packed packed frame frame in the in the bitstream bitstream 143. 143.
[000197] Fig. 13
[000197] Fig. 13 is is aa schematic block diagram schematic block diagram1300 1300showing showing functional functional modules modules of example of an an example implementationofofthe implementation thevideo videodecoder decoder1204. 1204.TheThe video video decoder decoder 12041204 may may be be implemented implemented as as one or one or more softwareapplication more software applicationprograms programs233233 executable executable within within thethe computer computer system system 200.200.
Thevideo The videodecoder decoder1204 1204 may may be be effected effected by by instructions instructions 231 231 (see (see Fig.2B) Fig. 2B) inin thesoftware the software233 233 that are that are carried carriedout outwithin withinthe thecomputer computer system system 200. Thesoftware 200. The softwareinstructions instructions 231 231may maybebe formedasasone formed oneoror more morecode codemodules, modules, each each forfor performing performing oneone or more or more particular particular tasks. tasks.
[000198] Thebitstream
[000198] The bitstream1206 1206isisinput inputtoto an an entropy entropydecoder decodermodule module 1320. 1320. TheThe entropy entropy decoder decoder
module1320 module 1320extracts extractssyntax syntaxelements elements from from thethe bitstream bitstream 143 143 by by decoding decoding sequences sequences of ‘bins’ of 'bins'
and passes and passes the the values values of of the the syntax syntax elements to other elements to other modules in the modules in the video video decoder 1204.The decoder 1204. The entropy decoder entropy decodermodule module 1320 1320 uses uses variable-length variable-length andand fixed fixed length length decoding decoding to to decode decode SPS,SPS,
PPSororslice PPS slice header using an header using an arithmetic arithmetic decoding enginetoto decode decoding engine decodesyntax syntaxelements elementsofofthe theslice slice data as data as aa sequence of one sequence of or more one or bins. Each more bins. Eachbin binmay may useoneone use or or more more ‘contexts’,with 'contexts', witha a context describing probability levels to be used for coding a ‘one’ and a ‘zero’ value for the bin. context describing probability levels to be used for coding a 'one' and a 'zero' value for the bin. Where multiple contexts are available for a given bin, a ‘context modelling’ or ‘context Where multiple contexts are available for a given bin, a 'context modelling' or 'context
selection’ step is performed to choose one of the available contexts for decoding the bin. The selection' step is performed to choose one of the available contexts for decoding the bin. The
process of process of decoding bins forms decoding bins formsaasequential sequential feedback feedbackloop, loop,where whereeach eachslice slicemay maybebedecoded decoded in in entirety by entirety by aa given given entropy entropy decoder 1020instance. decoder 1020 instance.
[000199]The
[000199] Theentropy entropydecoder decoder module module 13201320 applies applies an arithmetic an arithmetic coding coding algorithm, algorithm, for for example'context example ‘contextadaptive adaptivebinary binaryarithmetic arithmeticcoding' coding’(CABAC), (CABAC), to decode to decode syntax syntax elements elements
from the from the bitstream bitstream 143. 143. The Thedecoded decoded syntax syntax elements elements areare used used to to reconstruct reconstruct parameters parameters within within
the video the video decoder 1204.Parameters decoder 1204. Parameters include include residualcoefficients residual coefficients(represented (representedbybyananarrow arrow 1324), 1324), a a quantisation quantisation parameter 1374, aa secondary parameter 1374, secondarytransform transformindex index1370, 1370,and andmode mode selection selection
information such information suchas as an an intra intra prediction prediction mode (represented by mode (represented byan anarrow arrow1358). 1358).The Themode mode
44204385_1 44204385_1
64
selection information also includes information such as motion vectors, and the partitioning of selection information also includes information such as motion vectors, and the partitioning of 07 Jun 2024
each CTU each CTUinto intoone oneorormore more CBs. CBs. Parameters Parameters are are usedused to generate to generate PBs,PBs, typically typically in combination in combination
with sample with sampledata datafrom frompreviously previouslydecoded decoded CBs. CBs.
[000200] Theresidual
[000200] The residualcoefficients coefficients 1324 1324are are passed passedto to an an inverse inverse secondary secondarytransform transform module1336 module 1336where where eithera asecondary either secondary transform transform is is applied applied oror nono operationisisperformed operation performed (bypass) according (bypass) accordingto to aa secondary transformindex. secondary transform index.The Theinverse inversesecondary secondary transform transform
module1336 1336produces produces reconstructed transform coefficients 1332. That is,is, themodule module 1336 2024203901
module reconstructed transform coefficients 1332. That the 1336
producesprimary produces primarytransform transformdomain domain coefficients coefficients from from secondary secondary transform transform domain domain coefficients. coefficients.
Thereconstructed The reconstructedtransform transformcoefficients coefficients 1332 1332are areinput input to to aa dequantiser dequantiser module 1328.TheThe module 1328.
dequantiser module dequantiser module1328 1328 performs performs inverse inverse quantisation quantisation (or(or ‘scaling’)ononthe 'scaling') theresidual residual coefficients 1332, that is, in the primary transform coefficient domain, to create reconstructed coefficients 1332, that is, in the primary transform coefficient domain, to create reconstructed
intermediate transform intermediate transform coefficients, coefficients, represented represented by by an an arrow 1340, according arrow 1340, accordingtoto the the quantisation parameter quantisation 1374.The parameter 1374. The dequantiser dequantiser module module 13281328 may may also also applyapply a scaling a scaling matrix matrix to to provide non-uniform provide non-uniformdequantization dequantization within within theTB, the TB, corresponding corresponding to to operation operation of of thethe
dequantiser module dequantiser module840. 840.Should Should useuse of of a non-uniform a non-uniform inverse inverse quantisation quantisation matrix matrix be indicated be indicated
in the in the bitstream bitstream 1206, 1206, the the video video decoder decoder 1204 reads aa quantisation 1204 reads quantisation matrix fromthe matrix from the bitstream 143 as a sequence of scaling factors and arranges the scaling factors into a matrix. bitstream 143 as a sequence of scaling factors and arranges the scaling factors into a matrix.
Theinverse The inverse scaling scaling uses uses the the quantisation quantisation matrix matrix in in combination with the combination with the quantisation quantisation parameter parameter to create the reconstructed intermediate transform coefficients 1340. to create the reconstructed intermediate transform coefficients 1340.
[000201] Thereconstructed
[000201] The reconstructedtransform transformcoefficients coefficients1340 1340are arepassed passedtotoananinverse inverseprimary primary transform module transform module1344. 1344.TheThe module module 13441344 transforms transforms the coefficients the coefficients 13401340 fromfrom the frequency the frequency
domainback domain backtotothe thespatial spatial domain. Theinverse domain. The inverseprimary primarytransform transformmodule module 1344 1344 applies applies inverse inverse
DCT-2transforms DCT-2 transforms horizontallyandand horizontally vertically,constrained vertically, constrainedbybythe themaximum maximum available available transform transform
size as size as described described with with reference reference to tothe theforward forward primary primary transform module826. transform module 826.TheThe resultofof result
operation of operation of the the module 1344isis aa block module 1344 block of of residual residual samples, samples, represented by an represented by an arrow arrow1348. 1348.The The block of block of residual residual samples 1348isis equal samples 1348 equal in in size size to tothe thecorresponding corresponding CB. Theresidual CB. The residual samples1348 samples 1348are aresupplied suppliedtotoaa summation summation module module 1350. 1350.
[000202] Atthe
[000202] At thesummation summation module module 1350, 1350, the the residual residual samples samples 13481348 are added are added to a to a decoded decoded PB PB (represented as (represented as 1352) to produce 1352) to produce aa block block of of reconstructed reconstructed samples, samples, represented representedby byan an arrow 1356. arrow 1356.The Thereconstructed reconstructed samples samples 1356 1356 are are supplied supplied toreconstructed to a a reconstructed sample sample
cache 1360 cache 1360and andananin-loop in-loopfiltering filtering module 1388.The module 1388. Thein-loop in-loopfiltering filtering module module1388 1388produces produces
44204385_1 44204385_1
65
reconstructed blocks reconstructed blocks of of frame samples,represented frame samples, representedasas1392. 1392.The Theframe framesamples samples 1392 1392 are are 07 Jun 2024
written to written to aa frame frame buffer buffer 1396. Theframe 1396. The framebuffer buffer1396 1396outputs outputsimage imageor or video video frames frames 1210. 1210.
[000203] Thereconstructed
[000203] The reconstructedsample sample cache cache 1360 1360 operates operates similarly similarly to to thereference the referencesample sample cache 856 cache 856of of the the video video encoder encoder542. 542.The The reconstructed reconstructed sample sample cache cache 1360 1360 provides provides storage storage for for reconstructed samples reconstructed samplesneeded neededtotointra intra predict predict subsequent CBswithout subsequent CBs withoutthethememory memory206 206 (e.g., (e.g., by by using the using the data data 232 232 instead, instead, which is typically which is typicallyon-chip on-chipmemory). Referencesamples, memory). Reference samples, represented by by an an arrow arrow1364, 1364,are areobtained obtainedfrom fromthe thereconstructed reconstructedsample samplecache cache 1360 andand 2024203901
represented 1360
supplied to a reference sample filter 1368 to produce filtered reference samples indicated by supplied to a reference sample filter 1368 to produce filtered reference samples indicated by
arrow 1372. arrow 1372.The The filteredreference filtered referencesamples samples1372 1372 aresupplied are suppliedtotoananintra-frame intra-frameprediction prediction module1376. module 1376.The The module module 1376 1376 produces produces a block a block of intra-predicted of intra-predicted samples, samples, represented represented by by an an arrow 1380, arrow 1380,in in accordance accordancewith withthe theintra intra prediction prediction mode parameter1358 mode parameter 1358 signalled signalled inin the the
bitstream 1206 bitstream anddecoded 1206 and decodedbyby theentropy the entropydecoder decoder 1320. 1320. The The intra intra prediction prediction module module 13761376
supports the supports the modes ofthe modes of the encoder-side encoder-sidemodule module 864, 864, including including IBC IBC andand MIP. MIP. The block The block of of samples1380 samples 1380isisgenerated generatedusing usingmodes modes such such as as DC, DC, planar planar or or angular angular intraprediction. intra prediction.
[000204] When
[000204] When thethe predictionmode prediction mode of of a CB a CB is indicated is indicated to to use use intraprediction intra predictioninin the the bitstream 143, bitstream 143, the the intra-predicted intra-predictedsamples samples 1380 formthe 1380 form the decoded decodedPBPB 1352 1352 viavia a multiplexor a multiplexor
module1384. module 1384.Intra Intraprediction predictionproduces producesa aprediction predictionblock block(PB) (PB)ofofsamples, samples,which which is is a a blockinin block
one colour one colour component, component,derived derivedusing using'neighbouring ‘neighbouring samples’ samples' in the in the same same colour colour component. component.
Theneighbouring The neighbouringsamples samples areare samples samples adjacent adjacent to to thecurrent the currentblock blockand andbyby virtueofofbeing virtue being preceding in preceding in the the block block decoding orderhave decoding order havealready alreadybeen beenreconstructed. reconstructed.Where Where luma luma and and chromablocks chroma blocksare arecollocated, collocated, the the luma lumaand andchroma chroma blocks blocks maymay use use different different intraprediction intra prediction modes.However, modes. However,thethe two two chroma chroma CBs CBs shareshare the same the same intraintra prediction prediction mode. mode.
[000205] When
[000205] When thethe predictionmode prediction mode of the of the CB CB is indicated is indicated to to bebe interprediction inter predictioninin the the bitstream 1206, bitstream 1206, aa motion compensation motion compensation module module 13341334 produces produces a block a block of inter-predicted of inter-predicted
samples, represented samples, represented as as 1338. 1338. The Theblock blockofofinter-predicted inter-predictedsamples samples1338 1338 areproduced are produced using using a a motionvector, motion vector, decoded decodedfrom fromthe thebitstream bitstream143 143bybythetheentropy entropydecoder decoder 1320, 1320, andand reference reference
frame index frame indexto to select select and and filter filtera block ofofsamples a block samples1398 1398 from from the the frame frame buffer buffer 1396. 1396. The block The block
of samples of 1398isis obtained samples 1398 obtainedfrom fromaapreviously previouslydecoded decoded frame frame stored stored in in theframe the framebuffer buffer1396. 1396. For bi-prediction, For bi-prediction, two two blocks blocks of of samples are produced samples are andblended produced and blendedtogether togethertotoproduce producesamples samples for the for the decoded PB1352. decoded PB 1352.TheThe frame frame buffer buffer 1396 1396 is populated is populated with with filteredblock filtered block data1392 data 1392 from the from the in-loop in-loop filtering filtering module module 1388. Aswith 1388. As withthe thein-loop in-loopfiltering filtering module 868ofofthe module 868 the video video encoder 542, encoder 542, the the in-loop in-loop filtering filtering module module 1388 applies any 1388 applies any of of the the DBF, theALF DBF, the ALF and and SAOSAO
44204385_1 44204385_1
66
filtering operations. filtering operations.Generally, Generally, the themotion motion vector vector is isapplied appliedtotoboth boththe luma the lumaand andchroma chroma 07 Jun 2024
channels, although channels, although the the filtering filtering processes processesfor forsub-sample sub-sample interpolation interpolationin inthe theluma lumaand andchroma chroma
channel are different. channel are different.
[000206] Fig. 14
[000206] Fig. 14 is is aa schematic block diagram schematic block diagramshowing showingan an implementation implementation 14001400 of a of a
configurable feature configurable feature reconstruction reconstruction module performinga adecoder module performing decoder network network topology, topology, which which may may serve as serve as the the tensor tensordecompressor 1250.A A decompressor 1250. model model 1405, 1405, in the in the example example of Fig. of Fig. 14 14 an an ONNX ONNX
model1405 1405ofofthe thedecoder decodernetwork network topology to to be be performed as the decompressor 1250 1250 2024203901
model topology performed as the decompressor
receives the receives the tensor tensor decompressor networktopology decompressor network topology information information 1238 1238 and and the the tensor tensor weight weight
information 1240. information 1240.The Thestructure structureof of the the tensor tensor decompressor 1250isisselected decompressor 1250 selectedbybythe theONNX ONNX model1405 model 1405based based onon theinformation the information 1238. 1238. TheThe weights weights for for the the tensor tensor decompressor decompressor 1250 1250 are are selected by selected by the the ONNX model ONNX model 14051405 based based on information on the the information 1240.1240. Based Based on the on the selections selections at at the model the 1405,aadecompression model 1405, decompression model model 1410 1410 executes. executes. In the In the example example of Fig. of Fig. 14, 14, an ONNX an ONNX
runtime model runtime model1410 1410 executes executes to to receivethe receive thecompressed compressed tensors tensors 1224 1224 andand output output the the
decompressedtensors decompressed tensors1254. 1254.AsAs indicated indicated inin Fig.14, Fig. 14,resources resourcesrequired requiredtoto run run the the ONNX ONNX model model
1410 maybebeallocated 1410 may allocatedfrom fromone oneorormore more resources, resources, such such as as one one oror more more of of a CPU a CPU 1420, 1420, an an
FPGA FPGA 1424, 1424, a vector a vector processing processing unit(VPU) unit (VPU) 1428, 1428, a GPU a GPU 1432 1432 and anand an interface interface modelmodel
DirectML1436. DirectML1436. Each Each of of thethe resources resources 1420, 1420, 1424, 1424, 1428, 1428, 14321432 and and 14361436 canimplemented can be be implemented on on the module the 201ororcan module 201 canbebeexecuted executedacross acrossone oneorormore more similardevices. similar devices.
[000207] Fig. 15
[000207] Fig. 15 is is aa schematic block diagram schematic block diagramshowing showing a tensordecompressor a tensor decompressor 15001500 using using a a multi-scale feature reconstruction stage, which may be selected at the step 1910 for use in the multi-scale feature reconstruction stage, which may be selected at the step 1910 for use in the
tensor decompressor tensor 1250.InInparticular, decompressor 1250. particular, the the decompressor 1500 decompressor 1500 can can bebe implemented implemented as the as the
runtime model runtime model1410 1410ofof Fig.14. Fig. 14.The The tensor tensor decompressor decompressor 15001500 includes includes a single-scale a single-scale feature feature
compression(SSFC) compression (SSFC) decompressor decompressor 1510. 1510. The decompressor The SSFC SSFC decompressor 1510 receives 1510 receives the the tensor 1224 tensor havingaareduced 1224 having reducedchannel channelcount, count,such suchasas6464channels, channels,and andpasses passesthe thetensor tensor1224 1224toto a convolution a layer 1512, convolution layer whichoutputs 1512, which outputsaatensor tensor 1513 1513having havinga arestored restoredchannel channelcount, count,such suchasas 256 channels. 256 channels. The Thetensor tensor1513 1513isispassed passedtotoaabatch batchnormalisation normalisationmodule module 1514 1514 to to produce produce a a tensor 1515. tensor Thetensor 1515. The tensor1515 1515isispassed passedtotoaa PreLU PreLUmodule module 1516 1516 to produce to produce a tensor a tensor 1520. 1520. The The tensor decompressor tensor 1500 decompressor 1500 includes includes a a MSFR MSFR module module 1530.1530. Themodule The MSFR MSFR1530 module 1530tooperates operates to produceaa plurality produce plurality of of tensors tensorsfrom from the the tensor tensor1520 1520 produced byexecution produced by executionofofstep step 19120, 19120, described with described with reference reference to to Fig. Fig. 19, 19, using using one one or or more more trained trained convolutional convolutional layers. layers. Upsample Upsample
modules1532, modules 1532,1534, 1534,and and 1536 1536 upsample upsample the the tensor tensor 15201520 horizontally horizontally and and vertically vertically by by factors factors
of two, of two, four, four, and and eight, eight,respectively, respectively,toto produce producetensors 1533, tensors 1533,1535, 1535,and and1537. 1537. The tensor 1537 The tensor 1537
forms one forms one(P'2, (P’2, 1557) 1557)output outputfrom fromthe theMSFR MSFR module module 1530 1530 and and is is passed passed to a to a downsample downsample
module 1542. module 1542. 44204385_1 44204385_1
67
[000208] Thedownsample
[000208] The downsample module module 1542 1542 downsamples downsamples the 1537 the tensor tensor by1537 by a of a factor factor two of two 07 Jun 2024
horizontally and horizontally vertically totoproduce and vertically produce aatensor tensor1543 1543 having having the the same dimensionalityasas the same dimensionality the tensor 1535. tensor Thetensor 1535. The tensor1543 1543isisprovided providedtotoaaconvolution convolutionlayer layer1548 1548which which outputs outputs a a tensor 1549. tensor 1549. AAsummation summation module module 15541554 adds adds the tensors the tensors 1535 1535 and to and 1549 1549 to produce produce a a tensor 1555 tensor as an 1555 as an output output (P'3) (P’3) of of the the MSFR module MSFR module 1530. 1530.
[000209]
[000209] AAdownsample downsample module module 1540 1540 downsamples downsamples the 1535 the tensor tensor by1535 by a of a factor factor two of two
horizontally and and vertically vertically to toproduce produce aatensor tensor1541 1541 having having the the same dimensionalityasas the the 2024203901
horizontally same dimensionality
tensor 1533. tensor Thetensor 1533. The tensor1541 1541isisprovided providedtotoaaconvolution convolutionlayer layer1546 1546which which outputs outputs a a tensor 1547. tensor 1547. AAsummation summation module module 15521552 adds adds the tensors the tensors 1533 1533 and to and 1547 1547 to produce produce a a tensor 1553 tensor as an 1553 as an output output (P'4) (P’4) of of the the MSFR module MSFR module 1530. 1530.
[000210]
[000210] AAdownsample downsample module module 1538 1538 downsamples downsamples the 1533 the tensor tensor by1533 by a of a factor factor two of two
horizontally and horizontally and vertically verticallyto toproduce produce aatensor tensor1539 1539 having having the the same dimensionalityasas the same dimensionality the tensor 1520. tensor Thetensor 1520. The tensor1539 1539isisprovided providedtotoaaconvolution convolutionlayer layer1544 1544which which outputs outputs a a tensor 1545. tensor 1545. AAsummation summation module module 15501550 adds adds the tensors the tensors 1520 1520 and to and 1545 1545 to produce produce a a tensor 1551 tensor as an 1551 as an output output (P'5) (P’5) of of the the MSFR module MSFR module 1530. 1530. TheThe tenors tenors P'2 P’2 1557, 1557, P'3 P’3 1555, 1555, P'4 P’4 1553 andP'5 1553 and P’51551 1551form form thetensors the tensors1254 1254ofof Fig.12. Fig. 12.
[000211] Fig. 16A
[000211] Fig. 16Aisisaa schematic schematicblock blockdiagram diagram showing showing an example an example implementation implementation 1600 of 1600 of
the head the portion 150 head portion of aa CNN 150 of forobject CNN for objectdetection, detection, corresponding correspondingtotoaaportion portion of of aa “YOLOv3” "YOLOv3"
networkexcluding network excludingthe the"DarkNet-53" “DarkNet-53” backbone backbone portion. portion. Thehead The CNN CNNportion head portion 150 of 150 Fig. of Fig. 16A canbebeused 16A can usedwhen when theCNNCNN the backbone backbone is implemented is implemented as in as in 3A Fig. Fig.for 3Aexample. for example. Dependingononthethetask Depending tasktotobe beperformed performedininthe thedestination destination device device140, 140,different different networks maybebe networks may
substituted for substituted for the theCNN headportion CNN head portion150. 150.Incoming Incoming tensors tensors 149149 areare separated separated into into thetensor the tensor of each of each layer layer (i.e., (i.e., tensors 1610, tensors 1620, 1610, and 1620, and1634). 1634).The The tensor tensor 1610 1610 is is passed passed to toaaCBL CBL
module1612 module 1612totoproduce produce tensor1614. tensor 1614. The The tensor tensor 1614 1614 is is passed passed to to a detectionmodule a detection module 1616 1616 and and
an upscaler an upscaler module 1622.TheThe module 1622. detection detection module module outputs outputs bounding bounding boxesboxes 1618,1618, in theinform the form of a of a detection tensor. detection tensor. The The bounding boxes1618 bounding boxes 1618arearepassed passedtotoa anon-maximum non-maximum suppression suppression (NMS)(NMS)
module 1648. module 1648.
[000212] Toproduce
[000212] To produce bounding bounding boxes boxes addressing addressing co-ordinates co-ordinates in the in the original original video video data data 113, 113,
prior to resizing for the backbone portion of the network 114, scaling by the original video prior to resizing for the backbone portion of the network 114, scaling by the original video
width and width andheight height is is performed at the performed at the upscaler upscaler module 1622.TheThe module 1622. upscaler upscaler module module 16221622 receives receives
the tensor the tensor 1614 and the 1614 and the tensor tensor 1620 andproduces 1620 and producesananupscaled upscaledtensor tensor1624, 1624,which which is is passed passed toto a a CBLmodule CBL module 1626. 1626. TheThe CBL CBL module module 1626 produces 1626 produces a tensora 1628 tensoras1628 as output. output. The1628 The tensor tensor 1628
44204385_1 44204385_1
68
is passed is passed to to aadetection detectionmodule module 1630 andan 1630 and anupscaler upscalermodule module1636. 1636. TheThe detection detection module module 1630 1630 07 Jun 2024
producesaa detection produces detection tensor tensor 1632, whichisis supplied 1632, which supplied to to the the NMS module NMS module 1648. 1648.
[000213] The
[000213] The upscaler upscaler module module 16361636 is another is another instance instance of the of the module module 1622. 1622. The upscaler The upscaler
module1636 module 1636receives receivesthe thetensor tensor1628 1628and and thetensor the tensor1634 1634 and and outputs outputs an an upscaled upscaled tensor tensor 1638. 1638.
Theupscaled The upscaledtensor tensor1638 1638isispassed passedtoto aa CBL CBLmodule module 1640, 1640, which which outputs outputs a tensor a tensor 16421642 to ato a detection module detection 1644.TheThe module 1644. detection detection module module 16441644 produces produces a detection a detection tensor tensor 1646, 1646, which which is is supplied to to the the NMS module 1648. 2024203901
supplied NMS module 1648.
[000214] TheCBL
[000214] The CBL modules modules 1612, 1612, 1626, 1626, and 1640 and 1640 each contain each contain a concatenation a concatenation of CBL of five five CBL modules(e.g., modules (e.g., CBL model CBL model 360360 shown shown in Fig. in Fig. 3D).3D). The The upscaler upscaler modules modules 1622 1622 and andare 1636 1636 are each instances each instances of of an an upscaler upscaler module 1660asasshown module 1660 shownin in Fig.16B. Fig. 16B.TheThe module module 1648 1648 receives receives
the tensors the tensors 1618, 1618, 1632 and1646 1632 and 1646and andoutputs outputsthe thetask taskresult result 151. 151.
[000215] Asshown
[000215] As shownin in Fig.16B, Fig. 16B, theupscaler the upscalermodule module 1660 1660 accepts accepts a tensor a tensor 1662 1662 (for(for example example
the tensor the tensor 1614 of Fig. 1614 of Fig. 16A) as an 16A) as an input. input. The The tensor tensor 1662 is passed 1662 is passed to to aaCBL module1666 CBL module 1666 (having structure (having structure of of the the module 360) to module 360) to produce produce aa tensor tensor 1668. 1668. The Thetensor tensor1668 1668isispassed passedtotoanan upsampler1670 upsampler 1670totoproduce produceanan upsampled upsampled tensor tensor 1672. 1672. A concatenation A concatenation module module 1674 produces 1674 produces
a tensor a tensor 1676 by concatenating 1676 by concatenatingthe the upsampled upsampledtensor tensor1672 1672 with with a second a second input input tensor tensor 1664 1664 (for (for
examplethe example thetensor tensor 1620 1620input inputtoto the the upscaler upscaler 1622 1622inin Fig. Fig. 16A). 16A).
[000216] Thedetection
[000216] The detectionmodules modules 1616, 1616, 1630, 1630, andand 1644 1644 are are instances instances ofdetection of a a detection module1680 module 1680asasshown shown in in Fig.16C. Fig. 16C. TheThe detection detection module module 1680 1680 receives receives a tensor a tensor 1682.1682. The The tensor 1682 tensor is input 1682 is input to to aaCBL module1684 CBL module 1684 having having structureofofthe structure themodule module 360. 360. TheThe CBLCBL
module1684 module 1684generates generates a a tensor1686. tensor 1686.The The tensor1686 tensor 1686 is is passed passed to to a aconvolution convolution module module 1688, 1688,
whichimplements which implements a detectionkernel a detection kerneltotooutput outputa atensor tensor 1690. 1690.InInsome some arrangements, arrangements, thethe
detection kernel applies a 1 × 1 kernel to produce the output on feature maps at each of the detection kernel applies a 1 x 1 kernel to produce the output on feature maps at each of the
three layers three layers of ofthe thetensor. tensor.The The detection detectionkernel kernelisis1 1 × X1 × 1 (B × X(5(5+ C) X (B ), where + C) whereBBisis the the number number of bounding boxes a particular cell can predict, typically three (3), and C is the number of of bounding boxes a particular cell can predict, typically three (3), and C is the number of
classes, which may be eighty (80), resulting in a kernel size of two-hundred and fifty five (255) classes, which may be eighty (80), resulting in a kernel size of two-hundred and fifty five (255)
detection attributes (i.e. tensor 1290). The constant “5” represents four boundary box attributes detection attributes (i.e. tensor 1290). The constant "5" represents four boundary box attributes
(box centre x, y and size scale x, y) and one object confidence level (“objectness”). The result (box centre X, y and size scale X, y) and one object confidence level ("objectness"). The result
of a detection kernel has the same spatial dimensions as the input feature map, but the depth of of a detection kernel has the same spatial dimensions as the input feature map, but the depth of
the output corresponds to the detection attributes. The detection kernel is applied at each layer, the output corresponds to the detection attributes. The detection kernel is applied at each layer,
typically three typically threelayers, layers,resulting in in resulting a large number a large of of number candidate bounding candidate boundingboxes. boxes. A A process of process of
non-maximum non-maximum suppression suppression is applied is applied by the by the NMSNMS module module 1648 1648 to the to the resulting resulting bounding bounding boxes boxes
44204385_1 44204385_1
69
to discard redundant boxes, such as overlapping predictions at similar scale, resulting in a final to discard redundant boxes, such as overlapping predictions at similar scale, resulting in a final 07 Jun 2024
set of bounding boxes as output for object detection. set of bounding boxes as output for object detection.
[000217] Fig. 17
[000217] Fig. 17 is is aa schematic block diagram schematic block diagramshowing showing a head a head portion portion 1700 1700 of of a CNN. a CNN. The The
head portion head portion 1700 1700can canbebeimplemented implementedas as thethe CNN CNN headhead portion portion 150 where 150 where thebackbone the CNN CNN backbone 114 is implemented 114 is implemented asasthe thebackbone backbone400400 forexample. for example. The The headhead portion portion 17001700 formsforms part part of anof an
overall network overall knownasas'Faster network known ‘FasterRCNN' RCNN’and and includes includes a feature a feature network network (i.e., (i.e., backbone backbone
portion 400), 400), a a region region proposal proposal network, and aa detection detection network. Inputtoto the the head head 2024203901
portion network, and network. Input
portion 1700 portion are the 1700 are the tensors tensors 149, 149, which include P2-P6 which include P2-P6layer layertensors tensors 1710, 1710,1712, 1712,1714, 1714,1716, 1716, and 1718. and 1718. The TheP2-P5 P2-P5 layer layer tensors1710, tensors 1710, 1712, 1712, 1714, 1714, andand 1716, 1716, correspond correspond to the to the P2 P2 to to P5 P5 outputs 477, outputs 477, 475, 475, 473, 473, and and 471 471of of Fig. Fig. 4. 4. The The P2-P6 tensors 1710, P2-P6 tensors 1710,1712, 1712,1714, 1714,1716, 1716,and and1718 1718 are input are input to to aaregion regionproposal proposalnetwork network (RPN) headmodule (RPN) head module 1720. 1720. The The P6 tensor P6 tensor 17181718 is is producedbybya amax produced maxpool poolmodule module 1742, 1742, operating operating on tensor on P5 P5 tensor 17161716 to perform to perform a 2x2a max 2×2 max pooling operation. pooling operation. The TheRPN RPN head head module module 1720 1720 performs performs a convolution a convolution on theon the input input tensors, tensors,
generating an intermediate tensor. The intermediate tensor is fed into two subsequent sibling generating an intermediate tensor. The intermediate tensor is fed into two subsequent sibling
layers, (i) one for classifications and (ii) one for bounding box, or ‘region of interest’ (ROI), layers, (i) one for classifications and (ii) one for bounding box, or 'region of interest' (ROI),
regression. A resultant output is classification and bounding boxes 1722. The classification and regression. A resultant output is classification and bounding boxes 1722. The classification and
boundingboxes bounding boxes1722 1722 areare passed passed to to anan NMS NMS module module 1724.1724. Themodule The NMS NMS1724 module 1724 prunes outprunes out redundantbounding redundant boundingboxes boxes by by removing removing overlapping overlapping boxes boxes with with a lower a lower scorescore to produce to produce
prunedbounding pruned boundingboxes boxes 1726. 1726. The The bounding bounding boxesboxes 1726input 1726 are are input to a region to a region of interest of interest (ROI) (ROI)
pooler 1728. pooler 1728. The TheROI ROI pooler pooler 1728 1728 uses uses some some of the of the layer layer tensors tensors of of thethe tensor149 tensor 149 (described (described
further hereafter) further hereafter)and andthe thebounding bounding boxes 1726toto produce boxes 1726 producefixed-size fixed-sizefeature feature maps mapsfrom fromvarious various input size input size maps using max maps using maxpooling poolingoperations. operations.InInthe the max maxpooling poolingoperation operationa asubsampling subsampling takes the takes the maximum value maximum value in in each each group group of of input input values values toto produce produce oneone output output value value in in thethe
output tensor. output tensor.
[000218] Inputto
[000218] Input to the the ROI pooler1728 ROI pooler 1728are arethe theP2-P5 P2-P5feature featuremaps maps 1710, 1710, 1712, 1712, 1714, 1714, andand 1716, 1716,
and region and region of of interest interestproposals proposals 1726. Eachproposal 1726. Each proposal(ROI) (ROI)from from 1726 1726 is is associatedwith associated with a a portion of portion of the the feature featuremaps maps (1710-1716) to produce (1710-1716) to producea afixed-size fixed-size map. map.The The fixed-sizemap fixed-size mapis is ofof
a size a size independent of the independent of the underlying underlying portion portion of of the the feature featuremap map 1710-1716. Oneofofthe 1710-1716. One thefeature feature maps 1710-1716 is selected such that the resulting cropped map has sufficient detail, for maps 1710-1716 is selected such that the resulting cropped map has sufficient detail, for
example, according to the following rule: floor(4 + log2(sqrt(box_area) / 224)), where 224 is example, according to the following rule: floor(4 + log2(sqrt(box_area) / 224)), where 224 is
the canonical the canonical box size. The box size. TheROI ROIpooler pooler1728 1728 operates operates to to cropincoming crop incoming feature feature maps maps according according
to the to the proposals proposals 1726 producingaatensor 1726 producing tensor 1730. 1730.
44204385_1 44204385_1
70
[000219] Thetensor
[000219] The tensor1730 1730isisfed fedinto into aa fully fully connected (FC) neural connected (FC) neural network networkhead head1732. 1732.TheThe FC FC 07 Jun 2024
head 1732 head 1732performs performstwo two fullyconnected fully connected layerstotoproduce layers produce classscore class scoreand andbounding bounding boxbox
predictor delta predictor delta tensor tensor1734. 1734. The class score The class score is isgenerally generallyan an80-element 80-element tensor, tensor, each each element element
corresponding to aa prediction corresponding to prediction score score for for the thecorresponding corresponding object object category. Thebounding category. The boundingboxbox prediction deltas prediction deltas tensor tensorisisanan80×4 80x4 == 320 320 element tensor, containing element tensor, containing bounding boxesfor bounding boxes forthe the correspondingobject corresponding objectcategories. categories. Final Final processing processingis is performed performedbybyananoutput outputlayers layers module1736, module 1736,receiving receivingthe thetensor tensor1734 1734and andperforming performing a filteringoperation a filtering operationtotoproduce producea a 2024203901
filtered tensor filtered tensor1738. 1738. Low-scoring (lowclassification) Low-scoring (low classification) objects objects are are removed fromfurther removed from further consideration. AAnon-maximum consideration. non-maximum suppression suppression module module 1740 receives 1740 receives the filtered the filtered tensor tensor 1738 1738 and and removesoverlapping removes overlappingbounding bounding boxes boxes by removing by removing the overlapped the overlapped box awith box with a lower lower
classification score, resulting in an inference output tensor 1742, corresponding to the tensor classification score, resulting in an inference output tensor 1742, corresponding to the tensor
151. 151.
[000220] Referringtoto Appendix
[000220] Referring AppendixE,E, theweights the weights information information maymay include include a ‘no_weights_flag’, a no_weights_flag',
indicating that indicating that the thedecoder decoder network topologyto network topology to be be used used does does not not require require any any weights weightsin in order order to to operate. operate.
[000221]InIn an
[000221] an arrangement arrangementofofthe thesource sourcedevice device110 110and and thedestination the destinationdevice device140, 140,network network weights are signalled in the bitstream 121 as a delta relative to another set of network weights weights are signalled in the bitstream 121 as a delta relative to another set of network weights
(the ‘base (the 'base weights’) weights') that thatare areknown known to to the the system system 100 or may 100 or beobtained may be obtainedvia viaexternal external means, means, such as such as from the tensor from the tensor codec repository 180. codec repository 180. The Thebase baseweights weightsmay may be be indicated indicated viavia reference reference
using an identifier number stored in the bitstream 121. Signalling of network weights as a delta using an identifier number stored in the bitstream 121. Signalling of network weights as a delta
relative totoother relative othernetwork network weights weights may beaccomplished may be accomplished using using a syntax a syntax such such as as ‘MPEG 'MPEG
IncrementalNeural Incremental NeuralNetwork Network Representation’, Representation', under under development development as part as part of ISO/IEC of ISO/IEC 15938-17. 15938-17.
[000222] Methods
[000222] Methods presented presented herein herein enable enable efficientrepresentation efficient representationofoftensors tensorsin in aa format being format being
amenabletotocompression amenable compression using using contemporary contemporary block-based block-based compression compression standards standards such such as as VVC,HEVC, VVC, HEVC, AVC AVC or other or other standards. standards. Block-based Block-based compression, compression, althoughalthough not intuitively not intuitively
applicable to data such as compressed feature maps or coefficients for projecting basis vectors applicable to data such as compressed feature maps or coefficients for projecting basis vectors
to reconstruct to reconstruct feature featuremaps, maps, uncover additional unexpected uncover additional redundancy unexpected redundancy inin blockssuch blocks such asas byby use use
of various of various transforms including trained transforms including trained secondary transforms. Although secondary transforms. Although methods methods presented presented
herein are herein are described described with with reference reference to to the the‘Faster 'FasterRCNN’ and'YOLOv3' RCNN' and ‘YOLOv3’ network network architectures architectures
and specific and specific divisions divisions of ofthese thesenetworks networks into into ‘backbone’ 'backbone' and ‘head’ portions, and 'head' portions, the the methods are methods are
applicable to applicable to any any neural neural network operating on network operating onmulti-dimensional multi-dimensionaltensor tensordata dataand andare areapplicable applicable to different divisions of such networks into ‘backbone’ and ‘head’ portions. to different divisions of such networks into 'backbone' and 'head' portions.
44204385_1 44204385_1
71
[000223] It should
[000223] It be noted should be noted that that although the source although the source device 110 and device 110 andthe the destination destination device 140 device 140 07 Jun 2024
are described are described with with reference reference to to the the video video source source 112 112 comprising videoand comprising video andimage image data,other data, other types of content such as audio data or textual data may also be supplied as input to neural types of content such as audio data or textual data may also be supplied as input to neural
networksapplicable networks applicabletoto such such types types of of input input and the resulting and the resulting intermediate intermediate feature featuremaps maps may be may be
compressedand compressed anddecompressed decompressed by the by the modules modules 116146 116 and andwith 146 suitable with suitable encoder encoder and decoder and decoder
networktopologies. network topologies.
INDUSTRIAL APPLICABILITY 2024203901
INDUSTRIAL APPLICABILITY
[000224] Thearrangements
[000224] The arrangements described described areare applicable applicable to to thecomputer the computerandand data data processing processing
industries and particularly for the digital signal processing for the encoding and decoding of industries and particularly for the digital signal processing for the encoding and decoding of
signals such signals such as as video video and and image signals, achieving image signals, high compression achieving high compressionefficiency. efficiency.
[000225] Some
[000225] Some implementations implementations described described use use of inserted of an an inserted NALNAL unit unit that that identifies identifies thethe
compressionapproach compression approach used used forfor coding coding featuremaps feature maps andand consequently consequently alsoalso the the NAL NAL unit unit header header
format used for NAL units in the bitstream, including those related to tensor quantisation, format used for NAL units in the bitstream, including those related to tensor quantisation,
reduction and restoration operations, i.e., operations outside the scope of packed feature frame reduction and restoration operations, i.e., operations outside the scope of packed feature frame
coding. Accordingly, coding. Accordingly,such suchimplementations implementations allow allow several several differentcompression different compression standards standards to to be be
indicated in indicated in an an FCM bitstreamfor FCM bitstream for'inner ‘inner coding' coding’of of packed packedfeature featureframes, frames,while whilesupporting supporting signalling of signalling of higher-level higher-levelmetadata metadata needed to decode needed to the FCM decode the FCM bitstream.Allowing bitstream. Allowing compressionstandard compression standardused usedtotobebeindicated indicatedinin an an FCM FCM bitstream bitstream provides provides improved improved flexibility flexibility in in
implementation,including implementation, includingability ability to to be be back-compatible withlonger-existing back-compatible with longer-existingstandards standardssuch suchasas AVC,compatibility AVC, compatibilitywith withmore more recent recent standards standards such such as as HEVC HEVC or and or VVC VVC and further further flexibility flexibility to to allow use allow use of of custom or other custom or other compression compressionstandards. standards.unspecified' unspecified’values, values,which whichwill willnot notbebe used in future used in future
[000226] The
[000226] The foregoing foregoing describes describes only only some some embodiments embodiments of theof the present present invention, invention, and and
modifications and/or modifications and/or changes changescan canbebemade made theretowithout thereto without departing departing from from thethe scope scope andand spirit spirit
of the invention, the embodiments being illustrative and not restrictive. of the invention, the embodiments being illustrative and not restrictive.
[000227] In the
[000227] In the context context of of this this specification, specification,thetheword word“comprising” "comprising" means “including means "including
principally but not necessarily solely” or “having” or “including”, and not “consisting only of”. principally but not necessarily solely" or "having" or "including", and not "consisting only of".
Variations of Variations of the the word “comprising”,such word "comprising", suchasas"comprise" “comprise”andand “comprises” "comprises" have have
correspondinglyvaried correspondingly variedmeanings. meanings.
44204385_1 44204385_1
72
APPENDIX A APPENDIX A 07 Jun 2024
AVC/H.264NAL AVC/H.264 NAL unitformat unit format
Bytes Bytes 00 to to nalUnitHeaderBytes-1 nalUnitHeaderBytes-1 ininthe theNAL NAL unit unit form form thethe NALNAL unit unit header. header.
nal_unit( NumBytesInNALunit nal_unit(NumBytesInNALunit) ) {{ C C Descriptor Descriptor
forbidden_zero_bit forbidden_zero_bit All All f(1) f(1)
nal_ref_idc nal_ref_idc All All u(2) u(2) 2024203901
nal_unit_type nal_unit_type All All u(5) u(5)
NumBytesInRBSP =0 NumBytesInRBSP=0 nalUnitHeaderBytes1= nalUnitHeaderBytes 1 if( nal_unit_type if( nal_unit_type = == = 1414 |nal_unit_type | nal_unit_type == = =2011 20 | | nal_unit_type nal unit type === =21)21{) { if( nal_unit_type if( (nal_unit_type !! = 21 ) = 21)
svc_extension_flag svc_extension_flag All All u(1) u(1)
else else
avc_3d_extension_flag avc_3d_extension_flag All All u(1) u(1)
if( svc_extension_flag ) { if( svc_extension_flag ) {
nal_unit_header_svc_extension( ) /* specified nal_unit_header_svc_extension() /* specified in Annex in Annex G G All All
nalUnitHeaderBytes+=+=3 3 nalUnitHeaderBytes
}} else elseif( avc_3d_extension_flag {) { (ave_3d_extension_flag)
nal_unit_header_3davc_extension( ) /*specified al_unit_header_3dave_extension()/* specifiedin in Annex AnnexJ J nalUnitHeaderBytes nalUnitHeaderBytes += 2 } else { } else {
nal_unit_header_mvc_extension( ) /* specified nal_unit_header_mvc_extension()/*s specified in in Annex H Annex H All All
nalUnitHeaderBytes +=3 3 nalUnitHeaderBytes +=
}} } }
for( for( II= =nalUnitHeaderBytes; nalUnitHeaderBytes; II< <NumBytesInNALunit; i++ ) { NumBytesInNALunit;i++)
if( I + 2 <NumBytesInNALunit&& if(I+2 < NumBytesInNALunit && next_bits( 24 ) = = 0x000003 next_bits(24)==0x000003) { ){ rbsp_byte[ NumBytesInRBSP++ rbsp_byte| NumBytesInRBSP++ ]] All All b(8) b(8)
rbsp_byte[ NumBytesInRBSP++ rbsp_byte| NumBytesInRBSP++ ]] All All b(8) b(8)
I(+=2 += 2 emulation_prevention_three_byte /* equal emulation_prevention_three_byte /* equal to to 0x03 0x03 */ */ All All f(8) f(8)
}} else else
rbsp_byte[ NumBytesInRBSP++ rbsp_byte[ NumBytesInRBSP++ I] All All b(8) b(8)
} } }}
A modified A modifiedversion versionofofTable Table7-1 7-1from fromthe theAVC/H.264 AVC/H.264 spec, spec, withwith codes codes reserved reserved from from FCM FCM parametersets parameter sets and reservation NAL and reservation unittype NAL unit typenecessary necessarytotodistinguish distinguishfrom fromananICI ICINAL NAL unit unit is is
shown,asas follows: shown, follows:
44204385_1 44204385_1
73
nal_unit_type nal_unit_type Contentof Content of NAL NALunit unit and and RBSP RBSPsyntax syntax C C AnnexError! Annex Error! Annex GG Annex AnnexII Annex structure Reference and and 07 Jun 2024
structure Reference and and source not source not Annex HH Annex AnnexJJ Annex found. found. NAL unit NAL unit NALunit NAL unit type class type class type class type class
NALunit NAL unit type class type class
0 0 Reserved (prohibited) Reserved (prohibited) * * non-VCL non-VCL non-VCL non-VCL non-VCL non-VCL 1 1 Codedslice Coded sliceofof aa non-IDR non-IDRpicture picture 2, 2, VCL VCL VCL VCL VCL VCL slice_layer_without_partitioning_rbsp( ) slice_layer_without_partitioning_rbsp() 3, 3,
4 2024203901
4 2 2 Codedslice Coded slicedata datapartition partitionAA 2 2 VCL VCL not not not not slice_data_partition_a_layer_rbsp( slice_data_partition_a_layer_rbsp() ) applicable applicable applicable applicable
3 3 Codedslice Coded slicedata datapartition partitionBB 3 3 VCL VCL not not not not slice_data_partition_b_layer_rbsp( slice_data_partition_b_layer_rbsp() ) applicable applicable applicable applicable
4 4 Codedslice Coded slicedata datapartition partitionCC 4 4 VCL VCL not not not not slice_data_partition_c_layer_rbsp( slice_data_partition_c_layer_rbsp() ) applicable applicable applicable applicable
5 5 Codedslice Coded sliceofof an anIDRIDRpicture picture 2, 2, VCL VCL VCL VCL VCL VCL slice_layer_without_partitioning_rbsp( ) slice_layer_without_partitioning_rbsp( 3 3 6 6 Supplementalenhancement Supplemental enhancement 5 5 non-VCL non-VCL non-VCL non-VCL non-VCL non-VCL information information (SEI)(SEI) sei_rbsp( ) sei_rbsp()
7 7 Sequenceparameter Sequence parameterset set 0 0 non-VCL non-VCL non-VCL non-VCL non-VCL non-VCL seq_parameter_set_rbsp( seq_parameter_set_rbsp() ) 8 8 Picture parameter Picture parameter setset 11 non-VCL non-VCL non-VCL non-VCL non-VCL non-VCL pic_parameter_set_rbsp( ) pic_parameter_set_rbsp() 9 9 Accessunit Access unitdelimiter delimiter 6 6 non-VCL non-VCL non-VCL non-VCL non-VCL non-VCL access_unit_delimiter_rbsp( ) access_unit_delimiter_rbsp()
10 10 End End ofof sequence sequence 7 7 non-VCL non-VCL non-VCL non-VCL non-VCL non-VCL end_of_seq_rbsp( end_of_seq_rbsp() ) 11 11 End End ofof stream stream 8 8 non-VCL non-VCL non-VCL non-VCL non-VCL non-VCL end_of_stream_rbsp( ) end_of_stream_rbsp() 12 12 Filler Filler data data 9 9 non-VCL non-VCL non-VCL non-VCL non-VCL non-VCL filler_data_rbsp( ) filler_data_rbsp()
13 13 Sequenceparameter Sequence parameterset set extension extension 10 10 non-VCL non-VCL non-VCL non-VCL non-VCL non-VCL seq_parameter_set_extension_rbsp( ) seq_parameter_set_extension_rbsp() 14 14 Prefix NAL Prefix unit NAL unit 2 2 non-VCL non-VCL suffix suffix suffix suffix
prefix_nal_unit_rbsp( ) prefix_nal_unit_rbsp() dependent dependent dependent dependent 15 15 Subset sequence Subset parameterset sequence parameter set 0 0 non-VCL non-VCL non-VCL non-VCL non-VCL non-VCL subset_seq_parameter_set_rbsp( ) subset_seq_parameter_set_rbsp() 16 16 Depth parameterset Depth parameter set 11 11 non-VCL non-VCL non-VCL non-VCL non-VCL non-VCL depth_parameter_set_rbsp( ) depth_parameter_set_rbsp() 17..18 17..18 Reserved Reserved non-VCL non-VCL non-VCL non-VCL non-VCL non-VCL 19 19 Codedslice Coded sliceofof an anauxiliary auxiliary coded coded 2, 2, non-VCL non-VCL non-VCL non-VCL non-VCL non-VCL picture without picture withoutpartitioning partitioning 3, 3,
slice_layer_without_partitioning_rbsp( ) slice_layer_without_partitioning_rbsp( 4 4 20 20 Codedslice Coded sliceextension extension 2, 2, non-VCL non-VCL VCL VCL VCL VCL slice_layer_extension_rbsp( slice_layer_extension_rbsp() ) 3, 3,
4 4
44204385_1 44204385_1
74
21 21 Codedslice Coded sliceextension extensionfor fora adepth depth view view 2, 2, non-VCL non-VCL non-VCL non-VCL VCL VCL 07 Jun 2024
componentororaa3D-AVC component 3D-AVCtexture textureview view 3, 3,
component component 4 4 slice_layer_extension_rbsp( slice_layer_extension_rbsp() ) 22..23 22..23 Reserved Reserved non-VCL non-VCL non-VCL non-VCL VCL VCL 24 24 FCM FCM VMPS VMPS Feature codingfor Feature coding formachines machines – Vision - Vision
model parameter model parameterset set 25 25 FCM SPS FCM SPS Featurecoding Feature codingforformachines machines - –
Sequenceparameter Sequence parameterset set 2024203901
26 26 FCM PPS FCM PPS Feature cdoingfor Feature cdoing formachines machines – Picture - Picture
parameter parameter setset 27..30 27..30 Unspecified Unspecified mon-VCL mon-VCL mon-VCL mon-VCL mon-VCL mon-VCL 31 31 Reserved Reserved / /prohibited prohibited - avoid - avoid collision collision
with inner codec identifier (ICI). with inner codec identifier (ICI).
* nal_unit_type = 0 is marked as ‘reserved (prohibited)’ rather than ‘unspecified’ to indicate * nal_unit_type = 0 is marked as 'reserved (prohibited)' rather than 'unspecified' to indicate
this value is not available for allocation by groups, as such NAL units using this type would this value is not available for allocation by groups, as such NAL units using this type would
otherwise need to ensure the first byte of the RBSP could not contain a zero byte (0x00) due to otherwise need to ensure the first byte of the RBSP could not contain a zero byte (0x00) due to
the inability the inabilitytotoinsert an an insert emulation_prevention_three_byte betweenthe remulation_prevention_three_bytebetween the first first and and second second bytes bytes of of
the RBSP. the RBSP.
44204385_1 44204385_1
75
APPENDIX B APPENDIX B 07 Jun 2024
HEVC/H.265 HEVC/H.265 NAL NAL unit unit headerand header andpayload payload
nal_unit( NumBytesInNalUnit nal_unit(NumBytesInNalUnit) { ){ Descriptor Descriptor
nal_unit_header( nal_unit_header())
NumBytesInRbsp =0 NumBytesInRbsp=0 for( I = 2; II << NumBytesInNalUnit; for(I=2;) i++ ) NumBytesInNalUnit;i++)
if( I + 2 < if(I+2 NumBytesInNalUnit&&&& NumBytesInNalUnit next_bits(24) next_bits( 24 )=== 0x000003) = 0x000003{ ) { 2024203901
rbsp_byte[ NumBytesInRbsp++ ] rbsp_byte[NumBytesInRbsp++] b(8) b(8)
rbsp_byte[ NumBytesInRbsp++ rbsp_byte[NumBytesInRbsp++] ] b(8) b(8)
I ++== 22 I
emulation_prevention_three_byte /* equal emulation_prevention_three_byte /* equal to to 0x03 */ 0x03* f(8) f(8)
}} else else
rbsp_byte[ NumBytesInRbsp++ ] rbsp_byte[NumBytesInRbsp++] b(8) b(8)
} }
nal_unit_header( nal_unit_header()) { { Descriptor Descriptor
forbidden_zero_bit forbidden_zero_bit f(1) f(1)
nal_unit_type nal_unit_type u(6) u(6)
nuh_layer_id nuh_layer_id u(6) u(6)
nuh_temporal_id_plus1 nuh_temporal_id_plusl u(3) u(3)
} }
NALunit NAL unittype typecodes codes
44204385_1 44204385_1
76
nal_unit_type nal_unit_type Nameof Name of Content of NAL Content of NALunit unit and and RBSP RBSPsyntax syntaxstructure structure NAL unit NAL unit 07 Jun 2024
nal_unit_type nal_unit_type type class type class
0 0 TRAIL_N TRAIL_N Codedslice Coded slicesegment segmentof of a non-TSA, a non-TSA, non-STSA non-STSA trailing trailing VCL VCL 1 1 TRAIL_R TRAIL_R picture picture slice_segment_layer_rbsp( ) slice_segment_layer_rbsp()
2 2 TSA_N TSA_N Codedslice Coded slicesegment segmentof of a TSA a TSA picture picture VCL VCL 3 3 TSA_R TSA_R slice_segment_layer_rbsp( slice_segment_layer_rbsp() ) 4 4 STSA_N STSA_N Codedslice Coded slicesegment segmentof of an an STSA STSA picture picture VCL VCL 5 5 STSA_R STSA_R slice_segment_layer_rbsp( slice_segment_layer_rbsp() ) 2024203901
6 6 RADL_N RADL_N Codedslice Coded slicesegment segmentof of a RADL a RADL picture picture VCL VCL 7 7 RADL_R RADL_R slice_segment_layer_rbsp( slice_segment_layer_rbsp() ) 8 8 RASL_N RASL_N Codedslice Coded slicesegment segmentof of a RASL a RASL picture picture VCL VCL 9 9 RASL_R RASL_R slice_segment_layer_rbsp( slice_segment_layer_rbsp() ) 10 10 RSV_VCL_N10 RSV_VCL_N10 Reserved non-IRAP Reserved non-IRAP SLNR SLNR VCL VCL NAL types NAL unit unit types VCL VCL 12 12 RSV_VCL_N12 RSV_VCL_N12 14 14 RSV_VCL_N14 RSV_VCL_N14
11 11 RSV_VCL_R11 RSV_VCL_R11 Reserved non-IRAP Reserved non-IRAP sub-layer sub-layer reference reference VCLunit VCLNAL NAL unit VCL VCL 13 13 RSV_VCL_R13 RSV_VCL_R13 types types 15 15 RSV_VCL_R15 RSV_VCL_R15
16 16 BLA_W_LP BLA_W_LP Codedslice Coded slicesegment segmentof of a BLA a BLA picture picture VCL VCL 17 17 BLA_W_RADL BLA_W_RADL slice_segment_layer_rbsp( ) slice_segment_layer_rbsp() 18 18 BLA_N_LP BLA_N_LP
19 19 IDR_W_RADL IDR_W_RADL Codedslice Coded slicesegment segmentof of an an IDRIDR picture picture VCL VCL 20 20 IDR_N_LP IDR_N_LP slice_segment_layer_rbsp( ) slice_segment_layer_rbsp()
21 21 CRA_NUT CRA_NUT Codedslice Coded slicesegment segmentof of a CRA a CRA picture picture VCL VCL slice_segment_layer_rbsp( ) slice_segment_layer_rbsp()
22 22 RSV_IRAP_VCL22 RSV_IRAP_VCL22 Reserved IRAPVCLVCL Reserved IRAP NALNAL unitunit types types VCL VCL 23 23 RSV_IRAP_VCL23 RSV_IRAP_VCL23
24..31 24..31 RSV_VCL24.. RSV_VCL24.. Reserved non-IRAP Reserved non-IRAP VCLVCL NAL NAL unit unit typestypes VCL VCL RSV_VCL31 RSV_VCL31 32 32 VPS_NUT VPS_NUT Video parameter Video parameter set set non-VCL non-VCL video_parameter_set_rbsp( video_parameter_set_rbsp() ) 33 33 SPS_NUT SPS_NUT Sequenceparameter Sequence parameterset set non-VCL non-VCL seq_parameter_set_rbsp( seq_parameter_set_rbsp() )
34 34 PPS_NUT PPS_NUT Picture parameter Picture parameter set set non-VCL non-VCL pic_parameter_set_rbsp( ) pic_parameter_set_rbsp()
35 35 AUD_NUT AUD_NUT Accessunit Access unitdelimiter delimiter non-VCL non-VCL access_unit_delimiter_rbsp( access_unit_delimiter_rbsp() ) 36 36 EOS_NUT EOS_NUT End End of of sequence sequence non-VCL non-VCL end_of_seq_rbsp( end_of_seq_rbsp() )
37 37 EOB_NUT EOB_NUT End of bitstream End of bitstream non-VCL non-VCL end_of_bitstream_rbsp( end_of_bitstream_rbsp() )
38 38 FD_NUT FD_NUT Filler Filler data data non-VCL non-VCL filler_data_rbsp( ) filler_data_rbsp()
44204385_1 44204385_1
77
39 39 PREFIX_SEI_NUT PREFIX_SEI_NUT Supplementalenhancement Supplemental enhancement information information non-VCL non-VCL 07 Jun 2024
40 40 SUFFIX_SEI_NUT SUFFIX_SEI_NUT sei_rbsp( )) sei_rbsp(
41..47 41..47 RSV_NVCL41.. RSV_NVCL41.. Reserved Reserved non-VCL non-VCL RSV_NVCL47 RSV_NVCL47 48 48 FCM_VMPS FCM_VMPS Feature codingfor Feature coding formachines machines – Vision - Vision model model
parameter set parameter set
49 49 FCM_SPS FCM_SPS Feature Feature coding coding for formachines machines–-Sequence Sequence parameter parameter set set
50 50 FCM_PPS FCM_PPS Feature cdoingfor Feature cdoing formachines machines – Picture - Picture parameter parameter set set 2024203901
51..63 51..63 UNSPEC- UNSPEC- Unspecified Unspecified non-VCL non-VCL 51..UNSPEC63 51..UNSPEC63
44204385_1 44204385_1
78
APPENDIX C APPENDIX C 07 Jun 2024
VVC/H.266 VVC/H.266 NALNAL unit unit format format is asisfollows: as follows:
nal_unit( NumBytesInNalUnit ) { nal_unit(NumBytesInNalUnit)& Descriptor Descriptor
nal_unit_header( ) nal_unit_header()
NumBytesInRbsp NumBytesInRbsp=0= 0 2024203901
for( I = 2; I < NumBytesInNalUnit; i++ ) for(I=2;I<NumBytesInNalUnit;i++) if( I + 2 < NumBytesInNalUnit&&&& if(I+2<NumBytesInNalUnit next_bits( 240x000003) next_bits(24) ) = = 0x000003 { ){ rbsp_byte[ NumBytesInRbsp++ rbsp_byte[NumBytesInRbsp++] ] b(8) b(8)
rbsp_byte[ NumBytesInRbsp++ ] rbsp_byte[NumBytesInRbsp++] b(8) b(8)
I ++== 22 I
emulation_prevention_three_byte /* equal emulation_prevention_three_byte /* equal to to 0x03 0x03 */ */ f(8) f(8)
}} else else rbsp_byte[ NumBytesInRbsp++ ] rbsp_byte[NumBytesInRbsp++] b(8) b(8)
} }
VVC/H.266 VVC/H.266 NALNAL unit unit header header format format is asisfollows as follows (always (always 16 bits 16 bits or two or two bytes): bytes):
nal_unit_header( nal_unit_header()) { Descriptor Descriptor
forbidden_zero_bit forbidden_zero_bit f(1) f(1)
nal_unit_type nal_unit_type u(6) u(6)
nuh_layer_id nuh_layer_id u(6) u(6)
nuh_temporal_id_plus1 nuh_temporal_id_plusl u(3) u(3)
} }
VVC/H.266NAL VVC/H.266 NAL unittypes: unit types:
nal_unit_typ nal_unit_typ Nameof Name of Content Content ofofNAL NAL unit unit and and RBSP RBSP syntax syntax structure structure NAL unit NAL unit e e nal_unit_type nal_unit_type typeclass type class 0 0 TRAIL_NUT TRAIL_NUT Coded sliceof Coded slice of aa trailing trailing picture picture or or subpicture* subpicture* VCL VCL slice_layer_rbsp( slice_layer_rbsp() ) 11 STSA_NUT STSA_NUT Coded sliceof Coded slice of an anSTSA STSApicture pictureororsubpicture* subpicture* VCL VCL slice_layer_rbsp( slice_layer_rbsp() ) 2 2 RADL_NUT RADL_NUT Coded sliceof Coded slice of aa RADL RADLpicture pictureororsubpicture* subpicture* VCL VCL slice_layer_rbsp( ) slice_layer_rbsp()
3 3 RASL_NUT RASL_NUT Codedslice Coded sliceof of aa RASL RASLpicture pictureororsubpicture* subpicture* VCL VCL slice_layer_rbsp( slice_layer_rbsp() )
44204385_1 44204385_1
79
nal_unit_typ nal_unit_typ Nameof Name of ContentofofNAL Content NAL unit unit andand RBSP RBSP syntax syntax structure structure NAL unit NAL unit 07 Jun 2024
e e nal_unit_type nal_unit_type typeclass type class 4..6 4..6 RSV_VCL_4.. RSV_VCL_4.. Reservednon-IRAP Reserved non-IRAPVCLVCL NAL NAL unit unit typestypes VCL VCL RSV_VCL_6 RSV_VCL_6
7 7 IDR_W_RADL IDR_W_RADL Codedslice Coded sliceofof an anIDR IDRpicture pictureororsubpicture* subpicture* VCL VCL 8 8 IDR_N_LP IDR_N_LP slice_layer_rbsp( slice_layer_rbsp() ) 9 9 CRA_NUT CRA_NUT Codedslice Coded sliceofofaa CRA CRApicture pictureororsubpicture* subpicture* VCL VCL slice_layer_rbsp( slice_layer_rbsp() ) 10 10 GDR_NUT Codedslice Coded sliceofof aa GDR GDRpicture pictureororsubpicture* subpicture* VCL 2024203901
GDR_NUT VCL slice_layer_rbsp( slice_layer_rbsp() ) 11 11 RSV_IRAP_11 RSV_IRAP_11 ReservedIRAP Reserved IRAP VCL VCL NALNAL unitunit typetype VCL VCL
12 12 OPI_NUT OPI_NUT Operatingpoint Operating pointinformation information non-VCL non-VCL operating_point_information_rbsp( perating_point_information_rbsp() ) 13 13 DCI_NUT DCI_NUT Decodingcapability Decoding capabilityinformation information non-VCL non-VCL decoding_capability_information_rbsp( decoding_capability_information_rbsp() ) 14 14 VPS_NUT VPS_NUT Video parameter Video parameterset set non-VCL non-VCL video_parameter_set_rbsp( video_parameter_set_rbsp() )
15 15 SPS_NUT SPS_NUT Sequenceparameter Sequence parameterset set non-VCL non-VCL seq_parameter_set_rbsp( seq_parameter_set_rbsp() ) 16 16 PPS_NUT PPS_NUT Picture parameter Picture parameter set set non-VCL non-VCL pic_parameter_set_rbsp( ) pic_parameter_set_rbsp()
17 17 PREFIX_APS_NUT PREFIX_APS_NUT Adaptation parameter Adaptation parameterset set non-VCL non-VCL 18 18 SUFFIX_APS_NUT SUFFIX_APS_NUT adaptation_parameter_set_rbsp( adaptation_parameter_set_rbsp() )
19 19 PH_NUT PH_NUT Picture header Picture header non-VCL non-VCL picture_header_rbsp( picture_header_rbsp())
20 20 AUD_NUT AUD_NUT AUdelimiter AU delimiter non-VCL non-VCL access_unit_delimiter_rbsp( access_unit_delimiter_rbsp() ) 21 21 EOS_NUT EOS_NUT End of sequence End of sequence non-VCL non-VCL end_of_seq_rbsp( end_of_seq_rbsp() )
22 22 EOB_NUT EOB_NUT Endof End of bitstream bitstream non-VCL non-VCL end_of_bitstream_rbsp( ) end_of_bitstream_rbsp()
23 23 PREFIX_SEI_NUT PREFIX_SEI_NUT Supplementalenhancement Supplemental enhancement information information non-VCL non-VCL 24 24 SUFFIX_SEI_NUT SUFFIX_SEI_NUT sei_rbsp( ) sei_rbsp()
25 25 FD_NUT FD_NUT Filler data Filler data non-VCL non-VCL filler_data_rbsp( ) filler_data_rbsp()
26 26 RSV_NVCL_26 RSV_NVCL_26 Reservednon-VCL Reserved non-VCLNALNAL unitunit types types non-VCL non-VCL 27 27 RSV_NVCL_27 RSV_NVCL_27
28 28 FCM_VMPS FCM_VMPS Feature coding Feature coding for machines – Visionmodel formachines-Vision model parameterset parameter set 29 29 FCM_SPS FCM_SPS Feature coding Feature coding for formachines machines –-Sequence Sequence parameter parameter set set
30 30 FCM_PPS FCM_PPS Feature cdoingfor Feature cdoing formachines machines- -–Picture Pictureparameter parameter set set
44204385_1 44204385_1
80
nal_unit_typ nal_unit_typ Nameof Name of ContentofofNAL Content NAL unit unit andand RBSP RBSP syntax syntax structure structure NAL unit NAL unit 07 Jun 2024
e e nal_unit_type nal_unit_type typeclass type class 31 31 UNSPEC_31 UNSPEC_31 Unspecifiednon-VCL Unspecified non-VCLNALNAL unitunit types types non-VCL non-VCL 2024203901
44204385_1 44204385_1
81
APPENDIX D APPENDIX D 07 Jun 2024
Inner codec Inner identifier NAL codec identifier unit NAL unit
NumBytesInNALunit NumBytesInNALunit shallshall be equal be equal to to 1. 1. 2024203901
Nal_unit( NumBytesInNALunit ) Nal_unit(NumBytesInNALunit) {{ C C Descriptor Descriptor
forbidden_zero_bit forbidden_zero_bit All All f(1) f(1)
inner_codec_identifier inner_codec_identifier All All u(2) u(2)
constant_value_31 constant_value_31 All All u(5) u(5)
Inner_codec_identifier specifies Inner_codec_identifier specifies the inner the inner codec codec as follows: as follows:
Inner_codec Inner_codec Inner codec Inner codec _identifier _identifier
0 0 AVC/H.264 AVC/H.264
11 HEVC/H.265 HEVC/H.265
2 2 VVC/H.266 VVC/H.266
3 3 Custom Custom
In the In the case case of of value value 33(“custom”), ("custom"), an an alternative alternativeNAL unit encapsulation NAL unit with FCM-specific encapsulation with FCM-specific NAL NAL unitheaders unit headersisissupported, supported,supporting supportingcustom custom inner inner codecs. codecs.
Thefollowing The followingcustom custominner innercodecs codecs aresupported: are supported:
• End-to-endlearned End-to-end learnedinner innercodec codec
Entropycoded o Entropy codedpayloads payloadsareareencoded encodedas as RBSPs. RBSPs.
• Bypassedinner Bypassed innercodec codec
Quantizedtensors o Quantized tensorsvalues valuesare are encoded encodedasasRBSPs. RBSPs.
▪ Optionally with Optionally with basic basic encoding encodinglike like DeepCABAC, DeepCABAC, and optionally and optionally usingusing
delta coding delta coding mechanism mechanism toto compress compress runs runs of of zeros zeros as as zero-deltavalues. zero-delta values.
44204385_1 44204385_1
82
APPENDIX APPENDIX E E 07 Jun 2024
An example An example FCM FCMVMPS, VMPS, FCMFCM SPS,SPS, and and FCM FCM PPS message PPS message formatformat and associated and associated semantics semantics
for for representing representing metadata associated with metadata associated with tensor tensor decompressor structure, tensor decompressor structure, tensor packing, packing, and and complexity indication in a bitstream are as follows: complexity indication in a bitstream are as follows:
FCM visionmodel FCM vision modelparameter parameterset set 2024203901
fcm_vmps( payloadSize ) { fcm_vmps(payloadSize) Descriptor Descriptor
output_picture_width output_picture_width u(v) u(v)
output_picture_height output_picture_height u(v) u(v)
}}
FCM Sequence FCM Sequence parameter parameter setset
fcm_sps( fcm_sps( payloadSize ){ payloadSize) Descriptor Descriptor
fcm_sps_inner_decoding_bypass_flag fcm_sps_inner_decoding_bypass_flag u(1) u(1)
fcm_sps_quantisation_bypass_flag fcm_sps_quantisation_bypass_flag u(1) u(1)
fcm_sps_feature_restoration_bypass_flag fcm_sps_feature_restoration_bypass_flag u(1) u(1)
fcm_sps_temporal_upsampling_enabled_ flag fcm_sps_temporal_upsampling_enabled_flag u(1) u(1)
set_level_flag set_level_flag u(1) u(1)
if( set_level_flag ) if(set_level_flag)
fcm_level fcm_level u(8) u(8)
update_decoder_flag update_decoder_flag u(1) u(1)
if((update_decoder_flag==1){ update_decoder_flag = = 1 ) { no_weights_flag no_weights_flag u(1) u(1)
explicit_signal_decoder_flag explicit_signal_decoder_flag u(1) u(1)
if( explicit_signal_decoder_flag) explicit_signal_decoder_flag ){ { explicit_decoder_compression_idc explicit_decoder_compression_ido u(2) u(2)
explicit_decoder_format_idc explicit_decoder_format_ide u(4) u(4)
explicit_decoder_format_version_idc explicit_decoder_format_version_ido ue(v) ue(v)
explicit_decoder_payload_len explicit_decoder_payload_len ue(v) ue(v)
for( i = 0; i < explicit_decoder_payload_len; i ++ ) pr(i=0;i<explicit_decoder_payload_len;i++)
decoder_payload[ decoder_payload[i]i ] u(8) u(8)
44204385_1 44204385_1
83
register_decoder_idc_flag register_decoder_idc_flag u(1) u(1) 07 Jun 2024
if( register_decoder_idc_flag ) if(register_decoder_idc_flag)
decoder_idc decoder_idc ue(v) ue(v)
}} else else {{ ue(v) or string or ue(v) or string or registered_decoder_idc registered_decoder_ido UUID UUID }} 2024203901
}} if( !no_weights_flag if( ) !no_weights_flag ) { update_weights_flag update_weights_flag u(1) u(1)
if((update_weights_flag) update_weights_flag ) {{ explicit_signal_weights_flag explicit_signal_weights_flag u(1) u(1)
if( explicit_signal_weights_flag ) { f(explicit_signal_weights_flag){
explicit_weights_idc explicit_weights_ide ue(v) ue(v)
explicit_weights_payload_len explicit_weights_payload_len ue(v) ue(v)
for( i = 0; i < explicit_weights_payload_len; i++ ) for((i=0;i<explicit_weights_payload_len;i++)
weights_payload[ i ] weights_payload[i] u(8) u(8)
}} }} }} set_region_cnt_flag set_region_cnt_flag u(1) u(1)
if( if(set_region_cnt_flag set_region_cnt_flag))
region_cnt region_cnt ue(v) ue(v)
set_region_packing_flag set_region_packing_flag u(1) u(1)
if( set_region_packing_flag ) { f(set_region_packing_flag){
for( i = 0; i < region_cnt; i++ for(i=0;i<region_cnt;i++)& ){ top_left_rsctuaddr[ i ] top_left_rsctuaddr[i] u(v) u(v)
top_right_rsctuaddr[ i ] top_right_rsctuaddr[i] u(v) u(v)
bottom_left_rsctuaddr[ i ] bottom_left_rsctuaddr[i] u(v) u(v)
bottom_right_rsctuaddr[ i bottom_right_rsctuaddr[i] ] u(v) u(v)
horizontal_packing_flag[ horizontal_packing_flag[i) i] u(1) u(1)
}} }} set_reduced_tensor_info_flag set_reduced_tensor_info_flag u(1) u(1)
44204385_1 44204385_1
84
if( set_reduced_tensor_info_flag ) if(set_reduced_tensor_info_flag) 07 Jun 2024
for( i = 0; i < region_cnt; i++ ) { for(i=0;i<region_cnt;i++)
region_tensor_cnt[ i ] region_tensor_ent[i] ue(v) ue(v)
for( j = 0; j < region_tensor_cnt[ i ]; j++ ) { for(j=0;j<region_tensor_cnt[i];j++)&
reduced_tensor_batch_size[ i ][ j ] reduced_tensor_batch_size[i][j] ue(v) ue(v)
reduced_tensor_max_channels[ i ][ reduced_tensor_max_channels[i][j j] ue(v) ue(v)
reduced_tensor_width[ i][j] reduced_tensor_width[i][j] ue(v) ue(v) 2024203901
reduced_tensor_height[ i ] [ j reduced_tensor_height[i][j] ] ue(v) ue(v)
}} }} }} update_tensor_channels_flag update_tensor_channels_flag u(1) u(1)
if( update_tensor_channels_flag if(update_tensor_channels_flag) ) for( i = 0; i < region_cnt; i++ for(i=0;i<region_cnt;it+) ) for( j=0;j<region_tensor_cnt[i]) j = 0; j < region_tensor_cnt[ i ] ) { update_tensor_channel_flag[ i ][ update_tensor_channel_flag[i]j j] u(1) u(1)
if((update_tensor_channel_flag) update_tensor_channel_flag ) tensor_channel_cnt tensor_channel_cnt ue(v) ue(v)
}} }} }} }}
FCM pictureparameter FCM picture parameterset set
fcm_pps( payloadSize ) fcm_pps( payloadSize) Descriptor Descriptor
fcm_pps_temporal_upsampling_enabled_flag m_pps_temporal_upsampling_enabled_flag u(1) u(1)
if( fcm_pps_temporal_upsampling_enabled_flag ) { f(fcm_pps_temporal_upsampling_enabled_flag)
temporal_upsampling_ratio_minus2 temporal_upsampling_ratio_minus2 ue(v) ue(v)
terminate_sequence_flag terminate_sequence_flag u(1) u(1)
if( terminate_sequence_flag if(terminate_sequence_flag) ) trailing_picture_cnt trailing_picture_cnt ue(v) ue(v)
}}
44204385_1 44204385_1
85
quantization_range_update_flag quantization_range_update_flag u(1) u(1) 07 Jun 2024
if( quantization_range_update_flag if( quantization_range_update_flag) ){ qr_mantissa_len qr_mantissa_len ue(v) ue(v)
for( i = 0; i < region_cnt; i++ ) for(i=0;i<region_cnt;i++)
for( j = 0; j < region_tensor_cnt[ i ] for(j=0;j<region_tensor_cnt[i]) ){ qr_min_exp[ i ][ j ] qr_min_exp[i][j] ue(v) ue(v)
qr_min_exp_sign[ i ][ j ] qr_min_exp_sign[i][j] u(1) u(1) 2024203901
qr_min_mantissa[ i ][ j qr_min_mantissa[i][j] ] u(b) u(b)
qr_min_mantissa_sign[ i ][ j ] qr_min_mantissa_sign[i][j] u(1) u(1)
qr_max_exp[ i ][ j ] qr_max_exp[i][j] ue(v) ue(v)
qr_max_exp_sign[ i ][ j qr_max_exp_sign[i][j] ] u(1) u(1)
qr_max_mantissa[ i ][ j ] qr_max_mantissa[i][j] u(b) u(b)
qr_max_mantissa_sign[ i ][ j ] qr_max_mantissa_sign[i][j] u(1) u(1)
}} output_datatype_update_flag output_datatype_update_flag u(1) u(1)
if( output_datatype_update_flag ) { if((output_datatype_update_flag){
output_datatype_idc output_datatype_idc ue(v) ue(v)
if( if(output_datatype_idc == 0 ) { output_datatype_idc==0){
output_datatype_exponent_len output_datatype_exponent_len ue(v) ue(v)
output_datatype_mantissa_len output_datatype_mantissa_ler ue(v) ue(v)
output_datatype_implicit_mantissa_flag output_datatype_implicit_mantissa_flag u(1) u(1)
if( output_implicit_mantissa_flag utput_implicit_mantissa_flag) ){{
output_data_implicit_mantissa_value output_data_implicit_mantissa_value u(b) u(b)
}} }} output_scaling_enable_flag output_scaling_enable_flag u(1) u(1)
if( output_scaling_enable_flag if( output_scaling_enable_flag) ){{
for((i=0;i<restored_tensor_cnt; i = 0; i < restored_tensor_cnt; it i++ ) {
qr_second_min_exp[ qr_second_min_exp[i] i] ue(v) ue(v)
qr_second_min_exp_sign[ i qr_second_min_exp_sign[i] ] u(1) u(1)
qr_second_min_mantissa[ i qr_second_min_mantissa[i] ] u(b) u(b)
qr_second_min_mantissa_sign[ qr_second_min_mantissa_sign[i] i ] u(1) u(1)
qr_second_max_exp[ i ] qr_second_max_exp[i] ue(v) ue(v)
qr_second_max_exp_sign[ qr_second_max_exp_sign[i] i ] u(1) u(1)
44204385_1 44204385_1
86
qr_second_max_mantissa[ qr_second_max_mantissa[i] i] u(b) u(b) 07 Jun 2024
qr_second_max_mantissa_sign[ qr_second_max_mantissa_sign[i] i ] u(1) u(1)
}} }} }} 2024203901
Where u(n) refers to a fixed-length codeword n bits in length and ue(v) refers to an unsigned Where u(n) refers to a fixed-length codeword n bits in length and ue(v) refers to an unsigned
exponential Golomb exponential Golomb variable-length variable-length codeword. codeword.
FCM SPSand FCM SPS andFCM FCM PPS PPS semantics: semantics:
fcm_sps_inner_decoding_bypass_flag set equal fcm_sps_inner_decoding_bypass_flag set equal to indicates to one one indicates that that the the inner inner decoding decoding
(module1204) (module 1204)isisnot not performed performedand andwhen when equal equal to to zero zero indicatesthat indicates thatthe theinner innerdecoding decodingisis performed. performed.
fcm_sps_quantisation_bypass_flag set equal cm_sps_quantisation_bypass_flag set equal to one to one indicates indicates thatthat thethe inverse inverse quantisation quantisation
(module1218) (module 1218)isisnot not performed performedand andwhen when equal equal to to zero zero indicates indicates thatthe that theinverse inversequantisation quantisation is is performed. performed.
fcm_sps_feature_restoration_bypass_flag set equal fem_sps_feature_restoration_bypass_flag set equal to one to one indicates indicates thatthat thethe feature feature
restoration (module restoration 1250)isis not (module 1250) not performed andwhen performed and when equal equal to to zeroindicates zero indicatesthat thatthe the feature feature restoration is performed. restoration is performed.
fcm_sps_temporal_upsampling_enabled_flag fcm_sps_temporal_upsampling_enabled_flag settoequal set equal to one indicates one indicates that temporal that temporal
interpolation interpolation or or upsampling (module1260) upsampling (module 1260)maymay be be performed performed according according to the to the mostmost recently recently
signalled temporal_upsampling_ratio, signalled when temporal_upsampling_ratio, when setset toto zeroindicates zero indicatesthat that the the temporal temporalupsampling upsampling is is not not performed. performed.
fcm_pps_temporal_upsampling_enabled_flag fem_pps_temporal_upsampling_enabled_flag set equalset toequal to one indicates one indicates that temporal that temporal
interpolation or interpolation or upsampling (module1260) upsampling (module 1260)isisperformed performed according according to to thethe most most recently recently
signalled temporal_upsampling_ratio, signalled when temporal_upsampling_ratio, when setset toto zeroindicates zero indicatesthat that the the temporal temporal upsampling upsampling is not is not performed. performed. It Itisisa requirement a requirementofofbitstream bitstreamconformance conformance that that when when
fcm_sps_temporal_upsampling_enabled_flag is equal cm_sps_temporal_upsampling_enabled_flag: is equal to zero, to zero,
fcm_pps_temporal_upsampling_enabled_flag is also m_pps_temporal_upsampling_enabled_flag is also equalequal to zero. to zero.
44204385_1 44204385_1
87
temporal_upsampling_ratio_minus2 signals temporal_upsampling_ratio_minus2 signals the integer the integer upsampling upsampling ratio ratio minus minus 2, i.e., 2, i.e., a a 07 Jun 2024
value of zero signals an upsampling ratio of two, a value of one signals an upsampling ratio of value of zero signals an upsampling ratio of two, a value of one signals an upsampling ratio of
three, and so on. three, and SO on.
terminate_sequence_flag is settotoone terminate_sequence_flag is set onewhen when temporal temporal upsampling upsampling is enabled is enabled and source and the the source device 110 is terminating encoding of the bitstream and wishes to signal zero or more ‘trailing device 110 is terminating encoding of the bitstream and wishes to signal zero or more 'trailing
pictures’, i.e., pictures', i.e.,pictures to be pictures to output from be output thethe from temporal upsampler temporal upsampler1260 1260produced produced using using only only one one
previous picture and no forward reference to the next picture output from the picture previous picture and no forward reference to the next picture output from the picture 2024203901
decoder 1204. Each trailing picture is a duplicate of the most recently decoded picture. decoder 1204. Each trailing picture is a duplicate of the most recently decoded picture.
trailing_picture_cnt signalshow trailing_picture_cnt signals howmany many trailingpictures trailing picturestotooutput outputbefore beforetermination terminationofof the the bitstream 121. bitstream Thevalues 121. The valuesofoftrailing_picture_cnt trailing_picture_cnt must mustbe bebetween betweenzero zeroand andtemporal temporal upsamplingratio upsampling ratiominus minusone. one.ForFor example, example, when when the the temporal temporal upsampling upsampling ratio ratio is set is set to two to two
(temporal_upsampling_ratio_minus2 equal (temporal_upsampling_ratio_minus2 equal to zero), to zero), trailing_picture_cntisispermitted trailing_picture_cnt permittedtotobebezero zero or one. or one.
set_level_flag equal to one indicates that the tensor decompression complexity indication is to set_level_flag equal to one indicates that the tensor decompression complexity indication is to
be signalled be signalled in in this thisinstance instanceofof thethe FCM FCM decoder info SEI decoder info message. SEI message.
fcm_level fcm signals the level signals the complexity complexityindication indicationfor for any any tensor tensor decompressors decompressorstotobebeperformed performedin in
the - the decoder. decoder. The complexityindication The complexity indicationprovides providesa aworst-case worst-caselimit limiton onthe the complexity complexityofofany any instantiated tensor decompressor. It is a requirement of bitstream conformance that the tensor instantiated tensor decompressor. It is a requirement of bitstream conformance that the tensor
decompressioncomplexity decompression complexity indication indication is issignalled signalledprior priorto to use use of of the the FCM decoder,e.g., FCM decoder, e.g., signalled with the first frame of packed tensor data in the bitstream. The following table shows signalled with the first frame of packed tensor data in the bitstream. The following table shows
permitted maximum permitted maximum values values forfor complexity complexity aspects aspects for for given given fcvcm_level fcvcm_level values: values:
fcm_level fcm_level MACcount MAC count Weightcount Weight count
0 0 <5M <5M <1M <1M 11 <15M <15M <5M <5M 2 2 <50M <50M <10M <10M 3-254 3-254 (reserved for future use) (reserved for future use) (reserved for future use) (reserved for future use)
255 255
44204385_1 44204385_1
88
update_decoder_flag equal update_decoder_flag equal to one to one indicates indicates thatthe that theFCM FCM decoder decoder is be is to to be updated, updated, effective effective
from this from this instance instance of of the theFCMM decoder FCMM decoder info info SEISEI message message onwards. onwards.
no_weights_flag no_weights_flag equal equal toto oneindicates one indicatesthat thatthe the FCM FCM decoder decoder does does notnot include include anyany trained trained
elements (e.g., convolutions) and therefore does not require any weights. elements (e.g., convolutions) and therefore does not require any weights.
explicit_signal_decoder_flag equaltotoone explicit_signal_decoder_flag equal oneindicates indicatesthat that the the FCM FCM decoder decoder architecture architecture isis
signalled explicitly explicitlyininthis instance of of thethe FCMFCMdecoder decoder info infoSEI SEI message. message. When equaltotozero, zero, 2024203901
signalled this instance When equal
this instance this instance of ofthe theFCM decoderinfo FCM decoder infoSEI SEImessage message insteadreferences instead referencesa apreviously previouslysignalled signalled FCM FCM decoder decoder architecture architecture oror referencesananFCM references FCM decoder decoder architecture architecture obtained obtained by external by external
means, e.g., a predetermined architecture or an architecture available from a publicly accessible means, e.g., a predetermined architecture or an architecture available from a publicly accessible
registry. registry.
explicit_decoder_compression_idc specifies explicit_decoder_compression_ide specifies thethe compression compression technique technique (if any) (if any) applied applied to the to the
payloadcontaining payload containingthe the representation representation of of the the FCM decoder FCM decoder architecture,ininaccordance architecture, accordancewith withthe the following table: following table:
explicit_decoder_compression_idc explicit_decoder_compression_ido Compression method Compression method
0 0 None None
11 DEFLATE DEFLATE 2 2 LZMA LZMA 3 3 Reservedfor Reserved forfuture futureuse use
explicit_decoder_format_idc specifiesthe explicit_decoder_format_ide specifies theformat formatininwhich which theFCM the FCM decoder decoder architecture architecture is is
encoded,with encoded, withthe the following followingformats formatssupported: supported:
explicit_decoder_format_idc explicit_decoder_format_ide Decoderrepresentation Decoder representation format format
0 0 ONNX ONNX 11 NNEX NNEX 2 2 Pytorch Pytorch
3 3 Variable-length scheme Variable-length scheme
4-15 4-15 Reservedfor Reserved forfuture futureuse use
44204385_1 44204385_1
89
explicit_decoder_format_version_idc specifies explicit_decoder_format_version_ide specifies the the version version of of thethe format format in in which which thethe 07 Jun 2024
FCVCM FCVCM decoder decoder architecture architecture is encoded. is encoded. For For eacheach supported supported format, format, a separate a separate enumeration enumeration of of explicit_decoder_format_version_idc values explicit_decoder_format_version_ide values to versions to versions Of the Of the format format is specified. is specified.
explicit_decoder_payload_len specifies explicit_decoder_payload_lenspecifies the the length length of of thethe payload payload containing containing thethe FCMFCM decoder decoder
representation in bytes, after application of Compression (if applicable). representation in bytes, after application of Compression (if applicable).
decoder_payload[ i ] specifies decoder_payload[ i specifies thethe ithith byteofofthe byte theFCM FCM decoder decoder representation. representation.
register_decoder_idc_flag equal to to one indicatesthat thatthe theFCM FCM decoder representation 2024203901
register_decoder_idc_flag equal one indicates decoder representation
signalled in this instance of the FCM decoder info SEI message is to be registered (retained) in signalled in this instance of the FCM decoder info SEI message is to be registered (retained) in
the decoder for potential future reference. the decoder for potential future reference.
decoder_idc decoder specifies specifies an index an index value value for addressing for addressing thedecoder the FCM FCM decoder representation representation in a in a registry of registry ofretained retainedFCM decoderarchitectures. FCM decoder architectures.
explicit_signal_weights_flagequal explicit_signal_weights_flag equaltotoone oneindicates indicatesthat that weights weightsassociated associated with withthe the signalled signalled FCM FCM decoder decoder representation representation areare included included in in thisinstance this instanceofof the the FCM FCM decoder decoder info info SEISEI
message. message.
explicit_weights_idcspecifies explicit_weights_ido specifies an an index indexfor for the the weights signalled in weights signalled in this thisinstance instanceofofthe FCM the FCM
decoderinfo decoder info SEI SEImessage. message.
explicit_weights_payload_len explicit_weights_payload_len specifies specifies thethe lengthofofthe length theweights weightspayload payload in in theFCM the FCM decoder decoder
info SEI info message. SEI message.
weights_payload[ weights_payload| i ]i ]specifies specifies the the ith ith byte byte of ofthe theweights weights payload payload in in the theFcM decoderinfo FcM decoder info SEI SEI Message. Message.
register_weights_idc_flag equal to one specifies that the weights signalled in this instance of register_weights_idc_flag equal to one specifies that the weights signalled in this instance of
the FCVCM the decoder FCVCM decoder infoinfo SEI SEI message message are stored are stored in the in the FCM FCM decoder decoder for potential for potential futurefuture reference. reference.
registered_decoder_idc registered_decoder_ide specifiesananindex specifies index coded coded as as a null-terminated a null-terminated UTF-8 UTF-8 string string or or with with a a signalled length signalled length to to address address an an FCM decoderrepresentation FCM decoder representationthat thatisis either either known to the known to the decoder decoder by external by external means orwas means or wasregistered registeredwith withthe the FCM FCM decoder decoder in in an an earlierinstance earlier instanceofofthe theFCM FCM decoderinfo decoder info SEI SEIMessage. Message.TheThe registered_decoder_idc registered_decoder_ido may may be signalled be signalled as anasindex, an index, as aas a variable-length string, or a universally unique identifier (UUID), or other mechanism that variable-length string, or a universally unique identifier (UUID), or other mechanism that
enables the enables the decoder networktopology decoder network topologytotobebeuniquely uniquelyidentified. identified. Signalling Signalling of of the the registered_decoder_idc registered_decoder_id may may also also select select associatedweights associated weights to to bebe used used with with theselected the selecteddecoder decoder networktopology. network topology.
44204385_1 44204385_1
90
set_region_cnt_flagequal set_region_ent_flag equaltotoone oneindicates indicatesthat that this this instance instance of ofthe theFCM decoderinfo FCM decoder infoSEI SEI 07 Jun 2024
messagesignals message signalsaa count countof of regions regions into into which the current which the current and and subsequent subsequentpictures pictures are are to to be be
divided. divided.
region_cnt indicates region_cnt indicates a count a count of regions of regions into which into which the current the current and subsequent and subsequent pictures arepictures to be are to be divided. Each divided. region is Each region is rectangular rectangular in in shape shape and and aligned aligned to to CTU boundaries.Each CTU boundaries. Eachregion regionisis populatedwith populated withfeature feature Maps Mapsfrom fromoneone or or more more Tensors. Tensors.
set_region_packing_flag equal set_region_packing_flag equal to to one one indicatesthat indicates thatthis this instance instance of of the the FCM decoder FCM decoder infoSEISEI info 2024203901
message specifies a division of the current picture into one or more rectangular regions. This message specifies a division of the current picture into one or more rectangular regions. This
division remains division in effect remains in effect until untilthe next the instance next of of instance an an FCMFCM decoder decoder info info SEI SEI message with message with
set_region_packing_flagequal set_region_packing_flag equaltotoone. one.
top_left_rsctuaddr[ i ] specifies top_left_rsctuaddr[ i ] specifies the address the address in raster-scan in raster-scan order order of of the the CTU CTU in the in the top-left top-left
position in the ith region. position in the ith region.
top_right_rsctuaddr[ i ] specifies the address in raster-scan order of the CTU in the top-right top_right_rsctuaddr[ i ] specifies the address in raster-scan order of the CTU in the top-right
position in the ith region. position in the ith region.
bottom_left_rsctuaddr[ i ] specifies bottom_left_rsctuaddr[i] specifies thethe address address in in raster-scanorder raster-scan orderofofthe theCTU CTUin in thebottom- the bottom- left left position in the position in the ith ith region. region.
bottom_right_rsctuaddr[ bottom_right_rsctuaddr| i ] specifies specifies the address the address in raster-scan in raster-scan orderorder of CTU of the the CTU in thein the bottom-right position in the ith region. bottom-right position in the ith region.
horizontal_packing_flag[ horizontal_packing_flag[ i ]i ]equal equaltotoone onespecifies specifies when whenthe thepacking packingororunpacking unpacking progresses progresses
from one feature maps of one tensor to feature maps of the next tensor within the ith region, from one feature maps of one tensor to feature maps of the next tensor within the ith region,
packingwill packing will continue continue long long in in aa left-to-right left-to-right manner. manner. When equaltotozero, When equal zero, upon uponprogressing progressing from feature from feature maps mapsofofone onetensor tensorto to feature feature maps of the maps of the next next tensor, tensor, packing packing of of feature feature maps maps
advances to the leftmost position in the current region and below the previously packed feature advances to the leftmost position in the current region and below the previously packed feature
mapswithin maps withinthe thecurrent current region. region. The value one The value onemay maybebeused used where where multiple multiple tensors, tensors, each each
containing few (e.g., one) feature maps are to be packed, requiring a region generally larger in containing few (e.g., one) feature maps are to be packed, requiring a region generally larger in
width than in height and generally smaller frame area for the regions. width than in height and generally smaller frame area for the regions.
explicit_cropping_enabled_flag equal explicit_cropping_enabled_flag equal to to oneone specifies specifies thatthe that theFCM FCM decoder decoder may may crop crop the the
decodedtensors decoded tensorsfrom fromthe thefeature feature restoration restoration module 1250totomatch module 1250 matchthetherequired requireddimensions dimensionsof of
the restored-domain the tensors according restored-domain tensors accordingtoto the the crop_* crop_*syntax syntaxelements elements(the (the'cropping ‘cropping parameters’). When parameters'). When cropping_enabled_flag cropping is equal enabled_flag is equal to zero, to zero, it itisisaa requirement requirementofofbitstream bitstream conformancethat conformance thatthe thetensors tensors resulting resulting from the feature from the feature restoration restorationmodule 1250match module 1250 matchthe the required dimensions required dimensionsofofthe the restored-domain restored-domaintensors. tensors.
44204385_1 44204385_1
91
set_reduced_tensor_info_flag equaltotoone set_reduced_tensor_info_flage equal onespecifies specifiesthat that the the number number ofoftensors tensorsin in the the defined defined 07 Jun 2024
regions and regions and dimensions dimensionsofofthe thereduced-domain reduced-domain tensors tensors is is signalledininthis signalled this instance instance of of the the FCM FCM
decoderinfo decoder info SEI SEImessage. message.
region_tensor_cnt[ region_tensor_cnt[ i i]] specifies specifies the the number of reduced-domain number of reduced-domain tensors tensors toto bebe packed packed in in theith the ith region. region.
reduced_tensor_batch_size[ ispecifies reduced_tensor_batch_size[i][j ][ j ] specifies the batch the batch size size of jth of the the jth tensor tensor in in thethe reduced reduced
domainbeing domain beingpacked packed intothe into theith ith region. region. 2024203901
reduced_tensor_max_channels[ reduced_tensor_max_channels[i][jl ispecifies ] [ j ] specifies the maximum the maximum number ofnumber featureof feature maps maps (i.e., (i.e.,
channels) of the jth tensor in the reduced domain being packed in the ith region. channels) of the jth tensor in the reduced domain being packed in the ith region.
reduced_tensor_width[ i ]specifies reduced_tensor_width[i][j] [ j ] specifies the width the width of feature of feature mapsmaps ofjth of the the tensor jth tensor in the in the
reduceddomain reduced domainbeing being packed packed in in thethe ithregion. ith region.
reduced_tensor_height[ reduced_tensor_height[i] ]i ][ j ] specifies specifies the height the height of feature of feature maps maps of the of the jth jth tensor tensor in in the the
reduceddomain reduced domainbeing being packed packed in in thethe ithregion. ith region.
set_restored_tensor_info_flag set_restored_tensor_info_flag equal equal to to one one specifiesthat specifies thatthe thenumber numberofofand anddimensionality dimensionality of of
tensors output from the FCM decoder, i.e., tensors in the restored domain, is specified in this tensors output from the FCM decoder, i.e., tensors in the restored domain, is specified in this
instance of instance of the the FCM decoderinfo FCM decoder infoSEI SEImessage. message.
restored_tensor_cnt specifiesthe restored_tensor_ent specifies thenumber numberof of restored-domain restored-domain tensors tensors output output from from the the FCMFCM
decoder. decoder.
restored_tensor_batch_size[ i ] specifies restored_tensor_batch_size[ i ] specifies the size the batch batchin size in the the ith ith restored-domain restored-domain tensor tensor output from output from the the FCM FCM decoder. decoder.
restored_tensor_channels[ restored_tensor_channels[ i ]i specifies ] specifiesthe the number numberofofchannels channelsininthe theith ith restored-domain restored-domain - tensor output tensor output from the FCM from the decoder. FCM decoder.
restored_tensor_width[ i ] specifies restored_tensor_width[i specifies the the width width of the of the ithith restored-domain restored-domain tensor tensor output output from from
the FCM the decoder. FCM decoder.
restored_tensor_height[ i ]specifies restored_tensor_height[ i ] specifies the the height height of of the the ith ithrestored-domain restored-domain tensor tensor output output from from
the FCM the decoder. FCM decoder.
update_tensor_channels_flag update_tensor_channels_flag equalequal to indicates to one one indicates thatthat the the flags flags to to update update packed packed number number of of feature maps for each tensor in each region are to be signalled in this instance of the FCM feature maps for each tensor in each region are to be signalled in this instance of the FCM
decoder info decoder info SEI SEImessage. message.
44204385_1 44204385_1
92
update_tensor_channel_flag[ i ][ jequal update_tensor_channel_flag| i [[]] ] equal to to oneone indicates indicates thatthe that thepacked packednumber number of feature of feature 07 Jun 2024
maps for the jth tensor in the ith region is to be signalled in this instance of the FCM decoder maps for the jth tensor in the ith region is to be signalled in this instance of the FCM decoder
info SEI info message. SEI message.
tensor_channel_cnt[ i ][ j specifies tensor_channel_cnt| i [[]] ] specifiesthe thepacked packednumber numberof of featuremaps feature maps (i.e.,channels) (i.e., channels)for forthe the jth tensor of the ith region. When tensor_channel_cnt[ i ][ j] is not signalled and jth tensor of the ith region. When tensor_channel_cnt[ i ][ j] is not signalled and
tensor_max_channels[ tensor_max_channels[i] j] iis][signalled, j] is signalled, theisvalue the value is inferred inferred to be to be equal to equal to the corresponding the corresponding
tensor_max_channels[ i ][When tensor_max_channels[i][j]. j ]. When tensor_channel_cnt[ tensor_channel_cnt[i] i ][signalled is not j ] is not signalled or in or inferred inferred in 2024203901
the current the current instance instance of ofthe theFCM decoderinfo FCM decoder infoSEI SEImessage, message,thethevalue valueremains remains in in effectfrom effect fromthe the previous instance previous instance of of the the FCM decoder FCM decoder SEI SEI message message (if (if available),otherwise available), otherwisethethevalue valueisis inferred as 0. inferred as 0.
qr_mantissa_len qr_mantissa_len specifiesthe specifies thenumber numberof of bitstotobebeused bits usedtotoencode encodethe themantissa mantissaportion portionofofthe the reduced-domainquantisation reduced-domain quantisationrange. range.
qr_min_exp[ qr_min_exp[i][ji ][ j ] specifies specifies the the exponent exponent portion portion of lower of the the lower bound bound ofreduced-domain of the the reduced-domain quantisation range for the jth tensor in the reduced domain being packed in the ith region. quantisation range for the jth tensor in the reduced domain being packed in the ith region.
qr_min_exp_sign[ i ][ j ] specifies qr_min_exp_sign[i] specifies the signtheofsign the of the exponent exponent portionportion of the of the lower lower bound bound of the of the reduced-domain reduced-domain quantisationrange quantisation range forjth for jthtensor tensor in in the the reduced domainbeing reduced domain beingpacked packed in in theith the ith region. region.
qr_min_mantissa[ qr_min_mantissa[ i ][ ji ]][specifies j ] specifies the fraction the fraction portion portion of the of thebound lower lowerof bound of the reduced- the reduced-
domainquantisation domain quantisationrange rangefor forthe the jth jth tensor tensor in inthe thereduced reduced domain being packed domain being packedininthe the ith ith region, with a bit width as specified by qr_mantissa_len. region, with a bit width as specified by qr_mantissa_len.
qr_min_mantissa_sign[ i ][ j ] specifies qr_min_mantissa_sign[ [j specifies theofsign the sign theoffraction the fraction portion portion of lower of the the lower bound bound of of the reduced-domain the quantisationrange reduced-domain quantisation rangefor forthe thejth jth tensor tensor in in the the reduced reduced domain beingpacked domain being packedinin the ith region. the ith region.
qr_max_exp[ qr_max_exp[ i ][specifies i ][ j ] specifies the the exponent exponent portion portion of the of the upper upper bound bound of reduced-domain of the the reduced-domain quantisation range for the jth tensor in the reduced domain being packed in the ith region. quantisation range for the jth tensor in the reduced domain being packed in the ith region.
qr_max_exp_sign[ qr_max_exp_sign[ i ][i ][ j ] specifies specifies the sign the sign of the of the exponent exponent portion portion of the of the upper upper bound bound of of the the reduced-domain reduced-domain quantisationrange quantisation range forjth for jthtensor tensor in in the the reduced domainbeing reduced domain beingpacked packed in in theith the ith region. region.
qr_max_mantissa[ qr_max_mantissa[ ][ ]i ][ j ] specifies specifies the fraction the fraction portion portion of of thethe upper upper bound bound of the of the reduced- reduced- domainquantisation domain quantisationrange rangefor forthe the jth jth tensor tensor in inthe thereduced reduced domain being packed domain being packedininthe the ith ith region, with a bit width as specified by qr_mantissa_len. region, with a bit width as specified by qr_mantissa_len.
qr_max_mantissa_sign[ qr_max_mantissa_sign[i][j ispecifies ][ j ] specifies the of the sign signtheoffraction the fraction portion portion of the of the upper upper bound bound of of
the reduced-domain the quantisationrange reduced-domain quantisation rangefor forthe thejth jth tensor tensor in in the the reduced reduced domain beingpacked domain being packedinin the ith the ithregion.output_datatype_update_flag equal region.output_datatype_update_flag equal to one to one specifies specifies thatthis that thisinstance instanceofof the the FCM FCM decoder decoder SEISEI message message updates updates the datatype the datatype of the of the FCM FCM decoder decoder outputoutput tensors tensors and/orand/or their their range. range.
output_datatype_idc output_datatype_idc equal equal to to zero zero specifiesa acustom specifies custom data data format format forfor theFCM the FCM decoder decoder output output
and other values indicating floating-point or integer data formats. and other values indicating floating-point or integer data formats.
44204385_1 44204385_1
93
output_datatype_exponent_len specifies output_datatype_exponent_len specifies the the length length of the of the exponent exponent for for a custom a custom output output datadata 07 Jun 2024
format, with a value of zero indicating an integer rather than floating-point output format. format, with a value of zero indicating an integer rather than floating-point output format.
output_datatype_mantissa_len specifies output_datatype_mantissa_len specifies the the length length of the of the mantissa mantissa forfor a custom a custom output output data data
format whenthe format when theexponent exponentlength lengthisisnonzero nonzeroororthe thenumber numberof of bitsfor bits foraa custom customoutput outputdata data format whenthe format when theexponent exponentlength lengthisisequal equaltotozero. zero.
output_datatype_implicit_mantissa_flag equal output_datatype_implicit_mantissa_flage equal to one to one specifies specifies thattensors that tensorsoutput outputfrom from the the
FCM FCM decoder decoder allall usea amantissa use mantissavalue valuerather ratherthan thanusing usinga amantissa mantissasignalled signalledononaaper-element per-element 2024203901
basis for each output tensor. basis for each output tensor.
output_data_implicit_mantissa_value output_data_implicit_mantissa_value when when present present signals signals the implicit the implicit mantissa mantissa used used for all for all
elements of elements of all all output output tensors tensors from from the the FCM decoder. FCM decoder.
output_scaling_enable_flag equal output_scaling_enable_flag equal to to oneone indicates indicates thatthis that thisinstance instance of of the the FCM decoder FCM decoder SEISEI
messageupdates message updatesthe thequantisation quantisationmin minand andmax max (or(or lower lower andand upper upper bound) bound) for for the the quantisation quantisation
stage performed after the feature restoration stage. stage performed after the feature restoration stage.
qr_second_mantissa_len specifies qr_second_mantissa_len specifies thethe number number of bits of bits to to bebe used used to to encode encode thethe mantissa mantissa
portion of the quantisation range. portion of the quantisation range.
qr_second_min_exp[ qr_second_min_exp[ i ] ispecifies ] specifies theexponent the exponent portion portion of of thelower the lower bound bound of of thethe output output quantisation range for the ith tensor in the restored domain. quantisation range for the ith tensor in the restored domain.
qr_second_min_exp_sign[ qr_second_min_exp_sign[ i ] specifies i ] specifies thethe sign sign of of theexponent the exponent portion portion of of thelower the lowerbound bound of of the output quantisation range for ith tensor in the restored domain. the output quantisation range for ith tensor in the restored domain.
qr_second_min_mantissa_sign[ i ] specifies qr_second_min_mantissa_sign[ i ] specifies the the signsign of of thethe fractionportion fraction portionofofthe thelower lower boundofofthethe bound output output quantisation quantisation rangerange foriththe for the ith tensor tensor in the in the restored restored domain. domain.
qr_second_max_exp[ qr_second_max_exp[ i ] specifies i ] specifies thethe exponent exponent portion portion of of thethe upper upper bound bound of the of the output output quantisation range for the ith tensor in the restored domain. quantisation range for the ith tensor in the restored domain.
qr_second_max_exp_sign[ i ] specifies qr_second_max_exp_sign[ i ] specifies thethe sign sign of of theexponent the exponent portion portion of of theupper the upper bound bound of of the output quantisation range for ith tensor in the restored domain. the output quantisation range for ith tensor in the restored domain.
qr_second_max_mantissa[ qr_second_max_mantissa[ i ] specifies specifies the fraction the fraction portionportion of the of the upper upper bound bound of the of the output output quantisation range quantisation range forfor thethe ithith tensor tensor in the in the restored restored domain, domain, with a with a bitaswidth bit width as specified by specified by qr_second_mantissa_len. qr_second_mantissa_len.
qr_second_max_mantissa_sign[ qr_second_max_mantissa_sign[i ] specifiesi ] specifies the sign the sign of the of the fraction fraction portion portion of of thethe upper upper boundofofthethe bound output output quantisation quantisation rangerange for theforiththe ith tensor tensor in the in the restored restored domain. domain.
44204385_1 44204385_1

Claims (20)

  1. 94
    CLAIMS CLAIMS 1. 1. AAmethod methodof of decoding decoding a bitstream a bitstream to to produce produce tensors tensors forfor usebybya aneural use neuralnetwork network second second
    portion, the portion, the method comprising: method comprising:
    decodingaanetwork decoding networkabstraction abstractionlayer layer(NAL) (NAL) unitfrom unit from thebitstream the bitstreamhaving having a predetermined a predetermined 2024203901
    length, wherein length, the NAL wherein the unitofofthe NAL unit thepredetermined predeterminedlength lengthindicates indicatesaaNAL NAL unit unit format format of of one one
    inner codec inner of aa plurality codec of pluralityof ofinner innercodecs, codecs,each eachother otherinner codec inner codechaving havingNAL unit lengths NAL unit lengths different to the predetermined length; different to the predetermined length;
    selecting an selecting an inner inner codec codec from the plurality from the pluralityof ofinner innercodecs codecsbased based on on the thedecoded decoded NAL unit NAL unit
    of the of the predetermined length; and predetermined length; and
    decoding the bitstream using the selected inner codec to produce the tensors. decoding the bitstream using the selected inner codec to produce the tensors.
  2. 2. The 2. Themethod method according according to to claim claim 1, 1, wherein wherein thethe bitstream bitstream includes includes a pluralityofofNAL a plurality NAL units units
    and the and the decoded NAL decoded NAL unit unit is is a aNAL NAL unit unit header. header.
  3. 3. The 3. Themethod method according according to to claim claim 2, 2, wherein wherein thethe header header is is present present inin thebitstream the bitstreamprior priorto to any any units of data to be provided to the inner codec; units of data to be provided to the inner codec;
  4. 4. The 4. Themethod method according according to to claim claim 2, 2, wherein wherein multiple multiple instances instances of of thethe NAL NAL unitunit header header are are present in the bitstream. present in the bitstream.
  5. 5. The 5. Themethod method according according to to claim claim 4, 4, wherein wherein oneone or more or more instances instances of the of the NALNAL unit unit header header are are present prior to random access entry points in the bitstream. present prior to random access entry points in the bitstream.
  6. 6. The method according to claim 5, wherein a plurality of inner codecs are used in the 6. The method according to claim 5, wherein a plurality of inner codecs are used in the
    bitstream. bitstream.
  7. 7. The 7. methodaccording The method accordingtotoclaim claim1,1,wherein whereinthe thepredetermined predetermined length length is is onebyte. one byte.
  8. 8. The 8. Themethod method according according to to claim claim 1, 1, wherein wherein thethe pluralityofofinner plurality innercodecs codecsincludes includesatatleast least one one of advanced of videocoding advanced video coding(AVC), (AVC), high high efficiency efficiency video video coding coding (HEVC) (HEVC) and versatile and versatile videovideo
    coding (VVC).. coding (VVC)..
    41663381_1 41663381_1
    95
  9. 9. The 9. Themethod method according according to to claim claim 1, 1, wherein wherein thethe pluralityofofinner plurality innercodecs codecsincludes includesAVC, AVC, HEVC,VVC HEVC, VVCandand a custom a custom codec. codec.
  10. 10. 10. Themethod The methodaccording according to to claim claim 1,1, wherein wherein theNALNAL the unitunit follows follows a start a start code code in in the the
    bitstream. bitstream. 2024203901
  11. 11. 11. A method A methodofofencoding encoding tensorstotoa abitstream tensors bitstreamfor foruse useby byaa neural neural network networksecond secondportion, portion, the method the comprising: method comprising:
    selecting an inner codec from the plurality of inner codecs for use in encoding tensors to the selecting an inner codec from the plurality of inner codecs for use in encoding tensors to the
    bitstream; bitstream;
    encodingaanetwork encoding networkabstraction abstractionlayer layer(NAL) (NAL) unittotothe unit thebitstream bitstreamhaving havinga apredetermined predetermined length, wherein length, the NAL wherein the unitofofthe NAL unit thepredetermined predeterminedlength lengthindicates indicatesaaNAL NAL unit unit format format of of the the
    selected inner selected inner codec codec of of the the plurality pluralityofof inner codecs, inner each codecs, other each inner other codec inner having codec havingNAL unit NAL unit
    lengths different to the predetermined length; and lengths different to the predetermined length; and
    encoding the tenors to the bitstream using the selected inner codec. encoding the tenors to the bitstream using the selected inner codec.
  12. 12. 12. Themethod The methodaccording according to to claim claim 11,wherein 11, wherein thethe bitstream bitstream includes includes a a pluralityofof NAL plurality NAL units and units and the the decoded NAL decoded NAL unitisisaaNAL unit NAL unit unit header. header.
  13. 13. 13. Themethod The methodaccording according to to claim claim 12,wherein 12, wherein thethe header header is is presentininthe present thebitstream bitstreamprior prior to any units of data to be provided to the inner codec; to any units of data to be provided to the inner codec;
  14. 14. 14. Themethod The methodaccording according to to claim claim 12,wherein 12, wherein multiple multiple instances instances of of theNAL the NALunitunit header header
    are present in the bitstream. are present in the bitstream.
  15. 15. 15. Themethod The methodaccording according to to claim claim 14,wherein 14, wherein oneone or or more more instances instances of of thethe NALNAL unitunit
    header are present prior to random access entry points in the bitstream. header are present prior to random access entry points in the bitstream.
  16. 16. 16. The method according to claim 1, wherein the plurality of inner codecs includes at least The method according to claim 1, wherein the plurality of inner codecs includes at least
    one of one of advanced videocoding advanced video coding(AVC), (AVC), high high efficiency efficiency video video coding coding (HEVC) (HEVC) and versatile and versatile videovideo
    coding (VVC). coding (VVC).
    41663381_1 41663381_1
    96
  17. 17. 17. A decoder A decoderfor for decoding decodinga abitstream bitstreamtotoproduce producetensors tensorsfor for use use by by aa neural neural network network secondportion, second portion, the the decoder configuredto: decoder configured to:
    decodeaa network decode networkabstraction abstractionlayer layer (NAL) (NAL) unitfrom unit from thebitstream the bitstreamhaving having a a predetermined predetermined
    length, wherein length, the NAL wherein the unitofofthe NAL unit thepredetermined predeterminedlength lengthindicates indicatesaaNAL NAL unit unit format format of of one one 2024203901
    inner codec inner of aa plurality codec of pluralityof ofinner innercodecs, codecs,each eachother otherinner codec inner codechaving havingNAL unit lengths NAL unit lengths different to the predetermined length; different to the predetermined length;
    select an select an inner inner codec codec from the plurality from the pluralityof ofinner innercodecs codecsbased basedon on the thedecoded decoded NAL unitofof NAL unit
    the predetermined the length; and predetermined length; and
    decode the bitstream using the selected inner codec to produce the tensors. decode the bitstream using the selected inner codec to produce the tensors.
  18. 18. 18. A non-transitory A non-transitory computer-readable computer-readablestorage storagemedium medium which which stores stores a program a program for for executing aa method executing methodofofdecoding decodinga abitstream bitstreamtotoproduce producetensors tensorsfor foruse usebybyaaneural neuralnetwork network secondportion, second portion, the the method comprising: method comprising:
    decodingaa network decoding networkabstraction abstractionlayer layer(NAL) (NAL) unitfrom unit from thebitstream the bitstreamhaving having a predetermined a predetermined
    length, wherein length, the NAL wherein the unitofofthe NAL unit the predetermined predeterminedlength lengthindicates indicatesaaNAL NAL unit unit format format of of one one
    inner codec inner of aa plurality codec of pluralityof ofinner innercodecs, codecs,each eachother otherinner codec inner codechaving havingNAL unit lengths NAL unit lengths different to the predetermined length; different to the predetermined length;
    selecting an selecting an inner inner codec codec from the plurality from the pluralityof ofinner innercodecs codecsbased based on on the thedecoded decoded NAL unit NAL unit
    of the of the predetermined length; and predetermined length; and
    decoding the bitstream using the selected inner codec to produce the tensors. decoding the bitstream using the selected inner codec to produce the tensors.
  19. 19. 19. Anencoder An encoderfor forencoding encodingtensors tensorstotoaabitstream bitstreamfor for use use by by aa neural neural network second network second
    portion, the encoder configured to: portion, the encoder configured to:
    select an inner codec from the plurality of inner codecs for use in encoding tensors to the select an inner codec from the plurality of inner codecs for use in encoding tensors to the
    bitstream; bitstream;
    41663381_1 41663381_1
    97
    encodeaa network networkabstraction abstractionlayer layer (NAL) (NAL) unittotothe thebitstream bitstreamhaving havingaapredetermined predetermined 07 Jun 2024
    encode unit
    length, wherein length, the NAL wherein the unitofofthe NAL unit thepredetermined predeterminedlength lengthindicates indicatesaaNAL NAL unit unit format format of of the the
    selected inner selected inner codec codec of of the the plurality pluralityofof inner codecs, inner each codecs, other each inner other codec inner having codec havingNAL unit NAL unit
    lengths different to the predetermined length; and lengths different to the predetermined length; and
    encode the tenors to the bitstream using the selected inner codec. encode the tenors to the bitstream using the selected inner codec. 2024203901
  20. 20. A 20. non-transitory computer-readable A non-transitory computer-readablestorage storagemedium medium which which stores stores a program a program for executing for executing a a methodofofencoding method encodingtensors tensorstotoaabitstream bitstreamfor for use use by by aa neural neural network secondportion, network second portion,the the methodcomprising: method comprising:
    selecting an inner codec from the plurality of inner codecs for use in encoding tensors to the selecting an inner codec from the plurality of inner codecs for use in encoding tensors to the
    bitstream; bitstream;
    encodingaa network encoding networkabstraction abstractionlayer layer(NAL) (NAL) unittotothe unit thebitstream bitstreamhaving havinga apredetermined predetermined length, wherein length, the NAL wherein the unitofofthe NAL unit thepredetermined predeterminedlength lengthindicates indicatesaaNAL NAL unit unit format format of of the the
    selected inner selected inner codec codec of of the the plurality pluralityofof inner codecs, inner each codecs, other each inner other codec inner having codec havingNAL unit NAL unit
    lengths different to the predetermined length; and lengths different to the predetermined length; and
    encoding the tenors to the bitstream using the selected inner codec. encoding the tenors to the bitstream using the selected inner codec.
    CANONKABUSHIKI CANON KABUSHIKIKAISHA KAISHA Patent Attorneys Patent for the Attorneys for the Applicant Applicant
    Spruson & Spruson Ferguson & Ferguson
    41663381_1 41663381_1
AU2024203901A 2024-06-07 2024-06-07 Method, apparatus and system for encoding and decoding tensors Pending AU2024203901A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
AU2024203901A AU2024203901A1 (en) 2024-06-07 2024-06-07 Method, apparatus and system for encoding and decoding tensors
PCT/AU2025/050437 WO2025251103A1 (en) 2024-06-07 2025-05-02 Method, apparatus and system for encoding and decoding tensors

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
AU2024203901A AU2024203901A1 (en) 2024-06-07 2024-06-07 Method, apparatus and system for encoding and decoding tensors

Publications (1)

Publication Number Publication Date
AU2024203901A1 true AU2024203901A1 (en) 2026-01-08

Family

ID=97959853

Family Applications (1)

Application Number Title Priority Date Filing Date
AU2024203901A Pending AU2024203901A1 (en) 2024-06-07 2024-06-07 Method, apparatus and system for encoding and decoding tensors

Country Status (2)

Country Link
AU (1) AU2024203901A1 (en)
WO (1) WO2025251103A1 (en)

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114503573A (en) * 2019-03-20 2022-05-13 威诺瓦国际有限公司 Low complexity enhanced video coding
EP4289136A1 (en) * 2021-02-05 2023-12-13 Nokia Technologies Oy High-level syntax for signaling neural networks within a media bitstream
AU2021202142A1 (en) * 2021-04-07 2022-10-27 Canon Kabushiki Kaisha Tool selection for feature map encoding vs regular video encoding
US20240202507A1 (en) * 2021-04-23 2024-06-20 Nokia Technologies Oy Method, apparatus and computer program product for providing finetuned neural network filter
AU2022252783A1 (en) * 2022-10-13 2024-05-02 Canon Kabushiki Kaisha Method, apparatus and system for encoding and decoding a tensor

Also Published As

Publication number Publication date
WO2025251103A1 (en) 2025-12-11

Similar Documents

Publication Publication Date Title
AU2024200562B2 (en) Tool selection for feature map encoding vs regular video encoding
US20250254366A1 (en) Method, apparatus and system for encoding and decoding a tensor
US20250069369A1 (en) Method, Apparatus and System for Encoding and Decoding A Block of Video Samples
US20240242481A1 (en) 4:2:0 packing of feature maps
US20240205402A1 (en) Grouped feature map quantisation
AU2025201260A1 (en) Method, apparatus and system for encoding and decoding a tensor
AU2025204902A1 (en) Method, apparatus and system for encoding and decoding a tensor
WO2023197031A1 (en) Method, apparatus and system for encoding and decoding a tensor
AU2023202312B2 (en) Method, apparatus and system for encoding and decoding a tensor
AU2024203901A1 (en) Method, apparatus and system for encoding and decoding tensors
AU2023202315A1 (en) Method, apparatus and system for encoding and decoding a tensor
AU2023203168B2 (en) FCVCM complexity limits
AU2024202399A1 (en) Method, apparatus and system for encoding and decoding one or more tensors
AU2024202406A1 (en) Method, apparatus and system for encoding and decoding one or more tensors
AU2023200118B2 (en) Method, apparatus and system for encoding and decoding a tensor
AU2023248076B2 (en) Method, apparatus and system for encoding and decoding a tensor
AU2024202416A1 (en) Method, apparatus and system for encoding and decoding a plurality of tensors
AU2024202212A1 (en) Method, apparatus and system for encoding and decoding a plurality of tensors
AU2023229477B2 (en) Method, apparatus and system for encoding and decoding a tensor
AU2025217390A1 (en) Method, apparatus and system for encoding and decoding a tensor
AU2024200305A1 (en) Method, apparatus and system for encoding and decoding tensors and video
WO2025076577A1 (en) Method, apparatus and system for encoding and decoding a tensor
WO2024239041A1 (en) Method, apparatus and system for decoding tensors for content of a bitstream
AU2023203172A1 (en) FCVCM flexible packing arrangement
WO2025054654A1 (en) Method, apparatus and system for encoding and decoding a tensor