AU2024203901A1

AU2024203901A1 - Method, apparatus and system for encoding and decoding tensors

Info

Publication number: AU2024203901A1
Application number: AU2024203901A
Authority: AU
Inventors: Thi Hong Nhung NGUYEN; Christopher James ROSEWARNE
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2024-06-07
Filing date: 2024-06-07
Publication date: 2026-01-08
Also published as: WO2025251103A1

Abstract

41663381_1 METHOD, APPARATUS AND SYSTEM FOR ENCODING AND DECODING TENSORS A system and method of decoding a bitstream to produce tensors for use by a neural network second portion. The method comprises decoding a network abstraction layer (NAL) unit from the bitstream having a predetermined length, wherein the NAL unit of the predetermined length indicates a network abstraction layer (NAL) unit format of one inner codec of a plurality of inner codecs, each other inner codec having NAL unit lengths different to the predetermined length; selecting an inner codec from the plurality of inner codecs based on the decoded NAL unit of the predetermined length; and decoding the bitstream using the selected inner codec to produce the tensors.

Description

1200 (146) (146) 149

149 Temporal

Temporal upsampler

upsampler 1260

1260

1210

1206 1220

1216 1224 1254

1206 1210 1216 1220 1224 1254

1206 1206

Picture Tensor

Tensor

NAL unit Picture

1 NAL unit Tensor Tensor

Unpacker Q Unpacker Q-1

demultiplex decoder decompressor

storage

demultiplex decoder storage decompressor

1218

1214 1214 13/20

1218

1204

or 1202 1222 1250

or 1202 1204 1222 1250

1270 1270 1230

1207 1207 1230 Tensor

Tensor decompressor

decompressor

Metadata Metadata repository 1232

repository 1232

parser parser 1238

1238

1208 1208 1234

1234 Tensor weight

Tensor weight repository 1236

repository 1236 1240

1240 Fig. 12

Fig. 12

44204394_1

1

METHOD,APPARATUS METHOD, APPARATUSAND ANDSYSTEM SYSTEM FORENCODING FOR ENCODING AND AND DECODING DECODING 07 Jun 2024

TENSORS TENSORS TECHNICALFIELD TECHNICAL FIELD

[0001] The

[0001] The present present invention invention relates relates generally generally to digital to digital video processing video signal signal processing and, in and, in particular, totoa amethod, particular, method,apparatus apparatus and and system system for for encoding anddecoding encoding and decodingtensors tensorsfor foraa 2024203901

convolutional neural convolutional neural network. network.The The presentinvention present inventionalso alsorelates relatesto to aa computer program computer program

product including product including aa computer computerreadable readablemedium medium having having recorded recorded thereon thereon a computer a computer program program

for encoding for anddecoding encoding and decodingtensors tensorsfor foraa convolutional convolutionalneural neuralnetwork networkusing usingvideo videocompression compression technology. technology.

BACKGROUND BACKGROUND

[0002] Convolutionalneural

[0002] Convolutional neuralnetworks networks (CNNs) (CNNs) are are an emerging an emerging technology technology addressing, addressing, amongamong

other things, use cases involving machine vision such as object detection, instance other things, use cases involving machine vision such as object detection, instance

segmentation,object segmentation, object tracking, tracking, human poseestimation, human pose estimation,and andaction actionrecognition. recognition.Applications Applicationsforfor CNNs CNNs can can involve involve use use of of ‘edge 'edge devices’,with devices', withsensors sensorsand andsome some processing processing capability, capability, coupled coupled

to application servers as part of a ‘cloud’. CNNs can require relatively high computational to application servers as part of a 'cloud'. CNNs can require relatively high computational

complexity, more complexity, morethan thancan cantypically typicallybe beafforded affordedeither either in in computing capacityororpower computing capacity power consumptionbybyananedge consumption edge device.Executing device. Executing a CNN a CNN in a in a distributed distributed manner manner has emerged has emerged as oneas one solution to solution to running running leading-edge networksusing leading-edge networks usinglimited limitedcapability capability edge edgedevices deviceswithout without requiring all requiring allcomputational computational complexity to be complexity to be incurred incurred within within cloud cloud servers servers whilst whilst edge devices edge devices

have potentially under-utilised inferencing resources. In other words, distributed processing have potentially under-utilised inferencing resources. In other words, distributed processing

allows legacy edge devices to still provide the capability of leading-edge CNNs by distributing allows legacy edge devices to still provide the capability of leading-edge CNNs by distributing

processing between processing betweenthe theedge edgedevice deviceand andother otherprocessing processingmeans, means, such such as as cloud cloud servers. servers. Such Such a a distributed network architecture may be referred to as ‘collaborative intelligence’ (CI) and distributed network architecture may be referred to as 'collaborative intelligence' (CI) and

offers benefits such as re-using a partial result from a first portion of the network with several offers benefits such as re-using a partial result from a first portion of the network with several

different second portions, perhaps with each portion being optimised for a different task. CI different second portions, perhaps with each portion being optimised for a different task. CI

architectures introduce a need for efficient compression of tensor data, for transmission over a architectures introduce a need for efficient compression of tensor data, for transmission over a

network such network such as asaaWAN. WAN.

[0003] CNNs

[0003] CNNs typicallyinclude typically includemany many layers, layers, such such as as convolution convolution layers layers and and fullyconnected fully connected layers, with data passing from one layer to the next in the form of ‘tensors’. Splitting a network layers, with data passing from one layer to the next in the form of 'tensors'. Splitting a network

across different across different devices devices introduces introduces aa need need to tocompress the intermediate compress the intermediate multi-dimensional multi-dimensional

44204385_1 44204385_1

2

tensor data that passes from one layer to the next within a CNN in order to facilitate tensor data that passes from one layer to the next within a CNN in order to facilitate 07 Jun 2024

transmission over transmission over aa network networkhaving havingbandwidth bandwidth limitationsororcosts. limitations costs.Compression Compression of such of such

tensors may be referred to as ‘feature compression’ and the intermediate tensor data is often tensors may be referred to as 'feature compression' and the intermediate tensor data is often

referred to as ‘features’ or ‘feature maps’. Features or feature maps are generally a collection of referred to as 'features' or 'feature maps'. Features or feature maps are generally a collection of

two-dimensional(2D) two-dimensional (2D)arrays arraysofofvalues valueswhich, which,when when combined combined into into a 3Da (or 3D 4D) (or 4D) data data structure structure

form aa tensor, form tensor, with with each each feature feature map correspondingtotoone map corresponding one'channel' ‘channel’ofofthe thetensor. tensor. Intermediate Intermediate

tensor data represents a partially processed form of input such as an image frame or video tensor data represents a partially processed form of input such as an image frame or video 2024203901

frame, encountered frame, encounteredwithin withinaaneural neural network. network.Although Although a unit a unit ofof datatotobebeprocessed data processedininaaneural neural network is typically a tensor, operations performed may be described in relation to a feature network is typically a tensor, operations performed may be described in relation to a feature

map, in which case it is understood that the operation is to be performed to each feature map in map, in which case it is understood that the operation is to be performed to each feature map in

the tensor. International Organisation for Standardisation / International Electrotechnical the tensor. International Organisation for Standardisation / International Electrotechnical

Commission Commission JointTechnical Joint Technical Committee Committee 1 / Subcommittee 1 / Subcommittee 29 / Working 29 / Working Groups Groups 4 (ISO/IEC 4 (ISO/IEC

JTC1/SC JTC 1/SC29/WG 29/WG 4), 4), alsoalso known known as the as the “Moving "Moving Picture Picture Experts Experts Group” Group" (MPEG) (MPEG) Video Video coding are coding are tasked tasked with with developing developingaastandard standardfor for feature feature compression, known compression, known as as the'feature the ‘feature coding for coding for machines’ (FCM) machines' (FCM) standard. standard. Previously Previously WG 2WG 2 ‘MPEG 'MPEG Technical Technical Requirements’ Requirements' had had completedaa'Call completed ‘Call for for Proposals’ whichreceived Proposals' which receivedresponses responsesthat thatdemonstrated demonstratedsignificant significant outperformanceover outperformance overfeature featurecompression compression resultsachieved results achieved using using state-of-the-artstandardised state-of-the-art standardised video compression video compressiontechnology technology directlyapplied directly appliedtotothe thetensors. tensors.

[0004] CNNs

[0004] CNNs typicallyrequire typically requireweights weightsfor foreach eachofofthe thelayers layers to to be be predetermined inaa training predetermined in training stage, where stage, a very where a very large large amount of training amount of training data data is ispassed passed through through the the CNN anda aresult CNN and result determinedbybythe determined thenetwork networkundergoing undergoing training training being being compared compared to ground to ground truth truth associated associated with with

the training data. Discrepancy between the obtained and desired result is expressed as a ‘loss’ the training data. Discrepancy between the obtained and desired result is expressed as a 'loss'

and measured and measuredwith witha a'loss ‘lossfunction'. function’. Using Usingthe thedetermined determinedloss, loss,aaprocess processfor for updating updatingnetwork network weights, such weights, such as as stochastic stochastic gradient gradient descent descent (SGD), is performed. (SGD), is Network performed. Network weight weight update update

typically involves a process of back-propagation of ‘gradients’ that begins at the output layer of typically involves a process of back-propagation of 'gradients' that begins at the output layer of

the network the andproceeds network and proceedsbackward backwardto to terminate terminate when when the the input input layer layer to to thenetwork the network is is updated, updated,

propagatingthrough propagating throughintermediate, intermediate,oror 'hidden', ‘hidden’, layers layers of of the the network. Gradientsare network. Gradients are indicative indicative of deltas to be applied to network weights and are themselves updated as part of the back of deltas to be applied to network weights and are themselves updated as part of the back

propagation process. The rate of weight update is set by a ‘learning rate’ hyperparameter, propagation process. The rate of weight update is set by a 'learning rate' hyperparameter,

typically set to facilitate the training process in finding a global minima in terms of loss (i.e., typically set to facilitate the training process in finding a global minima in terms of loss (i.e.,

highest possible highest possible task task performance for the performance for the network architecture and network architecture and training training data) data) while while avoiding avoiding

the training the training process process becoming ‘stuck’ in becoming 'stuck' in aa local local minima. Becoming minima. Becoming stuck stuck in in a localminima a local minima correspondsto corresponds to obtaining obtaining sub-optimal sub-optimaltask taskperformance performanceforforthe thenetwork network architectureand architecture andbeing being incapable of incapable of finding finding new weightvalues new weight valuesthat that could couldlead lead to to higher higher task task performance. Network performance. Network

44204385_1 44204385_1

3

weights are weights are repeatedly repeatedly updated updatedby bysupplying supplyinginput inputdata dataand andground groundtruth truthdata dataorganised organisedinto into 07 Jun 2024

‘batches’ toiteratively 'batches' to iterativelyrefine refinethethenetwork network performance performance until further until further improvement improvement in accuracy in is accuracy is

no longer no longer achievable. achievable. AnAniteration iteration through throughthe the entire entire training training dataset datasetforms forms one one ‘epoch’ 'epoch' of of

training and training typically requires performing multiple epochs to achieve a high level of training and training typically requires performing multiple epochs to achieve a high level of

performancefor performance forthe the task. task. Weights Weightsfor foraatrained trained network networkare arethen thenavailable available for for deployment, and deployment, and

the network the operates in network operates in aa mode whereweights mode where weights arefixed are fixedand andgradients gradientsfor forweight weightupdate updateareare omitted. The omitted. Theprocess processofofexecuting executinga apretrained pretrainedCNN CNN with with an an input input andand progressively progressively 2024203901

transformingthe transforming the input input into into an an output output according to aa topology according to topology of of the the CNN CNN isis commonly commonly referred referred

to as ‘inferencing’. to as 'inferencing'.

[0005] Generally, aa tensor

[0005] Generally, tensor is is an an array arrayof ofelements elementshaving having four four dimensions, dimensions, namely: batch, namely: batch,

channels, height and width. The first dimension, ‘batch’, is typically of size one when channels, height and width. The first dimension, 'batch', is typically of size one when

inferencing on inferencing on video video data data and and indicates indicates that that one one frame frame is is passed passed through through a a CNN CNN asasone onebatch. batch. Whentraining When trainingaanetwork, network,the thevalue valueofofthe the batch batch dimension dimensionmay maybe be increased increased SO so thatmultiple that multiple frames are frames are passed passed through throughthe the network networkinineach eachbatch batchbefore beforethe thenetwork networkweights weights areupdated, are updated, according to according to aa predetermined ‘batchsize'. predetermined 'batch size’. AAmulti-frame multi-framevideo videomay may be be passed passed through through as as a a single tensor single tensor with with the the batch batch dimension increased in dimension increased in size size according according to to the thenumber of frames number of frames of of aa given video. given video. However, However, forpractical for practicalconsiderations considerationsrelating relating to to memory consumption memory consumption and and

access, inferencing access, inferencing on on video video data data is is typically typicallyperformed performed on on aa frame-wise basis. The frame-wise basis. ‘channels’ The 'channels'

dimensionindicates dimension indicatesthe the number numberofofconcurrent concurrent'feature ‘featuremaps' maps’for fora agiven giventensor tensorand andthe theheight height and width dimensions indicate the size of the feature maps at the particular stage of the CNN. and width dimensions indicate the size of the feature maps at the particular stage of the CNN.

Channelcount Channel countvaries variesthrough throughthe thelayers layers of of aa CNN according CNN according to to thenetwork the network architecture. architecture.

Feature map Feature mapsize sizealso also varies, varies, depending onsubsampling depending on subsamplingoror upsampling upsampling occurring occurring in specific in specific

networklayers. network layers.

[0006] The

[0006] The overall overall complexity complexity of theof thetends CNN CNN to tends to be relatively be relatively high, with high, with large relatively relatively large numbersofofmultiply-accumulate numbers multiply-accumulate (MAC) (MAC) operations operations beingbeing performed performed and numerous and numerous

intermediate tensors intermediate tensors being written to being written to and and read read from from memory, alongwith memory, along withreading reading weights weights forfor

performanceofofeach performance eachlayer layerofof the the CNN. CNN.As As such, such, dividing dividing a neural a neural network network into into portions portions allows allows

implementation implementation ofofmore morecomplex complex networks networks eveneven in systems in systems containing containing less less capable capable edge edge

devices, without requiring cloud servers to bear the full burden of performing the network. devices, without requiring cloud servers to bear the full burden of performing the network.

[0007] Feature compression

[0007] Feature compressionmay may benefit benefit from from existing existing video video compression compression standards, standards, suchsuch as as

ISO/IEC23090-2 ISO/IEC 23090-2 “Versatile "Versatile Video Video Coding” Coding" (VVC)/H.266, (VVC)/H.266, developed developed by the by theVideo Joint Joint Video Experts Team Experts Team(JVET), (JVET), a jointactivity a joint activity by byISO/IEC ISO/IEC and and ITU-T. ITU-T. VVC VVC is anticipated is anticipated to address to address

44204385_1 44204385_1

4

ongoingdemand ongoing demandforfor ever-higher ever-higher compression compression performance, performance, especially especially as video as video formats formats increase increase 07 Jun 2024

in capability (for example, with higher resolution and higher frame rate) and to address in capability (for example, with higher resolution and higher frame rate) and to address

increasing market increasing demand market demand forservice for servicedelivery deliveryover overWANs, WANs, where where bandwidth bandwidth costs costs are are relatively high. relatively high. VVC VVC isisimplementable implementablein in contemporary contemporary silicon silicon processes processes andand offers offers an an

acceptable trade-off acceptable trade-off between achievedperformance between achieved performance versus versus implementation implementation cost. cost. TheThe

implementationcost implementation costmay maybebe considered considered forfor example, example, in in terms terms of of one one or or more more of of siliconarea, silicon area, CPUprocessor CPU processorload, load,memory memory utilisation utilisation and and bandwidth. bandwidth. Other Other video video compression compression standards, standards, 2024203901

such as such as ISO/IEC ISO/IEC23008-2 23008-2 “High "High Efficiency Efficiency Video Video Coding” Coding" (HEVC)/H.265 (HEVC)/H.265 or ISO/IEC or ISO/IEC 14496- 14496- 15, 15, “Advanced Video "Advanced Video Coding” Coding" may may also also be used be used for feature for feature compression compression applications. applications. OtherOther

standards such standards such as as AV-1, AV-1,developed developedbyby theAlliance the AllianceforforOpen Open Media Media (AOMedia) (AOMedia) maybealso may also be used. used.

[0008] Videodata

[0008] Video dataincludes includesaa sequence sequenceofofframes framesofofimage imagedata, data,each eachframe frame including including one one or or

morecolour more colourchannels. channels.Where Where feature feature mapmap datadata is to is to bebe represented represented inin a apacked packed frame, frame,

generally aa monochrome generally frame monochrome frame having having luminance luminance only only andchroma and no no chroma channels channels is adequate. is adequate.

Whenonly When onlyluma luma samples samples areare present, present, thetheresulting resultingmonochrome monochrome frames frames are said are said to use to use a “4:0:0 a "4:0:0

chromaformat". chroma format”.

[0009] TheVVC

[0009] The VVC standard standard specifies specifies a ‘block a 'block based’ based' architecture,ininwhich architecture, whichframes framesarearefirstly firstly divided into divided into an an array array of of square square regions regions known as 'coding known as ‘codingtree tree units' units’ (CTUs). InVVC, (CTUs). In VVC, CTUs CTUs

generally occupy generally 128×128 occupy 128x128 luma luma samples. samples. Other Other possible possible CTU sizes CTU sizes when the when using using VVCthe VVC standard are standard are 32×32 and64x64. 32x32 and 64×64.However, However, CTUsCTUs at theatright the right and and bottom bottom edge edge of each of each frameframe

may be smaller in area, with implicit splitting occurring to ensure coding blocks remain in the may be smaller in area, with implicit splitting occurring to ensure coding blocks remain in the

frame. Associated frame. Associatedwith witheach eachCTU CTU is ais 'coding a ‘coding tree’defining tree' defininga adecomposition decompositionof of thethe area area ofof the the

CTU into a set of blocks, also referred to as ‘coding units’ (CUs). Blocks applicable to only the CTU into a set of blocks, also referred to as 'coding units' (CUs). Blocks applicable to only the

lumachannel luma channelororonly onlythe the chroma chromachannels channels arereferred are referredtotoas as 'coding ‘codingblocks' blocks’(CBs). (CBs).A A prediction of the contents of a coding block is held in a ‘prediction block’ (PB) or ‘prediction prediction of the contents of a coding block is held in a 'prediction block' (PB) or 'prediction

unit’ (PU) unit' (PU) and a residual and a residual block block defining defining an an array array of ofsample sample values values to to be be additively additivelycombined combined

with the PB or PU is referred to as a ‘transform block’ (TB) or ‘transform unit’ (TU), owing to with the PB or PU is referred to as a 'transform block' (TB) or 'transform unit' (TU), owing to

the typical use of a transformation process in the generation of the TB or TU. In the case of the typical use of a transformation process in the generation of the TB or TU. In the case of

HEVC, HEVC, theCTUCTU the sizesize maymay be 64×64, be 64x64, 32×32, 32x32, or 16×16 or 16x16 luma samples. luma samples. In the In the case ofcase of advanced advanced

video coding video coding(AVC), (AVC),a a"Macroblock" “Macroblock” is the is the analogue analogue ofCTU of a a CTU and ahas and has a size size of 16×16 of 16x16 luma luma

samples. samples.

44204385_1 44204385_1

5

[00010] Notwithstandingthetheabove

[00010] Notwithstanding above distinctionbetween distinction between ‘units’and 'units' and'blocks', ‘blocks’,the theterm term'block' ‘block’ 07 Jun 2024

may be used as a general term to refer to areas or regions of a frame for which operations are may be used as a general term to refer to areas or regions of a frame for which operations are

applied to all colour channels. applied to all colour channels.

[00011] Foreach

[00011] For eachCU, CU,a aprediction predictionunit unit(PU) (PU)ofofthe the contents contents (sample (samplevalues) values)ofof the the corresponding area corresponding area of frame of frame data data is generated is generated (a ‘prediction (a 'prediction unit’). Further, unit'). Further, a representation a representation of of the difference (or ‘spatial domain’ residual) between the prediction and the contents of the area the difference (or 'spatial domain' residual) between the prediction and the contents of the area

as seen seen at at input inputto tothe theencoder encoderisis formed. formed. The The difference difference in ineach each colour colour channel channel may be 2024203901

as may be

transformedand transformed andcoded codedasasa asequence sequenceofofresidual residualcoefficients, coefficients, forming oneorormore forming one moreTUs TUsforfor a a given CU. given CU.The The applied applied transform transform maymay be abeDiscrete a Discrete Cosine Cosine Transform Transform (DCT)(DCT) or or other other transform, applied to each block of residual values. The transform is applied separably, (i.e., transform, applied to each block of residual values. The transform is applied separably, (i.e.,

the two-dimensional the transformisisperformed two-dimensional transform performedinintwo twopasses, passes,one onehorizontally horizontallyand andone onevertically). vertically). Theblock The blockisis firstly firstly transformed transformed by by applying applying a a one-dimensional transformtotoeach one-dimensional transform eachrow rowofof samples in the block. Then, the partial result is transformed by applying a one-dimensional samples in the block. Then, the partial result is transformed by applying a one-dimensional

transform to each column of the partial result to produce a final block of transform coefficients transform to each column of the partial result to produce a final block of transform coefficients

that substantially decorrelates the residual samples. Transforms of various sizes are supported that substantially decorrelates the residual samples. Transforms of various sizes are supported

by the by the VVC standard,including VVC standard, includingtransforms transforms ofof rectangular-shaped rectangular-shaped blocks, blocks, with with each each side side

dimensionbeing dimension beinga apower powerofof two.Transform two. Transform coefficients coefficients areare quantised quantised forfor entropy entropy encoding encoding

into a bitstream. into a bitstream.

[00012] PBsororPUs

[00012] PBs PUsininVVC VVCmay may be generated be generated usingusing either either an intra-frame an intra-frame prediction prediction or an or an

inter-frame prediction inter-frame prediction process. Intra-frame prediction process. Intra-frame prediction involves involves the the use use of of previously previously processed processed

samples in a frame being used to generate a prediction of a current block of data samples in the samples in a frame being used to generate a prediction of a current block of data samples in the

frame. Inter-frame prediction involves generating a prediction of a current block of samples in frame. Inter-frame prediction involves generating a prediction of a current block of samples in

a frame a using aa block frame using of samples block of obtainedfrom samples obtained fromone oneorortwo twopreviously previouslydecoded decoded frames. frames. TheThe

block of block of samples obtainedfrom samples obtained froma apreviously previouslydecoded decoded frame frame is is offsetfrom offset fromthethespatial spatiallocation location of the current block according to a motion vector, which often has filtering applied. Intra-frame of the current block according to a motion vector, which often has filtering applied. Intra-frame

prediction blocks can be (i) a uniform sample value (“DC intra prediction”), (ii) a plane having prediction blocks can be (i) a uniform sample value ("DC intra prediction"), (ii) a plane having

an offset and horizontal and vertical gradient (“planar intra prediction”), (iii) a population of the an offset and horizontal and vertical gradient ("planar intra prediction"), (iii) a population of the block with neighbouring samples applied in a particular direction (“angular intra prediction”) or block with neighbouring samples applied in a particular direction ("angular intra prediction") or

(iv) the result of a matrix multiplication using neighbouring samples and selected matrix (iv) the result of a matrix multiplication using neighbouring samples and selected matrix

coefficients. coefficients.

[00013] Encodersand

[00013] Encoders anddecoders decoders conforming conforming to different to different video video encoding encoding standards standards may may be used be used

to compress to intermediatefeature compress intermediate feature maps mapsfrom froma afirst first portion portion (a (a ‘backbone’) of aa neural 'backbone') of neural network network

44204385_1 44204385_1

6

separated into separated into two two portions. portions. In In compression, the feature compression, the feature maps fromthe maps from the backbone backboneare arearranged arranged 07 Jun 2024

into aa frame into frame and and quantised fromaa floating-point quantised from floating-point domain to aa sample domain to sampledomain domain suitablefor suitable for compressionasasvideo compression videodata. data.Neural Neuralnetwork network layers,such layers, such asas convolutions,batch convolutions, batchnormalisations, normalisations, and activation functions, may be applied to reduce the dimensionality of the tensors prior to and activation functions, may be applied to reduce the dimensionality of the tensors prior to

compressionusing compression usinga avideo videocompression compression standard standard such such as as VVC. VVC. Dimensionality Dimensionality reduction reduction of of tensors reduces tensors the volume reduces the ofdata volume of data to to be be compressed, improving compressed, improving compression compression efficiency efficiency andand

reducing the reducing the runtime runtime of of the the VVC encoding VVC encoding andand decoding decoding stages. stages. Dimensionality Dimensionality reduction reduction 2024203901

introduces complexity introduces complexityoffsetting offsetting the the reduction reduction in in runtime runtime seen seen in in the the VVC encoding.A need VVC encoding. A need exists to exists tosupport support the theuse useofofencoders encodersand and decoders decoders conforming to various conforming to various video videoencoding encoding standards to standards to improve flexibility and improve flexibility and multi-encoder compatibility of multi-encoder compatibility of FCM implementations. FCM implementations.

SUMMARY SUMMARY

[00014]

[00014] ItItisisan anobject objectofofthethe present present invention invention to substantially to substantially overcome, overcome, or at or at least least ameliorate, ameliorate,

one or one or more disadvantagesofofexisting more disadvantages existingarrangements. arrangements.

[00015] Oneaspect

[00015] One aspectofofthe the present present disclosure disclosure provides provides aa method methodofofdecoding decodinga abitstream bitstreamtoto producetensors produce tensors for for use use by a neural by a neural network secondportion, network second portion, the the method methodcomprising: comprising: decoding decoding a a networkabstraction network abstraction layer layer (NAL) (NAL)unit unitfrom fromthe thebitstream bitstreamhaving havinga apredetermined predetermined length, length,

whereinthe wherein the NAL NAL unitofofthe unit thepredetermined predetermined length length indicatesa aNAL indicates NAL unit unit format format of of oneone inner inner

codec of a plurality of inner codecs, each other inner codec having NAL unit lengths different codec of a plurality of inner codecs, each other inner codec having NAL unit lengths different

to the predetermined length; selecting an inner codec from the plurality of inner codecs based to the predetermined length; selecting an inner codec from the plurality of inner codecs based

on the on the decoded NAL decoded NAL unit unit of of thepredetermined the predetermined length; length; and and decoding decoding the the bitstream bitstream using using thethe

selected inner codec to produce the tensors. selected inner codec to produce the tensors.

[00016] Anotheraspect

[00016] Another aspectofofthe thepresent present disclosure disclosure provides provides aa method methodofofencoding encodingtensors tensorstotoa a bitstream for bitstream for use use by by aa neural neural network secondportion, network second portion, the the method comprising:selecting method comprising: selectingananinner inner codec from the plurality of inner codecs for use in encoding tensors to the bitstream; encoding a codec from the plurality of inner codecs for use in encoding tensors to the bitstream; encoding a

networkabstraction network abstraction layer layer (NAL) (NAL)unit unittotothe the bitstream bitstream having havingaa predetermined predeterminedlength, length,wherein wherein the NAL the unitofofthe NAL unit the predetermined predeterminedlength lengthindicates indicatesaaNAL NAL unitformat unit format of of theselected the selectedinner inner codec of the plurality of inner codecs, each other inner codec having NAL unit lengths different codec of the plurality of inner codecs, each other inner codec having NAL unit lengths different

to the predetermined length; and encoding the tenors to the bitstream using the selected inner to the predetermined length; and encoding the tenors to the bitstream using the selected inner

codec. codec.

[00017] Another

[00017] Anotheraspect aspectofofthe thepresent present disclosure disclosure provides provides aa decoder decoderfor for decoding decodinga abitstream bitstreamto to producetensors produce tensors for for use use by a neural by a neural network secondportion, network second portion,the the decoder decoderconfigured configuredto: to: decode decodea a

44204385_1 44204385_1

7

networkabstraction network abstraction layer layer (NAL) (NAL)unit unitfrom fromthe thebitstream bitstreamhaving havinga apredetermined predetermined length, length, 07 Jun 2024

to the predetermined length; select an inner codec from the plurality of inner codecs based on to the predetermined length; select an inner codec from the plurality of inner codecs based on

the decoded the NAL decoded NAL unit unit ofof thepredetermined the predetermined length; length; and and decode decode thethe bitstream bitstream using using thethe selected selected

inner codec to produce the tensors. inner codec to produce the tensors.

[00018] Another Anotheraspect aspectofofthe thepresent present disclosure disclosure provides provides aa non-transitory non-transitory computer-readable computer-readable 2024203901

[00018]

storage medium storage which medium which stores stores a a program program forfor executing executing a method a method of decoding of decoding a bitstream a bitstream to to producetensors produce tensors for for use use by a neural by a neural network secondportion, network second portion,the the method methodcomprising: comprising: decoding decoding a a networkabstraction network abstraction layer layer (NAL) (NAL)unit unitfrom fromthe thebitstream bitstreamhaving havinga apredetermined predetermined length, length,

[00019] Anotheraspect

[00019] Another aspectofofthe thepresent present disclosure disclosure provides provides an an encoder encoderfor for encoding encodingtensors tensorstotoaa bitstream for bitstream for use use by by aa neural neural network secondportion, network second portion, the the encoder configuredto: encoder configured to: select select an an inner inner codec from the plurality of inner codecs for use in encoding tensors to the bitstream; encode a codec from the plurality of inner codecs for use in encoding tensors to the bitstream; encode a

to the predetermined length; and encode the tenors to the bitstream using the selected inner to the predetermined length; and encode the tenors to the bitstream using the selected inner

codec. codec.

[00020] Anotheraspect

[00020] Another aspectofofthe thepresent present disclosure disclosure provides provides aa non-transitory non-transitory computer-readable computer-readable

storage medium storage which medium which stores stores a a program program forfor executing executing a method a method of encoding of encoding tensors tensors to ato a bitstream for bitstream for use use by by aa neural neural network secondportion, network second portion, the the method comprising:selecting method comprising: selectingananinner inner codec from the plurality of inner codecs for use in encoding tensors to the bitstream; encoding a codec from the plurality of inner codecs for use in encoding tensors to the bitstream; encoding a

networkabstraction network abstraction layer layer (NAL) (NAL)unit unittotothe the bitstream bitstream having havingaa predetermined predeterminedlength, length,wherein wherein the NAL the unitofofthe NAL unit the predetermined predeterminedlength lengthindicates indicatesaaNAL NAL unit unit format format of of theselected the selectedinner inner codec of the plurality of inner codecs, each other inner codec having NAL unit lengths different codec of the plurality of inner codecs, each other inner codec having NAL unit lengths different

codec. codec.

44204385_1 44204385_1

8

[00021] Other

[00021] Other aspects aspects are are also also disclosed. disclosed.

BRIEF DESCRIPTION BRIEF DESCRIPTION OF OF THE THE DRAWINGS DRAWINGS

[00022] Atleast

[00022] At least one embodiment one embodiment of of thepresent the presentinvention inventionwill willnow nowbebe described described with with reference reference

to the to the following following drawings andappendices, drawings and appendices,ininwhich: which: 2024203901

[00023] Fig. 11 is

[00023] Fig. is aaschematic schematic block block diagram showinga adistributed diagram showing distributedmachine machine tasksystem; task system;

[00024] Figs. 2A

[00024] Figs. 2Aand and2B2Bform form a schematic a schematic block block diagram diagram of aofgeneral-purpose a general-purpose computer computer

systemupon system uponwhich which thedistributed the distributedmachine machine tasksystem task system of of Fig.1 1may Fig. maybe be practiced; practiced;

[00025] Fig. 3A

[00025] Fig. 3Aisis aa schematic blockdiagram schematic block diagramshowing showing functional functional modules modules of aofbackbone a backbone portion of portion of aa CNN; CNN;

[00026] Fig. 3B

[00026] Fig. 3Bis is aa schematic block diagram schematic block diagramshowing showing a residualblock a residual blockofofFig. Fig.3A; 3A;

[00027] Fig. 3C

[00027] Fig. 3Cis is aa schematic block diagram schematic block diagramshowing showing a residualunit a residual unitofofFig. Fig. 3A; 3A;

[00028] Fig. 3D

[00028] Fig. 3Disis aa schematic blockdiagram schematic block diagramshowing showing a CBL a CBL module module of Fig. of Fig. 3A; 3A;

[00029] Fig. 44 is

[00029] Fig. is aaschematic schematic block block diagram showingfunctional diagram showing functionalmodules modules of of an an alternative alternative

backboneportion backbone portionofofaaCNN; CNN;

[00030] Fig. 55 is

[00030] Fig. is aaschematic schematic block block diagram of aa tensor diagram of tensor encoder using aa configurable encoder using configurable tensor tensor compressorstage; compressor stage;

[00031] Fig. 66 is

[00031] Fig. is aaschematic schematic block block diagram showinga amulti-scale diagram showing multi-scalefeature featurefusion fusionstage stage for for aa tensor compressor; tensor compressor;

[00032] Fig.7 7shows

[00032] Fig. shows a picture a picture structure structure withdelay with low lowand delay and one one level level ofinterpolation; of temporal temporal interpolation;

[00033] Fig. 88 is

[00033] Fig. is aaschematic schematic block block diagram showingfunctional diagram showing functionalmodules modules of of a video a video encoder; encoder;

[00034] Figs. 9A

[00034] Figs. 9A&&9B9B areschematic are schematic block block diagrams diagrams showing showing an arrangement an arrangement of regions of regions or or subpictures for subpictures for holding holding compressed featuremap compressed feature mapdata datafrom from compressed compressed tensor tensor data; data;

[00035] Fig. 10

[00035] Fig. 10 is is aa schematic schematic block diagramshowing block diagram showing thestructure the structureofofaa network networkabstraction abstraction layer (NAL) layer unit; (NAL) unit;

44204385_1 44204385_1

9

[00036] Fig. 11

[00036] Fig. 11 is is aa schematic schematic block diagramshowing block diagram showing a bitstreamholding a bitstream holding NAL NAL units units for for 07 Jun 2024

various compression various compressionstandards; standards;

[00037] Fig. 12

[00037] Fig. 12 is is aa schematic schematic block diagramshowing block diagram showing a tensordecoder a tensor decoder with with a configurable a configurable

tensor decompressor; tensor decompressor;

[00038] Fig. 13

[00038] Fig. 13 is is aa schematic schematic block diagramshowing block diagram showing functionalmodules functional modules of of a video a video decoder; decoder; 2024203901

[00039] Fig.

[00039] Fig. 14 14 is is aa schematic schematic block diagramshowing block diagram showinganan implementation implementation of aofconfigurable a configurable feature reconstruction feature reconstruction module performinga adecoder module performing decodernetwork network topology; topology;

[00040] Fig. 15

[00040] Fig. 15 is is aa schematic schematic block diagramshowing block diagram showinganan embodiment embodiment of a of a multi-scale multi-scale feature feature

reconstruction stage; reconstruction stage;

[00041] Fig. 16A

[00041] Fig. 16Aisis aa schematic schematicblock blockdiagrams diagramsshowing showing a head a head portion portion of of a CNN; a CNN;

[00042] Fig. 16B

[00042] Fig. 16Bisis aa schematic blockdiagram schematic block diagramshowing showing an an upscaler upscaler module module of Fig. of Fig. 16A;16A;

[00043] Fig. 16C

[00043] Fig. 16Cisis aa schematic blockdiagram schematic block diagramshowing showing a detection a detection module module of Fig. of Fig. 16A; 16A;

[00044] Fig. 17

[00044] Fig. 17 is is aa schematic schematic block diagramshowing block diagram showinganan alternativehead alternative headportion portionofofa aCNN; CNN;

[00045] Fig.

[00045] Fig. 18 18 shows showsa amethod method forperforming for performing a firstportion a first portionof of aa CNN, CNN,selecting selectinga afeature feature mapcompression map compression standard, standard, compressing compressing tensors tensors using using thethe selected selected feature feature map map compressor, compressor, and and encodingresulting encoding resulting compressed compressedtensors tensorsinto intoaabitstream; bitstream;

[00046] Fig.

[00046] Fig. 19 19 shows showsa amethod method fordecoding for decoding a bitstream,determining a bitstream, determining a selectedfeature a selected featuremap map compressionstandard, compression standard,reconstructing reconstructingtensors tensorsaccording accordingtotoan anselected selected feature feature map compression map compression

standard, and standard, performingaasecond and performing secondportion portionofofthe the CNN; CNN; and and

[00047] Appendix

[00047] Appendix A shows A shows a syntax a syntax table table forfor NALNAL units units conforming conforming to AVC to the the standard, AVC standard,

[00048] Appendix

[00048] Appendix B shows B shows a syntax a syntax table table forfor NALNAL units units conforming conforming to HEVC to the the HEVC standard; standard;

[00049] Appendix

[00049] Appendix C shows C shows a syntax a syntax table table forfor NALNAL units units conforming conforming to VVC to the the VVC standard; standard;

[00050] Appendix

[00050] Appendix D shows D shows a syntax a syntax structure structure forfor identifyingoneone identifying manner manner of feature of feature mapmap

compression,such compression, suchasasaavideo videocompression compression standard,outoutofofa aplurality standard, plurality of of manners offeature manners of feature map compression; map compression; and and

44204385_1 44204385_1

10

[00051] Appendix

[00051] Appendix E shows E shows syntax syntax structures structures forfor signallinga afeature signalling featurecoding codingfor formachines machines vision vision 07 Jun 2024

modelparameter model parameterset set(FCM (FCM VMPS), VMPS), sequence sequence parameter parameter set SPS), set (FCM (FCMand SPS), and picture picture parameter parameter

set (FCM set PPS). (FCM PPS).

DETAILED DESCRIPTION DETAILED DESCRIPTION INCLUDING INCLUDING BEST BEST MODE MODE

[00052] Wherereference

[00052] Where referenceisismade madein in any any one one or or more more of of thethe accompanying accompanying drawings drawings to steps to steps

and/or features, and/or features, which which have the same have the referencenumerals, same reference numerals,those thosesteps stepsand/or and/orfeatures features have have for for 2024203901

the purposes of this description the same function(s) or operation(s), unless the contrary the purposes of this description the same function(s) or operation(s), unless the contrary

intention appears. intention appears.

[00053]

[00053] AAdistributed distributed machine machinetask tasksystem systemmay may include include an an edge edge device, device, such such as as a network a network

cameraororsmartphone camera smartphoneproducing producing intermediate intermediate compressed compressed data.data. The distributed The distributed machine machine task task system may also include a final device, such as a server farm based (‘cloud’) application, system may also include a final device, such as a server farm based ('cloud') application,

operating on the intermediate compressed data to produce a task result. Additionally, the edge operating on the intermediate compressed data to produce a task result. Additionally, the edge

device functionality device functionality may be embodied may be embodied inin thecloud the cloudand andthe theintermediate intermediatecompressed compressed data data maymay

be stored for later processing, potentially for multiple different tasks depending on need. be stored for later processing, potentially for multiple different tasks depending on need.

[00054] AAconvenient

[00054] convenientform form of of intermediate intermediate compressed compressed datadata is ais compressed a compressed video video bitstream, bitstream,

owingtoto the owing the availability availability ofofhigh-performing high-performing compression standardsand compression standards andimplementations implementations thereof. Video thereof. compressionstandards Video compression standardstypically typicallyoperate operateononinteger integer samples samplesofofsome somegiven given bit bit

depth, such as 10 bits, arranged in planar arrays. Colour video has three planar arrays, depth, such as 10 bits, arranged in planar arrays. Colour video has three planar arrays,

corresponding, for corresponding, for example, example,toto colour colour components components Y, Y, Cb,Cb, Cr,Cr, or or R,R, G,G, B,B, depending depending on on application. CNNs application. CNNs typicallyoperate typically operateononfloating floatingpoint pointdata data in in the the form of tensors form of tensors but but may also may also

operate oninteger operate on integer data, data, also also forming forming tensors. tensors. Tensors Tensors generally generally have a relatively have a relatively smaller spatial smaller spatial

dimensionality compared dimensionality comparedtoto incoming incoming video video data data upon upon which which the the CNN CNN operates operates while while havinghaving

morechannels more channelsthan thanthe thethree three channels channelstypical typical of of colour colour video data, for video data, for example 128, 256, example 128, 256, or or 512 channels. 512 channels.

[00055] Tensorstypically

[00055] Tensors typically have havethe the following followingdimensions: dimensions:frames, frames,channels, channels,height, height,and andwidth. width. For example, a tensor of dimensions [1, 256, 76, 136] would be said to contain floating-point or For example, a tensor of dimensions [1, 256, 76, 136] would be said to contain floating-point or

integer values integer values for for one one frame frame comprising anarray comprising an array of of two-hundred two-hundredand andfifty-six fifty-six (256) (256)feature feature maps(channels), maps (channels),each eachofof size size 136x76. 136×76.For Forvideo videodata, data,inferencing inferencingis is typically typically performed one performed one

frame at a time (frame or 'batch’ value of 1), rather than using tensors containing multiple frame at a time (frame or 'batch' value of 1), rather than using tensors containing multiple

frames.VVC, frames. HEVC, VVC, HEVC, andand AVC AVC support support a division a division of a of a picture picture intointo ’slices’,ororcontiguous 'slices', contiguous sequencesofof coded sequences codedCTUs CTUsor or Macroblocks Macroblocks in the in the casecase of AVC. of AVC. In VVCInand VVC andaHEVC, HEVC, 'tile' a ‘tile’

44204385_1 44204385_1

11

mechanism mechanism is isalso alsoavailable availableto to divide divide aa picture picture into intoaanumber number of of independently decodeable independently decodeable 07 Jun 2024

regions. regions.

[00056] Fig. 11 is

[00056] Fig. is aaschematic schematic block block diagram showingfunctional diagram showing functionalmodules modules of of a distributed a distributed

machinetask machine tasksystem system100, 100,capable capableofofperforming performing a machine a machine task task network network in aindistributed a distributed manner.The manner. The divisionofofa aparticular division particular neural neural network networkinto into two twoportions portionsrequires requires specifying specifying aa ‘split 'split point’ in the point' in network.Layers the network. Layers in network in the the network from from the thelayer input input up layer to theup to the split split point are point are

performedininaa first first device device and and the the resulting resultingintermediate intermediatetensor(s) areare compressed. compressed. Layers 2024203901

performed tensor(s) Layers

from the split point up to the last layer in the network are performed using decompressed from the split point up to the last layer in the network are performed using decompressed

tensor(s) from the first device as input to the layer(s) immediately following the split point. At tensor(s) from the first device as input to the layer(s) immediately following the split point. At

the split the splitpoint pointthere may there maybe beone one or ormore more tensors tensors that thatneed need to tobe becompressed for conveyance compressed for conveyance

over aa communication over channel communication channel with with limited limited bandwidth bandwidth compared compared to bandwidth to the the bandwidth requirement requirement

for transmission for transmission of of uncompressed tensors.Where uncompressed tensors. Where a ‘feature a 'feature pyramid pyramid network’ network' (FPN) (FPN) is inisuse, in use, it is common for layers in the FPN to be related in width and height such that a given layer is it is common for layers in the FPN to be related in width and height such that a given layer is

half the half the width width and and half half the theheight heightof ofan anadjacent adjacentlayer among layer among the the layers. layers. FPN architectures may FPN architectures may

also involve the width and height halving alternatively from one layer to the next layer. In also involve the width and height halving alternatively from one layer to the next layer. In

somearchitectures, some architectures, multiple multiple tensors tensors of of the thesame same width and height width and height are are produced withinthe produced within the FPN. FPN. An FPN may occur relatively early in the neural network topology, resulting in a necessity for a An FPN may occur relatively early in the neural network topology, resulting in a necessity for a

split point to occur within the FPN in order for a useful division of the network workload across split point to occur within the FPN in order for a useful division of the network workload across

the edge the device and edge device and the the cloud cloud to to be be achieved. When achieved. When a splitoccurs a split occurswithin withinthe theFPN FPNofof the the

machinetask machine tasknetwork, network,performance performanceof of a varietyofofmachine a variety machine task task networks networks where where layers layers up up to to the split the splitpoint pointare arecommon among common among themachine the machine task task networks networks (‘shared ('shared backbone’ backbone' architecture) architecture)

maybebeachieved. may achieved.Where Where a splitpoint a split pointoccurs occurswithin withinthe theFPN, FPN, tensorcompression tensor compression methods methods may may exploit redundancies exploit across the redundancies across the FPN layersto FPN layers to improve improvecompression compression performance. performance. Compression Compression

methodsapplicable methods applicabletotothe the various various network networktopologies topologiesused usedinincontemporary contemporary CNNs CNNs are therefore are therefore

beneficial for application in a wide range of scenarios. beneficial for application in a wide range of scenarios.

[00057] Thesystem

[00057] The system100 100 may may be be used used for for implementing implementing methods methods for decorrelating, for decorrelating, packing packing and and

quantising feature quantising feature maps into planar maps into planar frames for encoding frames for anddecoding encoding and decodingfeature featuremaps maps from from

encodeddata encoded datafor for various various neural neural networks. networks.Various Variousneural neuralnetworks networks maymay be split be split at at different different

points and points mayresult and may result in in intermediate intermediate tensors tensors of of various various number anddimensionality. number and dimensionality.A A feature feature

compressionscheme compression scheme capable capable of of adapting adapting to to differenttypes different typesofofintermediate intermediatedata dataand andcapable capableofof providing different quality reconstruction results in advantageous flexibility. Moreover, the providing different quality reconstruction results in advantageous flexibility. Moreover, the

system 100 provides flexibility to interface neural networks of various architectures and for system 100 provides flexibility to interface neural networks of various architectures and for

various applications subjected to splitting into portions (e.g., for distributed execution). various applications subjected to splitting into portions (e.g., for distributed execution).

44204385_1 44204385_1

12

[00058] Thesystem

[00058] The system100 100 includes includes a sourcedevice a source device 110 110 forfor generating generating frame frame data data 113. 113. TheThe 07 Jun 2024

frame data frame data 113 113is is passed to aa CNN passed to backbone CNN backbone 114114 to produce to produce tensors tensors 115. 115. The The tensors tensors 115 115 are are passed to passed to aa tensor tensor encoder encoder 116, 116, which producesananencoded which produces encoded bitstream bitstream 121. 121. TheThe system system 100 100 also also

includes a destination device 140 for decoding tensor data in the form of a received includes a destination device 140 for decoding tensor data in the form of a received

bitstream 143. bitstream 143. The Thedestination destinationdevice device140 140may maybe be used used forfor decoding decoding thethe tensor tensor data data (or (or

tensors) for content (e.g., of audio data, video data, image data, and textual data) of the tensors) for content (e.g., of audio data, video data, image data, and textual data) of the

bitstream 143. bitstream 143. 2024203901

[00059]AAcommunication

[00059] communication channel channel 130 130 is used is used to communicate to communicate the encoded the encoded bitstream bitstream 121 121 from from the source the source device 110 to device 110 to the the destination destination device device 140. In some 140. In arrangements,the some arrangements, thesource source device 110 device 110and anddestination destination device device 140 140may mayeither eitherororboth bothcomprise compriserespective respectivemobile mobile telephone telephone

handsets (e.g., handsets (e.g., “smartphones”) or network "smartphones") or networkcameras camerasand and cloud cloud applications.TheThe applications. communication communication

channel 130 channel 130may maybebea awired wiredconnection, connection, such such as as Ethernet,orora awireless Ethernet, wirelessconnection, connection,such suchasas WiFioror 5G, WiFi 5G,including includingconnections connectionsacross acrossa aWide Wide Area Area Network Network (WAN). (WAN). The communication The communication

channel 130 channel 130may mayalso alsobebeimplemented implemented across across ad-hoc ad-hoc connections. connections. Moreover, Moreover, the source the source

device 110 device 110and andthe the destination destination device device 140 140may maycomprise comprise applications applications where where encoded encoded video video datadata

is captured is captured on on some computer-readable some computer-readable storagemedium, storage medium, suchsuch as aashard a hard disk disk drive drive in in a file a file

server or server or memory. Although memory. Although thethe system system 100 100 is described is described as as including including thethe video video source source 112, 112,

whichwould which wouldprovide provide theframe the frame data113113 data forfor a a neuralnetwork neural network targetinga acomputer targeting computer vision vision

application, other types of source data, such as audio or text, may be input to a suitable neural application, other types of source data, such as audio or text, may be input to a suitable neural

network implemented network in the implemented in theCNN CNN backbone backbone 114 114 and and aa CNN head 150. CNN head 150. The The CNN backbone114 CNN backbone 114 mayalso may alsobe bereferred referred to to as as aa neural neuralnetwork network first firstportion portionoror NNNN part part1.1.The TheCNN head150 CNN head 150may may also be referred to as a neural network second portion or NN part 2. also be referred to as a neural network second portion or NN part 2.

[00060]As

[00060] Asshown shownin in Fig.1,1,the Fig. the source sourcedevice device110 110includes includesa avideo videosource source112, 112,the theCNN CNN backbone114, backbone 114,the thetensor tensorencoder encoder116, 116,and anda atransmitter transmitter122. 122.The Thevideo video source source 112 112 typically typically

comprisesaa source comprises sourceof of captured capturedvideo videoframe framedata data(shown (shownasas 113),such 113), suchasasananimage image capture capture

sensor, aa previously sensor, previously captured captured video sequencestored video sequence stored on onaa non-transitory non-transitory recording recording medium, medium,orora a video feed video feed from fromaa remote remoteimage imagecapture capturesensor. sensor.TheThe video video source source 112112 may may also also beoutput be an an output of of a computer a graphicscard, computer graphics card, for for example, example,displaying displayingthe the video videooutput outputof of an an operating operating system systemand and various applications various applications executing uponaa computing executing upon computingdevice device (e.g.,aa tablet (e.g., tablet computer). Examples computer). Examples of of

source devices source devices 110 110that that may mayinclude includeananimage imagecapture capturesensor sensorasasthe thevideo videosource source112 112include include smart-phones,video smart-phones, videocamcorders, camcorders,professional professionalvideo videocameras, cameras, and and network network video video cameras. cameras. The The video source video source 112 112may mayproduce produce independent independent images images or may or may produce produce temporally temporally sequential sequential

images, i.e., a video. images, i.e., a video.

44204385_1 44204385_1

13

[00061] Theneural

[00061] The neuralnetwork networkimplemented implemented in the in the CNNCNN backbone backbone 114 114 and theand CNNthe CNN head 150head 150 07 Jun 2024

maydepend may dependonon theapplication. the application.For Forexample, example, a ‘YOLOv3’ a 'YOLOv3' network network may bemay usedbe asused one as oneofpart part of an object an object tracking tracking system and aa 'FasterRCNN' system and ‘FasterRCNN’ network network may may be used be used as anasobject an object detection detection

system. The system. Thenumber numberandand dimensionality dimensionality of tensors of tensors 115115 depends depends on aon a particular particular network network

performed in the system 100 and the split point of the particular network. performed in the system 100 and the split point of the particular network.

[00062] TheCNN

[00062] The CNN backbone backbone 114 receives 114 receives the video the video frameframe data data 113performs 113 and and performs specific specific layerslayers of an an overall overall CNN, suchasaslayers layers corresponding correspondingtotothe the'backbone' ‘backbone’ofofthe theCNN, CNN, outputting 2024203901

of CNN, such outputting

tensors 115. tensors Thebackbone 115. The backbone layersofofthe layers theCNN CNNmaymay produce produce multiple multiple tensors tensors as output, as output, for for example, corresponding to different spatial scales of an input image represented by the video example, corresponding to different spatial scales of an input image represented by the video

frame data frame data 113 113when whensplitting splitting the the network networkwithin withinthe theFPN. FPN.An An FPNFPN may result may result in three in three tensors, tensors,

corresponding to three layers, output from the backbone 114 as the tensors 115 (e.g., if a corresponding to three layers, output from the backbone 114 as the tensors 115 (e.g., if a

‘YOLOv3’ network 'YOLOv3' network is performed is performed by the by the system system 100),100), withwith varying varying spatial spatial resolution resolution and and

channel count. channel count. When Whenthethe system system 100100 is performing is performing networks networks such such as ‘Faster as 'Faster RCNNRCNN X101- X101- FPN’oror'Mask FPN' ’MaskRCNN RCNN X101-FPN’ X101-FPN' the tensors the tensors 115 may115 may include include tensors tensors for fourfor four layers layers (P2-P5). (P2-P5).

Use of a FPN results in a plurality of tensors forming a hierarchical representation for a single Use of a FPN results in a plurality of tensors forming a hierarchical representation for a single

frame to frame to be be encoded encodedtoto(and (anddecoded decodedfrom) from) thebitstream the bitstreamwhen when thethe splitpoint split pointofofthe the network network occurs within occurs within the the FPN, as described FPN, as describedhereafter. hereafter. The Thetensor tensorencoder encoder116 116produces produces thethe encoded encoded

bitstream 121 bitstream 121 from fromthe thetensors tensors 115. 115.

[00063] The

[00063] The bitstream bitstream 121 121 is is supplied supplied to the to the transmitter transmitter 122 for 122 for transmission transmission over the over the communications channel 130 or the bitstream 121 is written to storage 132 for later use. communications channel 130 or the bitstream 121 is written to storage 132 for later use.

[00064] Thesource

[00064] The sourcedevice device110 110supports supportsa aparticular particularnetwork networkfor forthe theCNN CNN backbone backbone 114.114.

However,the However, thedestination destinationdevice device140 140may mayuseuse one one of of severalnetworks several networks forfor thethehead head CNNCNN 150. 150.

In using In using one of several one of several networks for the networks for the head head CNN 150,partially CNN 150, partiallyprocessed processeddata dataininthe the form formof of packedfeature packed feature maps mapsmay maybe be stored stored forlater for later use use in in performing performingvarious varioustasks tasks without withoutneeding needingtoto again perform again performthe the operation operation of of the the CNN backbone CNN backbone 114. 114.

[00065] Thebitstream

[00065] The bitstream121 121isistransmitted transmittedby bythe the transmitter transmitter 122 over the 122 over the communication communication

channel 130 channel 130asas encoded encodeddata. data.The The bitstream bitstream 121 121 cancan in in some some implementations implementations be stored be stored in ain a storage memory storage 132,where memory 132, where thethe storage storage 132 132 is is a a non-transitorystorage non-transitory storagedevice devicesuch suchasasa a"Flash" “Flash” memory memory or or a a harddisk hard diskdrive, drive,until until later laterbeing being transmitted transmitted over over the thecommunication channel130 communication channel 130 (or in-lieu (or in-lieuofoftransmission transmissionover overthe thecommunication channel130). communication channel 130).For Forexample, example, encoded encoded video video

44204385_1 44204385_1

14

data may data beserved may be servedupon upondemand demand to customers to customers overover a wide a wide areaarea network network (WAN)(WAN) for a for a video video 07 Jun 2024

analytics application. analytics application.

[00066] Thedestination

[00066] The destinationdevice device140 140includes includesa areceiver receiver 142, 142,aa tensor tensor decoder decoder146, 146,the the CNN CNN head 150, head 150, and andaa CNN CNN task task resultbuffer result buffer152. 152.The The receiver142 receiver 142 receivesencoded receives encoded video video data data

from the from the communication communication channel channel 130130 and and passes passes the the bitstream bitstream 143143 to the to the tensor tensor decoder decoder 146. 146.

Thetensor The tensor decoder decoder146 146outputs outputsdecoded decoded tensors149, tensors 149, which which areare supplied supplied to to theCNN the CNN headhead 150.150.

TheCNN CNN head 150150 receives thethe tensors 149149 andand performs the the later layers ofof theneural neuralnetwork network 2024203901

The head receives tensors performs later layers the

that began that with the began with the CNN backbone CNN backbone 114114 to produce to produce a task a task result result 151. 151. TheThe task task result151 result 151isis stored in the task result buffer 152. The contents of the task result buffer 152 may be presented stored in the task result buffer 152. The contents of the task result buffer 152 may be presented

to the user (e.g., via a graphical user interface), or provided to an analytics application where to the user (e.g., via a graphical user interface), or provided to an analytics application where

someaction some actionis is decided basedon decided based onthe the task task result, result, which which may includesummary may include summary level level presentation presentation

of aggregated task results to a user. It is also possible for the functionality of each of the source of aggregated task results to a user. It is also possible for the functionality of each of the source

device 110 device 110 and andthe the destination destination device device 140 140to to be be embodied embodiedinina asingle singledevice, device, examples examplesofofwhich which include mobile include mobiletelephone telephonehandsets handsetsand andtablet tabletcomputers computersand andcloud cloud applications. applications.

[00067] Asseen

[00067] As seenininFig. Fig. 1, 1, the the system system 100 also comprises 100 also comprises aa tensor tensor codec codecrepository repository 180. 180. The The codec repository codec repository 180 180may mayinclude includenetwork network topologies topologies covering covering a variety a variety ofof neuralnetworks neural networks andand

associated split points, and reconstruction fidelity levels. The network topologies may be associated split points, and reconstruction fidelity levels. The network topologies may be

stored in in the tensor codec repository 180 for future reference (or use). The tensor codec stored in in the tensor codec repository 180 for future reference (or use). The tensor codec

repository 180 repository maybebeaccessed 180 may accessed'out ‘outofofband' band’ororseparately separatelystored stored in in each of the each of the source source

device 110 device 110 and andthe the destination destination device device 140. 140. InIn other other words, words,the the tensor tensor codec codec repository repository 180 180may may be accessed be accessed over over aa network networkbybythe thesource sourcedevice device110 110and andthethedestination destinationdevice device140 140rather ratherthan than via the via the bitstream bitstream 143. 143. A networktopology A network topologyidentifier identifier 174 174and and176 176may maybe be sent sent by by thethe tensor tensor

encoder 116 encoder 116and andthe thetensor tensordecoder decoder146, 146,respectively, respectively, to to the the tensor tensor codec codec repository repository 180. The 180. The

networktopology network topologyidentifiers identifiers 174 174 and and176 176may maybe be used used forfor determining determining a given a given network network

topologyfrom topology fromthe thebitstream bitstream143. 143.

[00068] Asaaresult

[00068] As result of of aa request request for fora agiven givennetwork network topology, topology, aa network topology172 network topology 172and and178 178 maybebereturned may returnedbybythe thetensor tensor codec codecrepository repository180 180totothe the tensor tensor encoder encoder116 116and andthe thetensor tensor decoder 146, decoder 146, respectively. respectively. As Asdescribed describedinindetail detail below, the information below, the including the information including the network network topologyidentifier topology identifier 178 178 may bedecoded may be decodedand andused used by by thethe tensordecoder tensor decoder 146 146 forfor producing producing

decodedtensors decoded tensorsusing usingthe the determined determinednetwork network topology. topology. TheThe tensor tensor codec codec repository repository 180 180 may may be accessible via public file repository or within a private network accessible to the source be accessible via public file repository or within a private network accessible to the source

device 110 device 110 and andthe the destination destination device device 140. 140. AAgiven givennetwork network topology topology defines defines thethe composition composition

44204385_1 44204385_1

15

and interconnection of a set of machine learning primitive operations, including convolutions, and interconnection of a set of machine learning primitive operations, including convolutions, 07 Jun 2024

batch normalisations, batch normalisations, activation activation functions, functions, concatenations. concatenations. A networktopology A network topologymay may be be available for a split point. However, with mismatch in the supported dimensionality, in available for a split point. However, with mismatch in the supported dimensionality, in

particular the particular thespatial spatialdimensions dimensions of ofthe thefeature maps feature mapsmay may differ differ from from those those provided provided by CNN by CNN

backbone114. backbone 114.Moreover, Moreover, thethe data data type type provided provided from from the the CNN CNN backbone backbone 114 and114 and supplied supplied to to the CNN the head CNN head 150 150 maymay differ differ from from that that used used internallybyby internally thenetwork the network topology. topology. For For example, example,

integer inferencing integer inferencing is iscommonly useddue commonly used duetotoits its reduced reducedcomplexity complexitycompared compared to floating-point to floating-point 2024203901

inferencing. Where inferencing. Wherea anetwork network topology topology is is availablebut available butconfigured configuredtotouse usefloating-point floating-point inferencing, an inferencing, an adaptation adaptation between integer and between integer and floating-point floating-point domains is needed domains is neededtoto couple couplethe the networktopology network topologyimplemented implemented in the in the tensor tensor encoder encoder 116116 andand the the tensor tensor decoder decoder 146 146 withwith the the CNNbackbone CNN backbone114 114and andthe the CNN CNNhead head150. 150.

[00069] Thevideo

[00069] The videosource source112 112cancan provide provide vision vision model model parameters parameters 113a113a to the to the tensor tensor encoder encoder

116, asdescribed 116, as described hereafter. hereafter. TheThe vision vision modelmodel parameters parameters 113atheinclude 113a include spatial the spatial ofresolution of resolution

the frame data 113, used for bounding boxes (an example of the task result 151) to be scaled to the frame data 113, used for bounding boxes (an example of the task result 151) to be scaled to

correspond to the resolution of the frame data 113. correspond to the resolution of the frame data 113.

[00070] Thearrangements

[00070] The arrangements described described allow allow a different'inner a different ‘innercodec' codec’totobebeselected selectedand andused used based on based on implementation implementationrequirements. requirements. In In thecontext the contextofofthe thearrangements arrangements described,thethe'inner described, ‘inner codec’ relates codec' relates to tothe thefunctionality functionalityforfor encoding encodingtensors tensorsreduced reducedinindimensionality dimensionalitycompared to compared to

the tensors the tensors 115 115 from the CNN from the backbone CNN backbone 114 114 (or (or a feature a feature pyramid) pyramid) for for transmission transmission between between

the source the source device 110 and device 110 andthe the destination destination device 140, and device 140, and correspondingly correspondinglydecoding decodinga a bitstream bitstream

to produce to compressedtensors produce compressed tensorsafter afterreception receptionat at the the destination destination device device 140, 140, where the where the

compressedtensors compressed tensorswill willbe befurther further processed processedto to produce producetensors tensors 149, 149, restored restored in in dimensionality dimensionality

to correspond to the tensors 115. The ‘inner codec’ generates and decodes a bitstream in the to correspond to the tensors 115. The 'inner codec' generates and decodes a bitstream in the

examplesdescribed. examples described.InInother other implementations, implementations,a adifferent different encoded encodedoutput, output,for for example examplea a packedframe packed framemay maybe be used. used.

[00071] Notwithstanding

[00071] Notwithstanding theexample the example devices devices mentioned mentioned above, above, each each of source of the the source device device 110 110 and destination and destination device 140 may device 140 maybebeconfigured configuredwithin withina ageneral-purpose general-purpose computing computing system, system,

typically through typically through a a combination of hardware combination of hardwareand andsoftware softwarecomponents. components. Fig.Fig. 2A illustrates 2A illustrates such such

a computer a system200, computer system 200,which which includes:a acomputer includes: computer module module 201;201; input input devices devices suchsuch as aas a keyboard202, keyboard 202,aamouse mouse pointerdevice pointer device203, 203,a ascanner scanner226, 226,a acamera camera 227, 227, which which may may be be configured as configured as the the video source 112, video source 112, and and aa microphone microphone280; 280; and and output output devices devices including including a a printer 215, printer 215, aa display displaydevice device 214 214 and and loudspeakers 217. AnAnexternal loudspeakers 217. externalModulator-Demodulator Modulator-Demodulator

44204385_1 44204385_1

16

(Modem) (Modem) transceiverdevice transceiver device 216 216 maymay be used be used by the by the computer computer module module 201communicating 201 for for communicating 07 Jun 2024

to and to and from from aa communications communications network network 220 220 via via a connection a connection 221.221. The communications The communications

network220, network 220,which whichmay may represent represent thethe communication communication channel channel 130, 130, may may be be a (WAN), a (WAN), such as such as the Internet, the Internet,a acellular cellulartelecommunications telecommunications network, network, or or aa private privateWAN. Where WAN. Where thethe

connection221 connection 221isis aa telephone telephone line, line, the themodem 216may modem 216 may be be a traditional"dial-up" a traditional “dial-up”modem. modem. Alternatively, where the connection 221 is a high capacity (e.g., cable or optical) connection, Alternatively, where the connection 221 is a high capacity (e.g., cable or optical) connection,

the modem the 216 modem 216 maymay be abebroadband a broadband modem. modem. A wireless A wireless modem modem may also may alsofor be used be wireless used for wireless 2024203901

connectionto connection to the the communications network communications network 220. 220. The The transceiver transceiver device device 216 216 may provide may provide the the functionality of functionality of the thetransmitter transmitter122 122and andthe thereceiver receiver142 142and andthe thecommunication channel130 communication channel 130 maybebeembodied may embodiedin in theconnection the connection 221. 221.

[00072] Thecomputer

[00072] The computer module module 201 201 typically typically includes includes at at leastone least oneprocessor processor unit205, unit 205,and anda a memory memory unit206. unit 206.ForFor example, example, thethe memory memory unit unit 206 have 206 may may semiconductor have semiconductor random random access access memory(RAM) memory (RAM) and and semiconductorread semiconductor readonly only memory memory(ROM). (ROM).TheThe computer computer module module 201201 also also

includes a number of input/output (I/O) interfaces including: an audio-video interface 207 that includes a number of input/output (I/O) interfaces including: an audio-video interface 207 that

couples to couples to the the video video display display 214, 214, loudspeakers 217and loudspeakers 217 andmicrophone microphone 280; 280; an an I/OI/O interface213213 interface

that couples that couples to to the thekeyboard keyboard 202, 202, mouse 203,scanner mouse 203, scanner226, 226,camera camera 227 227 andand optionally optionally a joystick a joystick

or other human interface device (not illustrated); and an interface 208 for the external or other human interface device (not illustrated); and an interface 208 for the external

modem modem 216216 andand printer printer 215. 215. TheThe signal signal from from the the audio-video audio-video interface interface 207207 to the to the computer computer

monitor214 monitor 214isis generally generally the the output output of of aa computer graphics card. computer graphics card. In In some someimplementations, implementations,thethe

modem modem 216216 maymay be incorporated be incorporated within within the the computer computer module module 201,example 201, for for example within within the the interface 208. interface The computer 208. The computermodule module 201201 also also hashas a local a local network network interface interface 211, 211, which which permits permits

coupling of the coupling of the computer system200 computer system 200viaviaa aconnection connection223223 to to a a local-areacommunications local-area communications network222, network 222,known knownas as a Local a Local Area Area Network Network (LAN). (LAN). As illustrated As illustrated in Fig. in Fig. 2A, 2A, the local the local

communications communications network network 222222 may may also also couple couple to the to the widewide network network 220a via 220 via a connection connection 224, 224, which would typically include a so-called “firewall” device or device of similar functionality. which would typically include a so-called "firewall" device or device of similar functionality.

Thelocal The local network networkinterface interface 211 211may maycomprise comprise an an EthernetTM EthernetTM circuit circuit BluetoothTM card,a aBluetooth card,

wireless arrangement wireless oran arrangement or anIEEE IEEE802.11 802.11 wireless wireless arrangement; arrangement; however, however, numerous numerous other other typestypes

of interfaces of interfaces may be practiced may be practiced for for the theinterface interface211. 211. The The local localnetwork network interface interface 211 211 may also may also

provide the provide the functionality functionality of of the thetransmitter transmitter122 122and andthe thereceiver receiver142 142and andcommunication communication

channel 130 channel 130may mayalso alsobebeembodied embodiedin in thethe localcommunications local communications network network 222. 222.

[00073] The

[00073] The I/OI/O interfaces interfaces 208213 208 and andmay213 mayeither afford afford or either both ofor both and serial of serial and parallel parallel

connectivity, the connectivity, the former former typically typically being being implemented accordingtotothe implemented according theUniversal UniversalSerial Serial Bus Bus (USB)standards (USB) standardsand andhaving having corresponding corresponding USBUSB connectors connectors (not (not illustrated). illustrated). Storage Storage

44204385_1 44204385_1

17

devices 209 devices 209 are are provided providedand andtypically typically include include aa hard hard disk disk drive drive (HDD) 210.Other (HDD) 210. Other storage storage 07 Jun 2024

devices such as a floppy disk drive and a magnetic tape drive (not illustrated) may also be used. devices such as a floppy disk drive and a magnetic tape drive (not illustrated) may also be used.

An optical disk drive 212 is typically provided to act as a non-volatile source of data. Portable An optical disk drive 212 is typically provided to act as a non-volatile source of data. Portable

TM memory memory devices,such devices, such opticaldisks optical disks(e.g. (e.g. CD-ROM, CD-ROM, DVD,DVD, BluDiscTM), Blu ray ray DiscUSB-RAM, ), USB-RAM, portable, external hard drives, and floppy disks, for example, may be used as appropriate portable, external hard drives, and floppy disks, for example, may be used as appropriate

sources of sources of data data to to the thecomputer computer system 200.Typically, system 200. Typically,any anyofofthe theHDD HDD 210, 210, optical optical drive212, drive 212, networks220 networks 220and and222 222maymay also also be be configured configured to to operate operate as as thevideo the video source source 112, 112, or or asas a a 2024203901

destination for destination for decoded video data decoded video data to to be be stored stored for forreproduction reproduction via viathe thedisplay display214. 214. The The source source

device 110 device 110 and andthe the destination destination device device 140 140of of the the system 100may system 100 maybebeembodied embodied in the in the computer computer

system200. system 200.

[00074] Thecomponents

[00074] The components205205 to 213 to 213 of the of the computer computer module module 201 typically 201 typically communicate communicate via an via an

interconnected bus interconnected bus 204 204and andininaa manner mannerthat thatresults results in in aa conventional conventional mode ofoperation mode of operationofof the the computersystem computer system200 200 known known to those to those in in thethe relevantart. relevant art.For Forexample, example, theprocessor the processor 205 205 is is

coupledto coupled to the the system bus 204 system bus 204using usingaaconnection connection218. 218.Likewise, Likewise, thethe memory memory 206 206 and optical and optical

disk disk drive drive 212 212 are are coupled to the coupled to the system system bus 204 by bus 204 byconnections connections219. 219.Examples Examples of computers of computers

on which on whichthe thedescribed describedarrangements arrangementscancan bebe practisedinclude practised includeIBM-PC's IBM-PC’s and and compatibles, compatibles, Sun Sun SPARCstations, Apple SPARCstations, Apple Mac MacM orTM or alike alike computer computer systems. systems.

[00075] Thetensor

[00075] The tensorencoder encoder116, 116,the thetensor tensordecoder decoder146 146and and methods methods to be to be described, described, maymay be be

implementedasasone implemented oneorormore more software software application application programs programs 233 233 executable executable within within the the computer computer

system 200. In particular, the tensor encoder 116, the tensor decoder 146 and the steps of the system 200. In particular, the tensor encoder 116, the tensor decoder 146 and the steps of the

described methods are effected by instructions 231 (see Fig. 2B) in the software 233 that are described methods are effected by instructions 231 (see Fig. 2B) in the software 233 that are

carried out carried out within within the the computer system200. computer system 200.The The software software instructions231 instructions 231maymay be be formed formed as as one or one or more codemodules, more code modules,each each forperforming for performing oneone or or more more particular particular tasks.TheThe tasks. software software maymay

also be divided into two separate parts, in which a first part and the corresponding code also be divided into two separate parts, in which a first part and the corresponding code

modulesperforms modules performsthethedescribed describedmethods methods andand a second a second partpart andand thethe corresponding corresponding codecode

modules manage a user interface between the first part and the user. modules manage a user interface between the first part and the user.

[00076] Thesoftware

[00076] The softwaremay maybe be stored stored inin a acomputer computer readable readable medium, medium, including including the the storage storage

devices described devices described below, below,for for example. example.The The software software is is loaded loaded intothe into thecomputer computer system system 200200

from the from the computer computerreadable readablemedium, medium,andand then then executed executed by the by the computer computer system system 200. 200. A A computerreadable computer readablemedium medium having having suchsuch software software or computer or computer program program recorded recorded on the on the computerreadable computer readablemedium medium is computer is a a computer program program product. product. Theofuse The use ofcomputer the the computer program program

44204385_1 44204385_1

18

product in product in the the computer system200 computer system 200preferably preferablyeffects effectsananadvantageous advantageous apparatus apparatus forfor 07 Jun 2024

implementingthe implementing thesource sourcedevice device110 110 and and thedestination the destinationdevice device140 140 and and thethe described described methods. methods.

[00077] Thesoftware

[00077] The software233 233isistypically typically stored stored in in the the HDD 210ororthe HDD 210 thememory memory 206. 206. The The software software

is loaded is loaded into into the thecomputer computer system 200from system 200 froma acomputer computer readable readable medium, medium, and and executed executed by by the the computersystem computer system200. 200.Thus, Thus, forfor example, example, thethe software software 233233 may may be stored be stored onoptically on an an optically readable disk readable disk storage storage medium (e.g., CD-ROM) medium (e.g., CD-ROM) 225 225 that that is read is read by by thethe optical optical diskdrive disk drive212. 212. 2024203901

[00078] In some

[00078] In someinstances, instances, the the application application programs 233may programs 233 maybe be supplied supplied to to theuser the userencoded encoded on one on one or or more moreCD-ROMs CD-ROMs 225read 225 and and via readthe viacorresponding the corresponding drive drive 212, 212, or or alternatively alternatively may may be read by the user from the networks 220 or 222. Still further, the software can also be loaded be read by the user from the networks 220 or 222. Still further, the software can also be loaded

into the into the computer system200 computer system 200from fromother othercomputer computer readable readable media. media. Computer Computer readable readable storage storage

mediarefers media refers to to any any non-transitory non-transitory tangible tangible storage storage medium that provides medium that providesrecorded recordedinstructions instructions and/or data and/or data to to the the computer system200 computer system 200for forexecution executionand/or and/orprocessing. processing.Examples Examples of such of such

TM storage media storage includefloppy media include floppydisks, disks, magnetic magnetictape, tape, CD-ROM, CD-ROM, DVD,DVD, Blu-ray Blu-ray DiscTMDisc a hard, a hard disk drive, disk drive, aaROM ROM ororintegrated integratedcircuit, circuit, USB memory, USB memory, a magneto-optical a magneto-optical disk, disk, or or a computer a computer

readable card readable card such such as as aa PCMCIA card PCMCIA card andand the the like,whether like, whether or or notnot such such devices devices areare internaloror internal

external of external of the the computer module201. computer module 201.Examples Examples of transitory of transitory or or non-tangible non-tangible computer computer

readable transmission media that may also participate in the provision of the software, readable transmission media that may also participate in the provision of the software,

application programs, application instructions and/or programs, instructions and/or video video data data or or encoded videodata encoded video data to to the the computer computer

module201 module 201include includeradio radioororinfra-red infra-red transmission transmissionchannels, channels,as as well well as as aa network connectiontoto network connection

another computer another computerorornetworked networked device, device, and and theInternet the InternetororIntranets Intranets including including e-mail e-mail transmissions and transmissions andinformation informationrecorded recordedononWebsites Websitesandand thethe like. like.

[00079] Thesecond

[00079] The secondpart partofofthe the application application program program233 233andand thecorresponding the corresponding code code modules modules

mentionedabove mentioned abovemaymay be be executed executed to implement to implement onemore one or or more graphical graphical user user interfaces interfaces (GUIs) (GUIs)

to be to be rendered rendered or or otherwise represented upon otherwise represented uponthe the display display 214. 214. Through Through manipulation manipulation of of typically the typically the keyboard 202 and keyboard 202 andthe the mouse mouse203, 203,a auser userofofthe the computer computersystem system 200 200 andand thethe

application may application manipulatethe may manipulate theinterface interface in in aa functionally functionally adaptable adaptable manner to provide manner to provide controlling commands controlling and/or commands and/or inputtotothe input theapplications applicationsassociated associatedwith withthe the GUI(s). GUI(s).Other Otherforms forms of functionally of functionally adaptable adaptable user user interfaces interfacesmay may also also be be implemented, suchasasan implemented, such anaudio audiointerface interface utilizing speech utilizing speech prompts output via prompts output via the the loudspeakers 217and loudspeakers 217 anduser uservoice voicecommands commands input input viavia

the microphone the 280. microphone 280.

44204385_1 44204385_1

19

[00080] Fig. 2B

[00080] Fig. 2Bis is aa detailed detailed schematic schematic block diagramofofthe block diagram the processor processor 205 205and anda a 07 Jun 2024

“memory” "memory" 234. 234. TheThe memory memory 234 represents 234 represents a logical a logical aggregation aggregation ofthe of all all the memory memory modules modules

(including the (including the storage storage devices devices 209 209 and semiconductormemory and semiconductor memory 206)206) thatthat can can be accessed be accessed by by the the computermodule computer module201201 in in Fig.2A.2A. Fig.

[00081] When

[00081] When thecomputer the computer module module 201initially 201 is is initially powered powered up, up, a power-on a power-on self-test self-test (POST) (POST)

program250 program 250executes. executes.TheThe POST POST program program 250 is250 is typically typically stored stored in a in a ROM ROM 249 of249 the of the semiconductormemory memory206 206 of Fig. 2A.2A. A hardware devicedevice such theasROM the249 ROM 249 storing 2024203901

semiconductor of Fig. A hardware such as storing

software is software is sometimes referred to sometimes referred to as as firmware. ThePOST firmware. The POST program program 250 examines 250 examines hardware hardware

within the within the computer module computer module 201 201 to to ensure ensure proper proper functioning functioning andand typically typically checks checks thethe

processor 205, processor 205, the the memory 234 memory 234 (209, (209, 206), 206), and and a basicinput-output a basic input-outputsystems systems software software (BIOS) (BIOS)

module251, module 251,also alsotypically typically stored stored in in the the ROM 249,for ROM 249, forcorrect correctoperation. operation. Once OncethethePOST POST program250 program 250has hasrun runsuccessfully, successfully,the theBIOS BIOS 251 251 activatesthe activates thehard harddisk diskdrive drive210 210ofofFig. Fig.2A. 2A. Activation of the hard disk drive 210 causes a bootstrap loader program 252 that is resident on Activation of the hard disk drive 210 causes a bootstrap loader program 252 that is resident on

the hard the hard disk disk drive drive 210 210 to to execute execute via via the theprocessor processor 205. 205. This This loads loads an an operating operating system 253 system 253

into the into the RAM memory RAM memory 206,206, uponupon which which the operating the operating system system 253 commences 253 commences operation. operation. The The operating system 253 is a system level application, executable by the processor 205, to fulfil operating system 253 is a system level application, executable by the processor 205, to fulfil

various high various high level level functions, functions, including including processor processor management, memory management, memory management, management, devicedevice

management, management, storage storage management, management, software software application application interface, interface, andand generic generic user user interface. interface.

[00082] Theoperating

[00082] The operatingsystem system253 253 manages manages the the memory memory 234 (209, 234 (209, 206) 206) to to ensure ensure that each that each

process or process or application application running on the running on the computer module computer module 201 201 hashas sufficientmemory sufficient memory in which in which to to execute without execute withoutcolliding colliding with with memory memory allocatedtotoanother allocated anotherprocess. process.Furthermore, Furthermore, thethe different different

types of types of memory availableininthe memory available the computer computersystem system 200200 of of Fig.2A2A Fig. need need to to be be used used properly properly SO so that each that each process process can can run run effectively. effectively. Accordingly, the aggregated Accordingly, the memory aggregated memory 234234 is is notnotintended intended to illustrate how particular segments of memory are allocated (unless otherwise stated), but to illustrate how particular segments of memory are allocated (unless otherwise stated), but

rather to rather to provide provide aa general general view view of of the thememory accessibleby memory accessible bythe the computer computersystem system 200 200 andand howhow

such memory such memory is is used. used.

[00083] As

[00083] Asshown shownin in Fig.2B, Fig. 2B,the theprocessor processor205 205includes includesa anumber numberof of functional functional modules modules

including a control unit 239, an arithmetic logic unit (ALU) 240, and a local or internal including a control unit 239, an arithmetic logic unit (ALU) 240, and a local or internal

memory memory 248, 248, sometimes sometimes called called a cache a cache memory. memory. The cache The cache memorymemory 248 typically 248 typically includesincludes a a numberofofstorage number storageregisters registers 244-246 244-246inin aa register register section. section. One or more One or internal busses more internal 241 busses 241

functionally interconnect functionally interconnect these these functional functional modules. Theprocessor modules. The processor205 205typically typicallyalso alsohas hasone oneoror

44204385_1 44204385_1

20

moreinterfaces more interfaces 242 242for for communicating communicating with with external external devices devices viathethesystem via system bus bus 204, 204, using using thethe 07 Jun 2024

connection218. connection 218.The Thememory memory 234 234 is coupled is coupled to the to the bus bus 204 204 using using the the connection connection 219.219.

[00084] Theapplication

[00084] The applicationprogram program 233 233 includes includes a sequence a sequence of of instructions231231 instructions thatmay that may include include

conditional branch conditional andloop branch and loopinstructions. instructions. The Theprogram program233233 maymay also also include include data data 232232 which which is is used in used in execution of the execution of the program 233.The program 233. Theinstructions instructions231 231and andthe thedata data232 232are arestored storedinin memory memory locations228, locations 228,229, 229,230 230 and and 235, 235, 236, 236, 237, 237, respectively.Depending respectively. Depending uponupon the relative the relative

size of of the theinstructions instructions231 231and andthe thememory locations 228-230, 228-230,aa particular particular instruction instructionmay may be 2024203901

size memory locations be

stored in stored in aa single singlememory location as memory location as depicted depicted by by the the instruction instruction shown in the shown in the memory memory

location 230. location Alternately, an 230. Alternately, an instruction instruction may be segmented may be segmentedinto intoaanumber numberofof partseach parts eachofof whichisis stored which stored in in aa separate separatememory location, as memory location, as depicted by the depicted by the instruction instruction segments showninin segments shown

the memory the locations228 memory locations 228andand 229. 229.

[00085] In general, the processor 205 is given a set of instructions which are executed therein.

Theprocessor The processor205 205waits waitsfor foraa subsequent subsequentinput, input, to to which whichthe the processor processor205 205reacts reacts to to by by executing another executing another set set of of instructions. instructions. Each Each input input may be provided may be providedfrom fromone oneorormore moreof of a a numberofofsources, number sources,including includingdata datagenerated generatedbybyone oneorormore moreofofthe theinput inputdevices devices202, 202,203, 203,data data received from received froman anexternal external source source across across one oneof of the the networks networks220, 220,202, 202,data dataretrieved retrieved from fromone one of the of the storage storage devices devices 206, 206, 209 209 or or data data retrieved retrievedfrom from aa storage storagemedium 225inserted medium 225 insertedinto into the the corresponding reader 212, all depicted in Fig. 2A. The execution of a set of the instructions corresponding reader 212, all depicted in Fig. 2A. The execution of a set of the instructions

mayininsome may somecases casesresult resultin in output output of of data. data. Execution mayalso Execution may alsoinvolve involvestoring storingdata dataor or variables to variables to the thememory 234. memory 234.

[00086] Thetensor

[00086] The tensorencoder encoder116, 116,the thetensor tensordecoder decoder146 146and and thedescribed the describedmethods methods maymay use use

input variables input variables 254, 254, which are stored which are stored in in the thememory 234inincorresponding memory 234 correspondingmemory memory locations 255, locations 255, 256, 256, 257. Thetensor 257. The tensorencoder encoder116, 116,the thetensor tensor decoder decoder146 146and andthe thedescribed described methodsproduce methods produceoutput outputvariables variables261, 261,which which areare storedininthe stored thememory memory234234 in corresponding in corresponding

memory memory locations262, locations 262,263, 263,264. 264.Intermediate Intermediate variables variables 258258 maymay be stored be stored in memory in memory

locations 259, locations 259, 260, 260, 266 and 267. 266 and 267.

[00087] Referring

[00087] Referring to the to the processor processor 205 of205 Fig.of2B, Fig. the2B, the registers registers 244, 244, 245, 246,245, 246, the arithmetic the arithmetic

logic unit logic unit (ALU) 240,and (ALU) 240, andthe the control control unit unit 239 worktogether 239 work togetherto to perform performsequences sequencesofofmicro- micro- operations needed to perform “fetch, decode, and execute” cycles for every instruction in the operations needed to perform "fetch, decode, and execute" cycles for every instruction in the

instruction set instruction setmaking making up the program up the 233.Each program 233. Each fetch,decode, fetch, decode,and andexecute execute cycle cycle comprises: comprises:

44204385_1 44204385_1

21

a fetch a fetch operation, operation, which which fetches fetches or or reads reads an an instruction instruction231 231from from aa memory memory 07 Jun 2024

location 228, location 228, 229, 229, 230; 230;

a decode a operationin decode operation in which whichthe thecontrol control unit unit 239 determineswhich 239 determines whichinstruction instructionhas hasbeen been fetched; and fetched; and

an execute an execute operation operation in in which the control which the control unit unit 239 239 and/or and/or the the ALU 240execute ALU 240 execute the the

instruction. 2024203901

instruction.

[00088] Thereafter,

[00088] Thereafter, a further a further fetch, fetch, decode, decode, and execute and execute cycle cycle for for the the next next instruction instruction may be may be executed. Similarly, executed. Similarly, aa store store cycle cycle may be performed may be performedbybywhich which thethe controlunit control unit239 239stores storesoror writes aa value writes value to to aamemory location 232. memory location 232.

[00089] Each

[00089] Each stepstep or sub-process or sub-process in thein the methods methods of Figs. of 18 Figs. 18toand and 19, 19, to be described, be described, is is associated with associated one or with one or more segmentsofofthe more segments theprogram program 233 233 andand is is typicallyperformed typically performedby by thethe

register section 244, 245, 246, the ALU 240, and the control unit 239 in the processor 205 register section 244, 245, 246, the ALU 240, and the control unit 239 in the processor 205

working together to perform the fetch, decode, and execute cycles for every instruction in the working together to perform the fetch, decode, and execute cycles for every instruction in the

instruction set for the noted segments of the program 233. instruction set for the noted segments of the program 233.

[00090] Fig.

[00090] Fig. 3A 3Aisis aa schematic blockdiagram schematic block diagram300 300showing showing functional functional modules modules of aof a backbone backbone

portion 310 portion of aa CNN. 310 of Thediagram CNN. The diagram 300300 which which may may serveserve as anasimplementation an implementation of theofCNN the CNN backbone114 backbone 114when when thethe system system 100100 is configured is configured to perform to perform a ‘YOLOv3’ a 'YOLOv3' network. network. The The backboneportion backbone portion114 114isissometimes sometimes referredtotoasas'DarkNet-53', referred ‘DarkNet-53’,although although differentbackbones different backbones are also possible, resulting in a different number of and dimensionality of layers of the are also possible, resulting in a different number of and dimensionality of layers of the

tensors 115 tensors for each 115 for each frame. In one frame. In oneimplementation, implementation,the thebackbone backbone portion portion 310310 maymay be used be used as aas a person detector for the purpose of object tracking. person detector for the purpose of object tracking.

[00091] Asshown

[00091] As shownin in Fig.3A, Fig. 3A,the thevideo videodata data113 113isispassed passedtotoaa resizer resizer module 304.The module 304. Theresizer resizer module304 module 304resizes resizeseach eachframe frameofofthe thevideo videodata data113 113totoaaresolution resolution suitable suitable for for processing processing by by

the CNN the backbone CNN backbone 310, 310, producing producing resized resized frame frame datadata 312.312. If the If the resolution resolution of of thethe video video

data 113 data is already 113 is already suitable suitablefor forthe CNN the CNN backbone 310,operation backbone 310, operationofofthe theresizer resizer module 304isis module 304

not needed. not Theresized needed. The resized frame framedata data312 312isis passed passedto to aa convolutional batch normalisation convolutional batch normalisationleaky leaky rectified linear rectified linear(CBL) (CBL) module 314totoproduce module 314 producetensors tensors316. 316.The TheCBL CBL module module 314 contains 314 contains

modulesasasdescribed modules describedwith withreference referencetotoaa CBL CBLmodule module 360360 as shown as shown in Fig in Fig 3D. 3D.

44204385_1 44204385_1

22

[00092] TheCBL

[00092] The CBL module module 360 360 takes takes as input as input a tensor a tensor 361361 of the of the resized resized frame frame data data 312. 312. TheThe 07 Jun 2024

tensor 361 is passed to a convolutional layer 362 to produce tensor 363. If the convolutional tensor 361 is passed to a convolutional layer 362 to produce tensor 363. If the convolutional

layer 362 has a stride of one, the tensor 363 has the same spatial dimensions as the tensor 361. layer 362 has a stride of one, the tensor 363 has the same spatial dimensions as the tensor 361.

If the convolution layer 362 has a larger stride, such as two, the tensor 363 has smaller spatial If the convolution layer 362 has a larger stride, such as two, the tensor 363 has smaller spatial

dimensionscompared dimensions comparedto to thetensor the tensor361, 361,for forexample, example,halved halved in in width width and and height height forthe for thestride stride of two. Regardless of the stride, the size of channel dimension of the tensor 363 may vary of two. Regardless of the stride, the size of channel dimension of the tensor 363 may vary

comparedtotothe compared thechannel channeldimension dimensionof of thetensor the tensor361 361forfora aparticular particular CBL CBLblock. block.The The 2024203901

tensor 363 tensor is passed 363 is passed to to aa batch batch normalisation normalisation module 364,which module 364, whichoutputs outputsa atensor tensor365. 365.The The batch normalisation batch normalisation module module364 364 normalises normalises thethe input input tensor363 tensor 363 and and applies applies a a scalingfactor scaling factor and an offset value to produce the output tensor 365. The scaling factor and offset value are and an offset value to produce the output tensor 365. The scaling factor and offset value are

derived from a training process. The tensor 365 is passed to a leaky rectified linear activation derived from a training process. The tensor 365 is passed to a leaky rectified linear activation

(“LeakyReLU”) ("LeakyReLU") module module 366produce 366 to to produce a tensor a tensor 367. 367. The module The module 366 provides 366 provides a ‘leaky’ a 'leaky'

activation function activation function whereby positive values whereby positive values in in the the tensor tensor are arepassed passed through through and and negative negative

values are values are severely severely reduced in magnitude, reduced in for example, magnitude, for example,toto 0.1X 0.1Xtheir their former formervalue. value.

[00093] ReturningtotoFig.

[00093] Returning Fig. 3A, 3A,the the tensor tensor 316 316 is is passed from the passed from the CBL CBLblock block 314 314 to to a a residual residual

block module block module320, 320,such suchasasa a'res1+2+8' ‘res1+2+8’module module (also (also referred referred toto asasa ares11 res11module) module) containing containing

a concatenation of three residual blocks, each residual block containing one (1) residual unit, a concatenation of three residual blocks, each residual block containing one (1) residual unit,

two (2) residual units, and eight (8) residual units, respectively. The spatial resolution of the two (2) residual units, and eight (8) residual units, respectively. The spatial resolution of the

tensors is halved horizontally and halved vertically in each of the residual blocks (see Fig. 3B) tensors is halved horizontally and halved vertically in each of the residual blocks (see Fig. 3B) by aa convolution by withstride convolution with stride equal equal to to two two in in aaCBL block344. CBL block 344.

[00094]

[00094] AAresidual residual block blockis is described described with with reference reference to to aa ResBlock 340asasshown ResBlock 340 shownin in Fig.3B. Fig. 3B. TheResBlock The ResBlock 340 340 receives receives a tensor341. a tensor 341.The The tensor tensor 341 341 is is zero-padded zero-padded by by a zero-padding a zero-padding

module342 module 342totoproduce producea atensor tensor343. 343.The Thetensor tensor343 343 isispassed passedtotothe theCBL CBL module module 344 344 to to produceaa tensor produce tensor 345. 345. The TheCBL CBL module module 344 contains 344 contains a convolution a convolution (for (for example example 362) awith 362) with a stride parameter set to two, resulting in the tensor 345 having half the width and half the height stride parameter set to two, resulting in the tensor 345 having half the width and half the height

of the tensor 343. The tensor 345 is passed to a residual unit 346. The residual unit 346 of the tensor 343. The tensor 345 is passed to a residual unit 346. The residual unit 346

contains a series of concatenated residual units, based on the number of residual block (for contains a series of concatenated residual units, based on the number of residual block (for

example, eleven (11) units for the block 320). The last residual unit of the residual units 346 example, eleven (11) units for the block 320). The last residual unit of the residual units 346

outputs a tensor 347. outputs a tensor 347.

[00095]

[00095] AAresidual residual unit unit is is described described with with reference reference to to aaResUnit ResUnit 350 as shown 350 as in Fig. shown in Fig. 3C. 3C. The The ResUnit 350takes ResUnit 350 takesaatensor tensor 351 351asas input. input. The tensor 351 The tensor 351 is is passed to aa CBL passed to module CBL module 352 352 to to

produceaa tensor produce tensor 353. 353. The Thetensor tensor 353 353isis passed passed to to aa second CBLunit second CBL unit354 354totoproduce producea a tensor tensor

355. An 355. Anadd addmodule module 356 356 sums sums the the tensor tensor 355355 with with the the tensor tensor 351351 to to produce produce a tensor a tensor 357. 357. TheThe

44204385_1 44204385_1

23

add module 356 may also be referred to as a ‘shortcut’ as the input tensor 351 substantially add module 356 may also be referred to as a 'shortcut' as the input tensor 351 substantially 07 Jun 2024

influences the influences the output output tensor tensor 357. 357. For For an an untrained untrained network, network, ResUnit 350acts ResUnit 350 actsto to pass-through pass-through tensors. As tensors. As training training is isperformed, performed, the theCBL modules352 CBL modules 352andand 354 354 actact toto deviatethe deviate thetensor tensor357 357 awayfrom away fromthe thetensor tensor351 351ininaccordance accordancewith withtraining trainingdata dataand andground groundtruth truthdata. data.

[00096] ReturningtotoFig.

[00096] Returning Fig. 3A, 3A,the the Res1 Res11 module module 320 320 outputs outputs a tensor a tensor 322.322. The The tensor tensor 322 322 is is

output from output from the the backbone backbonemodule module310310 as as oneone of of thethe layersandand layers alsoprovided also provided to to a a Res8 Res8

module324. 324.The TheRes8 Res8 module 324 324 is aisresidual a residual block (i.e.,340), 340), which whichincludes includeseight eightresidual residual 2024203901

module module block (i.e.,

units (i.e. units (i.e. 350). 350).The TheRes8 Res8 module 324produces module 324 producesa atensor tensor326. 326.The Thetensor tensor326 326isispassed passedtoto aa Res4 Res4 module328 module 328and andoutput outputfrom from thethe backbone backbone module module 310one 310 as as of onethe of layers. the layers. The The Res4Res4 module module is is a residual block (i.e., 340), which includes four residual units (i.e., 350). The Res4 module 328 a residual block (i.e., 340), which includes four residual units (i.e., 350). The Res4 module 328

producesaa tensor produces tensor 329. 329. The Thetensor tensor 329 329isis output output from fromthe the backbone backbonemodule module310310 as one as one of of thethe

layers. Collectively, the layer tensors 322, 326, and 329 are output as the tensors 115 and may layers. Collectively, the layer tensors 322, 326, and 329 are output as the tensors 115 and may

be referred be referred to to as aslayers layers0-2 0-2oror L0, L0,L1, L1,and andL2, L2,respectively. The respectively. Thebackbone backbone CNN 310maymay CNN 310 take take

as input as input aa video video frame frame of of resolution resolution 1088×608 andproduce 1088x608 and produce threetensors, three tensors,corresponding correspondingtoto three layers, with the following dimensions: [1, 256, 76, 136], [1, 512, 38, 68], [1, 1024, 19, three layers, with the following dimensions: [1, 256, 76, 136], [1, 512, 38, 68], [1, 1024, 19,

34]. 34]. Another exampleofofthe Another example thethree three tensors tensors 115 115 corresponding correspondingtotothree threelayers layers may maybebe[1,

[1, 512, 512, 34, 34, 19], [1, 256, 19], [1, 68,38], 256, 68, 38],[1,

[1,128, 128,136, 136, 76]76] which which are respectively are respectively separated separated at layerat layer75,index index 90, 75, 90,

and 105 and 105when whenthe thelayers layersare areenumerated enumerated according according to to theYOLOv3 the YOLOv3 software software implementation implementation of of the backbone the 300and backbone 300 anda ahead head1200. 1200.

[00097] Eachofofthe

[00097] Each theRes11 Res11320, 320,Res8 Res8 324 324 andand Res4 Res4 328 328 operates operates in ainsimilar a similar manner manner to to

ResBlock340. ResBlock 340.Each Eachofof theCBL the CBL 314, 314, thethe CBLCBL 344 the 344 and andCBL the 354 CBLoperate 354 operate in a similar in a similar

mannertotothe manner the CBL CBL 360. 360.

[00098] Fig. 44 is

[00098] Fig. is aaschematic schematic block block diagram showingfunctional diagram showing functionalmodules modules of of an an alternative alternative

backboneportion backbone portion400 400ofofa aCNN, CNN, which which may may serve serve asimplementation as an an implementation ofCNN of the the CNN backbone114 backbone 114when when thethe system system 100100 is configured is configured to perform to perform a “FasterRCNN” a "FasterRCNN" or or “MaskRCNN” "MaskRCNN" ResNet ResNet 101 network. 101 network. Frame Frame data 113data 113 isand is input input and through passes passes through a stem a stem network408, network 408,aares2 res2 module module412, 412,a ares3 res3module module 416, 416, a res4module a res4 module 420, 420, andand a res5 a res5 module module 424 424 via tensors via tensors 409, 409, 413, 413, 417, 417, 421, 421, 425 425 respectively. respectively. The backboneportion The backbone portion400 400maymay be be used used as as part of a general object detector or for instance segmentation, with various classes of object part of a general object detector or for instance segmentation, with various classes of object

supported. supported.

[00099] The

[00099] Thestem stemnetwork network408408 includes includes a convolution a convolution with with a kernel a kernel size size of of 7x7 7x7 and and a strideofof a stride

two (2) two (2) and and aa max poolingoperation. max pooling operation.The The res2module res2 module 412, 412, thethe res3 res3 module module 416,416, the the res4res4

44204385_1 44204385_1

24

module420 module 420and andthetheres5 res5module module424424 perform perform convolution convolution operations, operations, suchsuch as LeakyReLU as LeakyReLU 07 Jun 2024

activations. Each activations. module412, Each module 412,416, 416,420 420 and and 424 424 also also performs performs oneone halving halving of the of the width width andand

height of the processed tensors via a stride setting of two. Each of the tensors 413, 417, 421 height of the processed tensors via a stride setting of two. Each of the tensors 413, 417, 421

and 425 and 425are are passed passedto to one one of of 1x1 1x1 lateral lateral convolution convolution modules 446,444, modules 446, 444,442 442and and440 440 respectively. The respectively. The modules 446,444, modules 446, 444,442, 442,and and440 440produce produce tensors tensors 447, 447, 445, 445, 443 443 andand 441441

respectively. The respectively. tensor 441 The tensor 441is is passed to aa 3x3 passed to 3x3 output output convolution module470, convolution module 470,which which producesan produces anoutput outputtensor tensor P5 P5471. 471. 2024203901

[000100] Thetensor

[000100] The tensor441 441isisalso also passed passedto to upsampler upsamplermodule module450450 to to produce produce an upsampled an upsampled

tensor 451. tensor 451. AAsummation summation module module 460 460 sums sums the tensors the tensors 443451 443 and andto451 to produce produce a tensor a tensor 461. 461. Thetensor The tensor 461 461is is passed to an passed to an upsampler module upsampler module 452 452 andand a 3x3 a 3x3 lateralconvolution lateral convolution module472. module 472.TheThe module module 472 472 outputs outputs a P4a tensor P4 tensor 473.473. The upsampler The upsampler modulemodule 452 produces 452 produces an an upsampledtensor upsampled tensor453. 453.A A summation summation module module 462tensors 462 sums sums tensors 445 445 and 453and to 453 to produce produce a a tensor 463. tensor 463. The tensor 463 The tensor 463 is is passed to aa 3x3 passed to 3x3 lateral lateralconvolution convolution module 474and module 474 andananupsampler upsampler module454. module 454.TheThe module module 474 474 outputs outputs a P3a tensor P3 tensor 475.475. The upsampler The upsampler modulemodule 454 outputs 454 outputs an an upsampledtensor upsampled tensor455. 455.A A summation summation module module 464the 464 sums sums the tensors tensors 447 447 and 455and to 455 to produce produce

tensor 465, tensor 465, which is passed which is to aa 3x3 passed to 3x3 lateral lateralconvolution convolution module 476. The module 476. Themodule module 476476 outputs outputs a a P2 tensor P2 tensor 477. 477. The Theupsampler upsampler modules modules 450,450, 452,452, and and 454 454 use use nearest nearest neighbour neighbour interpolation interpolation

for low for low computational complexity.The computational complexity. The tensors471, tensors 471,473, 473,475, 475,and and477477 form form thethe output output

tensors 115 tensors of the 115 of the CNN backbone CNN backbone 400. 400. Although Although Fig. Fig. 4 shows 4 shows a particular a particular backbone backbone portion portion of of the Faster RCNN network architecture (a ‘P-layer split point), different divisions into backbone the Faster RCNN network architecture (a 'P-layer split point), different divisions into backbone

and head are possible. Splitting the network at tensor 409 is termed a ‘stem’ split point. and head are possible. Splitting the network at tensor 409 is termed a 'stem' split point.

Splitting the network at tensors 447, 445, 443, and 441 is termed a ‘C-layer’ split point. Splitting the network at tensors 447, 445, 443, and 441 is termed a 'C-layer' split point.

[000101] The bitstream includes a plurality of network abstraction layer (NAL) units. Fig. 10 is

a schematic a block diagram schematic block diagramshowing showingthethe structureofofa aNAL structure NAL unit unit 1000. 1000. Each Each NAL NAL unit unit is is prefixed with a start code 1010, consisting of three contiguous bytes having values of 0x00, prefixed with a start code 1010, consisting of three contiguous bytes having values of 0x00,

0x00, and 0x00, and 0x01. 0x01.The The startcode start code1010 1010isisfollowed followedbybya aNAL NALunitunit header header 1012. 1012. The The NAL NAL unit unit header 1012 header 1012isis of of aa format format as as described described with with reference reference to to Appendices A,B,B,and Appendices A, andC,C,for forAVC, AVC, HEVC,and HEVC, andVVC, VVC, respectively. For respectively. For HEVC andVVC HEVC and VVCthethe NAL NAL unit unit header1012 header 1012isis aa predeterminedlength predetermined lengththat that is is always twobytes. always two bytes. For ForAVC, AVC,thethe NAL NAL unitunit header header 10121012 is a is a predetermined length of either one, three or four bytes, i.e., different to the predetermined predetermined length of either one, three or four bytes, i.e., different to the predetermined

length for length for the the other othercodecs. codecs. The NALunit The NAL unitheader header1012 1012 includes includes a nal_unit_type, a nal_unit_type, used used to to

identify the parsing process to be applied to parse a raw bitstream sequence payload identify the parsing process to be applied to parse a raw bitstream sequence payload

(RBSP)1020. (RBSP) 1020.TheThe NALNAL unit unit type type is either is either a five-ororsix-bit a five- six-bit fixed fixed length length code code and andincludes includes ‘reserved’ values,which 'reserved' values, which may may be defined be defined as partas of part of a revision a future future revision of the respective of the respective

44204385_1 44204385_1

25

specification (AVC, specification HEVC, (AVC, HEVC, or or VVC) VVC) and ‘unspecified’ and 'unspecified' values, values, which which will will notused not be be used in future in future 07 Jun 2024

versions of versions of the the AVC, HEVC, AVC, HEVC, or or VVCVVC standards standards andinstead and are are instead available available for for use use by other by other

bodies wishing bodies wishingtoto extend extendto to encoding encodingmethods methods other other than than AVC, AVC, HEVC, HEVC, orTo or VVC. VVC. avoidTo avoid detection of detection of false falsestart startcodes that codes may that maybebepresent presentinin thethe RBSP RBSP by by chance, chance, the the RBSP 1020isis RBSP 1020

encapsulated into encapsulated into aa NAL unitpayload NAL unit payload1014 1014 with with a process a process of of insertionofof'emulation insertion ‘emulationprevention prevention bytes’. In bytes'. In forming the NAL forming the NAL unitpayload unit payload 1014, 1014, whenever whenever a two-byte a two-byte sequence sequence 0x00 0x00 0x00 0x00 is is encounteredinin the encountered the RBSP RBSP 1020, 1020, such such as as zero zero bytes1016, bytes 1016, an an ‘emulation_prevention_three_byte’, 'emulation_prevention_three_byte', 2024203901

such as such as byte byte 1018, havingvalue 1018, having value0x03, 0x03,isis inserted inserted immediately followingthe immediately following thezero zero bytes 1016. bytes 1016. The Theprocess processofofT'emulation_prevention_three_byte: emulation_prevention_three_byteinsertion insertion to to form the NAL form the unit NAL unit

payload1014 payload 1014from fromthe theRBSP RBSP10201020 ensures ensures an absence an absence of false of any any false start start codes codes that that would would

trigger erroneous parsing of the bitstream 121. The start of the RBSP 1020 is the earliest trigger erroneous parsing of the bitstream 121. The start of the RBSP 1020 is the earliest

position at which detection of two zero bytes for the purpose of position at which detection of two zero bytes for the purpose of

emulation_prevention_three_byte insertion emulation_prevention_three_byteinsertion is is possible,which possible, which would would take take place place between between the the

secondand second andthird third byte byte of of the the RBSP 1020.To To RBSP 1020. prevent prevent false false startcode start codedetection detectionininthe the early early bytes bytes

(first ororsecond) (first second)ofofthe RBSP the RBSP 1020, the last 1020, the lastbyte byteofofthe NAL the NAL unit unit header header 1012 needsto 1012 needs to be be nonzero. nonzero.

[000102] ForHEVC

[000102] For HEVCand and VVC,VVC, the last the last syntax syntax element element of NAL of the the NAL unit header unit header 1012 is 1012 is

‘nuh_temporal_id_plus1’,coded 'nuh_temporal_id_plusl', coded with with a three-bitfixed-length a three-bit fixed-lengthcodeword codewordandand prohibited prohibited from from

using the using the bit bit string string“000”. "000".For ForHEVC andVVC HEVC and VVCthe the NALNAL unit unit header header 1012 1012 is always is always a a predeterminedlength predetermined lengthofoftwo twobytes bytesininlength. length. For ForAVC, AVC, when when using using a one-byte a one-byte NAL NAL unit unit header (that is the predetermined length of the NAL unit header is one byte rather than three header (that is the predetermined length of the NAL unit header is one byte rather than three

bytes or bytes or four four bytes), bytes),the thenal_unit_type nal_unit_typeneeds needs to tobe beavoided. avoided. For For AVC, nal_unit_typeequal AVC, nal_unit_type equaltoto0 0 may be marked as ‘reserved’ or ‘prohibited’ indicating that nal_unit_type is not available for may be marked as 'reserved' or 'prohibited' indicating that nal_unit_type is not available for

use by use by other other bodies outside of bodies outside of one one responsible responsible for for issuing issuing new new versions versions of of the the AVC AVC

specification. specification. InIn theimplementation the implementation described, described, the nal_unit_type the nal_unit_type of 0 shallof 0 shall not not be used, be used,

preventing the preventing the possibility possibility ofofa aNAL unit header NAL unit for AVC header for consistingofofaazero AVC consisting zerobyte. byte.

[000103] A bit ‘forbidden_zero_bit’, a single bit always set to value zero, is coded as the first

[000103] A bit 'forbidden_zero_bit', a single bit always set to value zero, is coded as the first

bit to be parsed (in bit position 7, as bit consumption in each byte progresses from bit 7 down to bit to be parsed (in bit position 7, as bit consumption in each byte progresses from bit 7 down to

bit 0), bit 0),ininthe NAL the NAL unit unit header header 1012, 1012, regardless regardless of of the theusage usage of ofAVC, HEVC, AVC, HEVC, or or VVCVVC as “inner as "inner

codec”. The codec". Theforbidden_zero_bit forbidden_zero_bit may may be used be used to signal to signal an an alternativeNAL alternative NAL unitunit header header format, format,

containing aa different containing different space space for forNAL unit types, NAL unit types, such such as as the the FCM_VMPS, FCM_SPS, FCM_VMPS, FCM_SPS, and and FCM_PPS FCM_PPS NAL NAL units. units. Standards Standards such such as as RFC6184 RFC6184 include include functionality functionality to set to theset the

forbidden_zero_bitinin aa NAL forbidden_zero_bit NAL unitheader unit headerwhen when transmission transmission errors errors areare detected detected in in receiveddata received data that forms that forms a a NAL unitpayload NAL unit payloadassociated associatedwith withthe theNAL NAL unit unit header. header.

44204385_1 44204385_1

26

[000104]

[000104] AAdecoder decoderconforming conforming to AVC, to AVC, HEVC, HEVC, or VVC,oris VVC, is not required not required to decode to decode NAL units NAL units 07 Jun 2024

with the with the forbidden_zero_bit set. In forbidden_zero_bit set. In some implementations,the some implementations, thedecoder decodermay may attempt attempt to to decode decode

NAL NAL unitswith units withthe theforbidden_zero_bi forbidden_zero_bit setset anyway anyway to provide to provide output output frame frame datadata thatthat is likely is likely

corrupted but corrupted but may maybebemore moredesirable desirablethan thandiscarding discardingthe theNAL NAL unit unit andand notnot provide provide anyany

associated output associated output frame data. In frame data. In the the context context of of FCM, tensorcompression FCM, tensor compression involves involves a reduction a reduction

into aa smaller into smaller dimensionality dimensionality space, space, quantisation, quantisation,and and generally generally packing packing into into aavideo video frame frame and and

coding the coding the video video frame framewith withaaconventional conventionalvideo videocodec. codec.Coding Codingof of thethevideo video frame frame is is known known as as 2024203901

an 'inner an ‘inner codec’ codec' as as this thisisis performed performed as asaastage stageinin thethe FCM FCM ‘outer’ 'outer'codec. codec. Reuse of aa Reuse of

conventionalvideo conventional videocodec codec(such (suchasasVVC, VVC, HEVC HEVC or AVC) or AVC) permitspermits deployment deployment of of the FCM the FCM standard onto standard onto existing existing system-on-chips byreusing system-on-chips by reusingcodec codecimplementations implementations already already provided provided by by ASICvendors. ASIC vendors.

[000105] Fig. 55 is

[000105] Fig. is aa schematic schematic block diagram500 block diagram 500ofofananexample example implementation implementation of the of the tensor tensor

encoder116 encoder 116using usingaaconfigurable configurabletensor tensorcompressor compressor stage,a atensor stage, tensordata datatype type adaptation adaptation stage, stage, and a tensor dimensionality adaptation stage, and a selectable video encoder 542, also referred and a tensor dimensionality adaptation stage, and a selectable video encoder 542, also referred

to as to as the the "inner "innercodec". codec". The video encoder The video encoder542 542implements implementsoneone of of several several video video standards, standards,

such as such as AVC, HEVC, AVC, HEVC, or VVC. or VVC. The video The video encoder encoder 542 may542 may also also implement implement a customised a customised

compressionmethod, compression method, such such as as losslessCABAC lossless CABAC encoding encoding of each of each quantised quantised value value in theininput the input tensors 115, tensors 115, using using an an algorithm such as algorithm such as ISO/IEC ISO/IEC15938-17, 15938-17, also also known known as “deepCABAC”. as "deepCABAC".

[000106]Fig.

[000106] Fig. 11 11is is aa schematic block diagram schematic block diagramshowing showingthethe bitstream bitstream 121 121 or or 143 143 holding holding

encodedpacked encoded packedfeature featuremaps, maps,parameter parameter setsforforthe sets theFCM FCM codec codec and and parameter parameter sets sets for for the the

inner codec. inner codec.

[000107] Fig. 18

[000107] Fig. 18 shows showsa amethod method 1800 1800 forfor performing performing a firstportion a first portionofofa aCNN, CNN, selecting selecting a a

tensor compressor, tensor compressingtensors compressor, compressing tensorsusing usingthe theselected selectedtensor tensorcompressor, compressor,and andencoding encoding resulting compressed resulting tensorsusing compressed tensors usingaa video videoencoder encoderconforming conformingto to a selectedvideo a selected videocompression compression standard. The standard. Thetensor tensorencoder encoder116 116(and (andthe theexample example implementation implementation 500)500) and and the method the method 1800 1800 maybebeimplemented may implementedas as oneone or or more more software software application application programs programs 233 executable 233 executable within within the the computersystem computer system200. 200.TheThe tensor tensor encoder encoder 116 116 and and the the method method 1800 1800 may bemay be effected effected by by instructions 231 (see Fig. 2B) in the software 233 that are carried out within the computer instructions 231 (see Fig. 2B) in the software 233 that are carried out within the computer

system200. system 200.The Thesoftware software instructions231 instructions 231may may be be formed formed as one as one or more or more codecode modules, modules, each each for performing for oneor performing one or more moreparticular particular tasks. tasks. The Themethod method 1800 1800 commences commences at a at a select select inner inner

codec step codec step 1801. 1801.

44204385_1 44204385_1

27

[000108] Atthe

[000108] At thestep step 1801, 1801, the the processor processor 205 205selects selects one one video video compression compressionstandard standardoutoutofofa a 07 Jun 2024

plurality of plurality ofvideo video compression standards. AVC, compression standards. AVC, HEVC, HEVC, and may and VVC VVC be may be options options for selection for selection

at step at step 1801 1801 and and the the video video compression standardselected compression standard selectedmay maybebedependent dependent on on considerations considerations

such as the capabilities of the source device 110 and the destination device 140. A capabilities such as the capabilities of the source device 110 and the destination device 140. A capabilities

negotiation may negotiation takeplace may take placebetween betweenthe thesource sourcedevice device110 110 and and thedestination the destinationdevice device140 140 wherebya aselection whereby selectionis is made suchthat made such that the the same samecompression compression standard standard is is usedinineach used eachdevice, device, prioritising VVC prioritising overHEVC VVC over HEVCand and HEVCHEVC overSelection over AVC. AVC. Selection at step at step 1801 may1801 may be constrained be constrained 2024203901

based on a profile of the FCM standard, such that a smaller set of compression standards is based on a profile of the FCM standard, such that a smaller set of compression standards is

available, such available, such as as HEVC and HEVC and AVC, AVC, fromfrom which which oneselected one is is selected based based on aforementioned on the the aforementioned capabilities negotiation capabilities negotiationstep. step.Rather Rather than than selecting selectinga arepurposed repurposedvideo video codec codec for for compressing compressing

features, specific features, specificFCM profiles may FCM profiles select aa customised may select approachsuch customised approach suchasascompressing compressing quantized values quantized values of of features features using using deepCABAC, deepCABAC, withwith or without or without prediction prediction of values of values within within a a feature map feature or from map or fromone onefeature feature map maptotoanother. another.Such Such customised customised approaches approaches may may be targeted be targeted

applications where applications achievinglow where achieving lowbitrate bitrate is is aa secondary secondary consideration comparedtotoachieving consideration compared achieving very low very low complexity, complexity,for forexample. example.InIntypical typicaluse, use,the the selection selection of of the the step step1801 1801 is isperformed performed

one time one time and andthus thus does doesnot not change changeduring duringthe thecourse courseofofencoding encodingone onebitstream. bitstream.Arrangements Arrangements may select a different inner codec during coding of the bitstream 121 provided that the switch may select a different inner codec during coding of the bitstream 121 provided that the switch

from one from oneinner inner codec codectoto aa different different inner inner codec codec is ismade prior to made prior to encoding encoding a a new group-of- new group-of-

pictures (GOP), pictures i.e., prior (GOP), i.e., priortoto a new a new“intra random "intra randomaccess accesspicture” picture"(IRAP) (IRAP) or or “instantaneous "instantaneous

decoderrefresh" decoder refresh” (IDR) (IDR)picture. picture. Control Controlinin the the processor processor 205 205progresses progressesfrom fromthe thestep step1801 1801toto an encode inner codec identifier step 1802. an encode inner codec identifier step 1802.

[000109]AtAtthe

[000109] thestep step 1802, 1802,aa metadata metadataencoder encoder544, 544,under underexecution execution of of theprocessor the processor205, 205, encodesan encodes anidentifier identifier for for the theselection selectionofof which whichinner innercodec codectotouse usefrom fromstep step1801 1801 to toFCM FCM

metadata548. metadata 548.The Theidentifier identifier is is encoded using aa NAL encoded using NALunit, unit,inin particular particular as as an an “inner "inner codec codec

identifier NAL identifier (ICI) unit" NAL (ICI) unit” 1110 (see Fig. 1110 (see Fig. 11). 11). A NAL A NAL unitmultiplexor unit multiplexor 550 550 multiplexes multiplexes thethe

NAL NAL unit1110 unit 1110 from from thethe FCMFCM NAL units NAL units bitstream bitstream 548the 548 into intobitstream the bitstream 121. 121. The syntax The syntax

described with described with reference reference to to Appendix Appendix D D isisused usedtotoencode encodethe theselected selectedinner innercodec. codec.AsAs described in described in Appendix Appendix D,D,ananinner innercodex codexidentifier identifier NAL NAL unit1110 unit 1110 hashas a fixed a fixed lengthofofone- length one- byte, i.e., byte, i.e.,having havinga a one-byte one-byteheader headerand andno no RBSP, distinguishing such RBSP, distinguishing suchNAL NAL unitsfrom units from allall

HEVC HEVC andand VVCVVC NAL units. NAL units. A five-bit A five-bit code code of of “11111” "11111" (decimal (decimal 31),occupies 31), which which occupies bit bit positions corresponding positions to "nal_unit_type" corresponding to “nal_unit_type”inin the the AVC AVC standard,distinguishes standard, distinguishesthe theNAL NAL unit unit

from an from an extension extensionto to the the AVC standard.Notably, AVC standard. Notably, thethe “forbidden_zero_bit”, "forbidden_zero_bit", common common to to AVC, AVC, HEVC, HEVC, andand VVC, VVC, is retained is retained as as a bitalways a bit always settotothe set thevalue valueofofzero. zero. The The"forbidden_zero_bit" “forbidden_zero_bit” remains free to be used for other purposes, such as indication of errors at the transport layer, remains free to be used for other purposes, such as indication of errors at the transport layer,

44204385_1 44204385_1

28

whichisis outside which outside of of the the scope scope of of the theAVC, HEVC, AVC, HEVC, andand VVCVVC standards. standards. A two-bit A two-bit codeword codeword 07 Jun 2024

“inner_codec_identifier”, occupying "inner_codec_identifier", occupyingthe thesame samebit bitpositions positionsas as the the “nal_ref_idc” syntax element "nal_ref_idc" syntax element of the of the AVC standard,signals AVC standard, signalswhich whichone oneofofAVC, AVC, HEVC, HEVC, or VVC, or VVC, or a custom or a custom codec codec are to are be to be used. An used. Aninstance instanceofofthe the inner inner codec codecidentifier identifier NAL unitisis required NAL unit required at at each each random-access random-access

entry point into the bitstream, that is, required at the start of the bitstream and prior to each entry point into the bitstream, that is, required at the start of the bitstream and prior to each

IRAPororIDR IRAP IDR picture1122 picture 1122 andand associated associated parameter parameter sets, sets, such such as as anan SPS SPS (sequence (sequence parameter parameter

set) 1118 set) 1118 and a PPS and a (picture parameter PPS (picture parameterset) set) 1120, as shown 1120, as inFig. shown in Fig. 11. 11. Each of the Each of the SPS 1118 SPS 1118 2024203901

and the and the PPS 1120contain PPS 1120 containNAL NAL unit unit headers headers of of thethe format format indicated indicated by by thethe inner inner codec codec

identifier 1110. identifier 1110. As a result As a resultof ofthe theinner codec inner codecidentifier NAL identifier NAL unit unit1110, 1110,the theformat formatof ofNAL unit NAL unit

headers of headers of subsequent subsequentNAL NAL units units is isknown. known. Thus, Thus, thethe NALNAL unit unit headers headers are are parseable, parseable, whenwhen

any one any one of of the the AVC, HEVC, AVC, HEVC, or VVC or VVC may may be beasused used as inner inner codecscodecs in the in the bitstream bitstream 121. 121. The The step 1802 step can operate 1802 can operate to to encode the inner encode the inner codec codecidentifier identifier NAL unit1110 NAL unit 1110and andassociated associated codewordsforbidden_zero_bit,inner_codec_identifier codewords forbidden_zero_bit, inner_codec_identifier and and constant_value_31 constant_value_31 (as the (as per per the exampleofofAppendix example AppendixD) D) forfor thetheselected selectedinner innerencoder encodertotothe thebitstream bitstream121. 121.The Thestep step1802 1802 operates to operates to encode encode aa NAL unittotothe NAL unit the bitstream bitstream having havingaapredetermined predeterminedlength, length,wherein whereinthethe NAL NAL unitofofthe unit thepredetermined predeterminedlength lengthcorresponds corresponds to to possibleNALNAL possible units units of one of one of the of the

selectable inner codecs (AVC) but the bit field (at bit positions four down to zero) that would selectable inner codecs (AVC) but the bit field (at bit positions four down to zero) that would

indicate nal_unit_type indicate nal_unit_type in in the the case case of ofAVC indicates aa reserved AVC indicates reserved or or prohibited prohibited codeword (suchasas codeword (such

0x1f or 0x1f or 31 or 0b11111). 31 or Instead,the 0b11111). Instead, the presence presenceofof binary binary value value 0b11111 0b11111atatbit bitpositions positions four four down to zero indicate this NAL unit selects one inner codec out of a of a plurality of inner down to zero indicate this NAL unit selects one inner codec out of a of a plurality of inner

codecs (AVC, codecs (AVC,VVC, VVC, HEVC, HEVC, customcustom for example). for example). Thepotential The other other potential inner inner codecscodecs (HEVC, (HEVC,

VVC,custom) VVC, custom) have have NALNAL unit unit lengths lengths different different to to thethe predetermined predetermined length length possible possible forfor AVCAVC

(i.e., one byte), as described above. Since a NAL unit with one byte length (excluding the start (i.e., one byte), as described above. Since a NAL unit with one byte length (excluding the start

code) only code) only appears appears for for AVC AVC and and forfor theinner the innercodec codecidentifier, identifier, any any such such NAL NAL unitcancanbebe unit

unambiguously unambiguously parsed, parsed, based based on on bitsfour bits fourdown downto to bitzero, bit zero,toto determine determinewhether whetherthetheNAL NALunitunit

is intended is intended for for parsing parsing by by an an AVC innercodec AVC inner codecfor forananinner innercodec codecidentification identification purpose. purpose. AA bitstream must bitstream must an an inner inner codec codecidentifier identifier NAL unit, to NAL unit, to select select an an inner innercodec, codec, prior priortoto any anyNAL NAL

units intended to be parsed by the selected inner codec. units intended to be parsed by the selected inner codec.

[000110] AdditionalNAL

[000110] Additional NAL units units conveying conveying parameters parameters for modules for modules asideaside from from the inner the inner codec, codec,

such as such as an an FCM VMPS FCM VMPS (vision (vision model model parameter parameter set) 1112, set) 1112, anSPS an FCM FCM SPSand 1114, 1114, and an FCM an FCM PPS1116 PPS 1116use usethe thesame sameNALNAL unitunit header header format format as the as the inner inner codec codec and and thusthus mustmust alsoalso follow follow

the inner the inner codec codec identifier identifierNAL unit 1110. NAL unit 1110. AsAsindicated indicatedininAppendix AppendixD, D, a custom a custom inner inner codec codec

mayalso may alsobe beselected selected at at step step 1801 and encoded 1801 and encodedatatstep step 1802. 1802.When When a custom a custom inner inner codec codec is is selected, aacustom selected, custom NAL unitheader NAL unit headerformat formatisisused used(and (andencoded encodedat at step1802), step 1802),which whichmaymay

44204385_1 44204385_1

29

duplicate the bit fields of an existing standard such as VVC. A custom codec typically requires duplicate the bit fields of an existing standard such as VVC. A custom codec typically requires 07 Jun 2024

a custom a enumerationofofNAL custom enumeration NALunitunit types types andand support support thethe selection selection of of one one out out ofof a aplurality plurality of of inner codecs, inner codecs, to to provide provide an an extensiblity extensiblitymechanism. Selectionscan mechanism. Selections caninclude includedirect direct deepCABAC deepCABAC coding of coding of tensor tensor values, values, intra-predictive intra-predictivedeepCABAC coding deepCABAC coding of of tensor tensor values, values, tensorencoding tensor encoding using an using an end-to-end end-to-end learned learned codec codecsuch suchasasthe theapproach approachdescribed describedininpaper paper"Learned “Learned Image Image

Compression Compression with with Discretized Discretized Gaussian Gaussian Mixture Mixture Likelihoods Likelihoods and Attention and Attention Modules” Modules" by by Cheng et Cheng et al. al.NAL NAL units units(such (suchasas thethe FCM VMPS FCM VMPS 1112, 1112, the theFCM FCM SPS 1114, and SPS 1114, and the theFCM FCM PPS PPS 2024203901

1116) defining parameters 1116) defining parameterswithin withinthe the FCM FCM standard standard scope scope butbut outside outside of of thethe innercodec inner codec scope scope

are referred are referred to toas as‘FCM NAL 'FCM NAL units’,and units', andthe theenumerated enumerated nal_unit_type nal_unit_type of of FCMFCM NAL units NAL units is is dependentononthe dependent theselected selected inner inner codec, codec, since since each each standard standard of of AVC, AVC,HEVC, HEVC, and and VVC VVC has has different enumerations of nal_unit_type and different ‘unspecified’ values, available for use different enumerations of nal_unit_type and different 'unspecified' values, available for use

such as such as by the FCM by the standard.InInone FCM standard. one example, example, nal_unit_type nal_unit_type of the of the FCMFCM VMPS VMPS 1112 is1112 is described with described with reference reference to to Appendices A-C Appendices A-C forfor AVC, AVC, HEVCHEVC and and VVC VVCcodecs inner inner codecs respectively (nal_unit_type respectively (nal_unit_type of of FCM VMPS FCM VMPS is for is 24 24 for AVC, AVC, 48HEVC, 48 for for HEVC, and 28 and for 28 for VVC). VVC). Also, nal_unit_type Also, nal_unit_type of of the the FCM_SPS FCM_SPS and and nal_unit_type nal_unit_type of the of the FCM_PPS FCM_PPS are described are described with with reference to reference to Appendices A-C Appendices A-C forAVC, for AVC, HEVC HEVC andinner and VVC VVCcodecs inner respectively. codecs respectively. As with As thewith the HLS(high HLS (highlevel levelsyntax) syntax)design designofofthe the inner inner codec, codec, the the syntax syntax of of the the FCM_PPS FCM_PPS andand thethe

FCM_SPS FCM_SPS are are intended intended to avoid to avoid parsing parsing dependencies, dependencies, i.e., i.e., anan FCM_PPS FCM PPS can becan be parsed parsed

regardless of the loss of the FCM_SPS for the bitstream. It should be noted that due to the regardless of the loss of the FCM_SPS for the bitstream. It should be noted that due to the

differing NAL differing unitheader NAL unit headerformat formatamong amongthethe inner inner codecs, codecs, lossofofthe loss theICI ICINAL NAL unit unit prevents prevents

parsing of parsing of any any other other NAL unitsininthe NAL units the bitstream. bitstream. Also, Also, as as with with the the inner innercodec codec NAL format, NAL format,

emulationprevention emulation preventionbytes bytesare are inserted inserted as as needed into the needed into the FCM NAL FCM NAL units units to to avoid avoid possible possible

false start false startcode codedetection. detection.The Theinner innercodec codecidentifier identifierNAL NAL unit unit1110 1110 needs needs to to precede precede the the FCM FCM

NALunits NAL unitsininorder orderfor for the the tensor tensor decoder 146to decoder 146 to parse parse the the bitstream bitstream 121. TheNAL 121. The NAL unit unit 1110 1110

is therefore usually encoded at the start of the bitstream by the step 1802. However, multiple is therefore usually encoded at the start of the bitstream by the step 1802. However, multiple

instances of instances of the the NAL unit 1110 NAL unit 1110including includingthe theNAL NAL unit unit header header cancan be be present present in in thebitstream the bitstream 121, as encoded 121, as encodedby by stepstep 1802. 1802. The multiple The multiple instances instances can, for can, forbeexample example bethe present at present start at of the start of

the bitstream the bitstream before before other other NAL unitsand NAL units andone oneorormore moreinstances instancesmay maybe be priortotoany prior any'random ‘random access’ (entry) point into the bitstream, such as prior to periodic intra random access pictures access' (entry) point into the bitstream, such as prior to periodic intra random access pictures

(IRAPs) that may be coded in the bitstream 121. If a different inner codec is selected at step (IRAPs) that may be coded in the bitstream 121. If a different inner codec is selected at step

1801 duringcoding 1801 during codingofofthe the bitstream bitstream 121 121as as describe describe above, above,i.e., i.e., aaswitch switchfrom from one one inner inner codec codec

to a different inner codec, a plurality of inner codecs are used in the bitstream and NAL units to a different inner codec, a plurality of inner codecs are used in the bitstream and NAL units

1110 are encoded 1110 are encodedtotothe the bitstream bitstream correspondingly correspondinglyatat step step 1802. 1802. Also, Also,regardless regardlessofof the the inner inner codec in codec in use use and and the the assigned nal_unit_type, the assigned nal_unit_type, the syntax syntax structure structure for foreach each parameter parameter set set (FCM (FCM

VMPS112, VMPS112, thethe FCMFCM SPS1114, SPS1114, and and the thePPS FCM FCM PPS 1116) is 1116) is unchanged. unchanged. Control inControl in the processor the processor

44204385_1 44204385_1

30

progresses from progresses fromthe the step step 1802 1802toto an an encode encodeFCM FCM Vision Vision model model parameter parameter set (VMPS) set (VMPS) 07 Jun 2024

step 1803. step 1803.

[000111] Atthe

[000111] At thestep step 1803, 1803, the the metadata metadataencoder encoder544, 544,under underexecution execution ofof theprocessor the processor205, 205, encodesvision encodes vision model modelparameters parameters 113a, 113a, used used forfor theoperation the operationofofthe theCNN CNN head head 150,150, as the as the

FCMVMPS FCM VMPS 1112 1112 as as FCMFCM metadata metadata 548.548. In one In one implementation,the implementation, theCM_VMPS FCM_VMPS 1112 1112 may may include output_picture_width include output_picture_widthandand output_picture_height output_picture_height (width (width andand height height of output of output pictures) pictures)

for the the vision visionmodel model parameters 113a, as as shown shownininthe theexample exampleofofAppendix AppendixE. E. TheThe NAL NAL unit 2024203901

for parameters 113a, unit

type (nal_unit_type) type (nal_unit_type) of of the the FCM VMPS FCM VMPS 11121112 is dependent is dependent oninner on the the inner codec codec selected selected at step at step

1801. NAL 1801. NAL unittypes unit typesofofthe theFCM FCM VMPS VMPS are described are described with reference with reference to Appendices to Appendices A-C forA-C for AVC,HEVC AVC, HEVC and and VVC codecs VVC inner inner codecs respectively. respectively. The vision The vision model parameters model parameters 113a 113a include include the spatial the spatialresolution resolutionofofthe frame the framedata data113, 113,needed neededfor forbounding bounding boxes boxes (an (an example of the example of the task task result 151) to be scaled to correspond to the resolution of the frame data 113, which is not result 151) to be scaled to correspond to the resolution of the frame data 113, which is not

otherwise known otherwise knownbyby thedestination the destinationdevice device140. 140.Control Control in in theprocessor the processor205 205 progresses progresses from from

the step the step 1803 to aa select 1803 to selectset setofoftensor compressors/decompressors tensor step 1805. compressors/decompressors step 1805.

[000112]AtAtthe

[000112] thestep step 1805, 1805,aa tensor tensor compressor compressorselector selector510, 510,under underexecution executionofofthe the processor 205, processor 205, selects selects aa set setofofmechanisms that may mechanisms that beused may be usedfor for compressing compressingand and decompressingintermediate decompressing intermediatetensors. tensors.Each Each mechanism mechanism of set of the the set forms forms a ‘bottleneck’ a 'bottleneck' and and

correspondsto corresponds to an an encoder encodernetwork networktopology topology coupled coupled to to a decoder a decoder network network topology. topology. The The interface between interface the encoder between the encodernetwork networktopology topology and and thedecoder the decoder network network topology topology is the is the

narrowest and narrowest andhence hencethe thelayer layer with withmost mostreduced reduceddimensionality. dimensionality.TheThe interface interface between between the the

encoder network encoder networktopology topology and and thedecoder the decoder network network topology topology alsoalso includes includes one one or more or more tensors tensors

that may that be referred may be referred to to as as ‘compressed tensors’. The 'compressed tensors'. Theone oneorormore moretensors tensorsmay maybe be produced produced

from operations from operations such such as convolutions, as convolutions, batch normalisations, batch normalisations, activationactivation functions, functions, or matrix or matrix

multiplications, tensor additions and/or subtractions. The dimensionality of tensors at the multiplications, tensor additions and/or subtractions. The dimensionality of tensors at the

interface between interface the encoder between the encodernetwork networktopology topology and and thedecoder the decoder network network topology topology may may vary vary from one from oneinvocation invocationofofthe the method method1800 1800to to a anext nextinvocation invocationofofthe themethod method 1800 1800 (e.g.,the (e.g., the channel count channel countmay mayvary). vary).Support Support forfor a a pluralityof plurality of mechanisms mechanisms in in one one bitstream bitstream enables enables

adapting to adapting to changing networkconditions changing network conditionsand andapplication applicationrequirements requirementsby by switching switching from from one one

mechanism mechanism to to anotherdynamically. another dynamically. ForFor example, example, an object an object segmentation segmentation network network may operate may operate

using aa mechanism using providing mechanism providing lowlow bitrateatatthe bitrate theexpense expenseofoflower lowerquality qualityoutput output(resulting (resulting in in lower mAP) lower mAP) ofof thetask the taskresult result 151 151from fromthe theCNN CNN head head 150. 150.

[000113] Eachmechanism

[000113] Each mechanism selected selected at the at the step1805 step 1805 needs needs to to match match the the dimensionality dimensionality of the of the

tensors 115 tensors at the 115 at the input inputto tothe theencoder encodernetwork network topology and output topology and output of of the the decoder network decoder network

44204385_1 44204385_1

31

topologyin topology in order order to to be be compatible with the compatible with the neural neural network formedbybythethebackbone network formed backbone114114 andand the the 07 Jun 2024

head 150. Tensors in compressed form, (i.e., at the ‘bottleneck’ point or output from the head 150. Tensors in compressed form, (i.e., at the 'bottleneck' point or output from the

encoder network encoder networktopology topology and and input input toto thedecoder the decodernetwork network topology), topology), maymay havehave varying varying

numberand number anddimensionality. dimensionality.Where Where a mechanism a mechanism involves involves theofuse the use of trainable trainable elements, elements, such such

as convolutions, as convolutions, the the tensor tensor compressor selector 510 compressor selector also determines 510 also selected weights determines selected weights516 516totobe be used by used by the the encoder encodernetwork networktopology topology and and thethe decoder decoder network network topology. topology. Multiple Multiple weights weights

maybebeavailable may availablefor for aa given encodernetwork given encoder networktopology topology and and decoder decoder network network topology, topology, suchsuch as as 2024203901

different weights targeting different quality operating points. Control in the processor 205 different weights targeting different quality operating points. Control in the processor 205

progresses from progresses fromthe the step step 1805 1805to to aa select select tensor tensorcompressor/decompressor step1810. compressor/decompressor step 1810.

[000114]AtAtthe

[000114] thestep step 1810, 1810,the the tensor tensor compressor selector 510, compressor selector 510, under underexecution executionofofthe the processor 205, processor 205, selects selects aa mechanism mechanism totobe beused usedfor forcompression compressionandand decompression decompression of the of the

tensors 115. tensors 115. The tensor compressor The tensor compressorselector selector510 510outputs outputsaaselected selected tensor tensor decompressor decompressor512 512 and associated and associated metadata metadata520. 520.The The selectionmade selection madeat at thethestep step1810 1810isisfrom fromthe theset setdetermined determinedatat the step 1805. The selection may be the result of a request by the destination device via an out- the step 1805. The selection may be the result of a request by the destination device via an out-

of-band signalling of-band signalling mechanism mechanism toto increaseorordecrease increase decreasethe thedecoded decodedquality quality(and (andhence hence thebit- the bit- rate) of rate) ofthe thebitstream bitstream121. 121. Where Where aa mechanism mechanism is is parameterizable parameterizable (e.g.,the (e.g., the channel channelcount countofof one or one or more of the more of the compressed compressedtensors tensorsmay maybe be varied varied dynamically), dynamically), a suitablevalue a suitable valueisisselected selected at the step 1810. at the step 1810.

[000115] Thesystem

[000115] The system 100 100 performs performs a given a given neural neural network network which which is divided is divided intointo a firstportion, a first portion, performedbybythe performed theCNN CNN backbone backbone 114,114, and and a a second second portion, portion, performed performed byCNN by the thehead CNN150. head 150. Thefirst The first portion portion of ofthe theneural neuralnetwork network may be aa Darknet-53 may be backbone Darknet-53 backbone asas described described with with

reference to reference to Figs. Figs. 3A-3D, 3A-3D, aa backbone backboneofofaaFasterRCNN FasterRCNN or MaskRCNN or MaskRCNN network,network, as described as described

with reference with reference to to Fig. Fig. 4, 4,orora a first portion first of some portion other of some neural other network. neural The network. Thenumber number and and

dimensionality of dimensionality of the the tensors tensors 115 115 depends onthe depends on thenetwork networkbeing beingimplemented implemented in the in the system system 100 100

and the and the division division of of the thenetwork network into into aa first firstportion, executed portion, in in executed thethe CNNCNNbackbone 114, and backbone 114, and aa secondportion, second portion, executed executedin in the the CNN head CNN head 150. 150. Compression Compression and decompression and decompression mechanisms mechanisms

involve an involve an encoder encodernetwork networktopology, topology,totobebeperformed performedin in thesource the sourcedevice device110, 110,andand a decoder a decoder

networktopology, network topology,totobe beperformed performedininthe thedestination destination device device140. 140.

[000116]The

[000116] Theencoder encoder network network topology topology and and decoder decoder network network topology topology may involve may involve the usethe of use of trained layers, trained layers,such such as asconvolutions, convolutions,in inwhich which case case weights weights are are also alsoneeded. needed. The encoder The encoder

networktopology network topologyand andthe thedecoder decodernetwork network topology topology formform a ‘bottleneck’ a 'bottleneck' between between the first the first

networkportion network portion(i.e., (i.e., the theCNN backbone114) CNN backbone 114)andand thesecond the second network network portion portion (i.e.,the (i.e., theCNN CNN head 150), head 150), with with the the interface interface between the encoder between the networktopology encoder network topologyandand thedecoder the decoder network network

44204385_1 44204385_1

32

topologyforming topology formingthe the'narrowest' ‘narrowest’part partof of the the bottleneck and thus bottleneck and thus the the lowest lowest bitrate bitratewhen when 07 Jun 2024

compressedininthe compressed theform formofofpacked packedvideo videoframes. frames.Different Different mechanisms mechanisms (encoder (encoder network network

topologies and topologies and corresponding correspondingdecoder decodernetwork network topologies) topologies) maymay be selected, be selected, signalled, signalled, andand activated dynamically to adapt the bitrate of the bitstream 121 to network conditions or to activated dynamically to adapt the bitrate of the bitstream 121 to network conditions or to

adjust to adjust to meet meet application application requirements for quality. requirements for quality. Mechanisms Mechanisms orortopologies topologiesproviding providinghigher higher quality generally quality generally have have larger larger dimensions of the dimensions of the compressed tensorsand compressed tensors andhence hencerequire requirea alarger larger packed frame area, resulting in a higher bitrate for the bitstream 121. Control in the packed frame area, resulting in a higher bitrate for the bitstream 121. Control in the 2024203901

processor 205 processor 205progresses progressesfrom fromthe thestep step1810 1810totoananinstantiate instantiate tensor tensor compressor step 1815. compressor step 1815.

[000117]AtAtthe

[000117] thestep step 1815, 1815,the the source source device device 110, 110, under underexecution executionofofthe theprocessor processor205, 205,obtains obtains a tensor structural description 522 from a tensor codec repository 514 based on the selected a tensor structural description 522 from a tensor codec repository 514 based on the selected

tensor decompressor tensor 512.The decompressor 512. Thesource source device device 110 110 instantiatesthe instantiates thetensor tensorstructural structural description 522 description into aa form 522 into form suitable suitable for forexecution execution by by the thetensor tensorcompressor compressor 530. The 530. The

instantiating step instantiating step1815 1815 may involve declaring may involve declaring required required memory memory andand initialisingdata initialising data structures structures in the memory 205 associated with the tensor structural description 522, or allocating resources in the memory 205 associated with the tensor structural description 522, or allocating resources

in aa reconfigurable in reconfigurable hardware devicesuch hardware device suchasas aa field field programmable gatearray programmable gate array(FPGA). (FPGA). Operationsdefined Operations definedin in the the tensor tensor structural structuraldescription description522 522may may be be converted to aa form converted to more form more

amenable for execution by the processor 205 as part of the instantiation step 1815. ‘Just-in- amenable for execution by the processor 205 as part of the instantiation step 1815. 'Just-in-

time compilation' time compilation’is is one approachfor one approach for obtaining obtaining aa representation representation such as aa ‘byte such as 'byte code’ code' that thatmay may

be executed more rapidly by the processor 205 than interpreting the tensor structural be executed more rapidly by the processor 205 than interpreting the tensor structural

description description 522 directly to 522 directly toperform perform each each tensor tensor compression operation. Where compression operation. Wherethethe instantiated instantiated

tensor compressor tensor 530contains compressor 530 containstrainable trainableelements, elements,such suchasasconvolutions, convolutions,aatensor tensor weight weight repository 518 is accessed to obtain necessary weights 524 for use by the trainable elements, repository 518 is accessed to obtain necessary weights 524 for use by the trainable elements,

with the with the weights selected based weights selected on the based on the weight weight selection selection 516. Thetensor 516. The tensorcodec codecrepository repository514 514 and the and the tensor tensor weight repository 518 weight repository maybebepopulated 518 may populatedfrom from thetensor the tensorcodec codec repository180. repository 180. Control in Control in the the processor processor 205 progresses from 205 progresses fromthe the step step 1815 1815toto aa determine determinecomplexity complexity indication step 1830. indication step 1830.

[000118] Atthe

[000118] At thestep step 1830, 1830, the the source source device device 110, 110, under underexecution executionofofthe the processor processor205, 205, determinesan determines anindication indication representative representative of of aa worst-case worst-case complexity for any complexity for any decoder decodernetwork network topologythat topology that could be signalled could be signalled for for the thebitstream bitstream121. 121. Where the source Where the sourcedevice device110 110may may select select

one topology one topologyfrom frommultiple multiplepossible possibledecoder decodernetwork network topologies topologies at at thestep the step1810 1810ititisis desirable desirable for the for the destination destinationdevice device140 140 to toknow, know, at at the thebeginning beginning of of decoding decoding the the bitstream bitstream 121, 121, whether whether

the destination device 140 will be able to decode the entirety of the bitstream. The first the destination device 140 will be able to decode the entirety of the bitstream. The first

signalled decoder signalled networktopology decoder network topologyininthe thebitstream bitstream121 121may may not not bebe themost the most complex complex topology topology

used for used for decoding that bitstream decoding that bitstream 121. Forexample, 121. For example,the thesystem system100 100 may may commence commence operation operation

44204385_1 44204385_1

33

in a low bitrate mode, later increasing bitrate (and required decoder network topology) based on in a low bitrate mode, later increasing bitrate (and required decoder network topology) based on 07 Jun 2024

somecriteria. some criteria. Aspects of decoder Aspects of decodernetwork networkcomplexity complexity include include thethe number number of multiply-and- of multiply-and-

accumulate(MAC) accumulate (MAC) operations operations and and the the number number of weights of weights required required to execute to execute the decoder the decoder

networktopology. network topology.The The decoder decoder network network complexity complexity indication indication is configured is configured to indicate to indicate thethe

highest complexity highest of all complexity of all possible possible decoder decoder network topologiesthat network topologies that the the source device 110 source device 110 may may instruct the instruct thedestination destinationdevice device140 140to toperform. perform. The The decoder networkcomplexity decoder network complexity indicationmaymay indication

be based be based on on aa decoded decodedcapability capabilityindication. indication. In In one one arrangement, arrangement,the thedecoder decodernetwork network 2024203901

complexityindication complexity indication may maybebea ascalar scalarvalue valuemapped mapped onto onto each each aspect aspect of of thenetwork the network complexity. For complexity. Forexample, example, thenetwork the network complexity complexity indication indication maymay be abe a scalar scalar value value that that relates relates

to aspects to aspects such such as as MAC count MAC count and and weight weight count count by by reference reference to to look-up look-up tables, tables, with with the the

networkcomplexity network complexityindication indicationset settoto accommodate accommodate thethe worst-case worst-case aspect aspect of of each each aspect aspect of of thethe

set of set of decompressors determinedatatthe decompressors determined thestep step 1805. 1805. Control Controlininthe theprocessor processor205 205progresses progressesfrom from the step 1830 to a perform neural network first portion step 1840. the step 1830 to a perform neural network first portion step 1840.

[000119] Atthe

[000119] At thestep step 1840, 1840, the the CNN CNN backbone backbone 114,114, under under execution execution of the of the processor processor 205,205,

performsthe performs the first first portion portionof ofa aneural neuralnetwork networkusing using frame frame data data 113 113 from the video from the source 112 video source 112 as input. as input. The step 1840 The step outputs the 1840 outputs the tensors tensors 115. Controlinin the 115. Control the processor processor 205 205progresses progressesfrom from the step the step 1840 to aa perform 1840 to tensor downsampling perform tensor step1850. downsampling step 1850.

[000120] Atthe

[000120] At thestep step 1850, 1850, aa tensor tensor downsampler downsampler 520 520 performs performs a temporal a temporal decimation decimation

operation on operation on the the tensors tensors 115 115 to to produced temporaldownsampled produced temporal downsampled tensors tensors 524.524. WhenWhen a a downsampling ratio of two is selected, tensors of every alternate (e.g., frames with an odd downsampling ratio of two is selected, tensors of every alternate (e.g., frames with an odd

picture order count) are dropped, resulting in a halving of the frame rate for the tensors 524 picture order count) are dropped, resulting in a halving of the frame rate for the tensors 524

comparedtotothe compared theframe framerate rateof of the the frame data 113. frame data 113. Other Otherdownsampling downsampling ratios, ratios, such such as as threetoto three

one, four one, four to to one one are are possible possiblewith with signalling signallingtoto support supportany anyinteger ratio. integer However, ratio. However,a a maximum maximum

limit, such limit, such as asan anexample maximum example maximum ratio ratio ofof four,isis needed four, neededtotoprevent preventthe the need needfor for excessive excessive tensor buffering. tensor buffering. The downsampling The downsampling ratio ratio isissignalled signalledininthe the FCM FCM PPS PPS 1116 1116 whenwhen

fcm_pps_temporal_upsampling_enabled_flag fcm_pps_temporal_upsampling_enabled_flag is equal is equal to one, to one, allowing allowing the ratio the ratio to altered to altered

during the during the course of one course of one bitstream. Thetensor bitstream. The tensor downsampler downsampler 520520 maymay be configured be configured into into an an active state, active state,where where tensor tensordownsampling downsampling isisperformed, performed,ororinto into aa bypass bypassstate, state, where the where the

tensors 115 are propagated to the tensors 524 with no alteration. Configuration of the tensor tensors 115 are propagated to the tensors 524 with no alteration. Configuration of the tensor

downsampler downsampler 520 520 into into activeororbypassed active bypassed statemay state maybe be predetermined, predetermined, e.g.,byby e.g., user user

configuration, or may be altered during operation of the source device 110, such as in response configuration, or may be altered during operation of the source device 110, such as in response

to available to available bandwidth of the bandwidth of the communications channel communications channel 130130 or or thethe levelofofdetected level detectedactivity activity such such as the as the number of bounding number of boundingboxes boxesinin thetask the taskresult result 151. 151. The The fcm_pps_temporal_upsampling_enabled_flag in the cm_pps_temporal_upsampling_enabled_flag: in the FCM FCM PPSis1116 PPS 1116 is encoded encoded (see 18110 (see 18110

44204385_1 44204385_1

34

below)regardless below) regardless of of the the value value of of the the fcm_sps_temporal_upsampling_enabled_flag in the fcm_sps_temporal_upsampling_enabled_flag in the FCM FCM 07 Jun 2024

SPS1114 SPS 1114totoavoid avoida aparsing parsingdependency dependencyof of thethe FCM FCM PPS PPS 1116 1116 on theonFCM theSPS FCM SPS 1114. 1114. However,the However, thefcm_pps_temporal_upsampling_enabled_flagis fcm_pps_temporal_upsampling_enabled_flag is not permitted not permitted to be enabled to be enabled (1) (1) whenthe when thefcm_sps_temporal_upsampling_enabled_flag: fcm_sps_temporal_upsampling_enabled_flag is to is set setdisabled to disabled (0).(0). Control Control in the in the

processor 205 processor 205progresses progressesfrom fromthe thestep step1850 1850totoaaperform performtensor tensorcompression compression step step 1860. 1860.

[000121] Atthe

[000121] At thestep step 1860, 1860, aa tensor tensor compressor compressor530, 530,under underexecution executionofofthe theprocessor processor205, 205, compressesthe thetensors tensors524 524totoproduce producecompressed compressed tensors 532. TheThe compressed tensors 532 2024203901

compresses tensors 532. compressed tensors 532

are fewer in number than the tensors 524 and reduced in dimensionality (i.e., reduced in either are fewer in number than the tensors 524 and reduced in dimensionality (i.e., reduced in either

or both or both of of channel channel count and feature count and feature map widthand map width andheight). height).The The compressed compressed tensors tensors 532532 formform

a representation of the tensors 524 that may be referred to as the ‘reduced domain’ or ‘feature a representation of the tensors 524 that may be referred to as the 'reduced domain' or 'feature

reduceddomain' reduced domain’and andthetheoperation operationofofthe thetensor tensorcompressor compressor530530 maymay be referred be referred to to as as ‘feature 'feature

reduction’. The reduction'. Thetensor tensor compressor compressor530 530 may may implement implement the instantiated the instantiated tensor tensor compressor compressor 512 512 in the in the form form of of precompiled ‘byte code' precompiled 'byte code’ or or machine machinecode codeororother otherform formmore more amenable amenable to direct to direct

execution by the processor 205, including by an inferencing engine as part of or associated with execution by the processor 205, including by an inferencing engine as part of or associated with

the processor the processor 205, 205, such as aa graphics such as graphics processing processing unit unit (GPU). Thestep (GPU). The step1860 1860 operatestoto operates

producethe produce the tensors tensors 532 532 from fromthe thetensors tensors produced producedatatstep step 1840. 1840.The Thetensor tensordownsampler downsampler 520 520 may be configured into an ‘active’ state where the instantiated tensor compressor 512 is used to may be configured into an 'active' state where the instantiated tensor compressor 512 is used to

produce the tensors 532 or into a ‘bypass’ state where the tensors 524 are passed along as the produce the tensors 532 or into a 'bypass' state where the tensors 524 are passed along as the

tensors 532 without modification. When in the active state, the tensors 532 have at least a tensors 532 without modification. When in the active state, the tensors 532 have at least a

smaller tensor count, a smaller channel count, or a smaller spatial size compared to the tensors smaller tensor count, a smaller channel count, or a smaller spatial size compared to the tensors

524. Controlinin the 524. Control the processor processor 205 205progresses progressesfrom fromthe thestep step1860 1860totoaaquantise quantisetensors tensors step 1870. step 1870.

[000122]AtAtthe

[000122] thestep step 1870, 1870,aa quantiser quantiser module module534, 534,under underexecution executionofofthe theprocessor processor205, 205,when when configured into an ‘active’ state quantises floating-point values in each tensor of the configured into an 'active' state quantises floating-point values in each tensor of the

compressedtensors compressed tensors532 532totoproduce producequantised quantised compressed compressed tensors tensors 536. 536. The The quantised quantised

compressedtensors compressed tensors536 536have have integervalues integer valuesand andoccupy occupy a range a range within within a sample a sample range range as as defined by defined by the the operational operational bit bit depth depth of ofthe thevideo videoencoder encoder 542. For example, 542. For example,when whenencoding encoding video using 8-bit, or 10-bit samples, integer values in the interval [0, 255] or [0, 1023], video using 8-bit, or 10-bit samples, integer values in the interval [0, 255] or [0, 1023],

respectively, are permitted. Quantisation firstly normalises elements from the tensor 532 into a respectively, are permitted. Quantisation firstly normalises elements from the tensor 532 into a

[0.0,

[0.0, 1.0] 1.0]floating-point range, floating-point resulting range, in one resulting minimum in one minimum and and one one maximum floating-point maximum floating-point

value for the tensor 532. A tensor normalised into the [0.0, 1.0] range is then converted and value for the tensor 532. A tensor normalised into the [0.0, 1.0] range is then converted and

rescaled into an integer sample range, such as [0, 1023] or [0, 255]. For each tensor, the rescaled into an integer sample range, such as [0, 1023] or [0,255]. For each tensor, the

minimum minimum andand maximum maximum floating-point floating-point values values form form a quantisation a quantisation rangerange 526the 526 for forfirst the first normalisation and normalisation andthe the range range for for the the second normalisation(to second normalisation (to integer integer sample range) is sample range) is dependent dependent

44204385_1 44204385_1

35

on the on the bit-depth bit-depth of of the thevideo videoencoder encoder 542. Thenormalisation 542. The normalisationtotointeger integer range range may mayoperate operateonona a 07 Jun 2024

minimum minimum andand a maximum a maximum value value thatupdated that is is updated from from one frame one frame to thetonext the next such such that that the the minimum minimum value value is is eitherdecreased either decreasedbased based onon thecurrent the currenttensors tensors532 532ororretains retains the the same samevalue valueasas derived for the previous tensors 532 (i.e., the tensors from the previous performance of the derived for the previous tensors 532 (i.e., the tensors from the previous performance of the

step 1860). step 1860). The maximum The maximum value value of the of the integer integer range range is is eitherincreased either increasedbased basedononthe thecurrent current tensors 532 tensors or retains 532 or retains the thesame same value value as as computed for the computed for the previous previous tensors tensors 532. 532. The Thequantiser quantiser module534 module 534can canbebeconfigured configured intoa a'bypass' into ‘bypass’state statewhere wherethe thetensors tensors532 532are arepassed passedalong alongasas 2024203901

the tensors the tensors 536. Configurationinto 536. Configuration into the the ‘bypass’ state may 'bypass' state be applied may be applied when whenthe thetensors tensors532 532 already contain already contain integer integer values values or or when the selected when the selected video video encoder 542isis capable encoder 542 capable of of encoding encoding tensor values tensor values in in floating-point floating-pointformat. format. Control Control in in the theprocessor processor205 205 progresses progresses from the from the

step 1870 to a pack tensors step 1880. step 1870 to a pack tensors step 1880.

[000123] Atthe

[000123] At thestep step 1880, 1880, aa packer packer module module538, 538,under under execution execution of of theprocessor the processor 205, 205, packs packs

the feature the feature maps of the maps of the tensor tensor 536 536 into into aaframe, frame, forming forming a a packed feature frame packed feature 540. Operation frame 540. Operation of the of the packer packer module 540generally module 540 generallyresults results in in placement of the placement of the two-dimensional two-dimensionalfeature featuremaps maps into an into an arrangement as described arrangement as described with withreference reference to to Fig. Fig. 9B. When 9B. When multiple multiple tensorsarearepresent tensors present in the tensors 536, a frame 900b is of sufficient size to hold feature maps for all the tensors of in the tensors 536, a frame 900b is of sufficient size to hold feature maps for all the tensors of

the tensors the tensors 536. Control in 536. Control in the the processor processor 205 progresses from 205 progresses fromthe the step step 1880 1880toto an an encode encodeframe frame step 1890. step 1890.

[000124] Atthe

[000124] At thestep step 1890, 1890, the the video video encoder encoder542 542(selected (selectedatat operation operation of of the the step step 1801 1801 and and

having aa corresponding having correspondingidentifier identifier encoded at step encoded at step 1802), under execution 1802), under executionof of the the processor processor 205, 205, compressesthe compresses thevideo videoframe frame540 540 totoproduce produce a compressed a compressed video video bitstream bitstream 546.546. The The encoder 542 encoder 542isis selected selected to to embody onecompression embody one compression approach approach out out of multiple of multiple compression compression

approachesinin accordance approaches accordancewith withthe theselection selectionof of step step 1801. In the 1801. In the case case of of the the use use of of H.266/VVC, H.266/VVC,

operation of the video encoder 542 is described with reference to Fig. 8. In the case of operation of the video encoder 542 is described with reference to Fig. 8. In the case of

H.265/HEVC H.265/HEVC or H.264/AVC, or H.264/AVC, operation operation involves involves generally generally subsets subsets of theoffunctional the functional modules modules as as described with reference to Fig. 8. The first packed frame 540 to be coded results in the described with reference to Fig. 8. The first packed frame 540 to be coded results in the

SPS1118 SPS 1118and andthe thePPS PPS 1120, 1120, followed followed by by thethe IRAP IRAP picture picture 11221122 (referred (referred to an to as as an ‘instantaneous 'instantaneous decoder refresh’ picture decoder refresh' picture IIH.264/AVC) H.264/AVC) asasshown shownin in Fig.11. Fig. 11.When When using using a low- a low-

delay coding delay coding configuration configurationaa subsequent subsequentpicture picturewould wouldbebecoded codedas as inter-picture1124. inter-picture 1124.InInthe the case of case of aa customised compressionapproach, customised compression approach, a a method method such such as directly as directly compressing compressing eacheach value value

in the in the tensors tensors536 536 using using an an arithmetic arithmetic coder coder such such as as deepCABAC or variable-length deepCABAC or variable-length coding coding

such as such as exponential Golomb exponential Golomb coding coding maymay be applied, be applied, with with the the frame frame packing packing stepstep 18801880 omitted. omitted.

Control in Control in the the processor processor 205 progresses from 205 progresses fromthe the step step 1890 1890toto an an encode encodeFCM FCMSPSSPS stepstep 18100. 18100.

44204385_1 44204385_1

36

[000125] Atthe

[000125] At thestep step 18100, 18100,the the metadata metadataencoder encoder544 544 encodes encodes sequence-level sequence-level parameters parameters 07 Jun 2024

neededfor needed for the the FCM FCM decoder decoder into into theFCM the FCM sequence sequence parameter parameter set 1114 set 1114 as part as part of the of the FCM FCM metadata548. metadata 548.The TheFCMFCM SPS SPS 1114 1114 includes includes tensortensor information information specifying specifying the dimensionality the dimensionality

of the of the compressed tensors532 compressed tensors 532and andthe theplacement placementofoffeature featuremaps mapsasas packing packing information information forfor

each tensor each tensor among thecompressed among the compressed tensors tensors 532532 into into a video a video frame frame 540. 540. The The tensor tensor

information includes, information includes, for for each each tensor, tensor, aamaximum channel maximum channel count count andand a used a used channel channel count. count. The The

frame area for a region must be sufficient for the tensors within the region to be packed up to frame area for a region must be sufficient for the tensors within the region to be packed up to 2024203901

the maximum the channel maximum channel count, count, i.e.,the i.e., themaximum maximum number number of feature of feature maps. maps. Flags Flags signalling signalling the the application or application or bypass bypass of of the the inner innerdecoding, decoding, corresponding to the corresponding to the encoding at 1890, encoding at inverse 1890, inverse

quantisation corresponding quantisation to quantisation corresponding to quantisation performed performedatatstep step 1870, 1870,and andfeature feature restoration restoration correspondingtoto feature corresponding feature compression compressionperformed performedat at step1860, step 1860,and and temporal temporal upsampling upsampling step, step,

correspondingtoto downsampling corresponding downsampling performed performed at step at step 1850, 1850, are are also also included included in in thethe FCM FCM

SPS 1114. SPS 1114.

[000126]The

[000126] Themetadata metadata encoder encoder 544, 544, under under execution execution of the of the processor processor 205, 205, at at step step 18100 18100

encodesthe encodes the selected selected tensor tensor decompressor 520into decompressor 520 intothe thebitstream bitstream121 121asasaadecoder decodernetwork network topologyin topology in the the FCM SPS FCM SPS 1114, 1114, as as described described with with reference reference to to Fig.11.11.The Fig. The selectedtensor selected tensor decompressor520 decompressor 520 may may be be signalled signalled as as an an explicitnetwork explicit network topology, topology, using using a textual a textual

representation (or representation (or syntax) syntax) such such as as Open NeuralNetwork Open Neural Network Exchange Exchange (‘ONNX’) ('ONNX') formatformat or Neural or Neural

NetworkExchange Network Exchange Format Format (‘NNEX’), ('NNEX'), fromKhronos from the the Khronos Group, Group, or usingorother usingformats other formats including aa short including short code code fragment suchas fragment such as aa Pytorch Pytorch function. function. Compression Compressionof of textual textual

representations of representations of the the tensor tensordecompressor using techniques decompressor using techniquessuch suchasasaa 'DEFLATE' ‘DEFLATE’ or ‘LZMA’ or 'LZMA'

algorithm may algorithm maybebeapplied appliedtotoreduce reducethe theoverhead overheadofofthe themetadata metadatawhen when stored stored in in the the

bitstream 121. As such, information (such as the textual representation or the syntax described bitstream 121. As such, information (such as the textual representation or the syntax described

above) representing above) representing the the network networktopology topologyisissignalled signalled in in compressed form. compressed form.

[000127]The

[000127] Thedecoder decoder network network topology topology information information representing representing the the decoder decoder network network

topologyis topology is to to be be decoded fromthe decoded from thebitstream bitstream 143 143ononthe thedestination destination device device 140 140toto determine determinethe the networktopology network topologytotobebeused usedononthe thedestination destinationdevice device140. 140.The Thedecoder decodernetwork network topology topology

specifies operations to be performed in the destination device 140 to convert tensors from a specifies operations to be performed in the destination device 140 to convert tensors from a

compressedrepresentation compressed representationtototheir their original original number anddimensionality, number and dimensionality,such suchthat thatthe the uncompressed uncompressed tensorsmay tensors may be be supplied supplied to to thethe CNN CNN headhead 149. 149. When When the network the network topology topology is is selected by reference to a collection of network topologies, such as available in the tensor codec selected by reference to a collection of network topologies, such as available in the tensor codec

repository 514, and associated weights as available in the tensor weight repository 518, a repository 514, and associated weights as available in the tensor weight repository 518, a

registered_decoder_idcsyntax registered_decoder_ido syntaxelement elementisisused. used.The The registered_decoder_idc registered_decoder_ido syntax syntax element element

may provide a direct index into a look-up table, a string or universally unique identifier (UUID) may provide a direct index into a look-up table, a string or universally unique identifier (UUID)

44204385_1 44204385_1

37

to perform to an associative perform an associative look-up to obtain look-up to obtain the the selected selected decoder decoder network topologyand network topology andweights. weights. 07 Jun 2024

Thenetwork The networkweights weightsmaymay also also be be signalled signalled inin theFCM the FCMSPS SPS 11141114 usingusing a format a format such such as as ISO/IEC15938-17 ISO/IEC 15938-17 “Compression "Compression of Neural of Neural Networks Networks for Multimedia for Multimedia Description Description and and Analysis”. When Analysis". When signallinginformation signalling information representing representing a given a given network network topology, topology, the the given given

networktopology network topologymay maybe be registeredwith registered withthethedestination destinationdevice device140 140for forreference referenceand andactivation activation in subsequent in bitstreams, avoiding subsequent bitstreams, the need avoiding the to signal need to signal information information representing representing the the network network

topologywith topology withevery everybitstream. bitstream. InInone onearrangement, arrangement,the thesystem system 100 100 maymay provide provide predetermined predetermined 2024203901

network topologies that do not need to be explicitly signalled in the bitstream. Predetermined network topologies that do not need to be explicitly signalled in the bitstream. Predetermined

networktopologies network topologiesmay maybebeactivated activatedininthe thedestination destination device device 140 140via via aa reference reference to to an an

identifier. Predetermined identifier. networktopologies Predetermined network topologiesmay maybe be made made available available to to thethe destination destination

device 140 device 140 via via external external means, suchas means, such as downloaded downloaded from from a repository a repository or or registryofofnetwork registry network topologies. Repositories topologies. Repositories or or registries registries ofofnetwork network topologies topologies may be accessible may be accessible publicly publicly or or may may

be accessible be accessible within within some private scope, some private scope, for for example, obtainedvia example, obtained via aa private private network or aa network or

secure network (e.g., VPN) available to instances of the destination device 140 but not the secure network (e.g., VPN) available to instances of the destination device 140 but not the

general public. general Wherethe public. Where thedestination destinationdevice device140 140isis known knowntoto haveaccess have access(either (eitheralready already registered or available for download from an external server) to the desired network topology, registered or available for download from an external server) to the desired network topology,

the source the source device 110 may device 110 mayencode encode thereference the referencetotoidentify identifyand andactivate activate the the specific specific network network

topologyrequired topology required to to be be used used when whendecoding decoding thebitstream the bitstream121. 121.Appendix Appendix E shows E shows an example an example

syntax structure for information encoded at operation of the step 18100. Control in the syntax structure for information encoded at operation of the step 18100. Control in the

processor 205 processor 205progresses progressesfrom fromthe thestep step18100 18100totoananencode encodeFCMFCM PPS PPS step step 18110. 18110.

[000128] Atthe

[000128] At thestep step 18110, 18110,the the metadata metadataencoder encoder544, 544,under under execution execution of of theprocessor the processor 205, 205,

encodesthe encodes the quantisation quantisation range range 526 526for for each each tensor tensor of of the the compressed tensors532 compressed tensors 532into intoan anFCM FCM PPS1116, PPS 1116,asasadditional additional FCM FCM metadata metadata 548. 548. Appendix Appendix E shows E shows an example an example syntax structure syntax structure

for information for information encoded at operation encoded at operation of of the the step step 18110. 18110. Quantisation ranges are Quantisation ranges are used used in in the the

bitstream 121 to enable inverse quantisation to the correct range by the destination device 140. bitstream 121 to enable inverse quantisation to the correct range by the destination device 140.

Thequantisation The quantisation range rangesignalled signalled in in the the FCM PPS FCM PPS 1116 1116 is is effectivefrom effective from thepicture the pictureatatwhich which the FCM the PPS FCM PPS 1116 1116 precedes precedes onwards, onwards, in output in output order order fromfrom a picture a picture decoder decoder 1204, 1204, i.e.,i.e., thethe

IRAP1122, IRAP 1122,which which hashas a pictureorder a picture ordercount count(POC) (POC) of of 0. 0. An An FCM FCM PPSprecedes PPS 1123 1123 precedes inter inter picture 1124, picture 1124, which hasaa POC which has POCofof1 1and andthus thusa aquantization quantizationrange rangecoded codedininFCM FCMPPS PPS 1123 1123

applies from applies POC1 1onwards from POC onwards (untila asubsequent (until subsequent FCM FCM PPSencountered PPS is is encountered preceding preceding another another

picture with picture with higher higher POC, andthe POC, and thesubsequent subsequentFCM FCMPPS PPS signals signals another another quantization quantization range range

update). Accordingly, update). Accordingly,when when thereisisnonochange there changeinin thequantisation the quantisationrange rangefor foraa given givenframe, frame,the the FCM FCM PPS PPS 1116 1116 needneed not not be encoded be encoded for that for that frame. frame. ThePPS The FCM FCM PPS 1116 may1116 may explicitly explicitly encode encode the picture the picture order order count count from from which the FCM which the FCM PPSPPS 1116 1116 parameters parameters apply, apply, i.e., i.e., a POC a POC of of 0, 0,

44204385_1 44204385_1

38

(from that (from that POC until another POC until anotherFCM FCMPPSPPS withwith a higher a higher POC POC is decoded, is decoded, such such as FCM as the the FCM 07 Jun 2024

PPS1123. PPS 1123.A A fixed fixed number number of least of least significantbits significant bits of of the the POC, suchasas88oror 12 POC, such 12bits, bits, may be may be

codedto coded to avoid avoid coding codingthe the entire entire 32-bit 32-bit POC witheach POC with eachFCM FCM PPS. PPS. The MSBs The MSBs of the of POCthe POC may may inferred based inferred based on the pattern on the pattern that thatPOC is increasing POC is increasing over over time time (with (with localised localisedexception exception when when aa

random-access configuration is used, due to localised difference in coding order vs output random-access configuration is used, due to localised difference in coding order VS output

order). In order). In one one arrangement FCM arrangement FCM PPSs PPSs (e.g., (e.g., 1116 1116 and and 1122) 1122) include include a picture a picture parameter parameter set set ID ID syntax element, syntax element, which whichcorresponds correspondstotothe thepicture pictureparameter parameterset setID IDofofany anyPPSs PPSspresent presentininthe the 2024203901

bitstream, such as the PPS 1120. The slice header or picture header of each picture includes a bitstream, such as the PPS 1120. The slice header or picture header of each picture includes a

picture parameter picture set ID parameter set ID (“ph_pic_parameter_set_id” ("ph_pic_parameter_set_id" ininVVC) VVC) which which activates activates one one of the of the

previously signalled previously signalled PPSs andFCM PPSs and FCM PPSs, PPSs, i.e.,parameters i.e., parameters in in theFCM the FCMPPS PPS identified identified by aby a particular picture particular pictureparameter parameter set setID ID are areselected selectedby bythe theph_pic_parameter_set_id decodedfrom ph_pic_parameter_set_id decoded from the slice header or picture header of a picture or slice. The NAL unit multiplexor 550 operates the slice header or picture header of a picture or slice. The NAL unit multiplexor 550 operates

to combine to theNAL combine the NAL unitsofofthe units theFCM FCM metadata metadata 548 548 and compressed and the the compressed video video bitstream bitstream 546 to546 to produce the bitstream 121, such that the inner coded identifier 1110 is coded firstly, followed produce the bitstream 121, such that the inner coded identifier 1110 is coded firstly, followed

by the by the FCM VMPS FCM VMPS 1112,the 1112, theFCM FCM SPS SPS 1114, 1114, andthe and theFCM FCM PPS PPS 1116. 1116. Following Following theFCM the FCM PPS1116 PPS 1116the theNAL NAL units units produced produced by the by the video video encoder encoder 542 542 are present, are present, suchsuch as the as the SPSSPS 1118, 1118,

the PPS 1120, and an IRAP picture 1122, and an inter picture such as the inter picture 1124. the PPS 1120, and an IRAP picture 1122, and an inter picture such as the inter picture 1124.

TheFCM The FCMSPSSPS 11141114 needs needs to betocoded be coded with with the IRAP the IRAP picture picture 1122 1122 as theasdecoder the decoder needs needs to to knowtensor know tensordimensionality dimensionalityinformation information and and decoder decoder network network topology topology information information to proceed. to proceed.

Oneinstance One instanceof of the the FCM FCM PPS PPS 1116 1116 is needed is needed withwith the the IRAP IRAP picture picture 11221122 in order in order for inverse for inverse

quantisation to operate, subsequent instances are needed only when there is a change in the quantisation to operate, subsequent instances are needed only when there is a change in the

quantisation range quantisation 526 to range 526 to be be used with aa given used with picture. An given picture. instance of An instance of the the PPS 1116isis PPS 1116

effective from effective from the the next next coded picture. The coded picture. method1800 The method 1800 terminates terminates andand processing processing progresses progresses

to the next instance of the source data 113 (e.g., the next frame from the video source 112). to the next instance of the source data 113 (e.g., the next frame from the video source 112).

[000129] Fig. 66 is

[000129] Fig. is aa schematic schematic block diagram600 block diagram 600showing showingoneone type type of of multi-scale multi-scale feature feature

fusion (MSFF) fusion module (MSFF) module 600, 600, which which may may serveserve as the as the tensor tensor compressor compressor 530. 530. The The MSFF MSFF module600 module 600takes takesthe thetensors tensors115 115and andproduces producesa a compressed compressed tensor tensor 532, 532, having having reduced reduced

dimensionality compared to the tensors 115 and thus resulting in a reduction in bitrate when dimensionality compared to the tensors 115 and thus resulting in a reduction in bitrate when

encodedasaspart encoded part of of aa packed frame. The packed frame. TheMSFF MSFF module module 600 trained 600 uses uses trained network network layerslayers and and requires aa corresponding requires moduleininthe corresponding module thetensor tensor decoder decoder146 146totorestore restore tensor tensor dimensionality dimensionalitySO so the tensors the tensors 149 149 may besupplied may be suppliedtoto the the CNN CNN head head 150. 150. TheThe MSFFMSFF modulemodule 600four 600 takes takes four tensors as input and requires each one the tensors to have two-hundred and fixty-six (256) tensors as input and requires each one the tensors to have two-hundred and fixty-six (256)

channels, SO channels, so that that the theMSFF module MSFF module 600 600 is is compatible compatible with with thethe P-layers P-layers of of theFasterRCNN the FasterRCNNor or

44204385_1 44204385_1

39

MaskRCNN MaskRCNN networks. networks. However, However, variants variants of the of themodule MSFF MSFF600 module 600 compatible compatible with with different different 07 Jun 2024

numbers of layers and different channel counts are possible. numbers of layers and different channel counts are possible.

[000130] TheMSFF

[000130] The MSFF module module 600 produces 600 produces one tensor one tensor as output as output with sixty-four with sixty-four (64) (64) channels channels

and a feature map size corresponding to the P5 layer seen at the input, however variants with and a feature map size corresponding to the P5 layer seen at the input, however variants with

different channel different channel count count are are also also possible. possible. Each Each variant variant of of the theMSFF module MSFF module 600 600 requires requires

different weights different weights to to be be used used for forproper proper operation. operation. Where several variants Where several variants of of the the MSFF MSFF

module600 600are areable abletotobe beused usedinin the the system system100 100for for aa given given network, network,the the packing packingformat formatmay may 2024203901

module

be set to a worst-case feature map count of the compressed tensors of the currently used be set to a worst-case feature map count of the compressed tensors of the currently used

decodernetwork decoder networktopology, topology,and andthetheactual actualused usedchannel channelcount countmaymay be be updated updated at runtime at runtime as part as part

of the tensor information. of the tensor information.

[000131]The

[000131] TheMSFC MSFC module module 600 includes 600 includes an block an MSFF MSFF610 block 610inshown shown in which Fig. 6, Fig. 6,produces which produces a single a single tensor tensor from from the the plurality pluralityofof tensors 115 tensors 115using usingone oneoror more moredownsampling filters. The downsampling filters. The MSFF MSFF block block 610, 610, under under execution execution of of thethe processor processor 205, 205, combines combines eacheach tensor tensor of aoffirst a firstset setofof tensors (i.e., tensors (i.e., 602, 603, 602, 604, 603, 605), 604, to to 605), produce a combined produce a combined tensor tensor629. 629.The The combined tensor629 combined tensor 629 forms aa representation forms representation of of the the FPN layer tensors. FPN layer tensors. Downsample Downsample modules modules 622a, 622a, 622b,622b, and and 622c 622c operate on the tensors having larger spatial scale, i.e., P4 604 at (2h, 2w, 256), and P3 603 at operate on the tensors having larger spatial scale, i.e., P4 604 at (2h, 2w, 256), and P3 603 at

(4h, 4w, (4h, 4w, 256), 256), and and P2 602 602 at (8h, at (8h, 8w,8w, 256), 256), respectively. respectively. Modules Modules 622a, 622a, 622b,622b, and and 622c 622c perform downsampling to match the spatial scale of the smallest tensor, i.e., P5 605 at (h, w, perform downsampling to match the spatial scale of the smallest tensor, i.e., P5 605 at (h, W,

256), producing 256), downscaled producing downscaled P5 P5 tensors tensors 623a, 623a, 623b, 623b, 623c, 623c, respectively. respectively. A concatenation A concatenation

module624 module 624performs performs a channel-wise a channel-wise concatenation concatenation of the of the tensors tensors 605, 605, 623a, 623a, 623b, 623b, andand 623c 623c to to produceconcatenated produce concatenatedtensor tensor625, 625,ofofdimensions dimensions (h,W,w,1024). (h, 1024).TheThe concatenated concatenated tensor tensor 625625 is is passed to passed to aa squeeze and excitation squeeze and excitation (SE) module626 (SE) module 626totoproduce producea atensor tensor627. 627.TheThe SE SE

module626 module 626sequentially sequentiallyperforms performsa a globalpooling, global pooling,a afully-connected fully-connectedlayer layerwith withreduction reductioninin channel count, a rectified linear unit activation unit, a second fully-connected layer restoring the channel count, a rectified linear unit activation unit, a second fully-connected layer restoring the

channel count, channel count, and and aa sigmoid sigmoidactivation activation function function to to produce produce aa scaling scaling tensor. tensor. The The tensor tensor 625 625is is scaled according scaled to the according to the scaling scaling tensor tensor to toproduce produce the theoutput outputas asthe thetensor 627. tensor 627.The The SE SE

block 626 is capable of being trained to adaptively alter the weighting of different channels in block 626 is capable of being trained to adaptively alter the weighting of different channels in

the tensor passed through, based on the first fully-connected layer output. the tensor passed through, based on the first fully-connected layer output.

[000132]The

[000132] Thefirst first fully-connected fully-connected layer layer output output reduces each feature reduces each feature map for each map for eachchannel channeltotoaa single value. Each single value is passed through a non-linear activation unit (ReLU) to create a single value. Each single value is passed through a non-linear activation unit (ReLU) to create a

conditional representation of the single value, suitable for weighting of other channels, with conditional representation of the single value, suitable for weighting of other channels, with

restoration to restoration tothe thefull fullchannel count channel countperformed performed by by the the second second fully-connected layer. The fully-connected layer. SE The SE

block 626 is thus capable of extracting non-linear inter-channel correlation in producing the block 626 is thus capable of extracting non-linear inter-channel correlation in producing the

44204385_1 44204385_1

40

tensor 627 from the tensor 625, to a greater extent than is possible purely with convolutional tensor 627 from the tensor 625, to a greater extent than is possible purely with convolutional 07 Jun 2024

(linear) layers. The tensor 627 is passed to a convolutional layer 628. The convolutional layer (linear) layers. The tensor 627 is passed to a convolutional layer 628. The convolutional layer

628 implements 628 implementsone one oror more more convolutional convolutional layers layers to to produce produce thethe combined combined tensor tensor 629,629, withwith

channel count reduced to F channels, typically 256 channels (i.e., F = 256). Further reduction channel count reduced to F channels, typically 256 channels (i.e., F = 256). Further reduction

in the in the channel channel count count is is achieved achieved by by a a single-scale single-scalefeature featurecompression compression (SSFC) module (SSFC) module 650. 650.

[000133]The

[000133] TheSSFC SSFC module module 650 650 receives receives the tensor the tensor 629 629 and and applies applies a convolution a convolution 652 652 to to reduce the the channel channel count countfrom fromF F(256) (256)down downto to C'C’ (nominally setset to to 6464 channels) toto produce 2024203901

reduce (nominally channels) produce

tensor 653. tensor Thetensor 653. The tensor653 653isis then then passed passedto to aa batch batch normalisation module654 normalisation module 654totoproduce produce batch normalised tensor 655, which is passed to a hyperbolic tangent activation layer 656 to batch normalised tensor 655, which is passed to a hyperbolic tangent activation layer 656 to

producethe produce the compressed compressedtensor tensor532. 532.TheThe output output of of thethe MSFC MSFC module module 600 is600 is atensor a one one tensor per per frame with frame with aa fixed fixed feature feature map size and map size and fixed fixed channel channel count. count.

[000134]Fig.

[000134] Fig. 77 is is aa schematic schematic block diagramshowing block diagram showingan an example example picture picture structure structure 700 700 with with

one level one level of of temporal temporal interpolation interpolation added added to to aa low-delay low-delay bi-predicted bi-predicted coding structure. The coding structure. The

video encoder video encoder542 542may maybe be configured configured to to implement implement the the picture picture structure structure 700, 700, providing providing an an

alternative totothe alternative theuse useofof thethetemporal temporaldownsampler 520and downsampler 520 andaatemporal temporalupsampler upsampler 1260, 1260,

described with described with reference reference to to Fig. Fig. 12. 12. The picture structure The picture structure 700 700 operates operates such such that that odd-numbered odd-numbered

pictures, by pictures, by picture pictureorder ordercount count(POC) refer to (POC) refer to the theimmediately immediately preceding andfollowing preceding and following pictures for inter prediction, via list 0 (L0) and list 1 (L1), respectively. For example, POC #1 pictures for inter prediction, via list 0 (L0) and list 1 (L1), respectively. For example, POC #1

refers to refers toPOC #0and POC #0 andPOC POC#2.#2. This This requires requires POCPOC #2betodecoded #2 to be decoded priorprior to POC to POC #1, resulting #1, resulting

in one frame of structural delay implicit in the picture structure 700. Then, each even- in one frame of structural delay implicit in the picture structure 700. Then, each even-

numberedPOC numbered POC includes includes a reference a reference to to thethe previous previous picture picture with with anan even-numbered even-numbered POC, POC, for for examplePOC example POC#2 #2 refers refers to to POC POC #0. #0. To assist To assist withwith coding coding of relatively of relatively stablescenes, stable scenes,past past reference to pictures of POC # modulus 8 equal to 0 are also kept, up to a limit, such as the last reference to pictures of POC # modulus 8 equal to 0 are also kept, up to a limit, such as the last

two or two or three three such such pictures. pictures. For For example, POC example, POC # 6also # 6 alsorefers referstoto POC POC#0#0 and and POCPOC #14 #14 (i.e., (i.e.,

belongingto belonging to the the subsequent GOP) subsequent GOP) referstotoPOC refers POC#8 #8 (firstpicture (first pictureof of the the second secondGOP) GOP) and and POCPOC

#0 (first #0 (first picture pictureofof thethe first GOP). first Each GOP). Eachpicture picturewith withananeven-numbered POC even-numbered POC referencesthethe references

previous even-numbered previous even-numbered POCPOC and pictures and pictures withwith POC POC modulus modulus 8 of 8 of the the current current and asand as many many previous GOPs as possible up to a limit, such as the decoded picture buffer size limitation of six previous GOPs as possible up to a limit, such as the decoded picture buffer size limitation of six

pictures (in HEVC) or eight pictures (in VVC), with one picture slot reserved for the current pictures (in HEVC) or eight pictures (in VVC), with one picture slot reserved for the current

picture, resulting picture, resultinginina maximum of five a maximum of five or or seven seven reference reference pictures, pictures,respectively. respectively.The The GOP GOP

structure shown in Fig. 7 repeats every eight frames, so where prior references with negative structure shown in Fig. 7 repeats every eight frames, SO where prior references with negative

numbers are shown (e.g., -8 or -16), these are to be interpreted as references to preceding numbers are shown (e.g., -8 or -16), these are to be interpreted as references to preceding

GOPs.InInthe GOPs. thecase caseof of pictures pictures with with even-numbered POCs, even-numbered POCs, both both reference reference listsinclude lists includethethesame same set of preceding pictures with the same ordering. set of preceding pictures with the same ordering.

44204385_1 44204385_1

41

[000135] Fig. 88 is

[000135] Fig. is aa schematic schematic block diagramshowing block diagram showing functionalmodules functional modules of of a video a video encoder encoder 07 Jun 2024

800 which 800 whichmay maybebe implemented implemented as the as the video video encoder encoder 542.542. The video The video encoder encoder 542 542 may bemay be implementedusing implemented usinga ageneral-purpose general-purpose computer computer system system 200,200, as shown as shown in Figs. in Figs. 2A 2B, 2A and and 2B, wherethe where the various various functional functional modules modulesmay maybe be implemented implemented by dedicated by dedicated hardware hardware within within the the computersystem computer system200, 200,bybysoftware software executable executable within within thethe computer computer system system 200 200 suchsuch as one as one or or moresoftware more softwarecode codemodules modulesof of thethe software software applicationprogram application program 233233 resident resident on on thethe hard hard disk disk

drive 205 and being controlled in its execution by the processor 205. Alternatively, the video drive 205 and being controlled in its execution by the processor 205. Alternatively, the video 2024203901

encoder542 encoder 542may maybebeimplemented implemented by aby a combination combination of dedicated of dedicated hardware hardware and software and software

executable within executable within the the computer computersystem system200. 200.TheThe video video encoder encoder 542 542 and and the described the described methods methods

mayalternatively may alternatively be be implemented implementedinindedicated dedicatedhardware, hardware, such such as as one one or or more more integrated integrated

circuits performing circuits performing the the functions functions or or sub sub functions functions of ofthe thedescribed describedmethods. Suchdedicated methods. Such dedicated hardwaremay hardware may include include graphic graphic processing processing units(GPUs), units (GPUs), digitalsignal digital signalprocessors processors(DSPs), (DSPs), application-specific standard products (ASSPs), application-specific integrated circuits application-specific standard products (ASSPs), application-specific integrated circuits

(ASICs),FPGAs (ASICs), FPGAsor or oneone or or more more microprocessors microprocessors and associated and associated memories. memories. In particular, In particular, the the video encoder video encoder542 542comprises comprises modules modules 810-890 810-890 which which maybe may each each be implemented implemented as more as one or one or more software code software codemodules modulesofofthe thesoftware softwareapplication applicationprogram program 233. 233.

[000136] Although

[000136] Although thevideo the videoencoder encoder 542542 of of Fig. Fig. 8 8isisan anexample exampleof of a a versatilevideo versatile videocoding coding (VVC)video (VVC) videoencoder, encoder, othervideo other videocodecs codecs maymay alsoalso be be used used to perform to perform the the processing processing stages stages

described herein. described herein. For For example, HEVC example, HEVC or AVC or AVC or other or other types types of encoders of encoders may may be be used. used. The The examplesdescribed examples describedgenerate generatea abitstream bitstreamofofencoded encodeddata. data.IfIf other other codecs codecswere wereused, used,some some implementationsmay implementations may pack pack data data intoa adifferent into differentformat formatsuch suchasasaaframe frameformat formatororthe thelike. like. The The video encoder video encoder800 800receives receivesframe framedata data712, 712,each eachframe frame including including one one or or more more colour colour channels. channels.

Theframe The framedata data712 712corresponds correspondstoto thetensors the tensors540 540ininpacked packedform, form,asasimplemented implemented at the at the step step

1890. Theframe 1890. The framedata data712 712may maybe be in in any any chroma chroma format format and and bit bit depth depth supported supported by the by the profile profile

in use, for example 4:0:0, 4:2:0 for the “Main 10” profile of the VVC standard, at eight (8) to in use, for example 4:0:0, 4:2:0 for the "Main 10" profile of the VVC standard, at eight (8) to

ten (10) bits in sample precision. ten (10) bits in sample precision.

[000137]

[000137] As As seen seen in Fig. in Fig. 8, a8, a block block partitioner partitioner 810 firstly 810 firstly divides divides thedata the frame frame 712 data 712 into CTUs, into CTUs,

generally square in shape and configured such that a particular size for the CTUs is used. The generally square in shape and configured such that a particular size for the CTUs is used. The

maximum maximum enabled enabled sizesize of of thethe CTUs CTUs may may be 32×32, be 32x32, 64x64,64×64, or 128×128 or 128x128 luma samples luma samples for for example,configured example, configuredbybya a'sps_log2_ctu_size_minus5' ‘sps_log2_ctu_size_minus5’ syntax syntax element element present present in the in the ‘sequence 'sequence

parameterset' parameter set’ (i.e., (i.e., thethe SPS SPS1118). 1118). The The ‘sps_log2_ctu_size_minus5’ syntax 'sps_log2_ctu_size_minus5' syntax element element uses uses

values 0, values 0, 1, 1, and and 22 to tocorrespond correspond to to CTU sizes of CTU sizes of 32×32, 64×64,and 32x32, 64x64, and128x128, 128×128, respectively respectively TheThe

CTUsize CTU sizealso alsoprovides providesaamaximum maximum CU size, CU size, as aas a CTU CTU with with no further no further splitting splitting willwill contain contain

44204385_1 44204385_1

42

one CU. one CU.Ternary Ternary splittingisis prohibited splitting prohibited when whena aCUCU has has one one or or more more dimensions dimensions of length of length 128 128 07 Jun 2024

lumasamples. luma samples.AsAs a a consequence, consequence, processing processing may may fully fully handle handle eacheach 64×64 64x64 quadrant quadrant of theof the 128×128 beforeprogressing 128x128 before progressingfrom from one one quadrant quadrant to to thethe next next quadrant. quadrant. Large Large CUsCUs such such as as

64×128are 64x128 areprocessed processedasasa apair pair of of 64x64 64×64regions. regions.AsAsa aresult resultof of quadrant-based quadrant-basedprocessing processing (sometimes referred to as “virtual pipeline data units” or “VPDUs”), internal storage in the (sometimes referred to as "virtual pipeline data units" or "VPDUs"), internal storage in the

video encoder video encoder800, 800,and andaacorresponding correspondingvideo videodecoder decoder 1204 1204 (also (also referred referred toto asasa apicture picture decoder), is decoder), is only only needed for 64×64 needed for sampleseven 64x64 samples evenwhen when thethe CTUCTU sizesize is configured is configured as 128x128. as 128x128. 2024203901

Feature maps are typically smaller than video frame size, due to use of intervening pooling Feature maps are typically smaller than video frame size, due to use of intervening pooling

operations or operations or convolution operations with convolution operations with stride stride parameter greater than parameter greater than one. Feature maps one. Feature mapsdodo not require not require the the large largeCU sizes provided CU sizes provided by VVC.UseUse by VVC. of of a 32×32 a 32x32 CTU CTU size size provides provides sufficient sufficient

flexibility in block structure to efficiently encode structural detail found in feature maps with a flexibility in block structure to efficiently encode structural detail found in feature maps with a

smaller amount smaller amountofofmemory memory required required forfor intermediate intermediate storage storage in in thememory the memory 206,206, i.e.,storage i.e., storage for partially decoded data from a bitstream 1206 but prior to a frame buffer 1396, to be for partially decoded data from a bitstream 1206 but prior to a frame buffer 1396, to be

described. Use of a smaller CTU size reduces the variety of CU sizes that are able to be tested described. Use of a smaller CTU size reduces the variety of CU sizes that are able to be tested

in the in the block block partitioner partitioner810, 810,reducing reducingruntime. runtime. Constraining the CTU Constraining the sizetoto 32x32 CTU size 32×32indicates indicatesaa reducedmemory reduced memory consumption consumption in the in the video video decoder decoder 1204 1204 required required for decoding for decoding the the bitstream 1206 bitstream 1206however howeverthetheworst-case worst-caseofof128x128 128×128 would would needneed to betosupported be supported should should such such a a bitstream be bitstream be encountered. encountered. A A collectionofofsyntax collection syntaxelements elementsforming forming a ‘general_constraints_info’ a 'general_constraints_info'

syntax structure syntax structure may be present may be present in in the the SPS 1118that SPS 1118 that constrains constrains allowed allowedvalues valuesof of other other syntax syntax elements in the SPS 1118 and indicate a compatibility point other than a profile defined in the elements in the SPS 1118 and indicate a compatibility point other than a profile defined in the

H.266/VVC H.266/VVC specification,such specification, such compatibility compatibility pointsareareknown points knownas as ‘subprofiles’ 'subprofiles' andand enable enable

application-specific definition of a subset of the tools of a given H.266/VVC profile. A application-specific definition of a subset of the tools of a given H.266/VVC profile. A

gci_three_minus_max_log2_ctu_size_constraint_idc gci_three_minus_max_log2_ctu_size_constraint_ide syntax syntax elementelement with 0, with values values 1, 2 0, 1, 2

constrains the constrains the maximum allowed maximum allowed CTUCTU size size in the in the SPS SPS 11181118 to 128×128, to 128x128, 64x64,64×64, or 32×32, or 32x32,

respectively. AA general respectively. general constraint constraint restricting restrictingthe themaximum CTU maximum CTU size size to to 32×32 32x32 maymay formform a a subprofile (or part of a subprofile), enabling the worst-case complexity requirement of the video subprofile (or part of a subprofile), enabling the worst-case complexity requirement of the video

decoder 1204 decoder 1204totobe bereduced reducedcompared comparedto to thethe case case where where thethe fullH.266/VVC full H.266/VVC profile profile werewere

required to be supported. One instance of the SPS 1118 is needed prior to the first picture in the required to be supported. One instance of the SPS 1118 is needed prior to the first picture in the

bitstream 121 and also at any subsequent entry points (or ‘random access points’) into the bitstream 121 and also at any subsequent entry points (or 'random access points') into the

bitstream 121 bitstream fromwhich 121 from whichdecoding decoding cancan commence. commence. The block The block partitioner partitioner 810 further 810 further divides divides

each CTU each CTUinto intoone oneorormore more CBs CBs according according to atoluma a luma coding coding treetree and and a chroma a chroma coding coding tree.tree. The The lumachannel luma channelmay may alsobebereferred also referredtotoas as aa primary primarycolour colourchannel. channel.Each Eachchroma chroma channel channel maymay

also be also be referred referred to toas asa asecondary secondarycolour colourchannel. channel.The The CBs haveaavariety CBs have variety of of sizes, sizes,and and may may

include both include both square square and and non-square non-squareaspect aspectratios. ratios. However, However,inin theVVC the VVC standard, standard, CBs, CBs, CUs,CUs,

PUs, and PUs, andTUs TUsalways always have have side side lengths lengths thatare that arepowers powersofof two.Thus, two. Thus, a current a current CB, CB, represented represented

44204385_1 44204385_1

43

as 812, is output from the block partitioner 810, progressing in accordance with an iteration as 812, is output from the block partitioner 810, progressing in accordance with an iteration 07 Jun 2024

over the over the one or more one or blocksof more blocks of the the CTU, CTU,ininaccordance accordancewith withthetheluma luma coding coding tree tree and and the the

chromacoding chroma codingtree treeofofthe the CTU. CTU.

[000138] TheCTUs

[000138] The CTUs resulting resulting from from thethe firstdivision first divisionofof the the frame framedata data 712 712may maybebescanned scanned in in

raster scan order and may be grouped into one or more ‘slices’. A slice may be an ‘intra’ (or raster scan order and may be grouped into one or more 'slices'. A slice may be an 'intra' (or

‘I’) 'I') slice. slice. An intra slice An intra slice (I (I slice) slice) indicates that every indicates that everyCUCU in the in the slice slice is intra is intra predicted. predicted.

Generally, the first picture in a coded layer video sequence (CLVS) contains only I slices, and 2024203901

Generally, the first picture in a coded layer video sequence (CLVS) contains only I slices, and

is referred to as an ‘intra picture’. The CLVS may contain periodic intra pictures, forming is referred to as an 'intra picture'. The CLVS may contain periodic intra pictures, forming

‘random accesspoints' 'random access points’(i.e., (i.e., intermediate intermediateframes frames in inaavideo videosequence sequence upon whichdecoding upon which decodingcan can commence). Alternatively, a slice may be uni- or bi-predicted (‘P’ or ‘B’ slice, respectively), commence). Alternatively, a slice may be uni- or bi-predicted ('P' or 'B' slice, respectively),

indicating additional availability of uni- and bi-prediction in the slice, respectively. indicating additional availability of uni- and bi-prediction in the slice, respectively.

[000139]The

[000139] Thevideo videoencoder encoder 542 542 encodes encodes sequences sequences of pictures of pictures according according to atopicture a picture structure. structure.

One picture structure is ‘low delay’, in which case pictures using inter-prediction may only One picture structure is 'low delay', in which case pictures using inter-prediction may only

reference pictures reference pictures occurring occurring previously previously in in the the sequence. Lowdelay sequence. Low delayenables enableseach eachpicture picturetotobebe output as soon as the picture is decoded, in addition to being stored for possible reference by a output as soon as the picture is decoded, in addition to being stored for possible reference by a

subsequentpicture. subsequent picture. Another Anotherpicture picturestructure structure is is ‘random access’, whereby 'random access', wherebythe thecoding codingorder orderofof pictures differs from the display order. Random access allows inter-predicted pictures to pictures differs from the display order. Random access allows inter-predicted pictures to

reference other reference other pictures pictures that, that,although althoughdecoded, decoded, have have not not yet yet been been output. output. A degree of A degree of picture picture buffering is needed so the reference pictures in the future in terms of display order are present buffering is needed so the reference pictures in the future in terms of display order are present

in the decoded picture buffer, resulting in a latency of multiple frames. in the decoded picture buffer, resulting in a latency of multiple frames.

[000140] When

[000140] When a chroma a chroma format format other other than than 4:0:0 4:0:0 is in use, is inin anuse, in anthe I slice, I slice, codingthe coding tree of eachtree of each

CTUmay CTU may diverge diverge below below the the 64×64 64x64 levellevel intointo two two separate separate coding coding trees, trees, oneone forfor luma luma and and

another for another for chroma. Useofofseparate chroma. Use separatetrees trees allows allowsdifferent different block structure to block structure toexist existbetween between luma luma

and chroma and chromawithin withina aluma luma64x64 64×64 area area of of a CTU. a CTU. For For example, example, a large a large chroma chroma CB mayCB be may be collocated with collocated numeroussmaller with numerous smallerluma luma CBs CBs and and vicevice versa. versa. In aInPaor P or B slice,a asingle B slice, singlecoding coding tree of tree of aaCTU defines aa block CTU defines block structure structure common common toto luma luma andand chroma. chroma. The The resulting resulting blocks blocks of of the single tree may be intra predicted or inter predicted. the single tree may be intra predicted or inter predicted.

[000141] In addition to a division of pictures into slices, pictures may also be divided into

‘tiles’. AA tile 'tiles'. tileis is a sequence of of a sequence CTUs CTUscovering covering aa rectangular rectangularregion regionof ofa apicture. picture.CTU CTU scanning scanning

occurs in a raster-scan manner within each tile and progresses from one tile to the next. A slice occurs in a raster-scan manner within each tile and progresses from one tile to the next. A slice

44204385_1 44204385_1

44

can be can be either either an an integer integernumber of tiles, number of tiles,oror ananinteger number integer numberof ofconsecutive consecutiverows rows of of CTUs CTUs 07 Jun 2024

within a given tile. within a given tile.

[000142]For

[000142] Foreach eachCTU, CTU,thethe video video encoder encoder 542542 as shown as shown in Fig. in Fig. 8 operates 8 operates in two in two stages. stages. In In the first stage (referred to as a ‘search’ stage), the block partitioner 810 tests various potential the first stage (referred to as a 'search' stage), the block partitioner 810 tests various potential

configurations of a coding tree. Each potential configuration of a coding tree has associated configurations of a coding tree. Each potential configuration of a coding tree has associated

‘candidate’ 'candidate' CBs. Thefirst CBs. The first stage stage involves testing various involves testing various candidate candidate CBs to select CBs to select CBs providing CBs providing

relatively high compression efficiency with relatively low distortion. The testing generally 2024203901

relatively high compression efficiency with relatively low distortion. The testing generally

involves aa Lagrangian involves optimisationwhereby Lagrangian optimisation whereby a candidate a candidate CB CB is evaluated is evaluated based based on on a weighted a weighted

combination of rate (i.e., coding cost) and distortion (i.e., error with respect to the input frame combination of rate (i.e., coding cost) and distortion (i.e., error with respect to the input frame

data 712). ‘Best’ candidate CBs (i.e., the CBs with the lowest evaluated rate/distortion) are data 712). 'Best' candidate CBs (i.e., the CBs with the lowest evaluated rate/distortion) are

selected for selected for subsequent subsequent encoding into the encoding into the bitstream bitstream portion portion 121. Includedinin evaluation 121. Included evaluation of of candidate CBs is an option to use a CB for a given area or to further split the area according to candidate CBs is an option to use a CB for a given area or to further split the area according to

various splitting options and code each of the smaller resulting areas with further CBs, or split various splitting options and code each of the smaller resulting areas with further CBs, or split

the areas the areas even even further. further. As As a a consequence, boththe consequence, both the coding codingtree tree and and the the CBs CBsthemselves themselvesareare selected in the search stage. selected in the search stage.

[000143] Thevideo

[000143] The videoencoder encoder 542 542 produces produces a prediction a prediction block block (PB), (PB), indicated indicated by by an an arrow arrow 820, 820,

for each for each CB, for example, CB, for CB812. example, CB 812.TheThe PB PB 820 820 is aisprediction a prediction of of thethe contents contents ofof theassociated the associated CB812. CB 812.A A subtractermodule subtracter module 822822 produces produces a difference, a difference, indicated indicated as as 824824 (or(or ‘residual’, 'residual',

referring to referring tothe thedifference differencebeing beingininthe spatial the domain), spatial between domain), betweenthe thePB PB820 820 and and the the CB 812. CB 812.

Thedifference The difference 824 824is is aa block-size block-size difference difference between correspondingsamples between corresponding samplesinin thePBPB the 820 820 andand

the CB the 812.The CB 812. The difference824 difference 824 isistransformed, transformed,quantised quantisedandand represented represented as as a a transform transform block block

(TB), indicated (TB), indicated by an arrow by an arrow 836. 836. The ThePBPB 820820 andand associated associated TB TB 836 836 are typically are typically chosen chosen fromfrom

one of one of many manypossible possiblecandidate candidateCBs, CBs,forforexample, example, based based on on evaluated evaluated cost cost or or distortion. distortion.

[000144]

[000144] AAcandidate candidatecoding codingblock block (CB) (CB) is is a aCBCB resulting resulting from from oneone of of thethe prediction prediction modes modes

available to available to the thevideo video encoder encoder 542 for the 542 for the associated associated PB PB and the resulting and the resulting residual. residual.When When

combinedwith combined withthe thepredicted predictedPBPB inin thevideo the videoencoder encoder542, 542,thetheTBTB 836836 reduces reduces thethe difference difference

betweenaadecoded between decodedCBCB andand thethe original original CBCB 812812 at the at the expense expense of additional of additional signalling signalling inin a a bitstream. bitstream.

[000145] Eachcandidate

[000145] Each candidatecoding coding block block (CB) (CB) (i.e.,prediction (i.e., predictionblock block(PB) (PB)inincombination combination with with a a

transform block (TB)), has an associated coding cost (or ‘rate’) and an associated difference (or transform block (TB)), has an associated coding cost (or 'rate') and an associated difference (or

‘distortion’). Thedistortion 'distortion'). The distortion of of thethe CBtypically CB is is typically estimated estimated as a difference as a difference invalues, in sample sample values,

44204385_1 44204385_1

45

such as such as aa sum of absolute sum of absolute differences differences (SAD), (SAD), aa sum sumofofsquared squareddifferences differences(SSD) (SSD)or or a a 07 Jun 2024

Hadamard Hadamard transform transform applied applied to to thedifferences. the differences.The Theestimate estimateresulting resultingfrom fromeach eachcandidate candidatePBPB maybebedetermined may determinedbyby a a mode mode selector selector 886886 using using thethe difference difference 824 824 to to determine determine a prediction a prediction

mode887. mode 887.TheThe prediction prediction mode mode 887 887 indicates indicates the the decision decision to to useuse a particularprediction a particular predictionmode mode for the current CB, for example, intra-frame prediction or inter-frame prediction. Estimation of for the current CB, for example, intra-frame prediction or inter-frame prediction. Estimation of

the coding the costs associated coding costs associated with with each each candidate prediction mode candidate prediction andcorresponding mode and corresponding residual residual

coding may coding maybebeperformed performedat at significantlylower significantly lowercost costthan thanentropy entropycoding codingofofthe theresidual. residual. 2024203901

Accordingly,aanumber Accordingly, numberofofcandidate candidatemodes modes maymay be evaluated be evaluated to determine to determine an optimum an optimum mode mode in in a rate-distortion sense even in a real-time video encoder. a rate-distortion sense even in a real-time video encoder.

[000146] Determining

[000146] Determining a preferredmode a preferred mode in in terms terms of of rate-distortionisis typically rate-distortion typically achieved using aa achieved using

variation of variation of Lagrangian optimisation. Lagrangian Lagrangian optimisation. Lagrangianororsimilar similaroptimisation optimisationprocessing processingcan canbebe employedtotoboth employed bothselect select aa preferred preferred partitioning partitioning of ofaaCTU into CBs CTU into (bythe CBs (by the block blockpartitioner partitioner 810) as well as the selection of a prediction mode from a plurality of possibilities. Through 810) as well as the selection of a prediction mode from a plurality of possibilities. Through

application of application of aa Lagrangian optimisation process Lagrangian optimisation process of of the the candidate modesininthe candidate modes the mode modeselector selector module886, module 886,the theintra intra prediction prediction mode withthe mode with thelowest lowestcost costmeasurement measurementis is selectedasasa a'best' selected ‘best’ mode.The mode. The lowest lowest costmode cost mode includes includes a selected a selected secondary secondary transform transform index index 888,888, which which is is encodedinin the encoded the bitstream bitstream 121 121by byan anentropy entropyencoder encoder838. 838.

[000147]

[000147] In In thethe second second stage stage of operation of operation of the of the encoder video video encoder 542 to 542 (referred (referred to as a ‘coding’ as a 'coding'

stage), an iteration over the determined coding tree(s) of each CTU is performed in the video stage), an iteration over the determined coding tree(s) of each CTU is performed in the video

encoder 542. encoder 542. For Fora aCTU CTU using using separate separate trees,for trees, foreach each64x64 64×64 luma luma region region of of thethe CTU, CTU, a luma a luma

coding tree coding tree is is firstly firstlyencoded encodedfollowed followed by by aachroma codingtree. chroma coding tree. Within Withinthe the luma lumacoding codingtree, tree, only luma only lumaCBs CBsare areencoded encodedandand within within thethe chroma chroma coding coding treetree only only chroma chroma CBsencoded. CBs are are encoded. For a CTU using a shared tree, a single tree describes the CUs (i.e., the luma CBs and the For a CTU using a shared tree, a single tree describes the CUs (i.e., the luma CBs and the

chromaCBs) chroma CBs) according according to to thecommon the common block block structure structure of the of the shared shared tree. tree.

[000148]The

[000148] Theentropy entropyencoder encoder 838 838 supports supports bitwise bitwise coding coding of of syntax syntax elements elements using using variable- variable-

length and length fixed-length codewords, and fixed-length andananarithmetic codewords, and arithmeticcoding codingmode modeforfor syntax syntax elements. elements.

Portions of the bitstream such as ‘parameter sets’, for example, the SPS, the picture parameter Portions of the bitstream such as 'parameter sets', for example, the SPS, the picture parameter

set (PPS), set (PPS), and and the the picture pictureheader header (PH) (PH) use use a a combination of fixed-length combination of fixed-length codewords codewordsand and variable-length codewords. Slices, also referred to as contiguous portions, have a slice header variable-length codewords. Slices, also referred to as contiguous portions, have a slice header

that uses variable length coding followed by slice data, which uses arithmetic coding. The slice that uses variable length coding followed by slice data, which uses arithmetic coding. The slice

header defines parameters specific to the current slice, such as slice-level quantisation header defines parameters specific to the current slice, such as slice-level quantisation

parameteroffsets, parameter offsets, and and may includean may include aninstance instanceof of the the PH. Theslice PH. The slicedata data includes includes the the syntax syntax

44204385_1 44204385_1

46

elements of elements of each each CTU CTU in in theslice. the slice. Use Useofofvariable variablelength lengthcoding codingand andarithmetic arithmeticcoding coding 07 Jun 2024

requires sequential requires sequential parsing parsing within within each each portion portion of of the thebitstream. bitstream. The The portions portions may be may be

delineated with a start code to form ‘network abstraction layer units’ or ‘NAL units’. delineated with a start code to form 'network abstraction layer units' or 'NAL units'.

Arithmeticcoding Arithmetic codingisis supported supportedusing usingaa context-adaptive context-adaptivebinary binaryarithmetic arithmeticcoding codingprocess. process.

[000149] Arithmeticallycoded

[000149] Arithmetically codedsyntax syntaxelements elements consistofofsequences consist sequences of of one one or or more more ‘bins’. 'bins'.

Bins, like bits, have a value of ‘0’ or ‘1’. However, bins are not encoded in a bitstream Bins, like bits, have a value of '0' or '1'. However, bins are not encoded in a bitstream

portion 716 (corresponding to the bitstream 546) as discrete bits. Bins have an associated 2024203901

portion 716 (corresponding to the bitstream 546) as discrete bits. Bins have an associated

predicted (or ‘likely’ or ‘most probable’) value and an associated probability, known as a predicted (or 'likely' or 'most probable') value and an associated probability, known as a

‘context’. 'context'. When theactual When the actualbin bin to to be be coded codedmatches matchesthe thepredicted predictedvalue, value,aa 'most ‘mostprobable probable symbol’(MPS) symbol' (MPS)is is coded.Coding coded. Coding a most a most probable probable symbol symbol is relatively is relatively inexpensive inexpensive in terms in terms of of consumed bits in the bitstream portion 121, including costs that amount to less than one discrete consumed bits in the bitstream portion 121, including costs that amount to less than one discrete

bit. When bit. the actual When the actual bin bin to to be be coded coded mismatches thelikely mismatches the likely value, value, aa ‘least 'leastprobable probablesymbol’ symbol'

(LPS)is (LPS) is coded. Codinga aleast coded. Coding least probable probable symbol symbolhas hasa arelatively relatively high high cost cost in in terms terms of of consumed consumed

bits. The bin coding techniques enable efficient coding of bins where the probability of a ‘0’ bits. The bin coding techniques enable efficient coding of bins where the probability of a '0'

versus a ‘1’ is skewed. For a syntax element with two possible values (i.e., a ‘flag’), a single versus a '1' is skewed. For a syntax element with two possible values (i.e., a 'flag'), a single

bin is bin is adequate. adequate. For syntax elements For syntax elementswith withmany many possiblevalues, possible values,a asequence sequenceofof binsisisneeded. bins needed. Theconvention The conventionfor forconverting convertingvalues valuesofofaasyntax syntaxelement elementinto intoaasequence sequenceofofbins binsisis termed termed ‘binarisation’. Where 'binarisation'. Where the the values values ‘0’'1' '0' and andfor‘1’a for bin a bin are are equally equally (or near(or near equally) equally) likely, itlikely, is it is possible to possible to omit omit use use of of aacontext contextand and assume an equiprobable assume an equiprobabledistribution. distribution. Bins Binswith withaa context context are termed are ‘context-codedbins' termed 'context-coded bins’ and andbins bins omitting omittingaa context context are are termed termed'bypass-coded ‘bypass-codedbins'. bins’. Thebinarization The binarization of of aa syntax syntax element into one element into or more one or bins may more bins mayresult result in in aa combination of combination of

context-codedand context-coded andbypass-coded bypass-coded bins.Unlike bins. Unlike directly directly coding coding oneone bitbit intothe into thebitstream, bitstream,aa bypass-codedbin bypass-coded binuses usesthe thearithmetic arithmetic coding codingengine, engine,which whichfacilitates facilitates mixing context-codedand mixing context-coded and bypass-codedbins bypass-coded binsinto intosyntax syntaxelement elementbinarisations. binarisations.

[000150] Foraagiven

[000150] For givenbinarization, binarization, the the presence of later presence of laterbins binsininthe sequence the sequencemay may be be determined determined

based on the value of earlier bins in the sequence, resulting in variable-length binarisations. based on the value of earlier bins in the sequence, resulting in variable-length binarisations.

Additionally, each Additionally, bin may each bin beassociated may be associatedwith withmore morethan thanone onecontext, context,with withone onecontext contextselected selected for use in coding a specific instances of the bin. The selection of a particular context may be for use in coding a specific instances of the bin. The selection of a particular context may be

dependentononearlier dependent earlier bins bins in in the the syntax syntax element, element, the the decoded values of decoded values of neighbouring syntax neighbouring syntax

elements (i.e., elements (i.e., those thosefrom from neighbouring blocks) and neighbouring blocks) and the the like. like. Each time aa context-coded Each time binis context-coded bin is encoded, the context that was selected for that bin (if any) is updated in a manner reflective of encoded, the context that was selected for that bin (if any) is updated in a manner reflective of

the new the bin value. new bin value. As Assuch, such,the thebinary binaryarithmetic arithmetic coding codingscheme schemeisissaid saidtoto be be adaptive. adaptive.

44204385_1 44204385_1

47

[000151] Theabsence

[000151] The absenceofofa acontext contextfor forbypass-coded bypass-coded binssaves bins saves memory memory and and reduces reduces 07 Jun 2024

complexity, and thus bypass bins are used where the distribution of values for the particular bin complexity, and thus bypass bins are used where the distribution of values for the particular bin

is not is not skewed. Oneexample skewed. One exampleof of an an entropy entropy coder coder employing employing context context and and adaption adaption is known is known in in the art the artas asCABAC (contextadaptive CABAC (context adaptivebinary binaryarithmetic arithmeticcoder) coder)and andmany many variants variants of of thiscoder this coder have been have beenemployed employedin in video video coding. coding.

[000152]

[000152] AAQPQP controller890 controller 890determines determines a quantisation a quantisation parameter parameter 892, 892, used used to to establisha a establish

quantisation step size for use by a quantiser 834 and a dequantiser 840. A larger quantisation 2024203901

quantisation step size for use by a quantiser 834 and a dequantiser 840. A larger quantisation

step size results in primary transform coefficients 828 being quantised into smaller values, step size results in primary transform coefficients 828 being quantised into smaller values,

reducing bitrate of the bitstream portion 716 at the expense of a reduction in the fidelity of reducing bitrate of the bitstream portion 716 at the expense of a reduction in the fidelity of

inverse transform coefficients 846. inverse transform coefficients 846.

[000153]The

[000153] Theentropy entropyencoder encoder 838 838 encodes encodes the the quantisation quantisation parameter parameter 892 892 and,and, if in if in useuse forfor the the

current CB, current the LFNST CB, the index LFNST index 888, 888, using using a combination a combination of context-coded of context-coded and and bypass-coded bypass-coded

bins. The bins. Thequantisation quantisation parameter parameter892 892isisencoded encodedatatthe thebeginning beginningofofeach eachslice sliceand andchanges changesinin the quantisation the quantisation parameter 892within parameter 892 withinaa slice slice are are coded coded using using a a ‘delta 'deltaQP’ QP' syntax syntax element. The element. The

delta QP delta syntax element QP syntax elementisis signalled signalled at at most most once in each once in area known each area known asasaa'quantisation ‘quantisation group'. group’. Thequantisation The quantisation parameter parameter892 892isisapplied appliedto to residual residual coefficients coefficients of ofthe theluma luma CB. Anadjusted CB. An adjusted quantisation parameter is applied to the residual coefficients of collocated chroma CBs. The quantisation parameter is applied to the residual coefficients of collocated chroma CBs. The

adjusted quantisation adjusted quantisation parameter mayinclude parameter may includemapping mapping from from the the luma luma quantisation quantisation

parameter892 parameter 892according accordingtotoa amapping mapping tableand table and a a CU-level CU-level offset,selected offset, selectedfrom froma alist list of of offsets. The offsets. secondarytransform The secondary transformindex index888 888isissignalled signalledwhen whenthe theresidual residualassociated associatedwith withthe the transform block includes significant residual coefficients only in those coefficient positions transform block includes significant residual coefficients only in those coefficient positions

subject to transforming into primary coefficients by application of a secondary transform. subject to transforming into primary coefficients by application of a secondary transform.

[000154] Residualcoefficients

[000154] Residual coefficients of of each each TB TBassociated associatedwith witha aCBCBarearecoded coded using using a residual a residual

syntax. The syntax. Theresidual residual syntax syntaxis is designed to efficiently designed to efficiently encode encode coefficients coefficientswith withlow low magnitudes, magnitudes,

using mainly arithmetically coded bins to indicate significance of coefficients, along with using mainly arithmetically coded bins to indicate significance of coefficients, along with

lower-valuedmagnitudes lower-valued magnitudes and and reserving reserving bypass bypass bins bins forfor higher higher magnitude magnitude residual residual coefficients. coefficients.

Accordingly,residual Accordingly, residual blocks blocks comprising comprisingvery verylow lowmagnitude magnitude values values andand sparse sparse placement placement of of significant coefficients significant coefficientsare areefficiently compressed. efficiently compressed.Moreover, Moreover, two residual coding two residual schemesare coding schemes are present. A regular residual coding scheme is optimised for TBs with significant coefficients present. A regular residual coding scheme is optimised for TBs with significant coefficients

predominantly located in the upper-left corner of the TB, as is seen when a transform is applied. predominantly located in the upper-left corner of the TB, as is seen when a transform is applied.

A transform-skip A transform-skipresidual residual coding codingscheme schemeisisavailable availablefor for TBs TBswhere wherea a transform transform isisnot not

44204385_1 44204385_1

48

performed and is able to efficiently encode residual coefficients regardless of their distribution performed and is able to efficiently encode residual coefficients regardless of their distribution 07 Jun 2024

throughoutthe throughout the TB. TB.

[000155]

[000155] AAmultiplexer multiplexermodule module884884 outputs outputs thethe PB PB 820 820 fromfrom an intra-frame an intra-frame prediction prediction

module864 module 864according according toto thedetermined the determined bestintra best intraprediction predictionmode, mode,selected selectedfrom fromthe thetested tested prediction mode prediction ofeach mode of eachcandidate candidateCB. CB.TheThe candidate candidate prediction prediction modes modes needneed not include not include every every

conceivableprediction conceivable prediction mode modesupported supported byby thevideo the video encoder encoder 542. 542. Intra Intra prediction prediction fallsinto falls into three types, first, “DC intra prediction”, which involves populating a PB with a single value 2024203901

three types, first, "DC intra prediction", which involves populating a PB with a single value

representing the representing the average of nearby average of reconstructed samples; nearby reconstructed samples;second, second,"planar “planarintra intraprediction", prediction”, whichinvolves which involvespopulating populatinga aPBPBwith withsamples samples according according to to a plane,with a plane, witha aDCDC offset offset andand a a vertical and vertical and horizontal horizontal gradient gradientbeing being derived derived from from nearby reconstructed neighbouring nearby reconstructed neighbouringsamples. samples. Thenearby The nearbyreconstructed reconstructedsamples samplestypically typicallyinclude includea arow rowofofreconstructed reconstructedsamples samples above above thethe

current PB, current extending to PB, extending to the the right right of ofthe thePB PB to toan anextent extentand anda acolumn column of of reconstructed reconstructed samples samples

to the left of the current PB, extending downwards beyond the PB to an extent; and, third, to the left of the current PB, extending downwards beyond the PB to an extent; and, third,

“angular intra "angular intra prediction”, prediction",which which involves involves populating a PB populating a withreconstructed PB with reconstructedneighbouring neighbouring samples filtered and propagated across the PB in a particular direction (or ‘angle’). In VVC, samples filtered and propagated across the PB in a particular direction (or 'angle'). In VVC,

sixty-five (65) angles are supported, with rectangular blocks able to utilise additional angles, sixty-five (65) angles are supported, with rectangular blocks able to utilise additional angles,

not available to square blocks, to produce a total of eighty-seven (87) angles. not available to square blocks, to produce a total of eighty-seven (87) angles.

[000156]

[000156] AAfourth fourthtype typeofofintra intra prediction prediction is isavailable availabletoto chroma chroma PBs, PBs, whereby the PB whereby the PBisis generated from generated fromcollocated collocatedluma lumareconstructed reconstructedsamples samples according according to to a ‘cross-component a 'cross-component linear linear

model’(CCLM) model' (CCLM) mode. mode. ThreeThree different different CCLM CCLM modes modes are are available, available, each each mode mode using a using a different model different derived from model derived fromthe the neighbouring neighbouringluma luma and and chroma chroma samples. samples. The derived The derived modelmodel

is used is used to to generate generate aablock block of ofsamples samples for forthe thechroma chroma PB fromthe PB from thecollocated collocatedluma lumasamples. samples. Lumablocks Luma blocksmay may be be intrapredicted intra predictedusing usinga amatrix matrixmultiplication multiplicationofofthe thereference referencesamples samples using one using one matrix matrix selected selected from fromaa predefined predefinedset set of of matrices. This matrix matrices. This matrix intra intra prediction prediction (MIP) (MIP)

achieves gain by using matrices trained on a large set of video data, with the matrices achieves gain by using matrices trained on a large set of video data, with the matrices

representing relationships between reference samples and a predicted block that are not easily representing relationships between reference samples and a predicted block that are not easily

captured in angular, planar, or DC intra prediction modes. captured in angular, planar, or DC intra prediction modes.

[000157] Themodule

[000157] The module864864 maymay alsoalso produce produce a prediction a prediction unitunit by copying by copying a block a block fromfrom nearby nearby

the current the current frame frame using using an an ‘intra 'intrablock block copy’ copy' (IBC) (IBC) method. Thelocation method. The locationofofthe thereference reference block is block is constrained constrained to to an an area areaequivalent equivalentto toone oneCTU, divided into CTU, divided into 64x64 regionsknown 64x64 regions knownasas

VPDUs, VPDUs, with with thearea the areacovering covering theprocessed the processed VPDUs VPDUs of current of the the current CTU CTU and VPDUs and VPDUs of the of the previous CTU(s) previous CTU(s)within withineach eachrowrow or or CTUs CTUs and and within within eacheach slice slice or tileupup or tile toto thearea the arealimit limit

44204385_1 44204385_1

49

correspondingtoto one corresponding one128x128 128×128 luma luma samples, samples, regardless regardless of the of the configured configured CTUCTU size size for for the the 07 Jun 2024

bitstream. This area is known as an ‘IBC virtual buffer’ and limits the IBC reference area, thus bitstream. This area is known as an 'IBC virtual buffer' and limits the IBC reference area, thus

limiting the required storage. The IBC buffer is populated with reconstructed samples 854 (i.e., limiting the required storage. The IBC buffer is populated with reconstructed samples 854 (i.e.,

prior to prior to loop loop filtering), filtering),andand so SO a separate buffer a separate to atoframe buffer buffer a frame 872872 buffer is needed. When is needed. Whenthe CTU the CTU

size isis128×128 size the virtual 128x128 the virtual buffer bufferincludes includessamples samples only only from the CTU from the adjacentand CTU adjacent andtotothe theleft left of the of the current current CTU. When CTU. When thethe CTUCTU sizesize is 32×32 is 32x32 or 64×64 or 64x64 the virtual the virtual buffer buffer includes includes CTUs CTUs

from up from upto to the the four four or or sixteen sixteen CTUs to the CTUs to the left leftof ofthe thecurrent currentCTU. CTU. Regardless of the Regardless of the CTU size, CTU size, 2024203901

access to access to neighbouring CTUs neighbouring CTUs forobtaining for obtainingsamples samples forfor IBC IBC reference reference blocks blocks is is constrained constrained by by

boundaries such as edges of pictures, slices, or tiles. Particularly for feature maps of FPN boundaries such as edges of pictures, slices, or tiles. Particularly for feature maps of FPN

layers having layers smaller dimensions, having smaller dimensions,use useof of aa CTU CTUsize sizesuch suchasas32x32 32×32or or 64×64 64x64 results results in in a a reference area reference area more aligned to more aligned to cover cover aa set set of of previous previous feature featuremaps. Wherefeature maps. Where featuremap map placementisis ordered placement ordered based basedononSAD, SAD,SSESSE or other or other difference difference metric, metric, access access toto similarfeature similar feature mapsfor maps for IBC IBCprediction predictionoffers offers coding codingefficient efficient advantage. advantage.

[000158]The

[000158] Theresidual residualfor for aa predicted predicted block block when whenencoding encoding featuremap feature map data data is is differenttoto the different the residual seen for natural video. Natural video is typically captured by an image sensor, or residual seen for natural video. Natural video is typically captured by an image sensor, or

screen content, as generally seen in operating system user interfaces and the like. Feature map screen content, as generally seen in operating system user interfaces and the like. Feature map

residuals tend to contain much detail. The level of detail in feature map residuals is amenable to residuals tend to contain much detail. The level of detail in feature map residuals is amenable to

transform skip transform skip coding codingmore morethan thanpredominantly predominantly low-frequency low-frequency coefficients coefficients of various of various

transforms. An transforms. Anintra-predicted intra-predictedluma lumacoding codingblock block may may be be partitioned partitioned into into a a setofofequal-sized set equal-sized prediction blocks, prediction blocks, either eithervertically verticallyoror horizontally, which horizontally, each which block each having block havinga aminimum area of minimum area of sixteen (16) sixteen (16) luma samples. luma samples.

[000159] Where

[000159] Where previously previously reconstructed reconstructed neighbouring neighbouring samples samples are unavailable, are unavailable, for for example example at at

the edge of the frame, a default half-tone value of one half the range of the samples is used. For the edge of the frame, a default half-tone value of one half the range of the samples is used. For

example,for example, for 10-bit 10-bit video a value video a value of of five-hundred and twelve five-hundred and twelve(512) (512)is is used. Asnonoprevious used. As previous samples are available for a CB located at the top-left position of a frame, angular and planar samples are available for a CB located at the top-left position of a frame, angular and planar

intra-prediction modes intra-prediction producethe modes produce thesame sameoutput outputasasthe theDCDC predictionmode prediction mode (i.e.a aflat (i.e. flat plane plane of of sampleshaving samples havingthe thehalf-tone half-tone value value as as magnitude). magnitude).

[000160] Forinter-frame

[000160] For inter-frameprediction predictionaa prediction prediction block block 882 882is is produced usingsamples produced using samplesfrom from one one

or two or frames preceding two frames precedingthe thecurrent current frame frameinin the the coding codingorder order frames framesinin the the bitstream bitstream by by aa motioncompensation motion compensation module module 880 880 and and output output as the as the PB by PB 820 820the bymultiplexer the multiplexer module module 884. 884. Moreover, for inter-frame prediction, a single coding tree is typically used for both the luma Moreover, for inter-frame prediction, a single coding tree is typically used for both the luma

channel and channel andthe the chroma chromachannels. channels.TheThe order order of of coding coding frames frames in the in the bitstream bitstream maymay differ differ from from

44204385_1 44204385_1

50

the order the order of of the theframes frames when capturedor when captured or displayed. displayed. When Whenoneone frame frame is used is used forfor prediction,the prediction, the 07 Jun 2024

block is block is said said to tobe be‘uni-predicted’ 'uni-predicted'and andhas hasone oneassociated associatedmotion motion vector. vector. When twoframes When two frames are are

used for prediction, the block is said to be ‘bi-predicted’ and has two associated motion vectors. used for prediction, the block is said to be 'bi-predicted' and has two associated motion vectors.

For aa P For P slice, slice,each eachCU maybebeintra CU may intra predicted predicted or or uni-predicted. uni-predicted. For For aa B B slice, slice,each eachCU CU may be may be

intra predicted, uni-predicted, or bi-predicted. intra predicted, uni-predicted, or bi-predicted.

[000161] Framesarearetypically

[000161] Frames typicallycoded codedusing usinga a'group ‘groupofofpictures' pictures’structure, structure, enabling enabling a a temporal temporal

hierarchy of of frames. Framesmay maybe be divided into multiple slices,each eachofofwhich whichencodes encodes a 2024203901

hierarchy frames. Frames divided into multiple slices, a

portion of portion of the the frame. A temporal frame. A temporalhierarchy hierarchyofofframes framesallows allowsa aframe frametotoreference referenceaapreceding preceding and aa subsequent and subsequentpicture picture in in the the order order of of displaying displaying the theframes. frames.The The images are coded images are in the coded in the

order necessary order to ensure necessary to the dependencies ensure the for decoding dependencies for decodingeach eachframe frameare aremet. met.An An affine affine inter inter

prediction mode prediction is available mode is available where instead of where instead of using using one one or or two two motion motionvectors vectorstotoselect select and and

filter reference sample blocks for a prediction unit, the prediction unit is divided into multiple filter reference sample blocks for a prediction unit, the prediction unit is divided into multiple

smaller blocks smaller blocks and and aa motion motionfield field is is produced so each produced SO each smaller smaller block block has has aa distinct distinct motion motion

vector. The motion field uses the motion vectors of nearby points to the prediction unit as vector. The motion field uses the motion vectors of nearby points to the prediction unit as

‘control points’.Affine 'control points'. Affine prediction prediction allows allows codingcoding of motion of motion differentdifferent to translation to translation with less with less

need to need to use use deeply split coding deeply split coding trees. trees. A A bi-prediction bi-prediction mode available to mode available to VVC performs VVC performs a a geometric blend of the two reference blocks along a selected axis, with angle and offset from geometric blend of the two reference blocks along a selected axis, with angle and offset from

the centre the centre of of the theblock blocksignalled. signalled.This Thisgeometric geometric partitioning partitioningmode (“GPM”) mode ("GPM") allows allows larger larger

coding units coding units to to be be used used along along the the boundary betweentwo boundary between two objects,with objects, withthe thegeometry geometryof of the the

boundarycoded boundary codedfor forthe thecoding codingunit unitasasan anangle angleand andcentre centreoffset. offset. Motion Motionvector vectordifferences, differences, instead of using cartesian (x, y) offset, may be coded as a direction (up/down/left/right) and a instead of using cartesian (x, y) offset, may be coded as a direction (up/down/left/right) and a

distance, with distance, with aa set setofofpower-of-two power-of-two distances distances supported. Themotion supported. The motionvector vectorpredictor predictorisis obtained from obtained fromaa neighbouring neighbouringblock block('merge (‘merge mode’) mode') as as if if nono offsetisisapplied. offset applied. The Thecurrent current block will block will share share the the same motionvector same motion vectoras as the the selected selected neighbouring block. neighbouring block.

[000162]The

[000162] Thesamples samples areselected are selectedaccording accordingtotoa amotion motionvector vector878878 and and reference reference picture picture

index. The index. Themotion motionvector vector878 878 and and reference reference pictureindex picture indexapplies appliestotoall all colour colour channels channelsand and thus inter prediction is described primarily in terms of operation upon Pus rather than PBs. The thus inter prediction is described primarily in terms of operation upon Pus rather than PBs. The

decompositionofofeach decomposition eachCTU CTU into into oneone or or more more inter-predicted inter-predicted blocks blocks is is described described with with a single a single

coding tree. coding tree. Inter Inter prediction prediction methods mayvary methods may varyininthe the number numberofofmotion motion parameters parameters andand their their

precision. Motion precision. Motionparameters parameterstypically typicallycomprise comprisea areference referenceframe frame index,indicating index, indicatingwhich which reference frame(s) from lists of reference frames are to be used plus a spatial translation for reference frame(s) from lists of reference frames are to be used plus a spatial translation for

each of each of the the reference reference frames, frames, but but may include more may include moreframes, frames,special specialframes, frames,oror complex complexaffine affine

44204385_1 44204385_1

51

parameterssuch parameters suchasasscaling scaling and and rotation. rotation. In In addition, addition, aa pre-determined pre-determined motion refinement motion refinement 07 Jun 2024

process may process maybebeapplied appliedtotogenerate generatedense densemotion motionestimates estimatesbased based onon referenced referenced sample sample blocks. blocks.

[000163] Havingdetermined

[000163] Having determined andand selected selected thethe PB PB 820820 and and subtracted subtracted the the PB 820 PB 820 from from the the

original sample block at the subtractor 822, a residual with lowest coding cost, represented original sample block at the subtractor 822, a residual with lowest coding cost, represented

as 824, as 824, is isobtained obtained and and subjected subjected to to lossy lossycompression. compression. The lossy compression The lossy compressionprocess process comprisesthe comprises the steps steps of of transformation, transformation, quantisation quantisation and and entropy coding. AAforward entropy coding. forwardprimary primary transform module module826 826 appliesa aforward forward transform to to thedifference difference824, 824,converting convertingthe the 2024203901

transform applies transform the

difference 824 difference fromthe 824 from the spatial spatial domain to the domain to the frequency domain,and frequency domain, andproducing producing primary primary

transform coefficients transform coefficients represented represented by an arrow by an 828. The arrow 828. Thelargest largestprimary primarytransform transformsize sizeininone one dimensionisis either dimension either aa 32-point 32-point DCT-2 DCT-2 ororaa64-point 64-pointDCT-2 DCT-2 transform, transform, configured configured by by a a ‘sps_max_luma_transform_size_64_flag’ in the 'sps_max_luma_transform_size_64_flag' in the sequence sequence parameter parameter set. set. If the If the CB being CB being

encoded is larger than the largest supported primary transform size expressed as a block size encoded is larger than the largest supported primary transform size expressed as a block size

(e.g., 64×64 or 32×32), the primary transform 826 is applied in a tiled manner to transform all (e.g., 64x64 or 32x32), the primary transform 826 is applied in a tiled manner to transform all

samplesof samples of the the difference difference 824. Wherea anon-square 824. Where non-squareCB CB is used, is used, tilingisisalso tiling also performed performedusing using the largest the largestavailable availabletransform transformsize sizeinin each eachdimension dimension of ofthe theCB. CB. For example,when For example, whena a maximum maximum transform transform sizesize of of thirty-two thirty-two (32)isisused, (32) used,aa64x16 64×16CBCB uses uses twotwo 32×16 32x16 primary primary

transforms arranged transforms arrangedinin aa tiled tiled manner. When manner. When a CB a CB is is largerininsize larger sizethan thanthe the maximum maximum supported transform supported transformsize, size, the the CB is filled CB is filledwith withTBs TBs in in aatiled tiledmanner. manner. For For example, example, aa 128×128 128x128

CBwith CB with64-pt 64-pttransform transformmaximum maximumsize size is filledwith is filled withfour four64x64 64×64 TBsTBs in ain2x2 a 2×2 arrangement. arrangement. A A 64×128CBCB 64x128 with with a 32-pt a 32-pt transform transform maximum maximum sizefilled size is is filled with with eight eight 32×32 32x32 TBs TBs in a in a 2x4 2×4 arrangement. arrangement.

[000164]Application

[000164] Applicationofofthe thetransform transform826 826results results in in multiple multiple TBs TBsfor for the the CB. CB.Where Where each each

application of the transform operates on a TB of the difference 824 larger than 32×32, e.g., application of the transform operates on a TB of the difference 824 larger than 32x32, e.g.,

64×64, all resulting primary transform coefficients 828 outside of the upper-left 32×32 area of 64x64, all resulting primary transform coefficients 828 outside of the upper-left 32x32 area of

the TB are set to zero (i.e., discarded). The remaining primary transform coefficients 828 are the TB are set to zero (i.e., discarded). The remaining primary transform coefficients 828 are

passed to passed to the the quantiser quantiser module 834. The module 834. Theprimary primarytransform transformcoefficients coefficients828 828are arequantised quantised according to according to the the quantisation quantisation parameter 892associated parameter 892 associated with withthe the CB CBtotoproduce produceprimary primary transform coefficients transform coefficients 832. In addition 832. In addition to to the the quantisation quantisationparameter parameter 892, 892, the the quantiser quantisermodule module

834 may 834 mayalso alsoapply applya a'scaling ‘scaling list' list’ totoallow allownon-uniform non-uniform quantisation quantisation within within the the TB by further TB by further scaling residualcoefficients scaling residual coefficients according according to their to their spatial spatial position position withinwithin the TB.the TheTB. The quantisation quantisation

parameter892 parameter 892may may differfor differ foraa luma lumaCBCB versus versus each each chroma chroma CB. CB. The primary The primary transform transform

coefficients 832 coefficients 832 are are passed passed to to aaforward forward secondary transformmodule secondary transform module830830 to to produce produce transform transform

coefficients represented coefficients represented by by the the arrow arrow 836 by performing 836 by performingeither either aa non-separable non-separablesecondary secondary

44204385_1 44204385_1

52

transform (NSST) transform (NSST)operation operation oror bypassing bypassing thethe secondary secondary transform. transform. The The forward forward primary primary 07 Jun 2024

transform 826 is typically separable, transforming a set of rows and then a set of columns of transform 826 is typically separable, transforming a set of rows and then a set of columns of

each TB. each TB.The Theforward forwardprimary primary transform transform module module 826 826 uses uses either either a type-II a type-II discrete discrete cosine cosine

transform (DCT-2) in the horizontal and vertical directions, or bypass of the transform transform (DCT-2) in the horizontal and vertical directions, or bypass of the transform

horizontally and horizontally and vertically, vertically,ororcombinations combinations of of aatype-VII type-VII discrete discretesine sinetransform transform(DST-7) (DST-7) and and a a

type-VIII discrete cosine transform (DCT-8) in either horizontal or vertical directions for luma type-VIII discrete cosine transform (DCT-8) in either horizontal or vertical directions for luma

TBsnot TBs notexceeding exceeding1616samples samples in in width width and and height.UseUse height. of of combinations combinations of aof a DST-7 DST-7 and and DCT- DCT- 2024203901

8 is referred to as ‘multi transform selection set’ (MTS) in the VVC standard. 8 is referred to as 'multi transform selection set' (MTS) in the VVC standard.

[000165] Theforward

[000165] The forward secondary secondary transform transform of of thethe module module 830 830 is generally is generally a non-separable a non-separable

transform, which transform, whichis is only only applied applied for for the the residual residualof ofintra-predicted CUs intra-predicted CUsand and may nonetheless may nonetheless

also be also be bypassed. Theforward bypassed. The forwardsecondary secondary transform transform operates operates either either on on sixteen sixteen (16)samples (16) samples (arranged as the upper-left 4×4 sub-block of the primary transform coefficients 828) or forty- (arranged as the upper-left 4x4 sub-block of the primary transform coefficients 828) or forty-

eight (48) samples (arranged as three 4×4 sub-blocks in the upper-left 8×8 coefficients of the eight (48) samples (arranged as three 4x4 sub-blocks in the upper-left 8x8 coefficients of the

primarytransform primary transformcoefficients coefficients 828) 828) to to produce produceaa set set of of secondary transformcoefficients. secondary transform coefficients. The The set of set of secondary secondary transform coefficients may transform coefficients be fewer may be fewerin in number numberthan thanthe theset setof of primary primary transform coefficients transform coefficients from whichthey from which theyare are derived. derived. Due Duetotoapplication applicationofofthe the secondary secondary transform to only a set of coefficients adjacent to each other and including the DC coefficient, transform to only a set of coefficients adjacent to each other and including the DC coefficient,

the secondary the transformisis referred secondary transform referred to to as asaa‘low 'lowfrequency frequency non-separable secondarytransform' non-separable secondary transform’ (LFNST).Such (LFNST). Such secondary secondary transforms transforms may may be obtained be obtained through through a training a training process process andtodue and due to their non-separable nature and trained origin, exploit additional redundancy in the residual their non-separable nature and trained origin, exploit additional redundancy in the residual

signal not signal not able able to tobe becaptured capturedby by separable separable transforms transforms such such as as variants variantsof ofDCT and DST. DCT and DST. Moreover,when Moreover, when theLFNST the LFNST is applied, is applied, allall remaining remaining coefficients coefficients in in theTBTB the areare zero,both zero, bothininthe the primarytransform primary transformdomain domain and and thethe secondary secondary transform transform domain. domain.

[000166] Thequantisation

[000166] The quantisationparameter parameter892892 is isconstant constantfor foraagiven givenTBTBand and thusresults thus resultsinin aa uniformscaling uniform scaling for for producing producingresidual residual coefficients coefficients in in the theprimary primary transform transform domain for aa TB. domain for TB. Thequantisation The quantisation parameter parameter892 892may may vary vary periodically periodically with with a signalled'delta a signalled ‘deltaquantisation quantisation parameter’. The parameter'. Thedelta delta quantisation quantisation parameter (delta QP) parameter (delta is signalled QP) is signalled once once for for CUs contained CUs contained

within a given area, referred to as a ‘quantisation group’. If a CU is larger than the quantisation within a given area, referred to as a 'quantisation group'. If a CU is larger than the quantisation

group size, delta QP is signalled once with one of the TBs of the CU. That is, the delta QP is group size, delta QP is signalled once with one of the TBs of the CU. That is, the delta QP is

signalled by signalled by the the entropy entropy encoder 838once encoder 838 oncefor for the the first first quantisation quantisationgroup group of ofthe theCU CU and and not not

signalled for signalled for any any subsequent quantisation groups subsequent quantisation groups of of the the CU. CU. A A non-uniform non-uniform scaling scaling is is also also

possible by application of a ‘quantisation matrix’, whereby the scaling factor applied for each possible by application of a 'quantisation matrix', whereby the scaling factor applied for each

residual coefficient residual coefficientisisderived derivedfrom fromaacombination combination of of the the quantisation quantisationparameter parameter 892 892 and the and the

44204385_1 44204385_1

53

corresponding entry in a scaling matrix. The scaling matrix may have a size that is smaller than corresponding entry in a scaling matrix. The scaling matrix may have a size that is smaller than 07 Jun 2024

the size the size of ofthe theTB, TB, and and when applied to when applied to the the TB TB aa nearest nearest neighbour approachisisused neighbour approach usedtoto provide provide scaling values for each residual coefficient from a scaling matrix smaller in size than the TB scaling values for each residual coefficient from a scaling matrix smaller in size than the TB

size. The residual coefficients 836 are supplied to the entropy encoder 838 for encoding in the size. The residual coefficients 836 are supplied to the entropy encoder 838 for encoding in the

bitstream portion 716. Typically, the residual coefficients of each TB with at least one bitstream portion 716. Typically, the residual coefficients of each TB with at least one

significant residual coefficient of the TU are scanned to produce an ordered list of values, significant residual coefficient of the TU are scanned to produce an ordered list of values,

according to according to aa scan scan pattern. pattern. The scan pattern The scan pattern generally generally scans scans the the TB as aa sequence TB as of 4x4 sequence of 4×4'sub- ‘sub- 2024203901

blocks’, providing a regular scanning operation at the granularity of 4×4 sets of residual blocks', providing a regular scanning operation at the granularity of 4x4 sets of residual

coefficients, with coefficients, withthe thearrangement arrangement of of sub-blocks sub-blocks dependent onthe dependent on thesize size of of the the TB. Thescan TB. The scan within each within each sub-block sub-blockand andthe theprogression progressionfrom fromone onesub-block sub-block to to thenext the nexttypically typicallyfollow followaa backwarddiagonal backward diagonalscan scanpattern. pattern.Additionally, Additionally,the thequantisation quantisationparameter parameter892 892 isisencoded encoded into into

the bitstream portion 716 using a delta QP syntax element, and a slice QP for the initial value in the bitstream portion 716 using a delta QP syntax element, and a slice QP for the initial value in

a given a given slice slice or orsubpicture subpictureand and the thesecondary secondary transform transform index 888 is index 888 is encoded in the encoded in the bitstream bitstream

portion 716. portion 716.

[000167] Asdescribed

[000167] As describedabove, above,the thevideo videoencoder encoder 542 542 needs needs access access to to a frame a frame representation representation

correspondingtoto the corresponding the decoded decodedframe framerepresentation representationseen seenininthe thevideo videodecoder. decoder.Thus, Thus, the the

residual coefficients residual coefficients836 836 are arepassed passed through through an an inverse inverse secondary transform module secondary transform module844, 844, operating in operating in accordance withthe accordance with the secondary secondarytransform transformindex index888 888 toto produce produce intermediate intermediate

inverse transform inverse coefficients, represented transform coefficients, represented by by an an arrow arrow 842. Theintermediate 842. The intermediateinverse inverse transform coefficients transform coefficients 842 are inverse 842 are inverse quantised quantised by by the the dequantiser dequantiser module 840according module 840 accordingtoto the quantisation the quantisation parameter 892to parameter 892 to produce producethe the inverse inverse transform transformcoefficients, coefficients, represented represented by by an an

arrow 846. arrow 846. The Thedequantiser dequantisermodule module 840840 may may also also perform perform an inverse an inverse non-uniform non-uniform scaling scaling of of residual coefficients using a scaling list, corresponding to the forward scaling performed in the residual coefficients using a scaling list, corresponding to the forward scaling performed in the

quantiser module quantiser 834.The module 834. The inverse inverse transform transform coefficients846 coefficients 846 arepassed are passed toto anan inverseprimary inverse primary transform module transform module848 848 toto produce produce residualsamples, residual samples, represented represented by by an an arrow arrow 850, 850, of of thethe TU.TU.

Theinverse The inverse primary primarytransform transformmodule module848848 applies applies DCT-2 DCT-2 transforms transforms horizontally horizontally and and vertically, constrained vertically, constrainedby bythe themaximum availabletransform maximum available transformsize sizeasasdescribed describedwith withreference referencetoto the forward the primarytransform forward primary transformmodule module 826. 826. TheThe types types of inverse of inverse transform transform performed performed by by the the inverse secondary inverse transformmodule secondary transform module 844 844 correspond correspond withwith the the types types of of forward forward transform transform

performedbybythe performed theforward forwardsecondary secondary transform transform module module 830.830. The types The types of inverse of inverse transform transform

performedbybythe performed theinverse inverseprimary primarytransform transformmodule module 848848 correspond correspond withwith the the types types of primary of primary

transform performed transform performedbybythe theprimary primarytransform transform module module 826.826. A summation A summation module module 852 adds852 theadds the residual samples residual 850and samples 850 andthe thePB PB820 820totoproduce produce reconstructed reconstructed samples samples (indicated (indicated by by an an

arrow 854) arrow 854)of of the the CU. CU.

44204385_1 44204385_1

54

[000168] Thereconstructed

[000168] The reconstructedsamples samples 854 854 areare passed passed to to a referencesample a reference sample cache cache 856856 and and an in- an in- 07 Jun 2024

loop filters loop filters module module 868. Thereference 868. The referencesample samplecache cache856, 856,typically typicallyimplemented implemented using using static static

RAM RAM on on an an ASIC ASIC to avoid to avoid costly costly off-chip off-chip memory memory access, access, provides provides minimal minimal samplesample storagestorage

neededtoto satisfy needed satisfy the the dependencies for generating dependencies for intra-frame PBs generating intra-frame for subsequent PBs for subsequentCUs CUsinin the the

frame. The frame. Theminimal minimal dependencies dependencies typically typically include include a ‘linebuffer' a 'line buffer’ofofsamples samplesalong alongthe thebottom bottom of aa row of row of of CTUs, for use CTUs, for use by by the the next next row rowof of CTUs CTUs and and column column buffering buffering the the extent extent of of which which is is set by set by the the height height of ofthe theCTU. Thereference CTU. The referencesample samplecache cache856856 supplies supplies reference reference samples samples 2024203901

(represented by an arrow 858) to a reference sample filter 860. The sample filter 860 applies a (represented by an arrow 858) to a reference sample filter 860. The sample filter 860 applies a

smoothingoperation smoothing operationtotoproduce producefiltered filtered reference reference samples samples(indicated (indicatedby byan anarrow arrow862). 862).The The filtered reference filtered referencesamples samples 862 862 are are used used by by the the intra-frame intra-frame prediction prediction module 864to module 864 to produce produceanan intra-predicted block intra-predicted block of of samples, samples, represented represented by by an an arrow 866. For arrow 866. Foreach eachcandidate candidateintra intra prediction mode prediction theintra-frame mode the intra-frame prediction prediction module module864 864produces produces a block a block of of samples, samples, that that

is 866. is 866. The blockof The block of samples samples866 866isisgenerated generatedbybythe themodule module 864 864 using using techniques techniques such such as as DC,DC,

planar or planar or angular angular intra intraprediction. prediction. The block of The block of samples samples866 866may may alsobebeproduced also produced using using a a matrix-multiplication approach matrix-multiplication approachwith withneighbouring neighbouring referencesample reference sample as as input input andand a matrix a matrix

selected from a set of matrices by the video encoder 800, with the selected matrix signalled in selected from a set of matrices by the video encoder 800, with the selected matrix signalled in

the bitstream 121 using an index to identify which matrix of the set of matrices is to be used by the bitstream 121 using an index to identify which matrix of the set of matrices is to be used by

the video the video decoder. decoder.

[000169]

[000169] TheThe in-loop in-loop filters filters module module 868 applies 868 applies several several filteringfiltering stages tostages to the reconstructed the reconstructed

samples854. samples 854.The The filteringstages filtering stages include include aa ‘deblocking filter’ (DBF) 'deblocking filter' whichapplies (DBF) which applies smoothing smoothing aligned to aligned to the the CU boundariestoto reduce CU boundaries reduceartefacts artefacts resulting resulting from from discontinuities. discontinuities. Another Another

filtering stage present in the in-loop filters module 868 is an ‘adaptive loop filter’ (ALF), which filtering stage present in the in-loop filters module 868 is an 'adaptive loop filter' (ALF), which

applies a Wiener-based adaptive filter to further reduce distortion. A further available filtering applies a Wiener-based adaptive filter to further reduce distortion. A further available filtering

stage in stage in the the in-loop in-loopfilters filtersmodule module868 868isisa ‘sample a 'sampleadaptive adaptiveoffset’ offset'(SAO) (SAO)filter. filter.The TheSAO SAO

filter operates by firstly classifying reconstructed samples into one or multiple categories and, filter operates by firstly classifying reconstructed samples into one or multiple categories and,

according to the allocated category, applying an offset at the sample level. according to the allocated category, applying an offset at the sample level.

[000170] Filtered

[000170] Filtered samples, samples, represented represented by an870, by an arrow arrow are 870, outputare output from from the the in-loop in-loop filters filters

module868. module 868.The Thefiltered filtered samples samples870 870are arestored storedininaa frame framebuffer buffer 872. 872. The Theframe frame buffer872872 buffer

typically has the capacity to store several (e.g., up to sixteen (16)) pictures and thus is stored in typically has the capacity to store several (e.g., up to sixteen (16)) pictures and thus is stored in

the memory the 206.TheThe memory 206. frame frame buffer buffer 872872 is not is not typically typically storedusing stored usingon-chip on-chip memory memory due due to to the the large memory large consumption memory consumption required. required. As such, As such, access access to the to the frame frame buffer buffer 872 872 is costly is costly in in terms terms

of memory of bandwidth. memory bandwidth. The The frame frame buffer buffer 872 872 provides provides reference reference frames frames (represented (represented by anby an arrow 874) arrow 874)to to aa motion estimationmodule motion estimation module 876 876 andand thethe motion motion compensation compensation module module 880. 880. The The

44204385_1 44204385_1

55

reference frames reference frames 874 874are are output output as as aa reconstructed frame 718 reconstructed frame 718ofofthe the encoder encodermodule module 542. 542. In In thethe 07 Jun 2024

example of Fig. 8, the reconstructed frame is a result of operation of lossy VVC encoding, that example of Fig. 8, the reconstructed frame is a result of operation of lossy VVC encoding, that

is due to operation of the modules 810 to 890. is due to operation of the modules 810 to 890.

[000171] Themotion

[000171] The motion estimation estimation module module 876 876 estimates estimates a number a number of ‘motion of 'motion vectors’ vectors' (indicated (indicated

as 878), each being a Cartesian spatial offset from the location of the present CB, referencing a as 878), each being a Cartesian spatial offset from the location of the present CB, referencing a

block in one of the reference frames in the frame buffer 872. A filtered block of reference block in one of the reference frames in the frame buffer 872. A filtered block of reference

samples(represented (representedas as 882) 882)is is produced for each eachmotion motionvector. vector.The The filteredreference reference 2024203901

samples produced for filtered

samples882 samples 882form formfurther furthercandidate candidatemodes modes available available forpotential for potentialselection selection by bythe the mode mode selector 886. selector Moreover,for 886. Moreover, foraa given givenCU, CU,the thePUPU820820 maymay be formed be formed using using one reference one reference blockblock

(‘uni-predicted’) or ("uni-predicted') or may be formed may be usingtwo formed using tworeference referenceblocks blocks('bi-predicted'). (‘bi-predicted’). For Forthe the selected motion selected vector, the motion vector, the motion compensationmodule motion compensation module 880880 produces produces the the PB in PB 820 820 in accordancewith accordance withaafiltering filtering process process supportive supportive of of sub-pixel sub-pixel accuracy accuracy in in the the motion motion vectors. vectors. As As

such, the such, the motion estimation module motion estimation module876 876(which (which operates operates on on many many candidate candidate motion motion vectors) vectors)

mayperform may performa asimplified simplifiedfiltering filtering process process compared compared totothat that of of the the motion compensation motion compensation

module880 module 880(which (which operates operates on on thethe selectedcandidate selected candidateonly) only)totoachieve achievereduced reduced computational computational

complexity.When complexity. Whenthethe video video encoder encoder 542 542 selects selects inter inter predictionforfora aCUCU prediction thethe motion motion

vector 878 is encoded into the bitstream portion 121. vector 878 is encoded into the bitstream portion 121.

[000172]Although

[000172] Although thevideo the videoencoder encoder 542542 of of Fig. Fig. 8 isdescribed 8 is describedwith withreference referencetotoversatile versatile video coding video coding(VVC), (VVC), othervideo other videocoding coding standards standards or or implementations implementations may may also also employ employ the the processing stages processing stages of of modules 810-890.TheThe modules 810-890. frame frame datadata 712712 (and(and bitstream bitstream 716)716) may may also also be be TM read from read (or written from (or written to) to) memory 206,the memory 206, thehard harddisk diskdrive drive 210, 210, aa CD-ROM, CD-ROM, a Blu-ray a Blu-ray diskor diskTM or other computer other readablestorage computer readable storagemedium. medium. Additionally, Additionally, thethe frame frame data data 712712 (and (and bitstream bitstream 716) 716)

may be received from (or transmitted to) an external source, such as a server connected to the may be received from (or transmitted to) an external source, such as a server connected to the

communications communications network network 220220 or aorradio-frequency a radio-frequency receiver. receiver. The The communications communications network network 220 220 may provide limited bandwidth, necessitating the use of rate control in the video encoder 120 to may provide limited bandwidth, necessitating the use of rate control in the video encoder 120 to

avoid saturating the network at times when the frame data 712 is difficult to compress. avoid saturating the network at times when the frame data 712 is difficult to compress.

[000173]The

[000173] Thebitstream bitstream716 716may may be be constructed constructed from from one one or more or more slices, slices, representing representing spatial spatial

sections (collections sections (collectionsof ofCTUs) of the CTUs) of the frame frame data data 712, 712, produced byone produced by oneorormore moreinstances instancesofofthe the video encoder video encoder542, 542,each eachproducing producingthethebitstream bitstreamportion portion716 716and andoperating operating inin a aco-ordinated co-ordinated mannerunder manner undercontrol controlofofthe theprocessor processor205. 205.The The bitstream bitstream portion716716 portion may may also also contain contain oneone

slice that corresponds to one region to be output as a collection of subpictures forming one slice that corresponds to one region to be output as a collection of subpictures forming one

44204385_1 44204385_1

56

picture, each picture, each being being independently encodableand independently encodable andindependently independently decodable decodable with with respect respect to to anyany 07 Jun 2024

of the other slices or subpictures in the picture. of the other slices or subpictures in the picture.

[000174]Figs.

[000174] Figs. 9A 9A& &9B9B areare schematic schematic block block diagrams diagrams showing showing an arrangement an arrangement for holding for holding or or packingcompressed packing compressed featuremap feature map data data from from compressed compressed tensor tensor data. data. A feature A feature map,map,

corresponding to one channel of a tensor, is packed or stored in rectangular area of the frame. corresponding to one channel of a tensor, is packed or stored in rectangular area of the frame.

The feature maps of each channel are packed typically in a left-to-right manner firstly and top- The feature maps of each channel are packed typically in a left-to-right manner firstly and top-

to-bottom mannersecondly secondly fillingthe the frame framewidth widthininthe theorder order of of incrementing incrementingchannel channelcount. count. 2024203901

to-bottom manner filling

Fig. 9A Fig. showsa aframe 9A shows frame900 900that thatcontains containsa aregion region910 910ininwhich whichfeature featuremaps mapsofof a a tensorare tensor aretoto be packed. be packed. Frames Frames containing containing featuremaps feature maps packed packed ontoonto the the area area of of thethe frame frame maymay be referred be referred

to as to as “feature "featureframes”. frames". The size of The size of the the frame frame 900 maybebespecified 900 may specifiedin in terms terms of of width width and andheight height in units in units of ofsamples, samples, smallest smallestCU width/height, or CU width/height, or CTU width/height.Fig. CTU width/height. Fig.9B9B shows shows thethe

frame 900b frame 900bwhich whichcorresponds corresponds to to theframe the frame 900900 once once feature feature maps, maps, i.e.,feature i.e., featuremaps mapsobtained obtained from the from the tensor tensor 532, 532, are are packed. Wherethethetensor packed. Where tensorcompressor compressor530530 waswas configured configured to perform to perform

the feature reduction network topology described with reference to Fig. 6, the tensor 532 the feature reduction network topology described with reference to Fig. 6, the tensor 532

contains feature contains feature maps correspondingtotothe maps corresponding theP5 P5layer, layer, such such as as aa feature feature map 930. map 930.

[000175] Fig. 12

[000175] Fig. 12 is is aa schematic block diagram schematic block diagram1200 1200showing showing an an example example implementation implementation of theof the

tensor decoder tensor 146. In decoder 146. In the the example of Fig. example of Fig. 12, 12, the the tensor tensor decoder decoder 146 includes aa configurable 146 includes configurable

tensor decompressor tensor 1250 decompressor 1250 and and a selectablepicture a selectable picturedecoder decoder1204. 1204.Fig. Fig. 1919 shows shows a method a method 19001900

for decoding a bitstream, including reconstructing tensors according to an indicated tensor for decoding a bitstream, including reconstructing tensors according to an indicated tensor

decompressor,and decompressor, andperforming performing a second a second portion portion of of thethe CNN. CNN. In the In the example example described, described, the the method1900 method 1900isisconfigured configuredfor fordecoding decodinganan FCM FCM bitstream bitstream where where the inner the inner coding coding is performed is performed

using one using one of of several several compression standards,each compression standards, eachofofwhich whichhas hasa adifferent different NAL NAL unitformat, unit format, affecting signalling of metadata outside the scope of the inner coding stage. affecting signalling of metadata outside the scope of the inner coding stage.

[000176]The

[000176] Thetensor tensordecoder decoder1200 1200 (146) (146) andand thethe method method 19001900 may may be implemented be implemented as one as or one or moresoftware more softwareapplication applicationprograms programs233233 executable executable within within thethe computer computer system system 200.200. The The tensor decoder tensor 146and decoder 146 andthe themethod method1900 1900 maymay be effected be effected by by instructions instructions 231231 (see (see Fig. Fig. 2B)2B) in in

the software the software 233 that are 233 that are carried carriedout outwithin withinthe thecomputer computer system 200. The system 200. Thesoftware software instructions 231 instructions 231 may beformed may be formedasasone oneorormore more code code modules, modules, each each for for performing performing one one or more or more

particular tasks. particular tasks. The The method 1900begins method 1900 beginsatataadecode decodecodec codec identifierNAL identifier NAL unit unit step1902. step 1902.

[000177]AtAtthe

[000177] thestep step 1902, 1902,aa NAL NAL unitdemultiplexor unit demultiplexor 1202 1202 passes passes NALNAL unitsunits 1207 1207 received received in in the bitstream the bitstream 143 to aa metadata 143 to parser 1208, metadata parser 1208, under executionof under execution of the the processor processor 205, 205, to to decode decode

received NAL received NAL units.AtAtthis units. this stage, stage, the the specific specificinner innercodec codec isisnot notknown known and and so SO the the only only NAL NAL

44204385_1 44204385_1

57

unit format unit format that that can can be be unambiguously decoded unambiguously decoded is is theinner the innercodec codecidentifier identifier NAL NAL unitasas unit 07 Jun 2024

described with described with reference reference to to Appendix Appendix D.D.InInparticular, particular, an an inner_codec_identifier syntax element inner_codec_identifier syntax element is decoded is frominner decoded from inner codec codecidentifier identifier (ICI) (ICI) 1110 at step 1110 at step 1902. 1902. For example,ifif AVC For example, was AVC was

selected as selected as the the inner innercodec codec at atstep step1802, 1802,step step1902 1902operates operatestotodecode decode aaNAL unit from NAL unit fromthe the bitstream having bitstream having aa predetermined predeterminedlength. length.As Asdescribed describedininrelation relation to to Appendix Appendix D,D,the theNAL NAL unit unit

of the of the predetermined length indicates predetermined length indicates aa NAL unitformat NAL unit formatofofone oneinner innercodec codecofofaaplurality plurality of of

inner codecs. inner codecs. Each other inner Each other inner codec (such as codec (such as HEVC, HEVC, VVCVVC or ‘custom’) or 'custom') has ahas NALa NAL unit length unit length 2024203901

different to the predetermined length. The bitstream includes a plurality of NAL units and the different to the predetermined length. The bitstream includes a plurality of NAL units and the

decodedNAL decoded NAL unit unit of of predetermined predetermined length length forfor identifying identifying a particularinner a particular innercodec codecisis the the NAL NAL unit header unit header 1012. 1012.

[000178] Controlininthe

[000178] Control the processor processor 205 205progresses progressesfrom fromthe thestep step1902 1902totoaaselect select inner inner codec codec

step 1904. step 1904.

[000179] Atthe

[000179] At thestep step 1904, 1904, the the tensor tensor decoder 146, under decoder 146, underexecution executionofofthe the processor processor205, 205, selects one selects one inner inner codec codec from from aa plurality plurality of ofinner innercodecs codecsbased based on on the the decoded NALunit decoded NAL unitofof the the predeterminedlength predetermined lengthdecoded decodedatatstep step1902. 1902.The Theinner innercodec, codec,i.e., i.e., the the compression standardtoto compression standard

be performed be performedbybythe thepicture picture decoder decoder1204, 1204,isis determined determinedfrom fromthetheinner_codec_identifier inner_codec_identifier syntax element syntax elementdecoded decodedatatthe thestep step 1902. 1902.Control Controlininthe theprocessor processor205 205progresses progressesfrom from the the

step 1902 step to aa decode 1902 to FCM decode FCM VMPS VMPS step step 1906.1906.

[000180] Atthe

[000180] At thestep step 1906, 1906, the the NAL NAL unitdemultiplexor unit demultiplexor 1202, 1202, configured configured to to parse parse NALNAL unit unit

headers in headers in accordance withthe accordance with the selected selected inner inner codec, passes the codec, passes the FCM VMPS FCM VMPS 1112 1112 to to the the metadataparser metadata parser 1208. 1208.The Thedemultiplexor demultiplexor 1202 1202 is able is able to to distinguishNAL distinguish NAL units units forfor thethe metadata parser metadata parser(FCM (FCM VMPS, FCM VMPS, FCM SPS, SPS, andFCM and FCM PPS) PPS) from from NALNAL units units forfor thethepicture picture decoder1204 decoder 1204based basedononthe theal_unit_type nal_unit_type enumerations enumerations described described withwith reference reference to Appendices to Appendices

A-C.The A-C. Themetadata metadata parser parser 1208 1208 decodes decodes the the FCM FCM VMPS VMPS 1112 in 1112 in accordance accordance with the with the syntax syntax structure shown structure in Appendix shown in AppendixE Etotoproduce produce visionmodel vision model parameters parameters (output_picture_width (output_picture_width and and output_picture_heightin output_picture_height in the the example ofAppendix example of AppendixE),E), which which areare passed passed to to theCNNCNN the headhead 150. 150.

Thevision The vision model modelparameters parametersproduced produced at at step1906 step 1906 correspond correspond to the to the parameters parameters 113a 113a of Fig. of Fig.

1. 1. Vision Vision model parametersmay model parameters may include include items items such such as as thedimensions the dimensions of of thethe frame frame data data 113, 113,

neededfor needed for bounding boundingboxes boxestoto bebe scaledcorrectly. scaled correctly. Control Controlininthe theprocessor processor205 205progresses progressesfrom from the step the step 1906 to aa decode 1906 to FCM decode FCM SPS SPS step step 1907. 1907.

44204385_1 44204385_1

58

[000181] Atthe

[000181] At thestep step 1907, 1907, the the metadata metadataparser parser 1208 1208parses parsesthe theFCM FCMSPSSPS 11141114 received received from from 07 Jun 2024

the bitstream 143 via the demultiplexor 1202 to obtain tensor information relating to the bitstream 143 via the demultiplexor 1202 to obtain tensor information relating to

dimensionality of dimensionality of compressed compressedtensors tensorsand andplacement placement of of featuremaps feature maps as as packing packing information information for for

each tensor each tensor in in the the bitstream bitstream 143. 143. The FCM The FCM SPSSPS 1114 1114 is parsed is parsed to obtain to obtain thethe information information

encodedatat step encoded step 18100. 18100.The TheFCM FCMSPSSPS 11141114 is parsed is parsed at step at step 1907 1907 in accordance in accordance withwith the syntax the syntax

structure and structure and semantics described with semantics described with reference reference to to Appendix Appendix E,E,for forexample. example.Control Control in in the the

processor 205 processor 205progresses progressesfrom fromthe thestep step1907 1907totoaadecode decodeFCM FCMPPS PPS stepstep 1908. 1908. 2024203901

[000182]AtAtthe

[000182] thestep step 1908, 1908,the the metadata metadataparser parser1208 1208parses parsesthe theFCM FCMPPSPPS 11161116 if received if received fromfrom

the bitstream the bitstream 143 via the 143 via the demultiplexor 1202. The demultiplexor 1202. TheFCM FCM PPS PPS 1116 1116 is parsed is parsed to obtain to obtain

information encoded information encodedatatstep step 18110. 18110.The Thestep step1908 1908operated operated toto decode decode andand parse parse information information in in accordancewith accordance withthe thesyntax syntaxstructure structure and and semantics semanticsdescribed describedwith withreference referencetotoAppendix AppendixE, E, for for

example.For example. Forexample, example,thethe FCM FCM PPS PPS 1116 1116 includes includes information information relating relating to quantisation to quantisation ranges ranges

in elements in qr_min_exp,qr_min_exp_sign, elements qr_min_exp, qr_min_exp_sign, qr_min_mantissa, qr_min_mantissa, qr_min_mantissa_sign, qr_min_mantissa_sign,

qr_max_exp,qr_max_exp_sign, qr_max_exp, qr_max_exp_sign, qr_max_mantissa, qr_max_mantissa, qr_max_mantissa_sign) qr_max_mantissa_sign) in the example in the example of of AppendixE.E.Control Appendix Controlininthe theprocessor processor205 205progresses progressesfrom from thestep the step1908 1908to to a adetermine determine tensor tensor

decompressorstep decompressor step1910. 1910.

[000183] Atthe

[000183] At thestep step 1910, 1910, the the tensor tensor decoder 146determines decoder 146 determinesa adecoder decodernetwork network topology topology to be to be

used for used for restoring restoring dimensionality dimensionality of of compressed tensorsto compressed tensors to aa dimensionality compatiblefor dimensionality compatible foruse use as input as input to to the theCNN head150. CNN head 150.The The metadata metadata parser parser 1208, 1208, under under execution execution of the of the

processor 205, processor 205, decodes decodesinformation informationrepresenting representingthe thedecoder decodertopology topology from from thethe FCMFCM SPS SPS 1114 1114 as aa full as fulldecoder decoder network network topology. Alternatively, the topology. Alternatively, the metadata metadata parser parser 1208 decodesaareference 1208 decodes reference to aa description to description of ofthe thedecoder decodernetwork network topology, topology, which mayhave which may have been been previously previously included included in in the bitstream the bitstream 143 or may 143 or havebeen may have beenobtained obtainedvia viaexternal externalmeans means (e.g.,downloaded (e.g., downloaded from from an an internet connection). internet connection). The decodernetwork The decoder networktopology topology maymay be encoded be encoded usingusing formats formats such such as as ONNX, ONNX, NNEX, NNEX, or Pytorch or Pytorch code.code. The decoded The decoded networknetwork topologytopology maya include may include format a format indication, signalling which format is in use. For a given format, multiple versions may be indication, signalling which format is in use. For a given format, multiple versions may be

defined (or defined (or new versions may new versions maybebecreated createdininfuture) future) and and SO so aa format format version version indicator indicator may also may also

be included be included in in the the decoded networktopology. decoded network topology.Formats Formats maymay be textual be textual in nature in nature andand thus thus useuse of of an optional an optional compression stage, using compression stage, using aa technique techniquesuch suchasas ZIP ZIPororLZMA, LZMA,maymay be signalled be signalled to to minimisethe minimise thestorage storageoverhead overheadofofthe thedecoder decodernetwork network topology topology in in thebitstream the bitstream143. 143.

[000184]AsAsseen

[000184] seenininFig. Fig. 12, 12, aa tensor tensor decompressor information1230 decompressor information 1230 is is outputbybythe output theparser parser 1208. Thetensor 1208. The tensor decompressor decompressorinformation information 1230 1230 specifies specifies a decoder a decoder network network topology topology either either

44204385_1 44204385_1

59

by reference by reference or or by by structure. structure. If Ifthe thetensor tensordecompressor decompressor information 1230specifies information 1230 specifies the the decoder decoder 07 Jun 2024

networktopology network topologybybyreference, reference,the thestructure structure of of the the decoder networktopology decoder network topologyisisobtained obtainedfrom from a tensor a tensor decompressor repository1232 decompressor repository 1232(if (if available) available) or or the the repository repository1232 1232 obtains obtains the the topology topology

from the from the tensor tensor codec repository 180. codec repository 180. The Thedecoder decodernetwork network topology topology may may be retained be retained in the in the

repository 1232 repository for future 1232 for future use use even even after after aadifferent differentdecoder decodernetwork network topology is used. topology is used. The The

repository 1232 repository outputs information 1232 outputs information1238 1238corresponding correspondingto to thedecoder the decoder topology topology to to thethe tensor tensor

decompressor1250. decompressor 1250.TheThe decoder decoder network network topology topology may may be be retained retained in theinrepository the repository 1232 1232 for for 2024203901

future use future use even even after after aasubsequent subsequent ICI ICI (e.g., (e.g.,1110) 1110)isis received, indicating received, thethe indicating commencement of commencement of

a new a bitstream. IfIf the new bitstream. the decoder networktopology decoder network topologyrequires requiresweights, weights,weight weightinformation information 1234 1234 is is providedfrom provided fromthe themetadata metadataparser parser1208 1208totoa atensor tensorweights weightsrepository repository1236 1236totospecify specifyweights weights either by either by reference reference or or by by value. value.The The repository repository 1236 1236 output output information 1240corresponding information 1240 correspondingtoto the weights the information. AAgiven weights information. givenfeature featurerestoration restoration network networktopology topologymay may allow allow specific specific

dimensionsofofthe dimensions the input input and and output output tensors tensors to to be be changed at runtime, changed at runtime, sometimes sometimesreferred referredtoto as as ‘dynamic axes’. Such 'dynamic axes'. Suchdynamic dynamic axes axes maymay correspond correspond to width to the the width and height and height of feature of feature mapsmaps of of

the tensors the tensors being being compressed orrestored. compressed or restored. Control Controlininthe the processor processor 205 205progresses progressesfrom fromthe the step 1910 step to aa decode 1910 to complexityindication decode complexity indicationstep step 1920. 1920.

[000185] Atthe

[000185] At thestep step 1920, 1920, the the metadata metadataparser parser 1208, 1208,under underexecution executionofofthe theprocessor processor205, 205, decodesone decodes oneorormore moresyntax syntaxelements elements indicatingthe indicating theworst-case worst-casecomplexity complexity of of anyany decoder decoder

networktopology network topologythat thatwill will be be implemented implemented inin thetensor the tensordecompressor decompressor 1250 1250 as the as the complexity complexity

indication from indication the FCM from the SPS FCM SPS 1114. 1114. Worst-case Worst-case complexity complexity includes includes one one or or of more more theof the worst worst

case in case in terms terms of of storage storage of ofintermediate intermediate tensors tensorswithin withinthe thetensor tensordecompressor decompressor 1250, worst- 1250, worst-

case number case number ofofMAC MAC operations operations to be to be performed performed by tensor by the the tensor decompressor decompressor 1250 1250 and and worst worst case floating-point case floating-point operations operations of of any any kind kind to tobe beperformed performed in in the the tensor tensordecompressor 1250. decompressor 1250.

Control in Control in the the processor processor 205 progresses from 205 progresses fromthe the step step 1920 1920toto aa determine determinetensor tensor decompressor decompressor complexitystep complexity step 1930. 1930.

[000186] Atthe

[000186] At thestep step 1930, 1930, the the tensor tensor decoder 146, under decoder 146, underexecution executionofofthe the processor processor205, 205, determinesthe determines the required required complexity complexitytotoperform performthe thedetermined determined decoder decoder network network topology. topology. An An operation count operation count is is produced byperforming produced by performinga atraversal traversal of of the the stages stages defined defined in in the thedetermined determined

decodernetwork decoder networktopology topology and and counting counting operations operations implied implied by by each each stage stage without without performing performing

the stage. Dimensionality of any persistent tensor data (i.e., tensors retained from one the stage. Dimensionality of any persistent tensor data (i.e., tensors retained from one

invocation of invocation of the the tensor tensor decompressor 1250totothe decompressor 1250 thenext nextinvocation), invocation), is is recorded. Thevolume recorded. The volumeofof intermediate tensor intermediate tensor data data is isretained, retained,such suchthat thethe that maximum amountofofintermediate maximum amount intermediatetensor tensordata data

44204385_1 44204385_1

60

concurrently retained concurrently retained in in the the memory 206isisdetermined. memory 206 determined.Control Control in in theprocessor the processor205 205 07 Jun 2024

progresses from progresses fromthe the step step 1930 1930to to aa complexity complexitytest test step step 1950. 1950.

[000187] Atthe

[000187] At thestep step 1950, 1950,the the tensor tensor decoder 146, under decoder 146, underexecution executionofofthe theprocessor processor205, 205, comparesthe compares thedecoder decodernetwork network topology topology complexity complexity determined determined at the at the stepstep 19301930 withwith the the complexity indication decoded at the step 1920. If the determined complexity is less than or complexity indication decoded at the step 1920. If the determined complexity is less than or

equal to equal to the the complexity indication (“OK” complexity indication at step ("OK" at step 1950) 1950)control control in in the the processor processor 205 progresses 205 progresses

from the the step step 1950 to an an instantiate instantiate tensor tensordecompressor step 1970. If the thedetermined 2024203901

from 1950 to decompressor step 1970. If determined

complexityisis greater complexity greater than than the the complexity indication (“NOT complexity indication OK” ("NOT OK" at at step step 1950) 1950) control control in in the the

processor 205 processor 205progresses progressesfrom fromthe thestep step1950 1950totoananerror error condition condition step step 1960. 1960.

[000188] Atthe

[000188] At thestep step 1960, 1960, the the tensor tensor decoder 146enters decoder 146 enters an an error error state state and and decoding cannot decoding cannot

continue due continue due to to the the possibility possibilityof ofa asignalled decoder signalled decodernetwork network topology topology exceeding the exceeding the

capabilities ofofthe capabilities thetensor decompressor tensor decompressor 1250. Themethod 1250. The method 1900 1900 terminates. terminates.

[000189] Atthe

[000189] At thestep step 1970, 1970, the the tensor tensor decompressor 1250 decompressor 1250 is isinitialized initialized in in accordance with the accordance with the decodernetwork decoder networktopology topologyasas determined determined at at thethestep step1910. 1910.TheThe step step 1970 1970 is is performed performed onlyonly

whena anew when newdecoder decoder network network topology topology was was determined determined at step at the the step 1910, 1910, i.e.i.e. subsequent subsequent

invocations of invocations of the the method 1900for method 1900 forwhich whichnononew new decoder decoder network network topology topology is determined is determined may may reuse resources allocated at the step 1970. Sufficient storage memory is allocated to hold any reuse resources allocated at the step 1970. Sufficient storage memory is allocated to hold any

persistent tensors, (i.e., tensors retained from one invocation of the method 1900 to the next persistent tensors, (i.e., tensors retained from one invocation of the method 1900 to the next

invocation of invocation of the the method 1900),along method 1900), alongwith withmemory memory to hold to hold thethe maximum maximum concurrently concurrently used used intermediate tensors intermediate tensors in in performing the decoder performing the networktopology. decoder network topology.In Inthethecase casewhere where hardware hardware

acceleration is applied for the decoder network topology, reservation of sufficient execution acceleration is applied for the decoder network topology, reservation of sufficient execution

units, such units, such as as MACs, DSP MACs, DSP blocks, blocks, e.g.,inin an e.g., an FPGA, FPGA, may may also also take take place. place. In In thethe case case where where thethe

decodernetwork decoder networktopology topologyisistotobe beperformed performedininsoftware, software,sufficient sufficient execution executiontime timeon onavailable available resources such resources such as as CPU CPUororGPU GPUis is reserved reserved to to enable enable real-timeoperation real-time operationofofthe themethod method 1900 1900

(i.e., sufficient (i.e., to allow sufficient repeated to allow invocations repeated of the invocations method of the 1900 method 1900asas incoming incoming packed packed frames frames

are decoded without accumulated stalling, jitter, buffering delay). Control in the processor 205 are decoded without accumulated stalling, jitter, buffering delay). Control in the processor 205

progresses from progresses fromthe the step step 1970 1970toto aa decode decodepacked packedframe frame step1980. step 1980.

[000190]AtAtthe

[000190] thestep step 1980, 1980,the the picture picture decoder 1204,under decoder 1204, underexecution executionofofthe theprocessor processor205, 205, decodesone decodes onepacked packedframe frame from from thethe bitstream bitstream 1206 1206 to to produce produce a decoded a decoded frame frame 1210,1210, provided provided

fcm_sps_inner_decoding_bypass_flag fcm_sps_inner_decoding_bypass_flag was was set set to disabled to disabled (i.e.,zero, (i.e., zero, or or do do not not bypass bypassthe the inner inner decoding step). The decoding step). Thebitstream bitstream1206 1206includes includesNAL NAL units units from from the the bitstream bitstream 143143 having having NAL NAL

44204385_1 44204385_1

61

unit types unit types that thatare arenot notallocated asas allocated FCM FCM VMPS, FCM VMPS, FCM SPS,SPS, or FCM or FCM PPS. PPS. Due to Due to operation operation of of 07 Jun 2024

step 1890 step at the 1890 at the encoding stage, the encoding stage, the step step1980 1980 executes executes to to produce one or produce one or more integer tensors more integer tensors from the bitstream. The step 1980 operates to decode the bitstream using the selected inner from the bitstream. The step 1980 operates to decode the bitstream using the selected inner

codec of codec of step step 1904 to produce 1904 to producethe the tensors tensors to to be be provided to the provided to the neural neural network head (second network head (second portion) 150. Operation of the picture decoder 1204 is described with reference to Fig. 13. portion) 150. Operation of the picture decoder 1204 is described with reference to Fig. 13.

Control in Control in the the processor processor 205 progresses from 205 progresses fromthe the step step 1980 1980toto an an unpack unpacktensors tensorsstep step 1990. 1990.

[000191] Atthe thestep step 1990, 1990, an an unpacker unpacker1214, 1214,under under execution of of theprocessor processor205, 205,reads reads 2024203901

[000191] At execution the

feature feature maps fromthe maps from thedecoded decodedframe frame 1210 1210 in in accordance accordance with with the the packing packing format format as as

determinedatat the determined the step step 1906 in decoding 1906 in the FCM decoding the FCM VMPS VMPS 1112 1112 and described and described with reference with reference to to Figs. 9A Figs. 9A &&9B. 9B.For Foreach each tensor,a anumber tensor, numberof of featuremaps feature maps areare decoded, decoded, thethe number number

corresponding to the number of used channels in the tensor as signalled in the tensor corresponding to the number of used channels in the tensor as signalled in the tensor

information. Aspects information. Aspectsofofthe thetensor tensor information informationused usedfor forpacking packingand andunpacking unpackingareare reduced- reduced-

domaintensor domain tensordimensionality dimensionalityand andplacement placement of of each each reduced-domain reduced-domain tensor tensor in the in the decoded decoded

frame 1210. frame 1210.The Thechannels channels forfor each each tensorare tensor areunpacked unpackedas as two-dimensional two-dimensional feature feature maps. maps. The The numberofoffeature number featuremaps mapsororchannels channelstotodecode decodeforfora agiven giventensor tensorisis decoded decodedfrom fromthethe bitstream 143 bitstream 143 as as aa ‘channel count’. The 'channel count'. Theunpacker unpacker1214 1214 outputs outputs integertensors integer tensors1216, 1216,where where thethe

tensors 1216 tensors havebeen 1216 have beendecoded decoded using using thedecoder the decoder topology topology forfor thethe tensordecoder tensor decoder 146. 146. DueDue to to use of use of the the video video decoder 1204, the decoder 1204, the tensors tensors 1216 contain integer 1216 contain integer elements in the elements in the range range afforded afforded

by the by the bit bit depth depth in inuse usein inthe video the videodecoder decoder1204. 1204. Control in the Control in the processor processor 205 205 progresses from progresses from

the step 1990 to an inverse quantise tensors step 19100. the step 1990 to an inverse quantise tensors step 19100.

[000192] Atthe

[000192] At thestep step 19100, 19100,ananinverse inversequantiser quantiser 1218, 1218,under underexecution executionofofthe theprocessor processor205, 205, performsinverse performs inversequantisation quantisation on on the the integer integer tensors tensors 1216 to produce 1216 to inverse quantised produce inverse quantised tensors 1220, tensors provided cm_sps_quantisation_bypass_flag 1220, provided fcm_sps_quantisation_bypass_flag is set is set to to zero zero (i.e., do (i.e., donot not bypass bypass inverse quantisation). inverse quantisation). The The inverse inverse quantiser quantiser 1218 1218 applies applies quantisation quantisation ranges ranges decoded fromthe decoded from the bitstream 143, to the determined channel count of each tensor, also converting the resultant bitstream 143, to the determined channel count of each tensor, also converting the resultant

tensor to tensor to aa floating-point floating-pointdata dataformat. format.The The quantisation quantisation ranges ranges indicate indicateaamaximum and maximum and

minimum minimum value value (or(or lower lower andand upper upper bound) bound) usedused in the in the floating-point floating-point domain domain at the at the output output

from the from the feature feature reduction reduction network or tensor network or tensor compressor compressor530, 530,i.e., i.e., 532. Toperform 532. To performinverse inverse quantisation, the quantisation, the quantisation quantisationrange range isisdecoded decoded from from the the FCM PPS FCM PPS 1116 1116 by by thethe metadata metadata

parser 1208 parser as 1270 1208 as 1270atat the the step step 1908. Thestep 1908. The step19100 19100therefore thereforeoperates operatestotodecode decodethe the quantisation range quantisation and produce range and produceone oneorormore morefloating-point floating-pointtensors tensors(1220) (1220)from fromthe theinteger integer tensor(s) produced tensor(s) at the produced at the step step 1990 1990 using using the the range range information. information. The quantisation The quantisation

range indicates range indicates aa range range of of values, values,and and the thetensors tensors1220 1220 are areproduced produced so SO that that each each element element of of

44204385_1 44204385_1

62

each feature each feature map of each map of eachtensor tensor has has aa value value within within the the indicated indicated range. The quantisation range. The quantisation 07 Jun 2024

range may range maybebeobtained obtainedbybydecoding decoding syntax syntax elements elements qr_min_exp, qr_min_exp_sign, qr_min_exp,qr_min_exp_sign

qr_min_mantissa, qr_min_mantissa_sign, qr_min_mantissa, qr_min_mantissa_sign,qr_max_exp, qr_max_exp_sign, qr_max_mantissa, qr_max_exp,qr_max_exp_sign,qr_max_mantissa,

qr_max_mantissa_sign, qr_max_mantissa_sign, as as described described with with reference reference to to Appendix Appendix E. Control E. Control in the in the

processor 205 processor 205progresses progressesfrom fromthe thestep step19100 19100totoone oneofofaabuffer buffer quantised quantisedtensors tensors step step 19110. 19110.

[000193] Atthe

[000193] At thestep step 19110, 19110,aa tensor tensor storage storage module module1222, 1222,under underexecution execution of of the the

processor 205, 205, provides provides inter-frame inter-frame storage storage of of the the inverse inverse quantised quantised tensors tensors 1220. Of the the 2024203901

processor 1220. Of

tensors for each region of the packing format, each tensor with at least one channel or feature tensors for each region of the packing format, each tensor with at least one channel or feature

mapdecoded map decodedisisstored storedinin the the tensor tensor storage storage module 1222.TheThe module 1222. tensor tensor storage storage module module 1222 1222

produces output produces output tensors tensors 1224, 1224, including including therecent the most mosttensor recentfortensor for each each tensor tensor where where at least at least

one feature one feature map wasdecoded. map was decoded.In In other other words, words, where where a tensor a tensor waswas notnot decoded decoded for for a current a current

frame (i.e., a channel count of zero was determined), the most recent value for the tensor where frame (i.e., a channel count of zero was determined), the most recent value for the tensor where

a nonzero a channelcount nonzero channel countwas wasdecoded, decoded, is isused. used.Control Control in in theprocessor the processor205 205 progresses progresses from from

the step the step 19110 to aa perform 19110 to tensor decompression perform tensor step19120. decompression step 19120.

[000194] Atthe

[000194] At thestep step 19120, 19120,the the tensor tensor decompressor decompressor1250, 1250, under under execution execution of of thethe

processor 205, processor 205, performs performsthe thesteps steps specified specified by by the the decoder networktopology decoder network topologyusing usingthe the tensors 1224 tensors as input 1224 as input to to produce decodedtensors produce decoded tensors1254 1254when when fcm_sps_feature_restoration_bypass_flag fcm_sps_feature_restoration_bypass_flagi is set is to set to i.e., zero, zero, do i.e., notdo not bypass bypass the tensor the tensor

decompreesion decompreesion oror featurerestoration feature restoration step. step. Operation Operationofofthe the step step 19120 19120isis described described further further with with

reference to Fig. 14. By virtue of the check performed at the step 1950, performance of the reference to Fig. 14. By virtue of the check performed at the step 1950, performance of the

decoder network decoder networktopology topology willnot will notconsume consume resources resources beyond beyond those those already already allocated allocated by the by the

destination device destination device 140 for the 140 for the purpose purpose of of tensor tensor decompression andhence decompression and hencewill willsucceed succeedinin producingananoutput. producing output. Control Controlininthe theprocessor processor205 205progresses progressesfrom fromthethestep step19120 19120to to a aperform perform upsamplingstep upsampling step19130. 19130.

[000195]AtAtthe

[000195] thestep step 19130, 19130,aatemporal temporalupsampler upsampler 1260, 1260, under under execution execution of the of the processor processor 205, 205,

performsaa temporal performs temporalupsampling upsampling (interpolation)totoproduce (interpolation) producethe thetensors tensors149 149from fromthethe tensors 1254. tensors Thetemporal 1254. The temporalupsampler upsampler is is activewhen active when fcm_pps_temporal_upsampling_enabled_flag is to fcm_pps_temporal_upsampling_enabled_flag is set set indicate to indicate application application of of temporal temporal

upsampling,inin accordance upsampling, accordancewith withthe theratio ratio indicated indicated by by temporal_upsampling_ratio_minus2. temporal_upsampling_ratio_minus2. Eachtemporal Each temporalupsampling upsampling operation operation takes takes twotwo consecutive consecutive sets sets of of tensorsfrom tensors from thethe

tensors 1254 tensors andproduces 1254 and producesone oneorormore more intermediate intermediate tensors,output tensors, outputalong alongwith withthe thetensors tensors1254 1254 to produce to tensors 149. produce tensors 149. Due Duetotouse useofofthe the second secondset set of of tensors tensors from 1254toto produce from 1254 produce

44204385_1 44204385_1

63

intermediate tensors, intermediate tensors, structural structuraldelay delayisis introduced introducedwhen when temporal temporal upsampling is enabled, upsampling is enabled, hence hence 07 Jun 2024

temporal upsampling is suited to applications that can tolerate a degree of latency. Control in temporal upsampling is suited to applications that can tolerate a degree of latency. Control in

the processor the processor 205 progresses from 205 progresses fromthe thestep step 1930 1930totoaa perform performneural neuralnetwork networksecond second portion portion

step 19140. step 19140.

[000196] Atthe

[000196] At thestep step 19140, 19140,the the CNN CNN head head 150, 150, under under execution execution of the of the processor processor 205, 205, performs performs

the remaining the layers of remaining layers of the the neural neural network implemented network implemented byby thesystem the system 100, 100, using using thethe

tensors 149 as input. input. The method1900 1900 terminates and thethe processor 205205 maymay reinvoke the 2024203901

tensors 149 as The method terminates and processor reinvoke the

method1900 method 1900upon upon receiving receiving thethe next next packed packed frame frame in the in the bitstream bitstream 143. 143.

[000197] Fig. 13

[000197] Fig. 13 is is aa schematic block diagram schematic block diagram1300 1300showing showing functional functional modules modules of example of an an example implementationofofthe implementation thevideo videodecoder decoder1204. 1204.TheThe video video decoder decoder 12041204 may may be be implemented implemented as as one or one or more softwareapplication more software applicationprograms programs233233 executable executable within within thethe computer computer system system 200.200.

Thevideo The videodecoder decoder1204 1204 may may be be effected effected by by instructions instructions 231 231 (see (see Fig.2B) Fig. 2B) inin thesoftware the software233 233 that are that are carried carriedout outwithin withinthe thecomputer computer system system 200. Thesoftware 200. The softwareinstructions instructions 231 231may maybebe formedasasone formed oneoror more morecode codemodules, modules, each each forfor performing performing oneone or more or more particular particular tasks. tasks.

[000198] Thebitstream

[000198] The bitstream1206 1206isisinput inputtoto an an entropy entropydecoder decodermodule module 1320. 1320. TheThe entropy entropy decoder decoder

module1320 module 1320extracts extractssyntax syntaxelements elements from from thethe bitstream bitstream 143 143 by by decoding decoding sequences sequences of ‘bins’ of 'bins'

and passes and passes the the values values of of the the syntax syntax elements to other elements to other modules in the modules in the video video decoder 1204.The decoder 1204. The entropy decoder entropy decodermodule module 1320 1320 uses uses variable-length variable-length andand fixed fixed length length decoding decoding to to decode decode SPS,SPS,

PPSororslice PPS slice header using an header using an arithmetic arithmetic decoding enginetoto decode decoding engine decodesyntax syntaxelements elementsofofthe theslice slice data as data as aa sequence of one sequence of or more one or bins. Each more bins. Eachbin binmay may useoneone use or or more more ‘contexts’,with 'contexts', witha a context describing probability levels to be used for coding a ‘one’ and a ‘zero’ value for the bin. context describing probability levels to be used for coding a 'one' and a 'zero' value for the bin. Where multiple contexts are available for a given bin, a ‘context modelling’ or ‘context Where multiple contexts are available for a given bin, a 'context modelling' or 'context

selection’ step is performed to choose one of the available contexts for decoding the bin. The selection' step is performed to choose one of the available contexts for decoding the bin. The

process of process of decoding bins forms decoding bins formsaasequential sequential feedback feedbackloop, loop,where whereeach eachslice slicemay maybebedecoded decoded in in entirety by entirety by aa given given entropy entropy decoder 1020instance. decoder 1020 instance.

[000199]The

[000199] Theentropy entropydecoder decoder module module 13201320 applies applies an arithmetic an arithmetic coding coding algorithm, algorithm, for for example'context example ‘contextadaptive adaptivebinary binaryarithmetic arithmeticcoding' coding’(CABAC), (CABAC), to decode to decode syntax syntax elements elements

from the from the bitstream bitstream 143. 143. The Thedecoded decoded syntax syntax elements elements areare used used to to reconstruct reconstruct parameters parameters within within

the video the video decoder 1204.Parameters decoder 1204. Parameters include include residualcoefficients residual coefficients(represented (representedbybyananarrow arrow 1324), 1324), a a quantisation quantisation parameter 1374, aa secondary parameter 1374, secondarytransform transformindex index1370, 1370,and andmode mode selection selection

information such information suchas as an an intra intra prediction prediction mode (represented by mode (represented byan anarrow arrow1358). 1358).The Themode mode

44204385_1 44204385_1

64

selection information also includes information such as motion vectors, and the partitioning of selection information also includes information such as motion vectors, and the partitioning of 07 Jun 2024

each CTU each CTUinto intoone oneorormore more CBs. CBs. Parameters Parameters are are usedused to generate to generate PBs,PBs, typically typically in combination in combination

with sample with sampledata datafrom frompreviously previouslydecoded decoded CBs. CBs.

[000200] Theresidual

[000200] The residualcoefficients coefficients 1324 1324are are passed passedto to an an inverse inverse secondary secondarytransform transform module1336 module 1336where where eithera asecondary either secondary transform transform is is applied applied oror nono operationisisperformed operation performed (bypass) according (bypass) accordingto to aa secondary transformindex. secondary transform index.The Theinverse inversesecondary secondary transform transform

module1336 1336produces produces reconstructed transform coefficients 1332. That is,is, themodule module 1336 2024203901

module reconstructed transform coefficients 1332. That the 1336

producesprimary produces primarytransform transformdomain domain coefficients coefficients from from secondary secondary transform transform domain domain coefficients. coefficients.

Thereconstructed The reconstructedtransform transformcoefficients coefficients 1332 1332are areinput input to to aa dequantiser dequantiser module 1328.TheThe module 1328.

dequantiser module dequantiser module1328 1328 performs performs inverse inverse quantisation quantisation (or(or ‘scaling’)ononthe 'scaling') theresidual residual coefficients 1332, that is, in the primary transform coefficient domain, to create reconstructed coefficients 1332, that is, in the primary transform coefficient domain, to create reconstructed

intermediate transform intermediate transform coefficients, coefficients, represented represented by by an an arrow 1340, according arrow 1340, accordingtoto the the quantisation parameter quantisation 1374.The parameter 1374. The dequantiser dequantiser module module 13281328 may may also also applyapply a scaling a scaling matrix matrix to to provide non-uniform provide non-uniformdequantization dequantization within within theTB, the TB, corresponding corresponding to to operation operation of of thethe

dequantiser module dequantiser module840. 840.Should Should useuse of of a non-uniform a non-uniform inverse inverse quantisation quantisation matrix matrix be indicated be indicated

in the in the bitstream bitstream 1206, 1206, the the video video decoder decoder 1204 reads aa quantisation 1204 reads quantisation matrix fromthe matrix from the bitstream 143 as a sequence of scaling factors and arranges the scaling factors into a matrix. bitstream 143 as a sequence of scaling factors and arranges the scaling factors into a matrix.

Theinverse The inverse scaling scaling uses uses the the quantisation quantisation matrix matrix in in combination with the combination with the quantisation quantisation parameter parameter to create the reconstructed intermediate transform coefficients 1340. to create the reconstructed intermediate transform coefficients 1340.

[000201] Thereconstructed

[000201] The reconstructedtransform transformcoefficients coefficients1340 1340are arepassed passedtotoananinverse inverseprimary primary transform module transform module1344. 1344.TheThe module module 13441344 transforms transforms the coefficients the coefficients 13401340 fromfrom the frequency the frequency

domainback domain backtotothe thespatial spatial domain. Theinverse domain. The inverseprimary primarytransform transformmodule module 1344 1344 applies applies inverse inverse

DCT-2transforms DCT-2 transforms horizontallyandand horizontally vertically,constrained vertically, constrainedbybythe themaximum maximum available available transform transform

size as size as described described with with reference reference to tothe theforward forward primary primary transform module826. transform module 826.TheThe resultofof result

operation of operation of the the module 1344isis aa block module 1344 block of of residual residual samples, samples, represented by an represented by an arrow arrow1348. 1348.The The block of block of residual residual samples 1348isis equal samples 1348 equal in in size size to tothe thecorresponding corresponding CB. Theresidual CB. The residual samples1348 samples 1348are aresupplied suppliedtotoaa summation summation module module 1350. 1350.

[000202] Atthe

[000202] At thesummation summation module module 1350, 1350, the the residual residual samples samples 13481348 are added are added to a to a decoded decoded PB PB (represented as (represented as 1352) to produce 1352) to produce aa block block of of reconstructed reconstructed samples, samples, represented representedby byan an arrow 1356. arrow 1356.The Thereconstructed reconstructed samples samples 1356 1356 are are supplied supplied toreconstructed to a a reconstructed sample sample

cache 1360 cache 1360and andananin-loop in-loopfiltering filtering module 1388.The module 1388. Thein-loop in-loopfiltering filtering module module1388 1388produces produces

44204385_1 44204385_1

65

reconstructed blocks reconstructed blocks of of frame samples,represented frame samples, representedasas1392. 1392.The Theframe framesamples samples 1392 1392 are are 07 Jun 2024

written to written to aa frame frame buffer buffer 1396. Theframe 1396. The framebuffer buffer1396 1396outputs outputsimage imageor or video video frames frames 1210. 1210.

[000203] Thereconstructed

[000203] The reconstructedsample sample cache cache 1360 1360 operates operates similarly similarly to to thereference the referencesample sample cache 856 cache 856of of the the video video encoder encoder542. 542.The The reconstructed reconstructed sample sample cache cache 1360 1360 provides provides storage storage for for reconstructed samples reconstructed samplesneeded neededtotointra intra predict predict subsequent CBswithout subsequent CBs withoutthethememory memory206 206 (e.g., (e.g., by by using the using the data data 232 232 instead, instead, which is typically which is typicallyon-chip on-chipmemory). Referencesamples, memory). Reference samples, represented by by an an arrow arrow1364, 1364,are areobtained obtainedfrom fromthe thereconstructed reconstructedsample samplecache cache 1360 andand 2024203901

represented 1360

supplied to a reference sample filter 1368 to produce filtered reference samples indicated by supplied to a reference sample filter 1368 to produce filtered reference samples indicated by

arrow 1372. arrow 1372.The The filteredreference filtered referencesamples samples1372 1372 aresupplied are suppliedtotoananintra-frame intra-frameprediction prediction module1376. module 1376.The The module module 1376 1376 produces produces a block a block of intra-predicted of intra-predicted samples, samples, represented represented by by an an arrow 1380, arrow 1380,in in accordance accordancewith withthe theintra intra prediction prediction mode parameter1358 mode parameter 1358 signalled signalled inin the the

bitstream 1206 bitstream anddecoded 1206 and decodedbyby theentropy the entropydecoder decoder 1320. 1320. The The intra intra prediction prediction module module 13761376

supports the supports the modes ofthe modes of the encoder-side encoder-sidemodule module 864, 864, including including IBC IBC andand MIP. MIP. The block The block of of samples1380 samples 1380isisgenerated generatedusing usingmodes modes such such as as DC, DC, planar planar or or angular angular intraprediction. intra prediction.

[000204] When

[000204] When thethe predictionmode prediction mode of of a CB a CB is indicated is indicated to to use use intraprediction intra predictioninin the the bitstream 143, bitstream 143, the the intra-predicted intra-predictedsamples samples 1380 formthe 1380 form the decoded decodedPBPB 1352 1352 viavia a multiplexor a multiplexor

module1384. module 1384.Intra Intraprediction predictionproduces producesa aprediction predictionblock block(PB) (PB)ofofsamples, samples,which which is is a a blockinin block

one colour one colour component, component,derived derivedusing using'neighbouring ‘neighbouring samples’ samples' in the in the same same colour colour component. component.

Theneighbouring The neighbouringsamples samples areare samples samples adjacent adjacent to to thecurrent the currentblock blockand andbyby virtueofofbeing virtue being preceding in preceding in the the block block decoding orderhave decoding order havealready alreadybeen beenreconstructed. reconstructed.Where Where luma luma and and chromablocks chroma blocksare arecollocated, collocated, the the luma lumaand andchroma chroma blocks blocks maymay use use different different intraprediction intra prediction modes.However, modes. However,thethe two two chroma chroma CBs CBs shareshare the same the same intraintra prediction prediction mode. mode.

[000205] When

[000205] When thethe predictionmode prediction mode of the of the CB CB is indicated is indicated to to bebe interprediction inter predictioninin the the bitstream 1206, bitstream 1206, aa motion compensation motion compensation module module 13341334 produces produces a block a block of inter-predicted of inter-predicted

samples, represented samples, represented as as 1338. 1338. The Theblock blockofofinter-predicted inter-predictedsamples samples1338 1338 areproduced are produced using using a a motionvector, motion vector, decoded decodedfrom fromthe thebitstream bitstream143 143bybythetheentropy entropydecoder decoder 1320, 1320, andand reference reference

frame index frame indexto to select select and and filter filtera block ofofsamples a block samples1398 1398 from from the the frame frame buffer buffer 1396. 1396. The block The block

of samples of 1398isis obtained samples 1398 obtainedfrom fromaapreviously previouslydecoded decoded frame frame stored stored in in theframe the framebuffer buffer1396. 1396. For bi-prediction, For bi-prediction, two two blocks blocks of of samples are produced samples are andblended produced and blendedtogether togethertotoproduce producesamples samples for the for the decoded PB1352. decoded PB 1352.TheThe frame frame buffer buffer 1396 1396 is populated is populated with with filteredblock filtered block data1392 data 1392 from the from the in-loop in-loop filtering filtering module module 1388. Aswith 1388. As withthe thein-loop in-loopfiltering filtering module 868ofofthe module 868 the video video encoder 542, encoder 542, the the in-loop in-loop filtering filtering module module 1388 applies any 1388 applies any of of the the DBF, theALF DBF, the ALF and and SAOSAO

44204385_1 44204385_1

66

filtering operations. filtering operations.Generally, Generally, the themotion motion vector vector is isapplied appliedtotoboth boththe luma the lumaand andchroma chroma 07 Jun 2024

channels, although channels, although the the filtering filtering processes processesfor forsub-sample sub-sample interpolation interpolationin inthe theluma lumaand andchroma chroma

channel are different. channel are different.

[000206] Fig. 14

[000206] Fig. 14 is is aa schematic block diagram schematic block diagramshowing showingan an implementation implementation 14001400 of a of a

configurable feature configurable feature reconstruction reconstruction module performinga adecoder module performing decoder network network topology, topology, which which may may serve as serve as the the tensor tensordecompressor 1250.A A decompressor 1250. model model 1405, 1405, in the in the example example of Fig. of Fig. 14 14 an an ONNX ONNX

model1405 1405ofofthe thedecoder decodernetwork network topology to to be be performed as the decompressor 1250 1250 2024203901

model topology performed as the decompressor

receives the receives the tensor tensor decompressor networktopology decompressor network topology information information 1238 1238 and and the the tensor tensor weight weight

information 1240. information 1240.The Thestructure structureof of the the tensor tensor decompressor 1250isisselected decompressor 1250 selectedbybythe theONNX ONNX model1405 model 1405based based onon theinformation the information 1238. 1238. TheThe weights weights for for the the tensor tensor decompressor decompressor 1250 1250 are are selected by selected by the the ONNX model ONNX model 14051405 based based on information on the the information 1240.1240. Based Based on the on the selections selections at at the model the 1405,aadecompression model 1405, decompression model model 1410 1410 executes. executes. In the In the example example of Fig. of Fig. 14, 14, an ONNX an ONNX

runtime model runtime model1410 1410 executes executes to to receivethe receive thecompressed compressed tensors tensors 1224 1224 andand output output the the

decompressedtensors decompressed tensors1254. 1254.AsAs indicated indicated inin Fig.14, Fig. 14,resources resourcesrequired requiredtoto run run the the ONNX ONNX model model

1410 maybebeallocated 1410 may allocatedfrom fromone oneorormore more resources, resources, such such as as one one oror more more of of a CPU a CPU 1420, 1420, an an

FPGA FPGA 1424, 1424, a vector a vector processing processing unit(VPU) unit (VPU) 1428, 1428, a GPU a GPU 1432 1432 and anand an interface interface modelmodel

DirectML1436. DirectML1436. Each Each of of thethe resources resources 1420, 1420, 1424, 1424, 1428, 1428, 14321432 and and 14361436 canimplemented can be be implemented on on the module the 201ororcan module 201 canbebeexecuted executedacross acrossone oneorormore more similardevices. similar devices.

[000207] Fig. 15

[000207] Fig. 15 is is aa schematic block diagram schematic block diagramshowing showing a tensordecompressor a tensor decompressor 15001500 using using a a multi-scale feature reconstruction stage, which may be selected at the step 1910 for use in the multi-scale feature reconstruction stage, which may be selected at the step 1910 for use in the

tensor decompressor tensor 1250.InInparticular, decompressor 1250. particular, the the decompressor 1500 decompressor 1500 can can bebe implemented implemented as the as the

runtime model runtime model1410 1410ofof Fig.14. Fig. 14.The The tensor tensor decompressor decompressor 15001500 includes includes a single-scale a single-scale feature feature

compression(SSFC) compression (SSFC) decompressor decompressor 1510. 1510. The decompressor The SSFC SSFC decompressor 1510 receives 1510 receives the the tensor 1224 tensor havingaareduced 1224 having reducedchannel channelcount, count,such suchasas6464channels, channels,and andpasses passesthe thetensor tensor1224 1224toto a convolution a layer 1512, convolution layer whichoutputs 1512, which outputsaatensor tensor 1513 1513having havinga arestored restoredchannel channelcount, count,such suchasas 256 channels. 256 channels. The Thetensor tensor1513 1513isispassed passedtotoaabatch batchnormalisation normalisationmodule module 1514 1514 to to produce produce a a tensor 1515. tensor Thetensor 1515. The tensor1515 1515isispassed passedtotoaa PreLU PreLUmodule module 1516 1516 to produce to produce a tensor a tensor 1520. 1520. The The tensor decompressor tensor 1500 decompressor 1500 includes includes a a MSFR MSFR module module 1530.1530. Themodule The MSFR MSFR1530 module 1530tooperates operates to produceaa plurality produce plurality of of tensors tensorsfrom from the the tensor tensor1520 1520 produced byexecution produced by executionofofstep step 19120, 19120, described with described with reference reference to to Fig. Fig. 19, 19, using using one one or or more more trained trained convolutional convolutional layers. layers. Upsample Upsample

modules1532, modules 1532,1534, 1534,and and 1536 1536 upsample upsample the the tensor tensor 15201520 horizontally horizontally and and vertically vertically by by factors factors

of two, of two, four, four, and and eight, eight,respectively, respectively,toto produce producetensors 1533, tensors 1533,1535, 1535,and and1537. 1537. The tensor 1537 The tensor 1537

forms one forms one(P'2, (P’2, 1557) 1557)output outputfrom fromthe theMSFR MSFR module module 1530 1530 and and is is passed passed to a to a downsample downsample

module 1542. module 1542. 44204385_1 44204385_1

67

[000208] Thedownsample

[000208] The downsample module module 1542 1542 downsamples downsamples the 1537 the tensor tensor by1537 by a of a factor factor two of two 07 Jun 2024

horizontally and horizontally vertically totoproduce and vertically produce aatensor tensor1543 1543 having having the the same dimensionalityasas the same dimensionality the tensor 1535. tensor Thetensor 1535. The tensor1543 1543isisprovided providedtotoaaconvolution convolutionlayer layer1548 1548which which outputs outputs a a tensor 1549. tensor 1549. AAsummation summation module module 15541554 adds adds the tensors the tensors 1535 1535 and to and 1549 1549 to produce produce a a tensor 1555 tensor as an 1555 as an output output (P'3) (P’3) of of the the MSFR module MSFR module 1530. 1530.

[000209]

[000209] AAdownsample downsample module module 1540 1540 downsamples downsamples the 1535 the tensor tensor by1535 by a of a factor factor two of two

horizontally and and vertically vertically to toproduce produce aatensor tensor1541 1541 having having the the same dimensionalityasas the the 2024203901

horizontally same dimensionality

tensor 1533. tensor Thetensor 1533. The tensor1541 1541isisprovided providedtotoaaconvolution convolutionlayer layer1546 1546which which outputs outputs a a tensor 1547. tensor 1547. AAsummation summation module module 15521552 adds adds the tensors the tensors 1533 1533 and to and 1547 1547 to produce produce a a tensor 1553 tensor as an 1553 as an output output (P'4) (P’4) of of the the MSFR module MSFR module 1530. 1530.

[000210]

[000210] AAdownsample downsample module module 1538 1538 downsamples downsamples the 1533 the tensor tensor by1533 by a of a factor factor two of two

horizontally and horizontally and vertically verticallyto toproduce produce aatensor tensor1539 1539 having having the the same dimensionalityasas the same dimensionality the tensor 1520. tensor Thetensor 1520. The tensor1539 1539isisprovided providedtotoaaconvolution convolutionlayer layer1544 1544which which outputs outputs a a tensor 1545. tensor 1545. AAsummation summation module module 15501550 adds adds the tensors the tensors 1520 1520 and to and 1545 1545 to produce produce a a tensor 1551 tensor as an 1551 as an output output (P'5) (P’5) of of the the MSFR module MSFR module 1530. 1530. TheThe tenors tenors P'2 P’2 1557, 1557, P'3 P’3 1555, 1555, P'4 P’4 1553 andP'5 1553 and P’51551 1551form form thetensors the tensors1254 1254ofof Fig.12. Fig. 12.

[000211] Fig. 16A

[000211] Fig. 16Aisisaa schematic schematicblock blockdiagram diagram showing showing an example an example implementation implementation 1600 of 1600 of

the head the portion 150 head portion of aa CNN 150 of forobject CNN for objectdetection, detection, corresponding correspondingtotoaaportion portion of of aa “YOLOv3” "YOLOv3"

networkexcluding network excludingthe the"DarkNet-53" “DarkNet-53” backbone backbone portion. portion. Thehead The CNN CNNportion head portion 150 of 150 Fig. of Fig. 16A canbebeused 16A can usedwhen when theCNNCNN the backbone backbone is implemented is implemented as in as in 3A Fig. Fig.for 3Aexample. for example. Dependingononthethetask Depending tasktotobe beperformed performedininthe thedestination destination device device140, 140,different different networks maybebe networks may

substituted for substituted for the theCNN headportion CNN head portion150. 150.Incoming Incoming tensors tensors 149149 areare separated separated into into thetensor the tensor of each of each layer layer (i.e., (i.e., tensors 1610, tensors 1620, 1610, and 1620, and1634). 1634).The The tensor tensor 1610 1610 is is passed passed to toaaCBL CBL

module1612 module 1612totoproduce produce tensor1614. tensor 1614. The The tensor tensor 1614 1614 is is passed passed to to a detectionmodule a detection module 1616 1616 and and

an upscaler an upscaler module 1622.TheThe module 1622. detection detection module module outputs outputs bounding bounding boxesboxes 1618,1618, in theinform the form of a of a detection tensor. detection tensor. The The bounding boxes1618 bounding boxes 1618arearepassed passedtotoa anon-maximum non-maximum suppression suppression (NMS)(NMS)

module 1648. module 1648.

[000212] Toproduce

[000212] To produce bounding bounding boxes boxes addressing addressing co-ordinates co-ordinates in the in the original original video video data data 113, 113,

prior to resizing for the backbone portion of the network 114, scaling by the original video prior to resizing for the backbone portion of the network 114, scaling by the original video

width and width andheight height is is performed at the performed at the upscaler upscaler module 1622.TheThe module 1622. upscaler upscaler module module 16221622 receives receives

the tensor the tensor 1614 and the 1614 and the tensor tensor 1620 andproduces 1620 and producesananupscaled upscaledtensor tensor1624, 1624,which which is is passed passed toto a a CBLmodule CBL module 1626. 1626. TheThe CBL CBL module module 1626 produces 1626 produces a tensora 1628 tensoras1628 as output. output. The1628 The tensor tensor 1628

44204385_1 44204385_1

68

is passed is passed to to aadetection detectionmodule module 1630 andan 1630 and anupscaler upscalermodule module1636. 1636. TheThe detection detection module module 1630 1630 07 Jun 2024

producesaa detection produces detection tensor tensor 1632, whichisis supplied 1632, which supplied to to the the NMS module NMS module 1648. 1648.

[000213] The

[000213] The upscaler upscaler module module 16361636 is another is another instance instance of the of the module module 1622. 1622. The upscaler The upscaler

module1636 module 1636receives receivesthe thetensor tensor1628 1628and and thetensor the tensor1634 1634 and and outputs outputs an an upscaled upscaled tensor tensor 1638. 1638.

Theupscaled The upscaledtensor tensor1638 1638isispassed passedtoto aa CBL CBLmodule module 1640, 1640, which which outputs outputs a tensor a tensor 16421642 to ato a detection module detection 1644.TheThe module 1644. detection detection module module 16441644 produces produces a detection a detection tensor tensor 1646, 1646, which which is is supplied to to the the NMS module 1648. 2024203901

supplied NMS module 1648.

[000214] TheCBL

[000214] The CBL modules modules 1612, 1612, 1626, 1626, and 1640 and 1640 each contain each contain a concatenation a concatenation of CBL of five five CBL modules(e.g., modules (e.g., CBL model CBL model 360360 shown shown in Fig. in Fig. 3D).3D). The The upscaler upscaler modules modules 1622 1622 and andare 1636 1636 are each instances each instances of of an an upscaler upscaler module 1660asasshown module 1660 shownin in Fig.16B. Fig. 16B.TheThe module module 1648 1648 receives receives

the tensors the tensors 1618, 1618, 1632 and1646 1632 and 1646and andoutputs outputsthe thetask taskresult result 151. 151.

[000215] Asshown

[000215] As shownin in Fig.16B, Fig. 16B, theupscaler the upscalermodule module 1660 1660 accepts accepts a tensor a tensor 1662 1662 (for(for example example

the tensor the tensor 1614 of Fig. 1614 of Fig. 16A) as an 16A) as an input. input. The The tensor tensor 1662 is passed 1662 is passed to to aaCBL module1666 CBL module 1666 (having structure (having structure of of the the module 360) to module 360) to produce produce aa tensor tensor 1668. 1668. The Thetensor tensor1668 1668isispassed passedtotoanan upsampler1670 upsampler 1670totoproduce produceanan upsampled upsampled tensor tensor 1672. 1672. A concatenation A concatenation module module 1674 produces 1674 produces

a tensor a tensor 1676 by concatenating 1676 by concatenatingthe the upsampled upsampledtensor tensor1672 1672 with with a second a second input input tensor tensor 1664 1664 (for (for

examplethe example thetensor tensor 1620 1620input inputtoto the the upscaler upscaler 1622 1622inin Fig. Fig. 16A). 16A).

[000216] Thedetection

[000216] The detectionmodules modules 1616, 1616, 1630, 1630, andand 1644 1644 are are instances instances ofdetection of a a detection module1680 module 1680asasshown shown in in Fig.16C. Fig. 16C. TheThe detection detection module module 1680 1680 receives receives a tensor a tensor 1682.1682. The The tensor 1682 tensor is input 1682 is input to to aaCBL module1684 CBL module 1684 having having structureofofthe structure themodule module 360. 360. TheThe CBLCBL

module1684 module 1684generates generates a a tensor1686. tensor 1686.The The tensor1686 tensor 1686 is is passed passed to to a aconvolution convolution module module 1688, 1688,

whichimplements which implements a detectionkernel a detection kerneltotooutput outputa atensor tensor 1690. 1690.InInsome some arrangements, arrangements, thethe

detection kernel applies a 1 × 1 kernel to produce the output on feature maps at each of the detection kernel applies a 1 x 1 kernel to produce the output on feature maps at each of the

three layers three layers of ofthe thetensor. tensor.The The detection detectionkernel kernelisis1 1 × X1 × 1 (B × X(5(5+ C) X (B ), where + C) whereBBisis the the number number of bounding boxes a particular cell can predict, typically three (3), and C is the number of of bounding boxes a particular cell can predict, typically three (3), and C is the number of

classes, which may be eighty (80), resulting in a kernel size of two-hundred and fifty five (255) classes, which may be eighty (80), resulting in a kernel size of two-hundred and fifty five (255)

detection attributes (i.e. tensor 1290). The constant “5” represents four boundary box attributes detection attributes (i.e. tensor 1290). The constant "5" represents four boundary box attributes

(box centre x, y and size scale x, y) and one object confidence level (“objectness”). The result (box centre X, y and size scale X, y) and one object confidence level ("objectness"). The result

of a detection kernel has the same spatial dimensions as the input feature map, but the depth of of a detection kernel has the same spatial dimensions as the input feature map, but the depth of

the output corresponds to the detection attributes. The detection kernel is applied at each layer, the output corresponds to the detection attributes. The detection kernel is applied at each layer,

typically three typically threelayers, layers,resulting in in resulting a large number a large of of number candidate bounding candidate boundingboxes. boxes. A A process of process of

non-maximum non-maximum suppression suppression is applied is applied by the by the NMSNMS module module 1648 1648 to the to the resulting resulting bounding bounding boxes boxes

44204385_1 44204385_1

69

to discard redundant boxes, such as overlapping predictions at similar scale, resulting in a final to discard redundant boxes, such as overlapping predictions at similar scale, resulting in a final 07 Jun 2024

set of bounding boxes as output for object detection. set of bounding boxes as output for object detection.

[000217] Fig. 17

[000217] Fig. 17 is is aa schematic block diagram schematic block diagramshowing showing a head a head portion portion 1700 1700 of of a CNN. a CNN. The The

head portion head portion 1700 1700can canbebeimplemented implementedas as thethe CNN CNN headhead portion portion 150 where 150 where thebackbone the CNN CNN backbone 114 is implemented 114 is implemented asasthe thebackbone backbone400400 forexample. for example. The The headhead portion portion 17001700 formsforms part part of anof an

overall network overall knownasas'Faster network known ‘FasterRCNN' RCNN’and and includes includes a feature a feature network network (i.e., (i.e., backbone backbone

portion 400), 400), a a region region proposal proposal network, and aa detection detection network. Inputtoto the the head head 2024203901

portion network, and network. Input

portion 1700 portion are the 1700 are the tensors tensors 149, 149, which include P2-P6 which include P2-P6layer layertensors tensors 1710, 1710,1712, 1712,1714, 1714,1716, 1716, and 1718. and 1718. The TheP2-P5 P2-P5 layer layer tensors1710, tensors 1710, 1712, 1712, 1714, 1714, andand 1716, 1716, correspond correspond to the to the P2 P2 to to P5 P5 outputs 477, outputs 477, 475, 475, 473, 473, and and 471 471of of Fig. Fig. 4. 4. The The P2-P6 tensors 1710, P2-P6 tensors 1710,1712, 1712,1714, 1714,1716, 1716,and and1718 1718 are input are input to to aaregion regionproposal proposalnetwork network (RPN) headmodule (RPN) head module 1720. 1720. The The P6 tensor P6 tensor 17181718 is is producedbybya amax produced maxpool poolmodule module 1742, 1742, operating operating on tensor on P5 P5 tensor 17161716 to perform to perform a 2x2a max 2×2 max pooling operation. pooling operation. The TheRPN RPN head head module module 1720 1720 performs performs a convolution a convolution on theon the input input tensors, tensors,

generating an intermediate tensor. The intermediate tensor is fed into two subsequent sibling generating an intermediate tensor. The intermediate tensor is fed into two subsequent sibling

layers, (i) one for classifications and (ii) one for bounding box, or ‘region of interest’ (ROI), layers, (i) one for classifications and (ii) one for bounding box, or 'region of interest' (ROI),

regression. A resultant output is classification and bounding boxes 1722. The classification and regression. A resultant output is classification and bounding boxes 1722. The classification and

boundingboxes bounding boxes1722 1722 areare passed passed to to anan NMS NMS module module 1724.1724. Themodule The NMS NMS1724 module 1724 prunes outprunes out redundantbounding redundant boundingboxes boxes by by removing removing overlapping overlapping boxes boxes with with a lower a lower scorescore to produce to produce

prunedbounding pruned boundingboxes boxes 1726. 1726. The The bounding bounding boxesboxes 1726input 1726 are are input to a region to a region of interest of interest (ROI) (ROI)

pooler 1728. pooler 1728. The TheROI ROI pooler pooler 1728 1728 uses uses some some of the of the layer layer tensors tensors of of thethe tensor149 tensor 149 (described (described

further hereafter) further hereafter)and andthe thebounding bounding boxes 1726toto produce boxes 1726 producefixed-size fixed-sizefeature feature maps mapsfrom fromvarious various input size input size maps using max maps using maxpooling poolingoperations. operations.InInthe the max maxpooling poolingoperation operationa asubsampling subsampling takes the takes the maximum value maximum value in in each each group group of of input input values values toto produce produce oneone output output value value in in thethe

output tensor. output tensor.

[000218] Inputto

[000218] Input to the the ROI pooler1728 ROI pooler 1728are arethe theP2-P5 P2-P5feature featuremaps maps 1710, 1710, 1712, 1712, 1714, 1714, andand 1716, 1716,

and region and region of of interest interestproposals proposals 1726. Eachproposal 1726. Each proposal(ROI) (ROI)from from 1726 1726 is is associatedwith associated with a a portion of portion of the the feature featuremaps maps (1710-1716) to produce (1710-1716) to producea afixed-size fixed-size map. map.The The fixed-sizemap fixed-size mapis is ofof

a size a size independent of the independent of the underlying underlying portion portion of of the the feature featuremap map 1710-1716. Oneofofthe 1710-1716. One thefeature feature maps 1710-1716 is selected such that the resulting cropped map has sufficient detail, for maps 1710-1716 is selected such that the resulting cropped map has sufficient detail, for

example, according to the following rule: floor(4 + log2(sqrt(box_area) / 224)), where 224 is example, according to the following rule: floor(4 + log2(sqrt(box_area) / 224)), where 224 is

the canonical the canonical box size. The box size. TheROI ROIpooler pooler1728 1728 operates operates to to cropincoming crop incoming feature feature maps maps according according

to the to the proposals proposals 1726 producingaatensor 1726 producing tensor 1730. 1730.

44204385_1 44204385_1

70

[000219] Thetensor

[000219] The tensor1730 1730isisfed fedinto into aa fully fully connected (FC) neural connected (FC) neural network networkhead head1732. 1732.TheThe FC FC 07 Jun 2024

head 1732 head 1732performs performstwo two fullyconnected fully connected layerstotoproduce layers produce classscore class scoreand andbounding bounding boxbox

predictor delta predictor delta tensor tensor1734. 1734. The class score The class score is isgenerally generallyan an80-element 80-element tensor, tensor, each each element element

corresponding to aa prediction corresponding to prediction score score for for the thecorresponding corresponding object object category. Thebounding category. The boundingboxbox prediction deltas prediction deltas tensor tensorisisanan80×4 80x4 == 320 320 element tensor, containing element tensor, containing bounding boxesfor bounding boxes forthe the correspondingobject corresponding objectcategories. categories. Final Final processing processingis is performed performedbybyananoutput outputlayers layers module1736, module 1736,receiving receivingthe thetensor tensor1734 1734and andperforming performing a filteringoperation a filtering operationtotoproduce producea a 2024203901

filtered tensor filtered tensor1738. 1738. Low-scoring (lowclassification) Low-scoring (low classification) objects objects are are removed fromfurther removed from further consideration. AAnon-maximum consideration. non-maximum suppression suppression module module 1740 receives 1740 receives the filtered the filtered tensor tensor 1738 1738 and and removesoverlapping removes overlappingbounding bounding boxes boxes by removing by removing the overlapped the overlapped box awith box with a lower lower

classification score, resulting in an inference output tensor 1742, corresponding to the tensor classification score, resulting in an inference output tensor 1742, corresponding to the tensor

151. 151.

[000220] Referringtoto Appendix

[000220] Referring AppendixE,E, theweights the weights information information maymay include include a ‘no_weights_flag’, a no_weights_flag',

indicating that indicating that the thedecoder decoder network topologyto network topology to be be used used does does not not require require any any weights weightsin in order order to to operate. operate.

[000221]InIn an

[000221] an arrangement arrangementofofthe thesource sourcedevice device110 110and and thedestination the destinationdevice device140, 140,network network weights are signalled in the bitstream 121 as a delta relative to another set of network weights weights are signalled in the bitstream 121 as a delta relative to another set of network weights

(the ‘base (the 'base weights’) weights') that thatare areknown known to to the the system system 100 or may 100 or beobtained may be obtainedvia viaexternal external means, means, such as such as from the tensor from the tensor codec repository 180. codec repository 180. The Thebase baseweights weightsmay may be be indicated indicated viavia reference reference

using an identifier number stored in the bitstream 121. Signalling of network weights as a delta using an identifier number stored in the bitstream 121. Signalling of network weights as a delta

relative totoother relative othernetwork network weights weights may beaccomplished may be accomplished using using a syntax a syntax such such as as ‘MPEG 'MPEG

IncrementalNeural Incremental NeuralNetwork Network Representation’, Representation', under under development development as part as part of ISO/IEC of ISO/IEC 15938-17. 15938-17.

[000222] Methods

[000222] Methods presented presented herein herein enable enable efficientrepresentation efficient representationofoftensors tensorsin in aa format being format being

amenabletotocompression amenable compression using using contemporary contemporary block-based block-based compression compression standards standards such such as as VVC,HEVC, VVC, HEVC, AVC AVC or other or other standards. standards. Block-based Block-based compression, compression, althoughalthough not intuitively not intuitively

applicable to data such as compressed feature maps or coefficients for projecting basis vectors applicable to data such as compressed feature maps or coefficients for projecting basis vectors

to reconstruct to reconstruct feature featuremaps, maps, uncover additional unexpected uncover additional redundancy unexpected redundancy inin blockssuch blocks such asas byby use use

of various of various transforms including trained transforms including trained secondary transforms. Although secondary transforms. Although methods methods presented presented

herein are herein are described described with with reference reference to to the the‘Faster 'FasterRCNN’ and'YOLOv3' RCNN' and ‘YOLOv3’ network network architectures architectures

and specific and specific divisions divisions of ofthese thesenetworks networks into into ‘backbone’ 'backbone' and ‘head’ portions, and 'head' portions, the the methods are methods are

applicable to applicable to any any neural neural network operating on network operating onmulti-dimensional multi-dimensionaltensor tensordata dataand andare areapplicable applicable to different divisions of such networks into ‘backbone’ and ‘head’ portions. to different divisions of such networks into 'backbone' and 'head' portions.

44204385_1 44204385_1

71

[000223] It should

[000223] It be noted should be noted that that although the source although the source device 110 and device 110 andthe the destination destination device 140 device 140 07 Jun 2024

are described are described with with reference reference to to the the video video source source 112 112 comprising videoand comprising video andimage image data,other data, other types of content such as audio data or textual data may also be supplied as input to neural types of content such as audio data or textual data may also be supplied as input to neural

networksapplicable networks applicabletoto such such types types of of input input and the resulting and the resulting intermediate intermediate feature featuremaps maps may be may be

compressedand compressed anddecompressed decompressed by the by the modules modules 116146 116 and andwith 146 suitable with suitable encoder encoder and decoder and decoder

networktopologies. network topologies.

INDUSTRIAL APPLICABILITY 2024203901

INDUSTRIAL APPLICABILITY

[000224] Thearrangements

[000224] The arrangements described described areare applicable applicable to to thecomputer the computerandand data data processing processing

industries and particularly for the digital signal processing for the encoding and decoding of industries and particularly for the digital signal processing for the encoding and decoding of

signals such signals such as as video video and and image signals, achieving image signals, high compression achieving high compressionefficiency. efficiency.

[000225] Some

[000225] Some implementations implementations described described use use of inserted of an an inserted NALNAL unit unit that that identifies identifies thethe

compressionapproach compression approach used used forfor coding coding featuremaps feature maps andand consequently consequently alsoalso the the NAL NAL unit unit header header

format used for NAL units in the bitstream, including those related to tensor quantisation, format used for NAL units in the bitstream, including those related to tensor quantisation,

reduction and restoration operations, i.e., operations outside the scope of packed feature frame reduction and restoration operations, i.e., operations outside the scope of packed feature frame

coding. Accordingly, coding. Accordingly,such suchimplementations implementations allow allow several several differentcompression different compression standards standards to to be be

indicated in indicated in an an FCM bitstreamfor FCM bitstream for'inner ‘inner coding' coding’of of packed packedfeature featureframes, frames,while whilesupporting supporting signalling of signalling of higher-level higher-levelmetadata metadata needed to decode needed to the FCM decode the FCM bitstream.Allowing bitstream. Allowing compressionstandard compression standardused usedtotobebeindicated indicatedinin an an FCM FCM bitstream bitstream provides provides improved improved flexibility flexibility in in

implementation,including implementation, includingability ability to to be be back-compatible withlonger-existing back-compatible with longer-existingstandards standardssuch suchasas AVC,compatibility AVC, compatibilitywith withmore more recent recent standards standards such such as as HEVC HEVC or and or VVC VVC and further further flexibility flexibility to to allow use allow use of of custom or other custom or other compression compressionstandards. standards.unspecified' unspecified’values, values,which whichwill willnot notbebe used in future used in future

[000226] The

[000226] The foregoing foregoing describes describes only only some some embodiments embodiments of theof the present present invention, invention, and and

modifications and/or modifications and/or changes changescan canbebemade made theretowithout thereto without departing departing from from thethe scope scope andand spirit spirit

of the invention, the embodiments being illustrative and not restrictive. of the invention, the embodiments being illustrative and not restrictive.

[000227] In the

[000227] In the context context of of this this specification, specification,thetheword word“comprising” "comprising" means “including means "including

principally but not necessarily solely” or “having” or “including”, and not “consisting only of”. principally but not necessarily solely" or "having" or "including", and not "consisting only of".

Variations of Variations of the the word “comprising”,such word "comprising", suchasas"comprise" “comprise”andand “comprises” "comprises" have have

correspondinglyvaried correspondingly variedmeanings. meanings.

44204385_1 44204385_1

72

APPENDIX A APPENDIX A 07 Jun 2024

AVC/H.264NAL AVC/H.264 NAL unitformat unit format

Bytes Bytes 00 to to nalUnitHeaderBytes-1 nalUnitHeaderBytes-1 ininthe theNAL NAL unit unit form form thethe NALNAL unit unit header. header.

nal_unit( NumBytesInNALunit nal_unit(NumBytesInNALunit) ) {{ C C Descriptor Descriptor

forbidden_zero_bit forbidden_zero_bit All All f(1) f(1)

nal_ref_idc nal_ref_idc All All u(2) u(2) 2024203901

nal_unit_type nal_unit_type All All u(5) u(5)

NumBytesInRBSP =0 NumBytesInRBSP=0 nalUnitHeaderBytes1= nalUnitHeaderBytes 1 if( nal_unit_type if( nal_unit_type = == = 1414 |nal_unit_type | nal_unit_type == = =2011 20 | | nal_unit_type nal unit type === =21)21{) { if( nal_unit_type if( (nal_unit_type !! = 21 ) = 21)

svc_extension_flag svc_extension_flag All All u(1) u(1)

else else

avc_3d_extension_flag avc_3d_extension_flag All All u(1) u(1)

if( svc_extension_flag ) { if( svc_extension_flag ) {

nal_unit_header_svc_extension( ) /* specified nal_unit_header_svc_extension() /* specified in Annex in Annex G G All All

nalUnitHeaderBytes+=+=3 3 nalUnitHeaderBytes

}} else elseif( avc_3d_extension_flag {) { (ave_3d_extension_flag)

nal_unit_header_3davc_extension( ) /*specified al_unit_header_3dave_extension()/* specifiedin in Annex AnnexJ J nalUnitHeaderBytes nalUnitHeaderBytes += 2 } else { } else {

nal_unit_header_mvc_extension( ) /* specified nal_unit_header_mvc_extension()/*s specified in in Annex H Annex H All All

nalUnitHeaderBytes +=3 3 nalUnitHeaderBytes +=

}} } }

for( for( II= =nalUnitHeaderBytes; nalUnitHeaderBytes; II< <NumBytesInNALunit; i++ ) { NumBytesInNALunit;i++)

if( I + 2 <NumBytesInNALunit&& if(I+2 < NumBytesInNALunit && next_bits( 24 ) = = 0x000003 next_bits(24)==0x000003) { ){ rbsp_byte[ NumBytesInRBSP++ rbsp_byte| NumBytesInRBSP++ ]] All All b(8) b(8)

rbsp_byte[ NumBytesInRBSP++ rbsp_byte| NumBytesInRBSP++ ]] All All b(8) b(8)

I(+=2 += 2 emulation_prevention_three_byte /* equal emulation_prevention_three_byte /* equal to to 0x03 0x03 */ */ All All f(8) f(8)

}} else else

rbsp_byte[ NumBytesInRBSP++ rbsp_byte[ NumBytesInRBSP++ I] All All b(8) b(8)

} } }}

A modified A modifiedversion versionofofTable Table7-1 7-1from fromthe theAVC/H.264 AVC/H.264 spec, spec, withwith codes codes reserved reserved from from FCM FCM parametersets parameter sets and reservation NAL and reservation unittype NAL unit typenecessary necessarytotodistinguish distinguishfrom fromananICI ICINAL NAL unit unit is is

shown,asas follows: shown, follows:

44204385_1 44204385_1

73

nal_unit_type nal_unit_type Contentof Content of NAL NALunit unit and and RBSP RBSPsyntax syntax C C AnnexError! Annex Error! Annex GG Annex AnnexII Annex structure Reference and and 07 Jun 2024

structure Reference and and source not source not Annex HH Annex AnnexJJ Annex found. found. NAL unit NAL unit NALunit NAL unit type class type class type class type class

NALunit NAL unit type class type class

0 0 Reserved (prohibited) Reserved (prohibited) * * non-VCL non-VCL non-VCL non-VCL non-VCL non-VCL 1 1 Codedslice Coded sliceofof aa non-IDR non-IDRpicture picture 2, 2, VCL VCL VCL VCL VCL VCL slice_layer_without_partitioning_rbsp( ) slice_layer_without_partitioning_rbsp() 3, 3,

4 2024203901

4 2 2 Codedslice Coded slicedata datapartition partitionAA 2 2 VCL VCL not not not not slice_data_partition_a_layer_rbsp( slice_data_partition_a_layer_rbsp() ) applicable applicable applicable applicable

3 3 Codedslice Coded slicedata datapartition partitionBB 3 3 VCL VCL not not not not slice_data_partition_b_layer_rbsp( slice_data_partition_b_layer_rbsp() ) applicable applicable applicable applicable

4 4 Codedslice Coded slicedata datapartition partitionCC 4 4 VCL VCL not not not not slice_data_partition_c_layer_rbsp( slice_data_partition_c_layer_rbsp() ) applicable applicable applicable applicable

5 5 Codedslice Coded sliceofof an anIDRIDRpicture picture 2, 2, VCL VCL VCL VCL VCL VCL slice_layer_without_partitioning_rbsp( ) slice_layer_without_partitioning_rbsp( 3 3 6 6 Supplementalenhancement Supplemental enhancement 5 5 non-VCL non-VCL non-VCL non-VCL non-VCL non-VCL information information (SEI)(SEI) sei_rbsp( ) sei_rbsp()

7 7 Sequenceparameter Sequence parameterset set 0 0 non-VCL non-VCL non-VCL non-VCL non-VCL non-VCL seq_parameter_set_rbsp( seq_parameter_set_rbsp() ) 8 8 Picture parameter Picture parameter setset 11 non-VCL non-VCL non-VCL non-VCL non-VCL non-VCL pic_parameter_set_rbsp( ) pic_parameter_set_rbsp() 9 9 Accessunit Access unitdelimiter delimiter 6 6 non-VCL non-VCL non-VCL non-VCL non-VCL non-VCL access_unit_delimiter_rbsp( ) access_unit_delimiter_rbsp()

10 10 End End ofof sequence sequence 7 7 non-VCL non-VCL non-VCL non-VCL non-VCL non-VCL end_of_seq_rbsp( end_of_seq_rbsp() ) 11 11 End End ofof stream stream 8 8 non-VCL non-VCL non-VCL non-VCL non-VCL non-VCL end_of_stream_rbsp( ) end_of_stream_rbsp() 12 12 Filler Filler data data 9 9 non-VCL non-VCL non-VCL non-VCL non-VCL non-VCL filler_data_rbsp( ) filler_data_rbsp()

13 13 Sequenceparameter Sequence parameterset set extension extension 10 10 non-VCL non-VCL non-VCL non-VCL non-VCL non-VCL seq_parameter_set_extension_rbsp( ) seq_parameter_set_extension_rbsp() 14 14 Prefix NAL Prefix unit NAL unit 2 2 non-VCL non-VCL suffix suffix suffix suffix

prefix_nal_unit_rbsp( ) prefix_nal_unit_rbsp() dependent dependent dependent dependent 15 15 Subset sequence Subset parameterset sequence parameter set 0 0 non-VCL non-VCL non-VCL non-VCL non-VCL non-VCL subset_seq_parameter_set_rbsp( ) subset_seq_parameter_set_rbsp() 16 16 Depth parameterset Depth parameter set 11 11 non-VCL non-VCL non-VCL non-VCL non-VCL non-VCL depth_parameter_set_rbsp( ) depth_parameter_set_rbsp() 17..18 17..18 Reserved Reserved non-VCL non-VCL non-VCL non-VCL non-VCL non-VCL 19 19 Codedslice Coded sliceofof an anauxiliary auxiliary coded coded 2, 2, non-VCL non-VCL non-VCL non-VCL non-VCL non-VCL picture without picture withoutpartitioning partitioning 3, 3,

slice_layer_without_partitioning_rbsp( ) slice_layer_without_partitioning_rbsp( 4 4 20 20 Codedslice Coded sliceextension extension 2, 2, non-VCL non-VCL VCL VCL VCL VCL slice_layer_extension_rbsp( slice_layer_extension_rbsp() ) 3, 3,

4 4

44204385_1 44204385_1

74

21 21 Codedslice Coded sliceextension extensionfor fora adepth depth view view 2, 2, non-VCL non-VCL non-VCL non-VCL VCL VCL 07 Jun 2024

componentororaa3D-AVC component 3D-AVCtexture textureview view 3, 3,

component component 4 4 slice_layer_extension_rbsp( slice_layer_extension_rbsp() ) 22..23 22..23 Reserved Reserved non-VCL non-VCL non-VCL non-VCL VCL VCL 24 24 FCM FCM VMPS VMPS Feature codingfor Feature coding formachines machines – Vision - Vision

model parameter model parameterset set 25 25 FCM SPS FCM SPS Featurecoding Feature codingforformachines machines - –

Sequenceparameter Sequence parameterset set 2024203901

26 26 FCM PPS FCM PPS Feature cdoingfor Feature cdoing formachines machines – Picture - Picture

parameter parameter setset 27..30 27..30 Unspecified Unspecified mon-VCL mon-VCL mon-VCL mon-VCL mon-VCL mon-VCL 31 31 Reserved Reserved / /prohibited prohibited - avoid - avoid collision collision

with inner codec identifier (ICI). with inner codec identifier (ICI).

* nal_unit_type = 0 is marked as ‘reserved (prohibited)’ rather than ‘unspecified’ to indicate * nal_unit_type = 0 is marked as 'reserved (prohibited)' rather than 'unspecified' to indicate

this value is not available for allocation by groups, as such NAL units using this type would this value is not available for allocation by groups, as such NAL units using this type would

otherwise need to ensure the first byte of the RBSP could not contain a zero byte (0x00) due to otherwise need to ensure the first byte of the RBSP could not contain a zero byte (0x00) due to

the inability the inabilitytotoinsert an an insert emulation_prevention_three_byte betweenthe remulation_prevention_three_bytebetween the first first and and second second bytes bytes of of

the RBSP. the RBSP.

44204385_1 44204385_1

75

APPENDIX B APPENDIX B 07 Jun 2024

HEVC/H.265 HEVC/H.265 NAL NAL unit unit headerand header andpayload payload

nal_unit( NumBytesInNalUnit nal_unit(NumBytesInNalUnit) { ){ Descriptor Descriptor

nal_unit_header( nal_unit_header())

NumBytesInRbsp =0 NumBytesInRbsp=0 for( I = 2; II << NumBytesInNalUnit; for(I=2;) i++ ) NumBytesInNalUnit;i++)

if( I + 2 < if(I+2 NumBytesInNalUnit&&&& NumBytesInNalUnit next_bits(24) next_bits( 24 )=== 0x000003) = 0x000003{ ) { 2024203901

rbsp_byte[ NumBytesInRbsp++ ] rbsp_byte[NumBytesInRbsp++] b(8) b(8)

rbsp_byte[ NumBytesInRbsp++ rbsp_byte[NumBytesInRbsp++] ] b(8) b(8)

I ++== 22 I

emulation_prevention_three_byte /* equal emulation_prevention_three_byte /* equal to to 0x03 */ 0x03* f(8) f(8)

}} else else

rbsp_byte[ NumBytesInRbsp++ ] rbsp_byte[NumBytesInRbsp++] b(8) b(8)

} }

nal_unit_header( nal_unit_header()) { { Descriptor Descriptor

forbidden_zero_bit forbidden_zero_bit f(1) f(1)

nal_unit_type nal_unit_type u(6) u(6)

nuh_layer_id nuh_layer_id u(6) u(6)

nuh_temporal_id_plus1 nuh_temporal_id_plusl u(3) u(3)

} }

NALunit NAL unittype typecodes codes

44204385_1 44204385_1

76

nal_unit_type nal_unit_type Nameof Name of Content of NAL Content of NALunit unit and and RBSP RBSPsyntax syntaxstructure structure NAL unit NAL unit 07 Jun 2024

nal_unit_type nal_unit_type type class type class

0 0 TRAIL_N TRAIL_N Codedslice Coded slicesegment segmentof of a non-TSA, a non-TSA, non-STSA non-STSA trailing trailing VCL VCL 1 1 TRAIL_R TRAIL_R picture picture slice_segment_layer_rbsp( ) slice_segment_layer_rbsp()

2 2 TSA_N TSA_N Codedslice Coded slicesegment segmentof of a TSA a TSA picture picture VCL VCL 3 3 TSA_R TSA_R slice_segment_layer_rbsp( slice_segment_layer_rbsp() ) 4 4 STSA_N STSA_N Codedslice Coded slicesegment segmentof of an an STSA STSA picture picture VCL VCL 5 5 STSA_R STSA_R slice_segment_layer_rbsp( slice_segment_layer_rbsp() ) 2024203901

6 6 RADL_N RADL_N Codedslice Coded slicesegment segmentof of a RADL a RADL picture picture VCL VCL 7 7 RADL_R RADL_R slice_segment_layer_rbsp( slice_segment_layer_rbsp() ) 8 8 RASL_N RASL_N Codedslice Coded slicesegment segmentof of a RASL a RASL picture picture VCL VCL 9 9 RASL_R RASL_R slice_segment_layer_rbsp( slice_segment_layer_rbsp() ) 10 10 RSV_VCL_N10 RSV_VCL_N10 Reserved non-IRAP Reserved non-IRAP SLNR SLNR VCL VCL NAL types NAL unit unit types VCL VCL 12 12 RSV_VCL_N12 RSV_VCL_N12 14 14 RSV_VCL_N14 RSV_VCL_N14

11 11 RSV_VCL_R11 RSV_VCL_R11 Reserved non-IRAP Reserved non-IRAP sub-layer sub-layer reference reference VCLunit VCLNAL NAL unit VCL VCL 13 13 RSV_VCL_R13 RSV_VCL_R13 types types 15 15 RSV_VCL_R15 RSV_VCL_R15

16 16 BLA_W_LP BLA_W_LP Codedslice Coded slicesegment segmentof of a BLA a BLA picture picture VCL VCL 17 17 BLA_W_RADL BLA_W_RADL slice_segment_layer_rbsp( ) slice_segment_layer_rbsp() 18 18 BLA_N_LP BLA_N_LP

19 19 IDR_W_RADL IDR_W_RADL Codedslice Coded slicesegment segmentof of an an IDRIDR picture picture VCL VCL 20 20 IDR_N_LP IDR_N_LP slice_segment_layer_rbsp( ) slice_segment_layer_rbsp()

21 21 CRA_NUT CRA_NUT Codedslice Coded slicesegment segmentof of a CRA a CRA picture picture VCL VCL slice_segment_layer_rbsp( ) slice_segment_layer_rbsp()

22 22 RSV_IRAP_VCL22 RSV_IRAP_VCL22 Reserved IRAPVCLVCL Reserved IRAP NALNAL unitunit types types VCL VCL 23 23 RSV_IRAP_VCL23 RSV_IRAP_VCL23

24..31 24..31 RSV_VCL24.. RSV_VCL24.. Reserved non-IRAP Reserved non-IRAP VCLVCL NAL NAL unit unit typestypes VCL VCL RSV_VCL31 RSV_VCL31 32 32 VPS_NUT VPS_NUT Video parameter Video parameter set set non-VCL non-VCL video_parameter_set_rbsp( video_parameter_set_rbsp() ) 33 33 SPS_NUT SPS_NUT Sequenceparameter Sequence parameterset set non-VCL non-VCL seq_parameter_set_rbsp( seq_parameter_set_rbsp() )

34 34 PPS_NUT PPS_NUT Picture parameter Picture parameter set set non-VCL non-VCL pic_parameter_set_rbsp( ) pic_parameter_set_rbsp()

35 35 AUD_NUT AUD_NUT Accessunit Access unitdelimiter delimiter non-VCL non-VCL access_unit_delimiter_rbsp( access_unit_delimiter_rbsp() ) 36 36 EOS_NUT EOS_NUT End End of of sequence sequence non-VCL non-VCL end_of_seq_rbsp( end_of_seq_rbsp() )

37 37 EOB_NUT EOB_NUT End of bitstream End of bitstream non-VCL non-VCL end_of_bitstream_rbsp( end_of_bitstream_rbsp() )

38 38 FD_NUT FD_NUT Filler Filler data data non-VCL non-VCL filler_data_rbsp( ) filler_data_rbsp()

44204385_1 44204385_1

77

39 39 PREFIX_SEI_NUT PREFIX_SEI_NUT Supplementalenhancement Supplemental enhancement information information non-VCL non-VCL 07 Jun 2024

40 40 SUFFIX_SEI_NUT SUFFIX_SEI_NUT sei_rbsp( )) sei_rbsp(

41..47 41..47 RSV_NVCL41.. RSV_NVCL41.. Reserved Reserved non-VCL non-VCL RSV_NVCL47 RSV_NVCL47 48 48 FCM_VMPS FCM_VMPS Feature codingfor Feature coding formachines machines – Vision - Vision model model

parameter set parameter set

49 49 FCM_SPS FCM_SPS Feature Feature coding coding for formachines machines–-Sequence Sequence parameter parameter set set

50 50 FCM_PPS FCM_PPS Feature cdoingfor Feature cdoing formachines machines – Picture - Picture parameter parameter set set 2024203901

51..63 51..63 UNSPEC- UNSPEC- Unspecified Unspecified non-VCL non-VCL 51..UNSPEC63 51..UNSPEC63

44204385_1 44204385_1

78

APPENDIX C APPENDIX C 07 Jun 2024

VVC/H.266 VVC/H.266 NALNAL unit unit format format is asisfollows: as follows:

nal_unit( NumBytesInNalUnit ) { nal_unit(NumBytesInNalUnit)& Descriptor Descriptor

nal_unit_header( ) nal_unit_header()

NumBytesInRbsp NumBytesInRbsp=0= 0 2024203901

for( I = 2; I < NumBytesInNalUnit; i++ ) for(I=2;I<NumBytesInNalUnit;i++) if( I + 2 < NumBytesInNalUnit&&&& if(I+2<NumBytesInNalUnit next_bits( 240x000003) next_bits(24) ) = = 0x000003 { ){ rbsp_byte[ NumBytesInRbsp++ rbsp_byte[NumBytesInRbsp++] ] b(8) b(8)

rbsp_byte[ NumBytesInRbsp++ ] rbsp_byte[NumBytesInRbsp++] b(8) b(8)

I ++== 22 I

emulation_prevention_three_byte /* equal emulation_prevention_three_byte /* equal to to 0x03 0x03 */ */ f(8) f(8)

}} else else rbsp_byte[ NumBytesInRbsp++ ] rbsp_byte[NumBytesInRbsp++] b(8) b(8)

} }

VVC/H.266 VVC/H.266 NALNAL unit unit header header format format is asisfollows as follows (always (always 16 bits 16 bits or two or two bytes): bytes):

nal_unit_header( nal_unit_header()) { Descriptor Descriptor

forbidden_zero_bit forbidden_zero_bit f(1) f(1)

nal_unit_type nal_unit_type u(6) u(6)

nuh_layer_id nuh_layer_id u(6) u(6)

nuh_temporal_id_plus1 nuh_temporal_id_plusl u(3) u(3)

} }

VVC/H.266NAL VVC/H.266 NAL unittypes: unit types:

nal_unit_typ nal_unit_typ Nameof Name of Content Content ofofNAL NAL unit unit and and RBSP RBSP syntax syntax structure structure NAL unit NAL unit e e nal_unit_type nal_unit_type typeclass type class 0 0 TRAIL_NUT TRAIL_NUT Coded sliceof Coded slice of aa trailing trailing picture picture or or subpicture* subpicture* VCL VCL slice_layer_rbsp( slice_layer_rbsp() ) 11 STSA_NUT STSA_NUT Coded sliceof Coded slice of an anSTSA STSApicture pictureororsubpicture* subpicture* VCL VCL slice_layer_rbsp( slice_layer_rbsp() ) 2 2 RADL_NUT RADL_NUT Coded sliceof Coded slice of aa RADL RADLpicture pictureororsubpicture* subpicture* VCL VCL slice_layer_rbsp( ) slice_layer_rbsp()

3 3 RASL_NUT RASL_NUT Codedslice Coded sliceof of aa RASL RASLpicture pictureororsubpicture* subpicture* VCL VCL slice_layer_rbsp( slice_layer_rbsp() )

44204385_1 44204385_1

79

nal_unit_typ nal_unit_typ Nameof Name of ContentofofNAL Content NAL unit unit andand RBSP RBSP syntax syntax structure structure NAL unit NAL unit 07 Jun 2024

e e nal_unit_type nal_unit_type typeclass type class 4..6 4..6 RSV_VCL_4.. RSV_VCL_4.. Reservednon-IRAP Reserved non-IRAPVCLVCL NAL NAL unit unit typestypes VCL VCL RSV_VCL_6 RSV_VCL_6

7 7 IDR_W_RADL IDR_W_RADL Codedslice Coded sliceofof an anIDR IDRpicture pictureororsubpicture* subpicture* VCL VCL 8 8 IDR_N_LP IDR_N_LP slice_layer_rbsp( slice_layer_rbsp() ) 9 9 CRA_NUT CRA_NUT Codedslice Coded sliceofofaa CRA CRApicture pictureororsubpicture* subpicture* VCL VCL slice_layer_rbsp( slice_layer_rbsp() ) 10 10 GDR_NUT Codedslice Coded sliceofof aa GDR GDRpicture pictureororsubpicture* subpicture* VCL 2024203901

GDR_NUT VCL slice_layer_rbsp( slice_layer_rbsp() ) 11 11 RSV_IRAP_11 RSV_IRAP_11 ReservedIRAP Reserved IRAP VCL VCL NALNAL unitunit typetype VCL VCL

12 12 OPI_NUT OPI_NUT Operatingpoint Operating pointinformation information non-VCL non-VCL operating_point_information_rbsp( perating_point_information_rbsp() ) 13 13 DCI_NUT DCI_NUT Decodingcapability Decoding capabilityinformation information non-VCL non-VCL decoding_capability_information_rbsp( decoding_capability_information_rbsp() ) 14 14 VPS_NUT VPS_NUT Video parameter Video parameterset set non-VCL non-VCL video_parameter_set_rbsp( video_parameter_set_rbsp() )

15 15 SPS_NUT SPS_NUT Sequenceparameter Sequence parameterset set non-VCL non-VCL seq_parameter_set_rbsp( seq_parameter_set_rbsp() ) 16 16 PPS_NUT PPS_NUT Picture parameter Picture parameter set set non-VCL non-VCL pic_parameter_set_rbsp( ) pic_parameter_set_rbsp()

17 17 PREFIX_APS_NUT PREFIX_APS_NUT Adaptation parameter Adaptation parameterset set non-VCL non-VCL 18 18 SUFFIX_APS_NUT SUFFIX_APS_NUT adaptation_parameter_set_rbsp( adaptation_parameter_set_rbsp() )

19 19 PH_NUT PH_NUT Picture header Picture header non-VCL non-VCL picture_header_rbsp( picture_header_rbsp())

20 20 AUD_NUT AUD_NUT AUdelimiter AU delimiter non-VCL non-VCL access_unit_delimiter_rbsp( access_unit_delimiter_rbsp() ) 21 21 EOS_NUT EOS_NUT End of sequence End of sequence non-VCL non-VCL end_of_seq_rbsp( end_of_seq_rbsp() )

22 22 EOB_NUT EOB_NUT Endof End of bitstream bitstream non-VCL non-VCL end_of_bitstream_rbsp( ) end_of_bitstream_rbsp()

23 23 PREFIX_SEI_NUT PREFIX_SEI_NUT Supplementalenhancement Supplemental enhancement information information non-VCL non-VCL 24 24 SUFFIX_SEI_NUT SUFFIX_SEI_NUT sei_rbsp( ) sei_rbsp()

25 25 FD_NUT FD_NUT Filler data Filler data non-VCL non-VCL filler_data_rbsp( ) filler_data_rbsp()

26 26 RSV_NVCL_26 RSV_NVCL_26 Reservednon-VCL Reserved non-VCLNALNAL unitunit types types non-VCL non-VCL 27 27 RSV_NVCL_27 RSV_NVCL_27

28 28 FCM_VMPS FCM_VMPS Feature coding Feature coding for machines – Visionmodel formachines-Vision model parameterset parameter set 29 29 FCM_SPS FCM_SPS Feature coding Feature coding for formachines machines –-Sequence Sequence parameter parameter set set

30 30 FCM_PPS FCM_PPS Feature cdoingfor Feature cdoing formachines machines- -–Picture Pictureparameter parameter set set

44204385_1 44204385_1

80

e e nal_unit_type nal_unit_type typeclass type class 31 31 UNSPEC_31 UNSPEC_31 Unspecifiednon-VCL Unspecified non-VCLNALNAL unitunit types types non-VCL non-VCL 2024203901

44204385_1 44204385_1

81

APPENDIX D APPENDIX D 07 Jun 2024

Inner codec Inner identifier NAL codec identifier unit NAL unit

NumBytesInNALunit NumBytesInNALunit shallshall be equal be equal to to 1. 1. 2024203901

Nal_unit( NumBytesInNALunit ) Nal_unit(NumBytesInNALunit) {{ C C Descriptor Descriptor

forbidden_zero_bit forbidden_zero_bit All All f(1) f(1)

inner_codec_identifier inner_codec_identifier All All u(2) u(2)

constant_value_31 constant_value_31 All All u(5) u(5)

Inner_codec_identifier specifies Inner_codec_identifier specifies the inner the inner codec codec as follows: as follows:

Inner_codec Inner_codec Inner codec Inner codec _identifier _identifier

0 0 AVC/H.264 AVC/H.264

11 HEVC/H.265 HEVC/H.265

2 2 VVC/H.266 VVC/H.266

3 3 Custom Custom

In the In the case case of of value value 33(“custom”), ("custom"), an an alternative alternativeNAL unit encapsulation NAL unit with FCM-specific encapsulation with FCM-specific NAL NAL unitheaders unit headersisissupported, supported,supporting supportingcustom custom inner inner codecs. codecs.

Thefollowing The followingcustom custominner innercodecs codecs aresupported: are supported:

• End-to-endlearned End-to-end learnedinner innercodec codec

Entropycoded o Entropy codedpayloads payloadsareareencoded encodedas as RBSPs. RBSPs.

• Bypassedinner Bypassed innercodec codec

Quantizedtensors o Quantized tensorsvalues valuesare are encoded encodedasasRBSPs. RBSPs.

▪ Optionally with Optionally with basic basic encoding encodinglike like DeepCABAC, DeepCABAC, and optionally and optionally usingusing

delta coding delta coding mechanism mechanism toto compress compress runs runs of of zeros zeros as as zero-deltavalues. zero-delta values.

44204385_1 44204385_1

82

APPENDIX APPENDIX E E 07 Jun 2024

An example An example FCM FCMVMPS, VMPS, FCMFCM SPS,SPS, and and FCM FCM PPS message PPS message formatformat and associated and associated semantics semantics

for for representing representing metadata associated with metadata associated with tensor tensor decompressor structure, tensor decompressor structure, tensor packing, packing, and and complexity indication in a bitstream are as follows: complexity indication in a bitstream are as follows:

FCM visionmodel FCM vision modelparameter parameterset set 2024203901

fcm_vmps( payloadSize ) { fcm_vmps(payloadSize) Descriptor Descriptor

output_picture_width output_picture_width u(v) u(v)

output_picture_height output_picture_height u(v) u(v)

}}

FCM Sequence FCM Sequence parameter parameter setset

fcm_sps( fcm_sps( payloadSize ){ payloadSize) Descriptor Descriptor

fcm_sps_inner_decoding_bypass_flag fcm_sps_inner_decoding_bypass_flag u(1) u(1)

fcm_sps_quantisation_bypass_flag fcm_sps_quantisation_bypass_flag u(1) u(1)

fcm_sps_feature_restoration_bypass_flag fcm_sps_feature_restoration_bypass_flag u(1) u(1)

fcm_sps_temporal_upsampling_enabled_ flag fcm_sps_temporal_upsampling_enabled_flag u(1) u(1)

set_level_flag set_level_flag u(1) u(1)

if( set_level_flag ) if(set_level_flag)

fcm_level fcm_level u(8) u(8)

update_decoder_flag update_decoder_flag u(1) u(1)

if((update_decoder_flag==1){ update_decoder_flag = = 1 ) { no_weights_flag no_weights_flag u(1) u(1)

explicit_signal_decoder_flag explicit_signal_decoder_flag u(1) u(1)

if( explicit_signal_decoder_flag) explicit_signal_decoder_flag ){ { explicit_decoder_compression_idc explicit_decoder_compression_ido u(2) u(2)

explicit_decoder_format_idc explicit_decoder_format_ide u(4) u(4)

explicit_decoder_format_version_idc explicit_decoder_format_version_ido ue(v) ue(v)

explicit_decoder_payload_len explicit_decoder_payload_len ue(v) ue(v)

for( i = 0; i < explicit_decoder_payload_len; i ++ ) pr(i=0;i<explicit_decoder_payload_len;i++)

decoder_payload[ decoder_payload[i]i ] u(8) u(8)

44204385_1 44204385_1

83

register_decoder_idc_flag register_decoder_idc_flag u(1) u(1) 07 Jun 2024

if( register_decoder_idc_flag ) if(register_decoder_idc_flag)

decoder_idc decoder_idc ue(v) ue(v)

}} else else {{ ue(v) or string or ue(v) or string or registered_decoder_idc registered_decoder_ido UUID UUID }} 2024203901

}} if( !no_weights_flag if( ) !no_weights_flag ) { update_weights_flag update_weights_flag u(1) u(1)

if((update_weights_flag) update_weights_flag ) {{ explicit_signal_weights_flag explicit_signal_weights_flag u(1) u(1)

if( explicit_signal_weights_flag ) { f(explicit_signal_weights_flag){

explicit_weights_idc explicit_weights_ide ue(v) ue(v)

explicit_weights_payload_len explicit_weights_payload_len ue(v) ue(v)

for( i = 0; i < explicit_weights_payload_len; i++ ) for((i=0;i<explicit_weights_payload_len;i++)

weights_payload[ i ] weights_payload[i] u(8) u(8)

}} }} }} set_region_cnt_flag set_region_cnt_flag u(1) u(1)

if( if(set_region_cnt_flag set_region_cnt_flag))

region_cnt region_cnt ue(v) ue(v)

set_region_packing_flag set_region_packing_flag u(1) u(1)

if( set_region_packing_flag ) { f(set_region_packing_flag){

for( i = 0; i < region_cnt; i++ for(i=0;i<region_cnt;i++)& ){ top_left_rsctuaddr[ i ] top_left_rsctuaddr[i] u(v) u(v)

top_right_rsctuaddr[ i ] top_right_rsctuaddr[i] u(v) u(v)

bottom_left_rsctuaddr[ i ] bottom_left_rsctuaddr[i] u(v) u(v)

bottom_right_rsctuaddr[ i bottom_right_rsctuaddr[i] ] u(v) u(v)

horizontal_packing_flag[ horizontal_packing_flag[i) i] u(1) u(1)

}} }} set_reduced_tensor_info_flag set_reduced_tensor_info_flag u(1) u(1)

44204385_1 44204385_1

84

if( set_reduced_tensor_info_flag ) if(set_reduced_tensor_info_flag) 07 Jun 2024

for( i = 0; i < region_cnt; i++ ) { for(i=0;i<region_cnt;i++)

region_tensor_cnt[ i ] region_tensor_ent[i] ue(v) ue(v)

for( j = 0; j < region_tensor_cnt[ i ]; j++ ) { for(j=0;j<region_tensor_cnt[i];j++)&

reduced_tensor_batch_size[ i ][ j ] reduced_tensor_batch_size[i][j] ue(v) ue(v)

reduced_tensor_max_channels[ i ][ reduced_tensor_max_channels[i][j j] ue(v) ue(v)

reduced_tensor_width[ i][j] reduced_tensor_width[i][j] ue(v) ue(v) 2024203901

reduced_tensor_height[ i ] [ j reduced_tensor_height[i][j] ] ue(v) ue(v)

}} }} }} update_tensor_channels_flag update_tensor_channels_flag u(1) u(1)

if( update_tensor_channels_flag if(update_tensor_channels_flag) ) for( i = 0; i < region_cnt; i++ for(i=0;i<region_cnt;it+) ) for( j=0;j<region_tensor_cnt[i]) j = 0; j < region_tensor_cnt[ i ] ) { update_tensor_channel_flag[ i ][ update_tensor_channel_flag[i]j j] u(1) u(1)

if((update_tensor_channel_flag) update_tensor_channel_flag ) tensor_channel_cnt tensor_channel_cnt ue(v) ue(v)

}} }} }} }}

FCM pictureparameter FCM picture parameterset set

fcm_pps( payloadSize ) fcm_pps( payloadSize) Descriptor Descriptor

fcm_pps_temporal_upsampling_enabled_flag m_pps_temporal_upsampling_enabled_flag u(1) u(1)

if( fcm_pps_temporal_upsampling_enabled_flag ) { f(fcm_pps_temporal_upsampling_enabled_flag)

temporal_upsampling_ratio_minus2 temporal_upsampling_ratio_minus2 ue(v) ue(v)

terminate_sequence_flag terminate_sequence_flag u(1) u(1)

if( terminate_sequence_flag if(terminate_sequence_flag) ) trailing_picture_cnt trailing_picture_cnt ue(v) ue(v)

}}

44204385_1 44204385_1

85

quantization_range_update_flag quantization_range_update_flag u(1) u(1) 07 Jun 2024

if( quantization_range_update_flag if( quantization_range_update_flag) ){ qr_mantissa_len qr_mantissa_len ue(v) ue(v)

for( i = 0; i < region_cnt; i++ ) for(i=0;i<region_cnt;i++)

for( j = 0; j < region_tensor_cnt[ i ] for(j=0;j<region_tensor_cnt[i]) ){ qr_min_exp[ i ][ j ] qr_min_exp[i][j] ue(v) ue(v)

qr_min_exp_sign[ i ][ j ] qr_min_exp_sign[i][j] u(1) u(1) 2024203901

qr_min_mantissa[ i ][ j qr_min_mantissa[i][j] ] u(b) u(b)

qr_min_mantissa_sign[ i ][ j ] qr_min_mantissa_sign[i][j] u(1) u(1)

qr_max_exp[ i ][ j ] qr_max_exp[i][j] ue(v) ue(v)

qr_max_exp_sign[ i ][ j qr_max_exp_sign[i][j] ] u(1) u(1)

qr_max_mantissa[ i ][ j ] qr_max_mantissa[i][j] u(b) u(b)

qr_max_mantissa_sign[ i ][ j ] qr_max_mantissa_sign[i][j] u(1) u(1)

}} output_datatype_update_flag output_datatype_update_flag u(1) u(1)

if( output_datatype_update_flag ) { if((output_datatype_update_flag){

output_datatype_idc output_datatype_idc ue(v) ue(v)

if( if(output_datatype_idc == 0 ) { output_datatype_idc==0){

output_datatype_exponent_len output_datatype_exponent_len ue(v) ue(v)

output_datatype_mantissa_len output_datatype_mantissa_ler ue(v) ue(v)

output_datatype_implicit_mantissa_flag output_datatype_implicit_mantissa_flag u(1) u(1)

if( output_implicit_mantissa_flag utput_implicit_mantissa_flag) ){{

output_data_implicit_mantissa_value output_data_implicit_mantissa_value u(b) u(b)

}} }} output_scaling_enable_flag output_scaling_enable_flag u(1) u(1)

if( output_scaling_enable_flag if( output_scaling_enable_flag) ){{

for((i=0;i<restored_tensor_cnt; i = 0; i < restored_tensor_cnt; it i++ ) {

qr_second_min_exp[ qr_second_min_exp[i] i] ue(v) ue(v)

qr_second_min_exp_sign[ i qr_second_min_exp_sign[i] ] u(1) u(1)

qr_second_min_mantissa[ i qr_second_min_mantissa[i] ] u(b) u(b)

qr_second_min_mantissa_sign[ qr_second_min_mantissa_sign[i] i ] u(1) u(1)

qr_second_max_exp[ i ] qr_second_max_exp[i] ue(v) ue(v)

qr_second_max_exp_sign[ qr_second_max_exp_sign[i] i ] u(1) u(1)

44204385_1 44204385_1

86

qr_second_max_mantissa[ qr_second_max_mantissa[i] i] u(b) u(b) 07 Jun 2024

qr_second_max_mantissa_sign[ qr_second_max_mantissa_sign[i] i ] u(1) u(1)

}} }} }} 2024203901

Where u(n) refers to a fixed-length codeword n bits in length and ue(v) refers to an unsigned Where u(n) refers to a fixed-length codeword n bits in length and ue(v) refers to an unsigned

exponential Golomb exponential Golomb variable-length variable-length codeword. codeword.

FCM SPSand FCM SPS andFCM FCM PPS PPS semantics: semantics:

fcm_sps_inner_decoding_bypass_flag set equal fcm_sps_inner_decoding_bypass_flag set equal to indicates to one one indicates that that the the inner inner decoding decoding

(module1204) (module 1204)isisnot not performed performedand andwhen when equal equal to to zero zero indicatesthat indicates thatthe theinner innerdecoding decodingisis performed. performed.

fcm_sps_quantisation_bypass_flag set equal cm_sps_quantisation_bypass_flag set equal to one to one indicates indicates thatthat thethe inverse inverse quantisation quantisation

(module1218) (module 1218)isisnot not performed performedand andwhen when equal equal to to zero zero indicates indicates thatthe that theinverse inversequantisation quantisation is is performed. performed.

fcm_sps_feature_restoration_bypass_flag set equal fem_sps_feature_restoration_bypass_flag set equal to one to one indicates indicates thatthat thethe feature feature

restoration (module restoration 1250)isis not (module 1250) not performed andwhen performed and when equal equal to to zeroindicates zero indicatesthat thatthe the feature feature restoration is performed. restoration is performed.

fcm_sps_temporal_upsampling_enabled_flag fcm_sps_temporal_upsampling_enabled_flag settoequal set equal to one indicates one indicates that temporal that temporal

interpolation interpolation or or upsampling (module1260) upsampling (module 1260)maymay be be performed performed according according to the to the mostmost recently recently

signalled temporal_upsampling_ratio, signalled when temporal_upsampling_ratio, when setset toto zeroindicates zero indicatesthat that the the temporal temporalupsampling upsampling is is not not performed. performed.

fcm_pps_temporal_upsampling_enabled_flag fem_pps_temporal_upsampling_enabled_flag set equalset toequal to one indicates one indicates that temporal that temporal

interpolation or interpolation or upsampling (module1260) upsampling (module 1260)isisperformed performed according according to to thethe most most recently recently

signalled temporal_upsampling_ratio, signalled when temporal_upsampling_ratio, when setset toto zeroindicates zero indicatesthat that the the temporal temporal upsampling upsampling is not is not performed. performed. It Itisisa requirement a requirementofofbitstream bitstreamconformance conformance that that when when

fcm_sps_temporal_upsampling_enabled_flag is equal cm_sps_temporal_upsampling_enabled_flag: is equal to zero, to zero,

fcm_pps_temporal_upsampling_enabled_flag is also m_pps_temporal_upsampling_enabled_flag is also equalequal to zero. to zero.

44204385_1 44204385_1

87

temporal_upsampling_ratio_minus2 signals temporal_upsampling_ratio_minus2 signals the integer the integer upsampling upsampling ratio ratio minus minus 2, i.e., 2, i.e., a a 07 Jun 2024

value of zero signals an upsampling ratio of two, a value of one signals an upsampling ratio of value of zero signals an upsampling ratio of two, a value of one signals an upsampling ratio of

three, and so on. three, and SO on.

terminate_sequence_flag is settotoone terminate_sequence_flag is set onewhen when temporal temporal upsampling upsampling is enabled is enabled and source and the the source device 110 is terminating encoding of the bitstream and wishes to signal zero or more ‘trailing device 110 is terminating encoding of the bitstream and wishes to signal zero or more 'trailing

pictures’, i.e., pictures', i.e.,pictures to be pictures to output from be output thethe from temporal upsampler temporal upsampler1260 1260produced produced using using only only one one

previous picture and no forward reference to the next picture output from the picture previous picture and no forward reference to the next picture output from the picture 2024203901

decoder 1204. Each trailing picture is a duplicate of the most recently decoded picture. decoder 1204. Each trailing picture is a duplicate of the most recently decoded picture.

trailing_picture_cnt signalshow trailing_picture_cnt signals howmany many trailingpictures trailing picturestotooutput outputbefore beforetermination terminationofof the the bitstream 121. bitstream Thevalues 121. The valuesofoftrailing_picture_cnt trailing_picture_cnt must mustbe bebetween betweenzero zeroand andtemporal temporal upsamplingratio upsampling ratiominus minusone. one.ForFor example, example, when when the the temporal temporal upsampling upsampling ratio ratio is set is set to two to two

(temporal_upsampling_ratio_minus2 equal (temporal_upsampling_ratio_minus2 equal to zero), to zero), trailing_picture_cntisispermitted trailing_picture_cnt permittedtotobebezero zero or one. or one.

set_level_flag equal to one indicates that the tensor decompression complexity indication is to set_level_flag equal to one indicates that the tensor decompression complexity indication is to

be signalled be signalled in in this thisinstance instanceofof thethe FCM FCM decoder info SEI decoder info message. SEI message.

fcm_level fcm signals the level signals the complexity complexityindication indicationfor for any any tensor tensor decompressors decompressorstotobebeperformed performedin in

the - the decoder. decoder. The complexityindication The complexity indicationprovides providesa aworst-case worst-caselimit limiton onthe the complexity complexityofofany any instantiated tensor decompressor. It is a requirement of bitstream conformance that the tensor instantiated tensor decompressor. It is a requirement of bitstream conformance that the tensor

decompressioncomplexity decompression complexity indication indication is issignalled signalledprior priorto to use use of of the the FCM decoder,e.g., FCM decoder, e.g., signalled with the first frame of packed tensor data in the bitstream. The following table shows signalled with the first frame of packed tensor data in the bitstream. The following table shows

permitted maximum permitted maximum values values forfor complexity complexity aspects aspects for for given given fcvcm_level fcvcm_level values: values:

fcm_level fcm_level MACcount MAC count Weightcount Weight count

0 0 <5M <5M <1M <1M 11 <15M <15M <5M <5M 2 2 <50M <50M <10M <10M 3-254 3-254 (reserved for future use) (reserved for future use) (reserved for future use) (reserved for future use)

255 255

44204385_1 44204385_1

88

update_decoder_flag equal update_decoder_flag equal to one to one indicates indicates thatthe that theFCM FCM decoder decoder is be is to to be updated, updated, effective effective

from this from this instance instance of of the theFCMM decoder FCMM decoder info info SEISEI message message onwards. onwards.

no_weights_flag no_weights_flag equal equal toto oneindicates one indicatesthat thatthe the FCM FCM decoder decoder does does notnot include include anyany trained trained

elements (e.g., convolutions) and therefore does not require any weights. elements (e.g., convolutions) and therefore does not require any weights.

explicit_signal_decoder_flag equaltotoone explicit_signal_decoder_flag equal oneindicates indicatesthat that the the FCM FCM decoder decoder architecture architecture isis

signalled explicitly explicitlyininthis instance of of thethe FCMFCMdecoder decoder info infoSEI SEI message. message. When equaltotozero, zero, 2024203901

signalled this instance When equal

this instance this instance of ofthe theFCM decoderinfo FCM decoder infoSEI SEImessage message insteadreferences instead referencesa apreviously previouslysignalled signalled FCM FCM decoder decoder architecture architecture oror referencesananFCM references FCM decoder decoder architecture architecture obtained obtained by external by external

means, e.g., a predetermined architecture or an architecture available from a publicly accessible means, e.g., a predetermined architecture or an architecture available from a publicly accessible

registry. registry.

explicit_decoder_compression_idc specifies explicit_decoder_compression_ide specifies thethe compression compression technique technique (if any) (if any) applied applied to the to the

payloadcontaining payload containingthe the representation representation of of the the FCM decoder FCM decoder architecture,ininaccordance architecture, accordancewith withthe the following table: following table:

explicit_decoder_compression_idc explicit_decoder_compression_ido Compression method Compression method

0 0 None None

11 DEFLATE DEFLATE 2 2 LZMA LZMA 3 3 Reservedfor Reserved forfuture futureuse use

explicit_decoder_format_idc specifiesthe explicit_decoder_format_ide specifies theformat formatininwhich which theFCM the FCM decoder decoder architecture architecture is is

encoded,with encoded, withthe the following followingformats formatssupported: supported:

explicit_decoder_format_idc explicit_decoder_format_ide Decoderrepresentation Decoder representation format format

0 0 ONNX ONNX 11 NNEX NNEX 2 2 Pytorch Pytorch

3 3 Variable-length scheme Variable-length scheme

4-15 4-15 Reservedfor Reserved forfuture futureuse use

44204385_1 44204385_1

89

explicit_decoder_format_version_idc specifies explicit_decoder_format_version_ide specifies the the version version of of thethe format format in in which which thethe 07 Jun 2024

FCVCM FCVCM decoder decoder architecture architecture is encoded. is encoded. For For eacheach supported supported format, format, a separate a separate enumeration enumeration of of explicit_decoder_format_version_idc values explicit_decoder_format_version_ide values to versions to versions Of the Of the format format is specified. is specified.

explicit_decoder_payload_len specifies explicit_decoder_payload_lenspecifies the the length length of of thethe payload payload containing containing thethe FCMFCM decoder decoder

representation in bytes, after application of Compression (if applicable). representation in bytes, after application of Compression (if applicable).

decoder_payload[ i ] specifies decoder_payload[ i specifies thethe ithith byteofofthe byte theFCM FCM decoder decoder representation. representation.

register_decoder_idc_flag equal to to one indicatesthat thatthe theFCM FCM decoder representation 2024203901

register_decoder_idc_flag equal one indicates decoder representation

signalled in this instance of the FCM decoder info SEI message is to be registered (retained) in signalled in this instance of the FCM decoder info SEI message is to be registered (retained) in

the decoder for potential future reference. the decoder for potential future reference.

decoder_idc decoder specifies specifies an index an index value value for addressing for addressing thedecoder the FCM FCM decoder representation representation in a in a registry of registry ofretained retainedFCM decoderarchitectures. FCM decoder architectures.

explicit_signal_weights_flagequal explicit_signal_weights_flag equaltotoone oneindicates indicatesthat that weights weightsassociated associated with withthe the signalled signalled FCM FCM decoder decoder representation representation areare included included in in thisinstance this instanceofof the the FCM FCM decoder decoder info info SEISEI

message. message.

explicit_weights_idcspecifies explicit_weights_ido specifies an an index indexfor for the the weights signalled in weights signalled in this thisinstance instanceofofthe FCM the FCM

decoderinfo decoder info SEI SEImessage. message.

explicit_weights_payload_len explicit_weights_payload_len specifies specifies thethe lengthofofthe length theweights weightspayload payload in in theFCM the FCM decoder decoder

info SEI info message. SEI message.

weights_payload[ weights_payload| i ]i ]specifies specifies the the ith ith byte byte of ofthe theweights weights payload payload in in the theFcM decoderinfo FcM decoder info SEI SEI Message. Message.

register_weights_idc_flag equal to one specifies that the weights signalled in this instance of register_weights_idc_flag equal to one specifies that the weights signalled in this instance of

the FCVCM the decoder FCVCM decoder infoinfo SEI SEI message message are stored are stored in the in the FCM FCM decoder decoder for potential for potential futurefuture reference. reference.

registered_decoder_idc registered_decoder_ide specifiesananindex specifies index coded coded as as a null-terminated a null-terminated UTF-8 UTF-8 string string or or with with a a signalled length signalled length to to address address an an FCM decoderrepresentation FCM decoder representationthat thatisis either either known to the known to the decoder decoder by external by external means orwas means or wasregistered registeredwith withthe the FCM FCM decoder decoder in in an an earlierinstance earlier instanceofofthe theFCM FCM decoderinfo decoder info SEI SEIMessage. Message.TheThe registered_decoder_idc registered_decoder_ido may may be signalled be signalled as anasindex, an index, as aas a variable-length string, or a universally unique identifier (UUID), or other mechanism that variable-length string, or a universally unique identifier (UUID), or other mechanism that

enables the enables the decoder networktopology decoder network topologytotobebeuniquely uniquelyidentified. identified. Signalling Signalling of of the the registered_decoder_idc registered_decoder_id may may also also select select associatedweights associated weights to to bebe used used with with theselected the selecteddecoder decoder networktopology. network topology.

44204385_1 44204385_1

90

set_region_cnt_flagequal set_region_ent_flag equaltotoone oneindicates indicatesthat that this this instance instance of ofthe theFCM decoderinfo FCM decoder infoSEI SEI 07 Jun 2024

messagesignals message signalsaa count countof of regions regions into into which the current which the current and and subsequent subsequentpictures pictures are are to to be be

divided. divided.

region_cnt indicates region_cnt indicates a count a count of regions of regions into which into which the current the current and subsequent and subsequent pictures arepictures to be are to be divided. Each divided. region is Each region is rectangular rectangular in in shape shape and and aligned aligned to to CTU boundaries.Each CTU boundaries. Eachregion regionisis populatedwith populated withfeature feature Maps Mapsfrom fromoneone or or more more Tensors. Tensors.

set_region_packing_flag equal set_region_packing_flag equal to to one one indicatesthat indicates thatthis this instance instance of of the the FCM decoder FCM decoder infoSEISEI info 2024203901

message specifies a division of the current picture into one or more rectangular regions. This message specifies a division of the current picture into one or more rectangular regions. This

division remains division in effect remains in effect until untilthe next the instance next of of instance an an FCMFCM decoder decoder info info SEI SEI message with message with

set_region_packing_flagequal set_region_packing_flag equaltotoone. one.

top_left_rsctuaddr[ i ] specifies top_left_rsctuaddr[ i ] specifies the address the address in raster-scan in raster-scan order order of of the the CTU CTU in the in the top-left top-left

position in the ith region. position in the ith region.

top_right_rsctuaddr[ i ] specifies the address in raster-scan order of the CTU in the top-right top_right_rsctuaddr[ i ] specifies the address in raster-scan order of the CTU in the top-right

position in the ith region. position in the ith region.

bottom_left_rsctuaddr[ i ] specifies bottom_left_rsctuaddr[i] specifies thethe address address in in raster-scanorder raster-scan orderofofthe theCTU CTUin in thebottom- the bottom- left left position in the position in the ith ith region. region.

bottom_right_rsctuaddr[ bottom_right_rsctuaddr| i ] specifies specifies the address the address in raster-scan in raster-scan orderorder of CTU of the the CTU in thein the bottom-right position in the ith region. bottom-right position in the ith region.

horizontal_packing_flag[ horizontal_packing_flag[ i ]i ]equal equaltotoone onespecifies specifies when whenthe thepacking packingororunpacking unpacking progresses progresses

from one feature maps of one tensor to feature maps of the next tensor within the ith region, from one feature maps of one tensor to feature maps of the next tensor within the ith region,

packingwill packing will continue continue long long in in aa left-to-right left-to-right manner. manner. When equaltotozero, When equal zero, upon uponprogressing progressing from feature from feature maps mapsofofone onetensor tensorto to feature feature maps of the maps of the next next tensor, tensor, packing packing of of feature feature maps maps

advances to the leftmost position in the current region and below the previously packed feature advances to the leftmost position in the current region and below the previously packed feature

mapswithin maps withinthe thecurrent current region. region. The value one The value onemay maybebeused used where where multiple multiple tensors, tensors, each each

containing few (e.g., one) feature maps are to be packed, requiring a region generally larger in containing few (e.g., one) feature maps are to be packed, requiring a region generally larger in

width than in height and generally smaller frame area for the regions. width than in height and generally smaller frame area for the regions.

explicit_cropping_enabled_flag equal explicit_cropping_enabled_flag equal to to oneone specifies specifies thatthe that theFCM FCM decoder decoder may may crop crop the the

decodedtensors decoded tensorsfrom fromthe thefeature feature restoration restoration module 1250totomatch module 1250 matchthetherequired requireddimensions dimensionsof of

the restored-domain the tensors according restored-domain tensors accordingtoto the the crop_* crop_*syntax syntaxelements elements(the (the'cropping ‘cropping parameters’). When parameters'). When cropping_enabled_flag cropping is equal enabled_flag is equal to zero, to zero, it itisisaa requirement requirementofofbitstream bitstream conformancethat conformance thatthe thetensors tensors resulting resulting from the feature from the feature restoration restorationmodule 1250match module 1250 matchthe the required dimensions required dimensionsofofthe the restored-domain restored-domaintensors. tensors.

44204385_1 44204385_1

91

set_reduced_tensor_info_flag equaltotoone set_reduced_tensor_info_flage equal onespecifies specifiesthat that the the number number ofoftensors tensorsin in the the defined defined 07 Jun 2024

regions and regions and dimensions dimensionsofofthe thereduced-domain reduced-domain tensors tensors is is signalledininthis signalled this instance instance of of the the FCM FCM

decoderinfo decoder info SEI SEImessage. message.

region_tensor_cnt[ region_tensor_cnt[ i i]] specifies specifies the the number of reduced-domain number of reduced-domain tensors tensors toto bebe packed packed in in theith the ith region. region.

reduced_tensor_batch_size[ ispecifies reduced_tensor_batch_size[i][j ][ j ] specifies the batch the batch size size of jth of the the jth tensor tensor in in thethe reduced reduced

domainbeing domain beingpacked packed intothe into theith ith region. region. 2024203901

reduced_tensor_max_channels[ reduced_tensor_max_channels[i][jl ispecifies ] [ j ] specifies the maximum the maximum number ofnumber featureof feature maps maps (i.e., (i.e.,

channels) of the jth tensor in the reduced domain being packed in the ith region. channels) of the jth tensor in the reduced domain being packed in the ith region.

reduced_tensor_width[ i ]specifies reduced_tensor_width[i][j] [ j ] specifies the width the width of feature of feature mapsmaps ofjth of the the tensor jth tensor in the in the

reduceddomain reduced domainbeing being packed packed in in thethe ithregion. ith region.

reduced_tensor_height[ reduced_tensor_height[i] ]i ][ j ] specifies specifies the height the height of feature of feature maps maps of the of the jth jth tensor tensor in in the the

set_restored_tensor_info_flag set_restored_tensor_info_flag equal equal to to one one specifiesthat specifies thatthe thenumber numberofofand anddimensionality dimensionality of of

tensors output from the FCM decoder, i.e., tensors in the restored domain, is specified in this tensors output from the FCM decoder, i.e., tensors in the restored domain, is specified in this

instance of instance of the the FCM decoderinfo FCM decoder infoSEI SEImessage. message.

restored_tensor_cnt specifiesthe restored_tensor_ent specifies thenumber numberof of restored-domain restored-domain tensors tensors output output from from the the FCMFCM

decoder. decoder.

restored_tensor_batch_size[ i ] specifies restored_tensor_batch_size[ i ] specifies the size the batch batchin size in the the ith ith restored-domain restored-domain tensor tensor output from output from the the FCM FCM decoder. decoder.

restored_tensor_channels[ restored_tensor_channels[ i ]i specifies ] specifiesthe the number numberofofchannels channelsininthe theith ith restored-domain restored-domain - tensor output tensor output from the FCM from the decoder. FCM decoder.

restored_tensor_width[ i ] specifies restored_tensor_width[i specifies the the width width of the of the ithith restored-domain restored-domain tensor tensor output output from from

the FCM the decoder. FCM decoder.

restored_tensor_height[ i ]specifies restored_tensor_height[ i ] specifies the the height height of of the the ith ithrestored-domain restored-domain tensor tensor output output from from

the FCM the decoder. FCM decoder.

update_tensor_channels_flag update_tensor_channels_flag equalequal to indicates to one one indicates thatthat the the flags flags to to update update packed packed number number of of feature maps for each tensor in each region are to be signalled in this instance of the FCM feature maps for each tensor in each region are to be signalled in this instance of the FCM

decoder info decoder info SEI SEImessage. message.

44204385_1 44204385_1

92

update_tensor_channel_flag[ i ][ jequal update_tensor_channel_flag| i [[]] ] equal to to oneone indicates indicates thatthe that thepacked packednumber number of feature of feature 07 Jun 2024

maps for the jth tensor in the ith region is to be signalled in this instance of the FCM decoder maps for the jth tensor in the ith region is to be signalled in this instance of the FCM decoder

info SEI info message. SEI message.

tensor_channel_cnt[ i ][ j specifies tensor_channel_cnt| i [[]] ] specifiesthe thepacked packednumber numberof of featuremaps feature maps (i.e.,channels) (i.e., channels)for forthe the jth tensor of the ith region. When tensor_channel_cnt[ i ][ j] is not signalled and jth tensor of the ith region. When tensor_channel_cnt[ i ][ j] is not signalled and

tensor_max_channels[ tensor_max_channels[i] j] iis][signalled, j] is signalled, theisvalue the value is inferred inferred to be to be equal to equal to the corresponding the corresponding

tensor_max_channels[ i ][When tensor_max_channels[i][j]. j ]. When tensor_channel_cnt[ tensor_channel_cnt[i] i ][signalled is not j ] is not signalled or in or inferred inferred in 2024203901

the current the current instance instance of ofthe theFCM decoderinfo FCM decoder infoSEI SEImessage, message,thethevalue valueremains remains in in effectfrom effect fromthe the previous instance previous instance of of the the FCM decoder FCM decoder SEI SEI message message (if (if available),otherwise available), otherwisethethevalue valueisis inferred as 0. inferred as 0.

qr_mantissa_len qr_mantissa_len specifiesthe specifies thenumber numberof of bitstotobebeused bits usedtotoencode encodethe themantissa mantissaportion portionofofthe the reduced-domainquantisation reduced-domain quantisationrange. range.

qr_min_exp[ qr_min_exp[i][ji ][ j ] specifies specifies the the exponent exponent portion portion of lower of the the lower bound bound ofreduced-domain of the the reduced-domain quantisation range for the jth tensor in the reduced domain being packed in the ith region. quantisation range for the jth tensor in the reduced domain being packed in the ith region.

qr_min_exp_sign[ i ][ j ] specifies qr_min_exp_sign[i] specifies the signtheofsign the of the exponent exponent portionportion of the of the lower lower bound bound of the of the reduced-domain reduced-domain quantisationrange quantisation range forjth for jthtensor tensor in in the the reduced domainbeing reduced domain beingpacked packed in in theith the ith region. region.

qr_min_mantissa[ qr_min_mantissa[ i ][ ji ]][specifies j ] specifies the fraction the fraction portion portion of the of thebound lower lowerof bound of the reduced- the reduced-

domainquantisation domain quantisationrange rangefor forthe the jth jth tensor tensor in inthe thereduced reduced domain being packed domain being packedininthe the ith ith region, with a bit width as specified by qr_mantissa_len. region, with a bit width as specified by qr_mantissa_len.

qr_min_mantissa_sign[ i ][ j ] specifies qr_min_mantissa_sign[ [j specifies theofsign the sign theoffraction the fraction portion portion of lower of the the lower bound bound of of the reduced-domain the quantisationrange reduced-domain quantisation rangefor forthe thejth jth tensor tensor in in the the reduced reduced domain beingpacked domain being packedinin the ith region. the ith region.

qr_max_exp[ qr_max_exp[ i ][specifies i ][ j ] specifies the the exponent exponent portion portion of the of the upper upper bound bound of reduced-domain of the the reduced-domain quantisation range for the jth tensor in the reduced domain being packed in the ith region. quantisation range for the jth tensor in the reduced domain being packed in the ith region.

qr_max_exp_sign[ qr_max_exp_sign[ i ][i ][ j ] specifies specifies the sign the sign of the of the exponent exponent portion portion of the of the upper upper bound bound of of the the reduced-domain reduced-domain quantisationrange quantisation range forjth for jthtensor tensor in in the the reduced domainbeing reduced domain beingpacked packed in in theith the ith region. region.

qr_max_mantissa[ qr_max_mantissa[ ][ ]i ][ j ] specifies specifies the fraction the fraction portion portion of of thethe upper upper bound bound of the of the reduced- reduced- domainquantisation domain quantisationrange rangefor forthe the jth jth tensor tensor in inthe thereduced reduced domain being packed domain being packedininthe the ith ith region, with a bit width as specified by qr_mantissa_len. region, with a bit width as specified by qr_mantissa_len.

qr_max_mantissa_sign[ qr_max_mantissa_sign[i][j ispecifies ][ j ] specifies the of the sign signtheoffraction the fraction portion portion of the of the upper upper bound bound of of

the reduced-domain the quantisationrange reduced-domain quantisation rangefor forthe thejth jth tensor tensor in in the the reduced reduced domain beingpacked domain being packedinin the ith the ithregion.output_datatype_update_flag equal region.output_datatype_update_flag equal to one to one specifies specifies thatthis that thisinstance instanceofof the the FCM FCM decoder decoder SEISEI message message updates updates the datatype the datatype of the of the FCM FCM decoder decoder outputoutput tensors tensors and/orand/or their their range. range.

output_datatype_idc output_datatype_idc equal equal to to zero zero specifiesa acustom specifies custom data data format format forfor theFCM the FCM decoder decoder output output

and other values indicating floating-point or integer data formats. and other values indicating floating-point or integer data formats.

44204385_1 44204385_1

93

output_datatype_exponent_len specifies output_datatype_exponent_len specifies the the length length of the of the exponent exponent for for a custom a custom output output datadata 07 Jun 2024

format, with a value of zero indicating an integer rather than floating-point output format. format, with a value of zero indicating an integer rather than floating-point output format.

output_datatype_mantissa_len specifies output_datatype_mantissa_len specifies the the length length of the of the mantissa mantissa forfor a custom a custom output output data data

format whenthe format when theexponent exponentlength lengthisisnonzero nonzeroororthe thenumber numberof of bitsfor bits foraa custom customoutput outputdata data format whenthe format when theexponent exponentlength lengthisisequal equaltotozero. zero.

output_datatype_implicit_mantissa_flag equal output_datatype_implicit_mantissa_flage equal to one to one specifies specifies thattensors that tensorsoutput outputfrom from the the

FCM FCM decoder decoder allall usea amantissa use mantissavalue valuerather ratherthan thanusing usinga amantissa mantissasignalled signalledononaaper-element per-element 2024203901

basis for each output tensor. basis for each output tensor.

output_data_implicit_mantissa_value output_data_implicit_mantissa_value when when present present signals signals the implicit the implicit mantissa mantissa used used for all for all

elements of elements of all all output output tensors tensors from from the the FCM decoder. FCM decoder.

output_scaling_enable_flag equal output_scaling_enable_flag equal to to oneone indicates indicates thatthis that thisinstance instance of of the the FCM decoder FCM decoder SEISEI

messageupdates message updatesthe thequantisation quantisationmin minand andmax max (or(or lower lower andand upper upper bound) bound) for for the the quantisation quantisation

stage performed after the feature restoration stage. stage performed after the feature restoration stage.

qr_second_mantissa_len specifies qr_second_mantissa_len specifies thethe number number of bits of bits to to bebe used used to to encode encode thethe mantissa mantissa

portion of the quantisation range. portion of the quantisation range.

qr_second_min_exp[ qr_second_min_exp[ i ] ispecifies ] specifies theexponent the exponent portion portion of of thelower the lower bound bound of of thethe output output quantisation range for the ith tensor in the restored domain. quantisation range for the ith tensor in the restored domain.

qr_second_min_exp_sign[ qr_second_min_exp_sign[ i ] specifies i ] specifies thethe sign sign of of theexponent the exponent portion portion of of thelower the lowerbound bound of of the output quantisation range for ith tensor in the restored domain. the output quantisation range for ith tensor in the restored domain.

qr_second_min_mantissa_sign[ i ] specifies qr_second_min_mantissa_sign[ i ] specifies the the signsign of of thethe fractionportion fraction portionofofthe thelower lower boundofofthethe bound output output quantisation quantisation rangerange foriththe for the ith tensor tensor in the in the restored restored domain. domain.

qr_second_max_exp[ qr_second_max_exp[ i ] specifies i ] specifies thethe exponent exponent portion portion of of thethe upper upper bound bound of the of the output output quantisation range for the ith tensor in the restored domain. quantisation range for the ith tensor in the restored domain.

qr_second_max_exp_sign[ i ] specifies qr_second_max_exp_sign[ i ] specifies thethe sign sign of of theexponent the exponent portion portion of of theupper the upper bound bound of of the output quantisation range for ith tensor in the restored domain. the output quantisation range for ith tensor in the restored domain.

qr_second_max_mantissa[ qr_second_max_mantissa[ i ] specifies specifies the fraction the fraction portionportion of the of the upper upper bound bound of the of the output output quantisation range quantisation range forfor thethe ithith tensor tensor in the in the restored restored domain, domain, with a with a bitaswidth bit width as specified by specified by qr_second_mantissa_len. qr_second_mantissa_len.

qr_second_max_mantissa_sign[ qr_second_max_mantissa_sign[i ] specifiesi ] specifies the sign the sign of the of the fraction fraction portion portion of of thethe upper upper boundofofthethe bound output output quantisation quantisation rangerange for theforiththe ith tensor tensor in the in the restored restored domain. domain.

44204385_1 44204385_1

Claims

94

CLAIMS CLAIMS 1. 1. AAmethod methodof of decoding decoding a bitstream a bitstream to to produce produce tensors tensors forfor usebybya aneural use neuralnetwork network second second

portion, the portion, the method comprising: method comprising:

decodingaanetwork decoding networkabstraction abstractionlayer layer(NAL) (NAL) unitfrom unit from thebitstream the bitstreamhaving having a predetermined a predetermined 2024203901

length, wherein length, the NAL wherein the unitofofthe NAL unit thepredetermined predeterminedlength lengthindicates indicatesaaNAL NAL unit unit format format of of one one

inner codec inner of aa plurality codec of pluralityof ofinner innercodecs, codecs,each eachother otherinner codec inner codechaving havingNAL unit lengths NAL unit lengths different to the predetermined length; different to the predetermined length;

selecting an selecting an inner inner codec codec from the plurality from the pluralityof ofinner innercodecs codecsbased based on on the thedecoded decoded NAL unit NAL unit

of the of the predetermined length; and predetermined length; and

decoding the bitstream using the selected inner codec to produce the tensors. decoding the bitstream using the selected inner codec to produce the tensors.
2. The 2. Themethod method according according to to claim claim 1, 1, wherein wherein thethe bitstream bitstream includes includes a pluralityofofNAL a plurality NAL units units

and the and the decoded NAL decoded NAL unit unit is is a aNAL NAL unit unit header. header.
3. The 3. Themethod method according according to to claim claim 2, 2, wherein wherein thethe header header is is present present inin thebitstream the bitstreamprior priorto to any any units of data to be provided to the inner codec; units of data to be provided to the inner codec;
4. The 4. Themethod method according according to to claim claim 2, 2, wherein wherein multiple multiple instances instances of of thethe NAL NAL unitunit header header are are present in the bitstream. present in the bitstream.
5. The 5. Themethod method according according to to claim claim 4, 4, wherein wherein oneone or more or more instances instances of the of the NALNAL unit unit header header are are present prior to random access entry points in the bitstream. present prior to random access entry points in the bitstream.
6. The method according to claim 5, wherein a plurality of inner codecs are used in the 6. The method according to claim 5, wherein a plurality of inner codecs are used in the

bitstream. bitstream.
7. The 7. methodaccording The method accordingtotoclaim claim1,1,wherein whereinthe thepredetermined predetermined length length is is onebyte. one byte.
8. The 8. Themethod method according according to to claim claim 1, 1, wherein wherein thethe pluralityofofinner plurality innercodecs codecsincludes includesatatleast least one one of advanced of videocoding advanced video coding(AVC), (AVC), high high efficiency efficiency video video coding coding (HEVC) (HEVC) and versatile and versatile videovideo

coding (VVC).. coding (VVC)..

41663381_1 41663381_1

95
9. The 9. Themethod method according according to to claim claim 1, 1, wherein wherein thethe pluralityofofinner plurality innercodecs codecsincludes includesAVC, AVC, HEVC,VVC HEVC, VVCandand a custom a custom codec. codec.
10. 10. Themethod The methodaccording according to to claim claim 1,1, wherein wherein theNALNAL the unitunit follows follows a start a start code code in in the the

bitstream. bitstream. 2024203901
11. 11. A method A methodofofencoding encoding tensorstotoa abitstream tensors bitstreamfor foruse useby byaa neural neural network networksecond secondportion, portion, the method the comprising: method comprising:

selecting an inner codec from the plurality of inner codecs for use in encoding tensors to the selecting an inner codec from the plurality of inner codecs for use in encoding tensors to the

bitstream; bitstream;

encodingaanetwork encoding networkabstraction abstractionlayer layer(NAL) (NAL) unittotothe unit thebitstream bitstreamhaving havinga apredetermined predetermined length, wherein length, the NAL wherein the unitofofthe NAL unit thepredetermined predeterminedlength lengthindicates indicatesaaNAL NAL unit unit format format of of the the

selected inner selected inner codec codec of of the the plurality pluralityofof inner codecs, inner each codecs, other each inner other codec inner having codec havingNAL unit NAL unit

lengths different to the predetermined length; and lengths different to the predetermined length; and

encoding the tenors to the bitstream using the selected inner codec. encoding the tenors to the bitstream using the selected inner codec.
12. 12. Themethod The methodaccording according to to claim claim 11,wherein 11, wherein thethe bitstream bitstream includes includes a a pluralityofof NAL plurality NAL units and units and the the decoded NAL decoded NAL unitisisaaNAL unit NAL unit unit header. header.
13. 13. Themethod The methodaccording according to to claim claim 12,wherein 12, wherein thethe header header is is presentininthe present thebitstream bitstreamprior prior to any units of data to be provided to the inner codec; to any units of data to be provided to the inner codec;
14. 14. Themethod The methodaccording according to to claim claim 12,wherein 12, wherein multiple multiple instances instances of of theNAL the NALunitunit header header

are present in the bitstream. are present in the bitstream.
15. 15. Themethod The methodaccording according to to claim claim 14,wherein 14, wherein oneone or or more more instances instances of of thethe NALNAL unitunit

header are present prior to random access entry points in the bitstream. header are present prior to random access entry points in the bitstream.
16. 16. The method according to claim 1, wherein the plurality of inner codecs includes at least The method according to claim 1, wherein the plurality of inner codecs includes at least

one of one of advanced videocoding advanced video coding(AVC), (AVC), high high efficiency efficiency video video coding coding (HEVC) (HEVC) and versatile and versatile videovideo

coding (VVC). coding (VVC).

41663381_1 41663381_1

96
17. 17. A decoder A decoderfor for decoding decodinga abitstream bitstreamtotoproduce producetensors tensorsfor for use use by by aa neural neural network network secondportion, second portion, the the decoder configuredto: decoder configured to:

decodeaa network decode networkabstraction abstractionlayer layer (NAL) (NAL) unitfrom unit from thebitstream the bitstreamhaving having a a predetermined predetermined

length, wherein length, the NAL wherein the unitofofthe NAL unit thepredetermined predeterminedlength lengthindicates indicatesaaNAL NAL unit unit format format of of one one 2024203901

inner codec inner of aa plurality codec of pluralityof ofinner innercodecs, codecs,each eachother otherinner codec inner codechaving havingNAL unit lengths NAL unit lengths different to the predetermined length; different to the predetermined length;

select an select an inner inner codec codec from the plurality from the pluralityof ofinner innercodecs codecsbased basedon on the thedecoded decoded NAL unitofof NAL unit

the predetermined the length; and predetermined length; and

decode the bitstream using the selected inner codec to produce the tensors. decode the bitstream using the selected inner codec to produce the tensors.
18. 18. A non-transitory A non-transitory computer-readable computer-readablestorage storagemedium medium which which stores stores a program a program for for executing aa method executing methodofofdecoding decodinga abitstream bitstreamtotoproduce producetensors tensorsfor foruse usebybyaaneural neuralnetwork network secondportion, second portion, the the method comprising: method comprising:

decodingaa network decoding networkabstraction abstractionlayer layer(NAL) (NAL) unitfrom unit from thebitstream the bitstreamhaving having a predetermined a predetermined

length, wherein length, the NAL wherein the unitofofthe NAL unit the predetermined predeterminedlength lengthindicates indicatesaaNAL NAL unit unit format format of of one one

inner codec inner of aa plurality codec of pluralityof ofinner innercodecs, codecs,each eachother otherinner codec inner codechaving havingNAL unit lengths NAL unit lengths different to the predetermined length; different to the predetermined length;

selecting an selecting an inner inner codec codec from the plurality from the pluralityof ofinner innercodecs codecsbased based on on the thedecoded decoded NAL unit NAL unit

of the of the predetermined length; and predetermined length; and

decoding the bitstream using the selected inner codec to produce the tensors. decoding the bitstream using the selected inner codec to produce the tensors.
19. 19. Anencoder An encoderfor forencoding encodingtensors tensorstotoaabitstream bitstreamfor for use use by by aa neural neural network second network second

portion, the encoder configured to: portion, the encoder configured to:

select an inner codec from the plurality of inner codecs for use in encoding tensors to the select an inner codec from the plurality of inner codecs for use in encoding tensors to the

bitstream; bitstream;

41663381_1 41663381_1

97

encodeaa network networkabstraction abstractionlayer layer (NAL) (NAL) unittotothe thebitstream bitstreamhaving havingaapredetermined predetermined 07 Jun 2024

encode unit

length, wherein length, the NAL wherein the unitofofthe NAL unit thepredetermined predeterminedlength lengthindicates indicatesaaNAL NAL unit unit format format of of the the

selected inner selected inner codec codec of of the the plurality pluralityofof inner codecs, inner each codecs, other each inner other codec inner having codec havingNAL unit NAL unit

lengths different to the predetermined length; and lengths different to the predetermined length; and

encode the tenors to the bitstream using the selected inner codec. encode the tenors to the bitstream using the selected inner codec. 2024203901
20. A 20. non-transitory computer-readable A non-transitory computer-readablestorage storagemedium medium which which stores stores a program a program for executing for executing a a methodofofencoding method encodingtensors tensorstotoaabitstream bitstreamfor for use use by by aa neural neural network secondportion, network second portion,the the methodcomprising: method comprising:

selecting an inner codec from the plurality of inner codecs for use in encoding tensors to the selecting an inner codec from the plurality of inner codecs for use in encoding tensors to the

bitstream; bitstream;

encodingaa network encoding networkabstraction abstractionlayer layer(NAL) (NAL) unittotothe unit thebitstream bitstreamhaving havinga apredetermined predetermined length, wherein length, the NAL wherein the unitofofthe NAL unit thepredetermined predeterminedlength lengthindicates indicatesaaNAL NAL unit unit format format of of the the

selected inner selected inner codec codec of of the the plurality pluralityofof inner codecs, inner each codecs, other each inner other codec inner having codec havingNAL unit NAL unit

lengths different to the predetermined length; and lengths different to the predetermined length; and

encoding the tenors to the bitstream using the selected inner codec. encoding the tenors to the bitstream using the selected inner codec.

CANONKABUSHIKI CANON KABUSHIKIKAISHA KAISHA Patent Attorneys Patent for the Attorneys for the Applicant Applicant

Spruson & Spruson Ferguson & Ferguson

41663381_1 41663381_1