GB2641548A

GB2641548A - An apparatus and method for controlling codec capability level

Info

Publication number: GB2641548A
Application number: GB2407975.8A
Authority: GB
Inventors: Juhani Laaksonen Lasse; Anton Pajunen Lauros; Sakari Rämö Anssi; Ilmari Lintervo Arvi
Original assignee: Nokia Technologies Oy
Current assignee: Nokia Technologies Oy
Priority date: 2024-06-05
Filing date: 2024-06-05
Publication date: 2025-12-10
Also published as: GB202407975D0; WO2025252427A1

Abstract

An apparatus for conversational immersive audio coding, the apparatus comprising means configured to: support at least two different coded formats; obtain one or more first codec capability level for a first coded format of the supported at least two different coded formats; obtain one or more second codec capability level for a second coded format of the supported at least two different coded formats, the one or more second codec capability level being different to the one or more first codec capability level; control the one or more first codec capability level based on the one or more second codec capability level; and determine coding parameters for supporting coding the conversational immersive audio based on the controlled one or more first codec capability level.

Description

[0001] AN APPARATUS AND METHOD FOR CONTROLLING CODEC CAPABILITY LEVEL

[0002] Field

[0003] The present application relates to apparatus and methods for controlling codec capability levels and not exclusively controlling codec capability levels in two of more coding formats within an immersive voice and audio service environment.

[0004] Background

[0005] Immersive audio codecs are being implemented supporting a multitude of operating points ranging from a low bit rate operation to transparency. An example of such a codec is the immersive voice and audio services (IVAS) codec which is designed to be suitable for use over a communications network such as a 3GPP 4G/5G network. Such immersive services include uses for example in immersive voice and audio for applications such as virtual reality (VR), augmented reality (AR) and mixed reality (MR) as well as spatial voice communication including teleconferencing. This audio codec handles the encoding, decoding and rendering of speech, music and generic audio. It provides backwards interoperable EVS mono operation and furthermore supports stereo, scene-based audio (SBA), metadata-assisted spatial audio (MASA), channel-based audio, object-based audio, and certain combinations of these input formats. The codec operates with low latency to enable conversational services as well as support high error robustness under various transmission conditions.

[0006] The input signals are presented to the IVAS encoder in one of the supported formats (and in some allowed combinations of the formats). In addition, combinations of formats are supported such as: Objects with MASA (OMASA) and Objects with SBA (OSBA). The IVAS output formats include mono, stereo, multichannel (including custom loudspeaker layouts), FOA, HOA2, HOA3, and binaural. In addition, so-called pass-through operation is possible allowing, e.g., MASA output for MASA input. IVAS furthermore includes a bit-exact implementation of the EVS codec standard.

[0007] Similarly, the decoder can output the audio in several supported formats including a pass-through operation, where the audio can be provided in its original format after transmission (encoding/decoding).

[0008] As a spatial audio codec supporting at least three degrees of rotation freedom (yaw, pitch, roll) for all spatial inputs, the IVAS codec is expected to be used in a variety of scenarios, all of which cannot be known beforehand.

[0009] The IVAS codec algorithm is described in 3GPP TS 26.253 (Codec for Immersive Voice and Audio Services; Detailed Algorithmic Description incl. RTP payload format and SDP parameter definitions). Furthermore the IVAS codec floating-point C code is provided in 3GPP TS 26.258.

[0010] Additionally RTP (Real-Time Transport Protocol) is intended for an end-toend, real-time transfer of streaming media and provides facilities for jitter compensation and detection of packet loss and out-of-order delivery. RTP allows data transfer to multiple destinations through IP multicast or to a specific destination through IP unicast. The majority of the RTP implementations are built on top of the User Datagram Protocol (UDP). Other transport protocols may also be utilized. RTP is used in together with other protocols such as H.323 and Real Time Streaming Protocol (RTSP).

[0011] The RTP specification describes two protocols: RTP and RTCP. RTP is used for the transfer of multimedia data, and its companion protocol (RTCP) is used to periodically send control information and QoS (Quality of Service) parameters. For a class of applications (e.g., audio, video), a RTP profile may be defined. For a media format (e.g., a specific video coding format), an associated RTP payload format may be defined. Every instantiation of RTP in a particular application may require a profile and payload format specifications.

[0012] The profile defines the codecs used to encode the payload data and their mapping to payload format codes in the protocol field Payload Type (PT) of the RTP header.

[0013] For example, the RTP profile for audio and video conferences with minimal control is defined in RFC 3551. The profile defines a set of static payload type assignments, and a dynamic mechanism for mapping between a payload format, and a PT value using Session Description Protocol (SDP). The latter mechanism is used for newer video codecs such as RTP payload format for H.264 Video defined in RFC 6184 or RTP Payload Format for High Efficiency Video Coding (HEVC) defined in RFC 7798.

[0014] IVAS RTP payload format is currently being developed, and the latest state is described in TS 26.253 Annex A (v2.0.0, S4-240376). Additional updates are 5 provided in CR on TS 26.253 (S4-241325).

[0015] Summary

[0016] There is provided according to a first aspect an apparatus for conversational immersive audio coding, the apparatus comprising means configured to: support at least two different coded formats; obtain one or more first codec capability level for a first coded format of the supported at least two different coded formats; obtain one or more second codec capability level for a second coded format of the supported at least two different coded formats, the one or more second codec capability level being different to the one or more first codec capability level; control the one or more first codec capability level based on the one or more second codec capability level; and determine coding parameters for supporting coding the conversational immersive audio based on the controlled one or more first codec capability level.

[0017] The one or more first codec capability level may be at least one of: one or 20 more codec capability level; one or more codec mode set; one or more codec profile; one or more capability for bitrate; and one or more capability for bandwidth. The one or more second codec capability level may be at least one of: one or more codec capability level; one or more codec mode set; one or more codec profile; one or more capability for bitrate; and one or more capability for bandwidth.

[0018] The controlled one or more first codec capability level may be at least one of: one or more codec capability level; one or more codec mode set; one or more codec profile; one or more capability for bitrate; and one or more capability for bandwidth.

[0019] The one or more first codec capability level may be a first codec capability list or profile defining at least one of: a first supported bitrate; and a first supported audio bandwidth.

[0020] The one or more second codec capability level may be a second codec capability list or profile defining at least one of: a second supported bitrate; and a second supported audio bandwidth.

[0021] The one or more second codec capability level may have a lower capability than the one or more first codec capability level, wherein the means configured to control the one or more first codec capability level based on the one or more second codec capability level may be configured to adapt to the one or more second codec capability level.

[0022] The one or more first codec capability level and the one or more second codec capability level may be one or more IVAS codec capability level.

[0023] The means configured to control the one or more first codec capability level based on the one or more second codec capability level may be configured to perform at least one of: set a maximum supported bitrate for the first coded format to a maximum supported bitrate of the second coded format; and set a maximum supported bandwidth for the first coded format to a maximum supported bandwidth of the second coded format.

[0024] The means configured to determine the coding parameters for supporting coding the conversational immersive audio based on the controlled one or more first codec capability level may be configured to negotiate with at least one further apparatus the controlled one or more first codec capability level, for supporting coding the conversational immersive audio.

[0025] The means configured to negotiate with at least one further apparatus the controlled one or more first codec capability level, for supporting coding the conversational immersive audio may be configured to perform a SDP negotiation.

[0026] The second coded format may be a coded format adapted by a virtual codec capability level based on an apparatus constraint.

[0027] The apparatus constraint may comprise at least one of: an apparatus battery level; and an apparatus current consumption level.

[0028] The controlled one or more first codec capability level may be one or more adapted codec capability level.

[0029] According to a second aspect there is provided a method for an apparatus for conversational immersive audio coding, the method comprising: supporting at least two different coded formats; obtaining one or more first codec capability level for a first coded format of the supported at least two different coded formats; obtaining one or more second codec capability level for a second coded format of the supported at least two different coded formats, the one or more second codec capability level being different to the one or more first codec capability level; controlling the one or more first codec capability level based on the one or more second codec capability level; and determining coding parameters for supporting coding the conversational immersive audio based on the controlled one or more first codec capability level.

[0030] The one or more first codec capability level may be at least one of: one or 10 more codec capability level; one or more codec mode set; one or more codec profile; one or more capability for bitrate; and one or more capability for bandwidth. The one or more second codec capability level may be at least one of: one or more codec capability level; one or more codec mode set; one or more codec profile; one or more capability for bitrate; and one or more capability for bandwidth.

[0031] The controlled one or more first codec capability level may be at least one of: one or more codec capability level; one or more codec mode set; one or more codec profile; one or more capability for bitrate; and one or more capability for bandwidth.

[0032] The one or more first codec capability level may be a first codec capability list or profile defining at least one of: a first supported bitrate; and a first supported audio bandwidth.

[0033] The one or more second codec capability level may be a second codec capability list or profile defining at least one of: a second supported bitrate; and a second supported audio bandwidth.

[0034] The one or more second codec capability level may have a lower capability than the one or more first codec capability level, wherein controlling the one or more first codec capability level based on the one or more second codec capability level may comprise adapting to the one or more second codec capability level.

[0035] The one or more first codec capability level and the one or more second codec capability level may be one or more IVAS codec capability level.

[0036] Controlling the one or more first codec capability level based on the one or more second codec capability level may comprise at least one of: setting a maximum supported bitrate for the first coded format to a maximum supported bitrate of the second coded format; and setting a maximum supported bandwidth for the first coded format to a maximum supported bandwidth of the second coded format.

[0037] Determining the coding parameters for supporting coding the conversational immersive audio based on the controlled one or more first codec capability level may comprise negotiating with at least one further apparatus the controlled one or more first codec capability level, for supporting coding the conversational immersive audio.

[0038] Negotiating with at least one further apparatus the controlled one or more first codec capability level, for supporting coding the conversational immersive audio may comprise performing a SDP negotiation.

[0039] The second coded format may be a coded format adapted by a virtual codec capability level based on an apparatus constraint.

[0040] The apparatus constraint may comprise at least one of: an apparatus battery level; and an apparatus current consumption level.

[0041] The controlled one or more first codec capability level may be one or more adapted codec capability level.

[0042] According to a third aspect there is provided an apparatus for conversational immersive audio coding, the apparatus comprising at least one processor and at least one memory including a computer program code, the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus at least to: support at least two different coded formats; obtain one or more first codec capability level for a first coded format of the supported at least two different coded formats; obtain one or more second codec capability level for a second coded format of the supported at least two different coded formats, the one or more second codec capability level being different to the one or more first codec capability level; control the one or more first codec capability level based on the one or more second codec capability level; and determine coding parameters for supporting coding the conversational immersive audio based on the controlled one or more first codec capability level.

[0043] The one or more first codec capability level may be at least one of: one or more codec capability level; one or more codec mode set; one or more codec profile; one or more capability for bitrate; and one or more capability for bandwidth.

[0044] The one or more second codec capability level may be at least one of: one or more codec capability level; one or more codec mode set; one or more codec profile; one or more capability for bitrate; and one or more capability for bandwidth. The controlled one or more first codec capability level may be at least one 5 of: one or more codec capability level; one or more codec mode set; one or more codec profile; one or more capability for bitrate; and one or more capability for bandwidth.

[0045] The one or more first codec capability level may be a first codec capability list or profile defining at least one of: a first supported bitrate; and a second supported audio bandwidth.

[0046] The one or more second codec capability level may be a second codec capability list or profile defining at least one of: a second supported bitrate; and a second supported audio bandwidth.

[0047] The one or more second codec capability level may have a lower capability than the one or more first codec capability level, wherein the apparatus caused to control the one or more first codec capability level based on the one or more second codec capability level may be caused to adapt to the one or more second codec capability level.

[0048] The one or more first codec capability level and the one or more second codec capability level may be one or more IVAS codec capability level.

[0049] The apparatus caused to control the one or more first codec capability level based on the one or more second codec capability level may be caused to perform at least one of: set a maximum supported bitrate for the first coded format to a maximum supported bitrate of the second coded format; and set a maximum supported bandwidth for the first coded format to a maximum supported bandwidth of the second coded format.

[0050] The apparatus caused to determine the coding parameters for supporting coding the conversational immersive audio based on the controlled one or more first codec capability level may be caused to negotiate with at least one further apparatus the controlled one or more first codec capability level, for supporting coding the conversational immersive audio.

[0051] The apparatus caused to negotiate with at least one further apparatus the controlled one or more first codec capability level, for supporting coding the conversational immersive audio is caused to perform a SDP negotiation The second coded format may be a coded format adapted by a virtual codec capability level based on an apparatus constraint.

[0052] The apparatus constraint may comprise at least one of: an apparatus battery level; and an apparatus current consumption level.

[0053] The controlled one or more first codec capability level may be one or more adapted codec capability level.

[0054] According to a fourth aspect there is provided an apparatus for conversational immersive audio coding, the apparatus comprising: supporting circuitry configured to support at least two different coded formats; obtaining circuitry configured to obtain one or more first codec capability level for a first coded format of the supported at least two different coded formats; obtaining circuitry configured to obtain one or more second codec capability level for a second coded format of the supported at least two different coded formats, the one or more second codec capability level being different to the one or more first codec capability level; controlling circuitry configured to control the one or more first codec capability level based on the one or more second codec capability level; and determining circuitry configured to code parameters for supporting coding the conversational immersive audio based on the controlled one or more first codec capability level.

[0055] According to a fifth aspect there is provided a computer program comprising instructions [or a computer readable medium comprising program instructions] for causing an apparatus for conversational immersive audio coding to perform at least the following: supporting at least two different coded formats; obtaining one or more first codec capability level for a first coded format of the supported at least two different coded formats; obtaining one or more second codec capability level for a second coded format of the supported at least two different coded formats, the one or more second codec capability level being different to the one or more first codec capability level; controlling the one or more first codec capability level based on the one or more second codec capability level; and determining coding parameters for supporting coding the conversational immersive audio based on the controlled one or more first codec capability level.

[0056] According to a sixth aspect there is provided a non-transitory computer readable medium comprising program instructions for causing an apparatus for conversational immersive audio coding to perform at least the following: supporting at least two different coded formats; obtaining one or more first codec capability level for a first coded format of the supported at least two different coded formats; obtaining one or more second codec capability level for a second coded format of the supported at least two different coded formats, the one or more second codec capability level being different to the first one or more codec capability level; controlling the first one or more codec capability level based on the one or more second codec capability level; and determining coding parameters for supporting coding the conversational immersive audio based on the controlled one or more first codec capability level.

[0057] According to a seventh aspect there is provided an apparatus for conversational immersive audio coding, the apparatus comprising: means for supporting at least two different coded formats; means for obtaining one or more first codec capability level for a first coded format of the supported at least two different coded formats; means for obtaining one or more second codec capability level for a second coded format of the supported at least two different coded formats, the one or more second codec capability level being different to the one or more first codec capability level; means for controlling the one or more first codec capability level based on the one or more second codec capability level; and means for determining coding parameters for supporting coding the conversational immersive audio based on the controlled one or more first codec capability level.

[0058] According to an eighth aspect there is provided a computer readable medium comprising program instructions for causing an apparatus for conversational immersive audio coding to perform at least the following: supporting at least two different coded formats; obtaining one or more first codec capability level for a first coded format of the supported at least two different coded formats; obtaining one or more second codec capability level for a second coded format of the supported at least two different coded formats, the one or more second codec capability level being different to the one or more first codec capability level; controlling the one or more codec first capability level based on the one or more second codec capability level; and determining coding parameters for supporting coding the conversational immersive audio based on the controlled one or more first codec capability level.

[0059] An apparatus comprising means for performing the actions of the method as described above.

[0060] An apparatus configured to perform the actions of the method as described above.

[0061] A computer program comprising program instructions for causing a 10 computer to perform the method as described above.

[0062] A computer program product stored on a medium may cause an apparatus to perform the method as described herein.

[0063] An electronic device may comprise apparatus as described herein.

[0064] A chipset may comprise apparatus as described herein.

[0065] Embodiments of the present application aim to address problems associated with the state of the art.

[0066] Summary of the Figures

[0067] For a better understanding of the present application, reference will now be 20 made by way of example to the accompanying drawings in which: Fig. 1 shows schematically example server and peer-to-peer teleconferencing systems within which embodiments may be implemented; Fig 2 shows schematically an example nested or onion level based codec; Figs 3a and 3b show schematically example separate or combined level 25 based codecs; Fig 4 shows schematically example SBA based level codec and bitrates; Fig 5 shows schematically example MASA based level codec and bitrates; Fig 6 shows schematically an example adapted MASA based level codec and bitrates according to some embodiments; Figs 7 to 9 show flow diagrams of an example operation of the adaptation operation according to some embodiments; Fig 10 shows a table of various codecs and bit rates and associated processing requirements; and Fig 11 shows an example device suitable for implementing the apparatus shown.

[0068] Embodiments of the Application The following describes in further detail suitable apparatus and possible mechanisms for the provision of efficient IVAS audio.

[0069] An example system within which embodiments may be implemented is shown in Fig. 1.

[0070] Fig. 1, for example, shows an example teleconferencing system within which some embodiments can be implemented. For example in some embodiments a bidirectional immersive audio call or conversational immersive audio can be implemented using such a teleconferencing system. In this example there is shown two sites or rooms, Room A 100 and Room B 102. Room A 100 comprises a 'talker' or user, Talker TX 103. Room B 102 comprises one 'talker' or user, Talker RX 141.

[0071] In the following example within room A is a suitable teleconference apparatus (or more generally telecommunications apparatus 110) configured to spatially capture and encode the audio environment and furthermore is configured to render a spatial audio signal to the room. The apparatus can in some embodiments be implemented by a user equipment (UE) operating within a cellular communications system or accessing any suitable access network. Within each of the other rooms may be a suitable teleconference apparatus (or more generally telecommunications apparatus such as apparatus 120 within room B) configured to render a spatial audio signal to the room and furthermore is configured to capture and encode at least a mono audio and optionally configured to spatially capture and encode the audio environment.

[0072] The apparatus or UEs 110, 120 can comprise any suitable type of device. The apparatus or UEs 110, 120 can comprise personal communication devices such as mobile phones, devices within a teleconferencing system and/or any other types of communication devices. The apparatus or UEs 110, 120 can comprise microphones and/or any other suitable means for capturing audio. The apparatus or UEs 110, 120 can furthermore comprise, or can be coupled to, one or more playback devices or other means for playing back rendered audio. The means for playing back audio can comprise loudspeaker or headsets or any other suitable means.

[0073] The apparatus or UEs 110, 120 can implemented as shown in Fig. 11. The apparatus or UEs 110, 120 can comprise immersive audio or conversational immersive audio codecs, for example the immersive voice and audio services (IVAS) codec. Other codecs could be used in other examples.

[0074] Thus in the following examples each room is provided with the means to spatially capture, encode spatial audio signals, receive spatial audio signals and render these to a suitable listener.

[0075] It would be understood that there may be other embodiments where the system comprises some apparatus configured to only capture and encode audio signals (in other words the apparatus is a 'transmit' only apparatus), and other apparatus configured to only receive and render audio signals (in other words the apparatus is a 'receive' only apparatus). In such embodiments the system within which embodiments may be implemented may comprise apparatus with varying abilities to capture/render audio signals.

[0076] The teleconference apparatus (for each site or room) 110, 120 can be configured to call into a teleconference controlled by and implemented over a server 111.

[0077] In some embodiments the communications or teleconferencing system comprises a (peer-to-peer) communications system (rather than the server-based system shown in Fig. 1) within which some embodiments can be implemented. Thus, for example, two or more UEs can be configured to interact directly with each other (for example to implement an immersive audio phone call between users).

[0078] The teleconference apparatus can be configured to spatially capture and encode the audio environment and furthermore can be configured to render a spatial audio signal to the room. In this example only the communications or signalling path from the Room A 100 to the Room B 102 is shown for simplicity but a duplex or multipoint communication system comprising multiple signalling paths can be implemented using the methods as described herein without significant inventive input.

[0079] The teleconference apparatus (for each site or room) 110, 120 is further configured to communicate with each other to implement a teleconference function.

[0080] As shown in Fig. 1, the apparatus 110, 120 and server 111 can comprise suitable encoder and decoder functionality. For example the apparatus 110 is shown comprising an (IVAS) encoder/packetizer function 105 which comprises an IVAS stream encoder (encoder 1) 101, the server 111 is shown comprising a (IVAS) 5 decoder and encoder functions 121 and the apparatus 120 is shown comprising an (IVAS) depacketizer/decoder function 141 which comprises an IVAS stream decoder (decoder 1) 131. In such a manner a first audio stream 104 (the audio signals representing the user or talker TX 103) can be encoded by the encoder 1 101 which generates a single RTP payload 106 that comprises an IVAS bitstream 10 from the encoder 1 101 to be passed to a server 111. The server 111 can then decode, (optionally then mix with other objects and otherwise process the audio signals) and encode then to generate the single RTP payload 108 with the IVAS bitstream to be passed to the apparatus 120. In some embodiments the server 111 decoder and encoder 121 comprises multiple stream encoder and decoder instances. The apparatus 120 can then decode the audio signals and present them to the user or talker 'Talker RX' 141. The apparatus 120 thus can comprise a decoder function 131 configured to receive the RTP payload 108. In some embodiments the decoder function 131 comprises a first decoder (decoder 1) 131 configured to decode the bit-stream.

[0081] Although this example shows a teleconference application the encoder/decoder functionality can be applied to the streaming of any suitable media, or indeed a one-to-one immersive audio call (IVAS call with two participants).

[0082] The IVAS decoder/renderer for each of the teleconference apparatus 102 can be furthermore configured to handle one or multiple input streams that may each originate from a different encoder.

[0083] Similarly, although in the example shown in Figure 1 has a server 111 between the apparatus 110, 120 in some embodiments the communication can be direct without any intermediate (IVAS) decoder/encoder.

[0084] The different apparatus or UEs 110, 120 in the interactive communication session can have different computational capabilities (or computational complexities or codec capability level or functionality tier or any suitable label). The codec capability level or computational capability of the apparatus or UEs 110, 120 can be defined in terms of a level or a tier or any equivalent way. The different levels enable encoding and decoding/rendering functionalities with different complexity and memory requirements. For most devices and use cases, levels can be considered to particularly control the encoding capability. In general, tiers of functionality enable a codec to be implementable on a wide range of devices with different capabilities, balancing user experience and implementation complexity or cost.

[0085] For instance, in the apparatus or UEs 110, 120 that use an IVAS codec, the different levels can be built on each other so that a higher level comprises all features and functionalities of lower levels but with some additional features. For example, level one could be a core level, level two can comprise all of the features of level one and some additional features. Level three can comprise all of the features of level two and some additional features. Level three can be the highest level. The highest level can comprise the full set of IVAS codec features and functionalities. An apparatus or UE 110, 120 or any other type of device configured for IVAS shall support a codec with a basic functionality (or in other words a coding format with at least a level one functionality (where the level one functionality is the most basic of the defined functionality associated with the coding format).

[0086] The levels (or tiers or similar) can, for example, be defined as follows: The following level-dependent limits apply for IVAS codec operations (encoder/decoder/renderer total) excluding Jitter Buffer Management and other supplementary operations: Level 1: o Complexity <= 3 " EVS o RAM <= 3 * EVS Level 2: o Complexity <= 6 " EVS o RAM <= 6 * EVS Level 3: o Complexity <= 10 * EVS o RAM <= 10 * EVS The highest level provides the full functionality of IVAS. At the lower levels, reduced functionality is provided. Each increasing complexity level can provide full support of all the lower levels.

[0087] The following level-independent ROM and PROM constraints apply: o ROM, PROM <= 10 * EVS In the above level indications, "EVS" stands for the standard baseline functional implementation of the 3GPP EVS codec described in TS 26.445 specification. For example, the phrase "Complexity <= 3 * EVS" indicates that the measured or evaluated computational complexity of the IVAS codec should be less than or equal to three times the computational complexity of the standard baseline EVS implementation. The standard fixed point EVS implementations are described, for example, in 3GPP TS 26.442 and TS 26.452 specifications.

[0088] Other definitions or indications of levels or general complexity-based device capabilities are possible. In this example three different levels of computational capabilities are defined. Other numbers of levels could be used in other examples.

[0089] The complexity/memory for the UE can be evaluated based on any suitable manner, for example using the WMC automated tool based on ITU-T G.191 for both CuT and reference in a consistent way for worst case.

[0090] The complexity level can in examples be provided to encoder / decoder / renderer during codec initialization.

[0091] The decoder/renderer at all levels shall be able to decode any IVAS bitstream and render it to an output format that may be level dependent.

[0092] Thus, examples of the disclosure enable an immersive codec operating point to be selected for a communication session where the selection of the operating point accounts for the computational capabilities of at least one of the UEs 110, 120 in the communication session.

[0093] As discussed previously RTP is intended for an end-to-end, real-time transfer of streaming media and provides facilities for jitter compensation and detection of packet loss and out-of-order delivery. RTP is furthermore designed to carry a multitude of multimedia formats, which permit the transport of new formats without revising the RTP standard. To this end, the information required by a specific application of the protocol is not included in the generic RTP header. For a class of applications (e.g., audio, video), an RTP profile may be defined. For a media format (e.g., a specific video coding format), an associated RTP payload format may be defined. Every instantiation of RTP in a particular application may therefore require a profile and payload format specifications.

[0094] The profile is configured to define the codec used to encode the payload 5 data and the mapping to payload format codes in the protocol field Payload Type (PT) of the RTP header.

[0095] For example, the RTP profile for audio and video conferences with minimal control is defined in RFC 3551. The profile defines a set of static payload type assignments, and a dynamic mechanism for mapping between a payload format, and a PT value using Session Description Protocol (SDP). The latter mechanism is used for newer video codec such as RTP payload format for H.264 Video defined in RFC 6184 or RTP Payload Format for High Efficiency Video Coding (HEVC) defined in RFC 7798.

[0096] An RTP session can be established for each multimedia stream. Audio and video streams may be implemented which use separate RTP sessions, enabling a receiver to selectively receive components of a particular stream. The RTP specification can furthermore be configured to recommend port numbers for RTP, and furthermore to recommend the use of the next odd port number for the associated RTCP session. A single port can be used for RTP and RTCP in applications that multiplex the protocols.

[0097] Each RTP stream can comprise RTP packets, and the RTP packet in turn can comprise a RTP header and payload pair.

[0098] Enhanced Voice Services (EVS) is a mono voice codec standardized in 3GPP and described in the TS 26.445 specification document. The codec can have 25 two operating modes: EVS Primary and EVS AMR-WB 10 (Adaptive Multi Rate Wideband Inter-Operable).

[0099] The IVAS codec is an extension to the EVS codec and as such the IVAS and EVS codecs can have some similarities in terms of design and implementation. The IVAS RTP payload format is extended from the EVS payload format.

[0100] The RTP payload format of EVS is described in 3GPP TS 26.445 Annex A. In EVS, the RTP payload format is divided into two different embodiments: a Compact format and a Header-Full format. In the EVS Compact payload format, a RTP packet includes a single EVS speech frame for EVS Primary mode. For EVS AMR-WB 10 mode, the compact RTP packet also includes a 3-bit Codec Mode Request (CMR) field in front of the speech frame. In the EVS Compact format, the different modes and bitrates for the speech frames are identified by the size of the RTP payload. For example, an RTP packet of size 328 bits is assigned for EVS Primary mode with 16.4 kbps bitrate, as is shown in Table A.1 in TS 26.445 Annex A. An IVAS codec capability on a device (UE) can be defined based on device's computational capability or codec capability level, i.e., how complex a codec operation the UE can employ. In the following examples the terms computational capability and codec capability level can be interchanged. In some embodiments, this capability may be controlled or determined or adapted, e.g., based on battery level status, power consumption, current consumption or other requirements. The capability can be called or relate to, e.g., IVAS Level, IVAS tier, IVAS core set, IVAS features, etc. IVAS supports several encoder input formats and encoding of several coded modes. For example, mono (using EVS), stereo, SBA, MASA, etc. Bitrate support (similarly or in addition audio bandwidth / sampling rate support) for codec capability (e.g., IVAS Level 1) can be format specific. For example, Level 1 support for SBA could be limited to 80 kbps FB operation, while 20 for MASA the support could go to 512 kbps FB operation.

[0101] There are currently issues associated with codec negotiation when several coded formats (based on various encoder input formats) are offered and the apparatus or UEs have a limited computational capability for coding (and therefore have to conform to a lower IVAS Level). For example, where the apparatus or UE is a Level 1 or Level 2 IVAS device.

[0102] Current approaches for IVAS RTP payload and SDP parameter definition provide a single common parameter for all supported coded formats. This parameter can be, for example, for immersive mode bitrates (ibr) or immersive mode bandwidth (ibw). In addition, similar parameters (br, bw) can be available for mono (EVS) operation, where IVAS provides encoding operation using EVS Primary and AMR-WB 10 modes.

[0103] However, these latter parameters are not suitable for IVAS Immersive operation.

[0104] Additional parameters could be introduced for bitrate (and bandwidth) negotiation for the Immersive operations, which can be numerous on a single device (including core set or Level 1 devices): SBA (FOA, HOA2, HOA3), MASA (MASA1, MASA2), IMS1-4, MC (5.1, 7.1, 5.1+2, 5.1+4, 7.1+4) OSBA (with 1-4 ISMs), and OMASA (with 1-4 ISMs). However adding separate ibr and ibw parameters for all (minor) coded formats would significantly overload the SDP negotiation. Thus adding a parameter for each of the supported coded or coding formats would dramatically increase the signalling requirements during any negotiation. In other words the negotiation would become too complex in terms of size and implementation.

[0105] Alternatively, rather than implementing a single negotiation with multiple parameters, several negotiation attempts could be performed. This would complicate the negotiation process and requite more time to complete the negotiation attempts. This results in the situation where establishing an IVAS call or immersive call to becomes more complex and slower than establishing a mono call.

[0106] This issue can furthermore arise in the situation where the offered bitrate is higher than the bitrate which is actually supported. For example, an ambisonics mode (SBA) supports 80 kbps and below for Level 1 operation, however 512 kbps is offered to establish a call with highest-quality MASA encoding. In such an example a fallback to SBA then requires a new negotiation.

[0107] Furthermore, a similar negotiation issue can exist when the apparatus or UE supports a number of coded formats that it can offer. For example where there are various audio inputs (encoder input formats) and audio capture modes that the UE 25 supports.

[0108] The concept as discussed in the following embodiments relates to encoder coded format selection and includes an associated codec negotiation (e.g., using SDP offer answer) in context of immersive voice and audio services and related codecs (e.g., 3GPP IVAS codec). In the following examples apparatus (for example means configured to perform the following) and methods are provided to control, adapt or otherwise determine a codec capability list or profile (e.g., in terms of maximum supported bitrate and/or maximum supported audio bandwidth for a bitrate) of at least one encoded format based on offering at least one other encoded format supporting a different or lower capability (e.g., in terms of maximum bitrate and/or lower maximum supported audio bandwidth for a bitrate) to achieve an offering of a simplified or unified capability list or profile (e.g., consisting of a single maximum bitrate and/or audio bandwidth for a bitrate) by combining the capabilities of different coded formats on basis of selecting the lower of the at least two coded format capabilities.

[0109] In other words there is an adaptation, control or otherwise determination of the codec capability level or the computational capability for coding. Basically, the level can determine what can be implemented by the codec. For example in level 1, the codec can support features x, y and z, whereas in level 2, the codec can support features x, y and z plus a, b and c. Then level 3 further extends this and so on.

[0110] Although this can be considered to represent capability level (level 1, 2 etc) or capability for bitrate and/or bandwidth, the examples described herein can cover both in the form of a capability level based on bitrate limits. For example where an SBA has three capability levels based on bitrate (1: 13.2-80kbps, 2: 96-192kbps, 3: 256-512kbps) MASA has one (1: 13.2-512kbps).

[0111] For example the apparatus or methods provided in the following embodiments disclose codec capability in terms of bitrate and/or audio bandwidth support which can be adaptively selected for a UE with a given capability level (e.g., IVAS Level 1 or core set) based on set of coded formats being offered (or being supported by the device).

[0112] For example, in some embodiments the offer to the UE can include coded formats SBA and MASA.

[0113] Also, for example, the offer to the UE can include bitrates up to 256 kbps.

[0114] In these embodiments there can, for example, be a UE which supports at capability level SBA up to 80 kbps.

[0115] For example, since the SBA is part of the offer and has a constraint, MASA capability is adaptively updated to include bitrates up to 80 kbps (although it would 30 be supported up to 512 kbps and would thus support offered bitrates up to 256 kbps).

[0116] For example, UE answers SBA and/or MASA with bitrates up to 80 kbps.

[0117] In order to additionally incorporate other constraints, for example, battery or current or power constraints, the adaptation mechanism (or a suitable control mechanism or determination) as discussed herein can be extended by employing a virtual codec capability configuration, which is considered together with the regular codec capability selection.

[0118] For example, the offer to UE can include codec formats SBA and MASA. For example, the offer to UE can include bitrates up to 256 kbps.

[0119] For example, UE supports at capability level SBA and MASA all the way up to 512 kbps. In other words, UE is a Level 3 device (as SBA is not limited).

[0120] For example, UE battery level can be determined to be low (below a determined threshold voltage or similar) and UE is configured to conserve power. In such embodiments a virtual codec capability configuration is implemented in operating mode selection and codec negotiation.

[0121] For example, the virtual configuration on a UE can correspond with an SBA capability level up to 80 kbps. The SBA capability is thus updated accordingly, although as the UE is a Level 3 device and may support SBA all the way up to 512 kbps and would thus support offered bitrates up to 256 kbps.

[0122] In some embodiments, since an SBA (with virtual codec capability configuration adaptation or control) is part of the offer consideration and can comprise a constraint where MASA capability is adaptively updated (or suitable determined or controlled) to include bitrates up to 80 kbps (although the MASA configuration would be supported up to 512 kbps and would thus support offered bitrates up to 256 kbps).

[0123] For example, UE is configured to answer with SBA and/or MASA coded formats with bitrates up to 80 kbps. In case of SBA operation, in particular, the UE conserves power and battery life based on this adaptation.

[0124] As such the embodiments as discussed herein relate to immersive voice and audio communications systems and services. The methods and apparatus as discussed herein can be implemented on an IVAS device or, e.g., a suitable network or service element (e.g., media gateway, teleconferencing bridge, etc.).

[0125] The apparatus and the methods discussed herein relate to codec operating mode selection and codec negotiation (for example employing SDP offer-answer negotiations) aspects on a complexity-constrained device. The adaptation, control or determining methods and mechanism described here thus aim to address the issues associated with the capabilities of a complexity-constrained device, however the control, determination or adaptation in some scenarios, e.g., due to the two-way nature of a communications session can be implemented also on another device (e.g., server) than the complexity-constrained device (3GPP IVAS UE).

[0126] The examples described here follow a nested operating point codec or follow the onion principle of codec capabilities and operating points.

[0127] For example, a Level 2 capability codec includes Level 1 capabilities (and any additional capabilities/functionalities). For example, a level 2 capability with support for a bitrate of 64 kbps includes support for the (level 1) lower bitrates 13.2-48 kbps.

[0128] The discussion and examples in the following embodiments consider IVAS Level 1, 2, and 3 capability or functionality tiers according to the principles of the IVAS Design Constraints (i.e., 3 x EVS, 6 x EVS, 10 x EVS). However, these examples are simply an example for demonstrating the wider concept. There can therefore also be implemented in some embodiments, e.g., a IVAS core set of functionalities/capabilities, where, for example, individual capabilities above the core set (but for example not full capability set corresponding to Level 3) are indicated. Indication can refer to a negotiated parameter. This can mean that there can be employed a system comprising IVAS Level 1, IVAS Level 3, but no (specific) IVAS Level 2, and considerable flexibility between Level 1 (core set) and Level 3 (full IVAS) functionality.

[0129] Thus in general, the capability levels can be understood as functionality tiers with increasing complexity/memory requirements. Each additional level is based on a lowest level or core set (core level) and can be configured to add at least one functionality which generally entails an increase in complexity/memory requirements of the implementation and its runtime operation.

[0130] As discussed these capability levels are typically nested (or considered according to the onion principle), for example as presented in Fig 2 where the codec comprises a level 1 (core set) functionality 201, a level 2 functionality 203 which incorporates and builds on the core set and level 3 functionality 205 which incorporates and builds on the level 2 functionality.

[0131] In some situations the additional functionality can be considered or implemented individually or in some combination, and thus entail additional complexity/memory requirements in different ways, for example as shown in Figs 3a and 3b.

[0132] Fig 3a, shows a three layer codec where the functionality is not nested but where the level 1 (core set) functionality 301 can be augmented by the level 2 functionality 303 which further then can be augmented by the level 3 functionality 305.

[0133] Whereas Fig 3b shows three layers of codec functionalities where the functionality is not nested and furthermore where one of the levels is further subdivided into more than one functionality sub-level. Thus the level 1 (core set) functionality 301 can be augmented by the level 2 functionality 303 which further then can be augmented by the level 3 functionality 305. However the level 2 functionality 303 can comprise additional functionality aspects such as level 2a 303a functionality, level 2b 303b functionality which comprises at least some different functionality aspects which require differing additional complexity/memory requirements, level 2c 303c functionality which comprises at least some different functionality aspects which require differing additional complexity/memory requirements to level 2a and 2b, level 2d 303d functionality which effectively comprises functionality aspects which span between the level 1 and 3 functionality aspects and a combination of level 2a and 2c 303e functionality.

[0134] The above approach for subdividing a functionality sub-level can be applied to any functionality level. For example, level 3 could be subdivided to provide multiple level 3 sub-levels. Also the core level 1 can be subdivided to provide multiple different core sub-levels. The device can then choose a suitable core sub-level based on, for example, the input or coded format the device is using or providing as the encoded bitstream format.

[0135] Based on IVAS complexity profile estimates (where final complexity profile estimates will be available when fixed-point reference implementation is available), the following complexity-motivated Level definitions can be made: For SBA (Scene based audio) implementations and as shown in Fig 4 the three levels can be as follows: 401 -SBA level 1 with operating bit-rates of between 13.2 to 80 kbps; 403 -SBA level 2 with operating bit-rates of between 96 to 192 kbps; and 405 -SBA level 3 with operating bit-rates of between 256 to 512 kbps. Fig 5 shows the MASA implementation which can comprise 501 -MASA level 1 with operating bit-rates of between 13.2 to 512 kbps.

[0136] (where MASA level 2 and level 3 implementations include the core set of level 1).

[0137] In some embodiments both of the coded formats described above are offered and supported by the UE. The UE can adapt the MASA Level 1 support according to the constraints by the other format (SBA) in order to at least provide a simplified interface and set of parameter values for efficient codec negotiation. In the following examples the term adapt or adaptation can be replaced with control or determine as a more generic term.

[0138] The adaptation can be as shown in Fig 6 where the SBA levels as shown in Fig 4 are adapted into the MASA levels as shown in Fig 5 to define: 601 -adapted MASA level 1 with operating bit-rates of between 13.2 to 80 kbps; 603 -adapted MASA level 2 with operating bit-rates of between 96 to 192 kbps; and 605 -adapted MASA level 3 with operating bit-rates of between 256 to 512 kbps. The above adaptation can be applied to the offered bit-rate and also to other relevant parameters, such as bandwidth or any other suitable negotiated parameter. According to the adaptation principle, the higher capability is adapted down to correspond with the lower capability as shown in the example.

[0139] The adaptation of a first coding format (MASA level 1 support) with another format (SBA) such as described above can in some embodiments be implemented 25 by a suitable means according the flow diagram as shown in Fig 7.

[0140] For example as shown by Fig 7 in 701 there is the operation of obtaining an offer or list with at least a first and a second coded format associated with at least one codec capability level or mode set or profile.

[0141] Then, as shown by 703 in Fig 7 is determining at least one parameter value associated with the codec capability level or mode set or profile for the at least first (for example MASA) and second (SBA) coded formats, where the at least one parameter can comprise: bitrate or audio bandwidth. In some embodiments the determining may be or comprise obtaining or comparing the two coding formats.

[0142] Furthermore, as shown by 705 in Fig 7 is the operation of adapting at least one parameter value associated with the codec capability level or mode set or profile for the first coded format to correspond to parameter value of the second coded format at least for a codec negotiation or duration of a communication session, where at least one of the coded formats from the obtained offer or list is selected. The selected aspect is a selection as part of the answer of being selected as the codec operating mode.

[0143] In some embodiments, with respect to an IVAS codec the method for adaptation can furthermore comprise the operations as shown with respect to Fig 8.

[0144] For example, as shown by Fig 8 in 801 there is the operation of obtaining an offer or list with at least a first and a second coded format associated with IVAS codec capability level or (core) feature set.

[0145] Then as shown by 803 in Fig 8 is the operation of determining at least one parameter value associated with the IVAS codec capability level or (core) feature set for the at least first and second coded formats, where the at least one parameter can comprise: immersive mode bitrate (ibr) or immersive mode audio bandwidth (ibw).

[0146] Furthermore as shown by 805 in Fig 8 is that of adapting at least one parameter value associated with the IVAS codec capability level or (core) feature set for the first coded format to correspond to parameter value of the second coded format at least for a codec negotiation or duration of an IVAS session, where at least one of the coded formats from the obtained offer or list is selected and where the at least one parameter value adaptation can comprise: setting the maximum supported bitrate for the first coded format to a maximum supported bitrate of the second coded format; and/or setting the maximum supported bandwidth for the first codec format to a maximum supported bitrate of the second coded format.

[0147] Following this, as shown by 807 in Fig 8 is creating an answer with at least one coded format or select a coded format according to the adapted IVAS codec capability level or (core) feature set for the at least first and second coded formats, where at least one parameter value associated with the IVAS codec capability level or (core) feature set is adapted.

[0148] For example, an IVAS UE has the capacity to employ a core set (or Level 1) IVAS codec. This means that the UE is able to support a subset of IVAS functionalities at least in the encoding direction. In other words, the bitstream produced by the example IVAS UE is a core set (or Level 1) bitstream.

[0149] The IVAS UE can be configured to setting up an immersive communications call (IVAS session). The UE can be configured to receive an offer (for example from a server or other UE) including, for example, at least the coded formats (cf) SBA and MASA. The offer can furthermore comprise information such as immersive mode bitrates (ibr) up to 256 kbps.

[0150] The IVAS UE supports SBA and MASA according to example definitions shown above. For example, SBA Level 1 is up to 80 kbps. On the other hand, a MASA Level 1 can be up to 512 kbps.

[0151] Thus in some embodiments the following adaptation takes place.

[0152] The UE coded format selector (as part of the codec negotiation) obtains the offered formats of the example offer. This UE can support both coded formats. It considers the constraint for SBA and compares this against supported operating modes for MASA. In order to harmonize the codec capability (functionality) for the answer (to the offer parameters or constraints), the UE adaptively updates the MASA capability to include bitrates up to 80 kbps conforming to the SBA Level 1 constraint.

[0153] Following the adaptation, the IVAS UE can provide an answer including, e.g., both SBA and MASA as part of list of coded formats (cf) with offered immersive mode bitrate (ibr) up to 80 kbps.

[0154] In such a manner the UE is able to provide a simple, harmonized offer answer to make the negotiation more efficient.

[0155] It is noted that in some embodiments where the second format (e.g., SBA) is not part of the offer (or any similar consideration), there is no need to constrain the first coded format (for example the MASA operation configuration). Thus, in such embodiments if MASA alone were considered, the IVAS UE of the above example could answer with bitrates (ibr) up to 256 kbps (based on the offer).

[0156] In such a manner the adaptations aim to solve the problems associated with codec mode selection and codec negotiation under IVAS capability level constraints. This enables efficient and flexible implementation of functionality tiers based on device capabilities and current session requirements.

[0157] Furthermore, the adaptation (or control) methods such as shown in the example above can be further adapted based on other (temporary) constraints that relate to computational complexity and therefore the capabilities of the codec. In other words a coded format can be a coded format adapted or controlled by a virtual computational capability based on an apparatus constraint. For example, the apparatus constraint can be a determination of high current consumption or a determination of a low battery level (for example a battery voltage below a determined threshold value) can be employed to further adapt or constrain the codec operation.

[0158] The (further and/or temporary) adaptation can, for example, be implemented according to the following flow diagram operations as shown in Fig 9.

[0159] For example, as shown by Fig 9 in 901 there is the operation of obtaining an offer or list with at least a first coded format associated with a first codec capability level or mode set or profile.

[0160] Then as shown by 903 in Fig 9 is the operation of obtaining a virtual offer or list with at least a second coded format associated with a second lower codec capability level or mode set or profile. Although this example is a 'lower' codec capability level or mode set or profile it can be generalized to a 'different' capability level or mode set or profile, but not necessarily a lower one.

[0161] Furthermore as shown by 905 in Fig 9 is that of determining at least one parameter value associated with the codec capability level or mode set or profile for the at least first and second coded formats, where the at least one parameter can comprise: bitrate or audio bandwidth.

[0162] This then furthermore can result as shown by 907 in Fig 9 in adapting at least one parameter value associated with the codec capability level or mode set or profile for the first coded format to correspond to parameter value of the second coded format at least for a codec negotiation or duration of a communication session, where at least one of the coded formats from the obtained offer or list is selected.

[0163] The adaptation mechanism described above is extended by use of a 'virtual' codec capability configuration, which is considered together with the regular codec capability selection when provided.

[0164] For example, an IVAS UE of Level 3 which is low on battery, can limit codec capability according to core set (Level 1) in order to conserve power.

[0165] This Level 3 device supports all IVAS functionalities (in terms of complexity; it may not have, e.g., audio inputs corresponding with all encoder input formats or coded format). In other words, the bitstream produced by the example IVAS UE is a Level 3 bitstream.

[0166] For sake of clarity, it can be noted that the Level 3 functionalities include Level 2 and Level 1 functionalities according to the nested codec or onion principle. Thus for example the IVAS UE can be configured to set up an immersive call (IVAS session). The UE receives an offer including, e.g., at least the coded formats (cf) SBA and MASA. The offer includes also, e.g., Immersive mode bitrates (ibr) up to 256 kbps.

[0167] The IVAS UE supports SBA and MASA according to example definitions shown above. For example, SBA Level 3 is up to 512 kbps. On the other hand, already MASA Level 1 is up to 512 kbps and is thus similarly fully supported.

[0168] Now, according to the embodiments as described above, the following temporary adaptation takes place.

[0169] The IVAS UE coded format selector (as part of the codec negotiation) obtains the offered formats as in the above example offer. As said above, this UE can support both coded formats with full capabilities. Thus, it could normally, e.g., answer with immersive mode bitrates (ibr) up to 256 kbps as offered.

[0170] As indicated by this example, the IVAS UE is low on battery. Based, for example, on a power saving mode, it obtains or configures a virtual codec capability configuration. This corresponds, e.g., to Level 1 core set. The UE then considers the virtual codec capability configuration as part of the codec mode selection and codec negotiation according to the procedures above.

[0171] The virtual codec capability configuration includes, e.g., SBA Level 1, which is limited to 80 kbps and below. In order to harmonize the codec capability (functionality) for the answer, the IVAS UE coded format selector (as part of the codec negotiation) provides a temporary adaptive update for the MASA capability to include bitrates up to 80 kbps conforming to the SBA Level 1 virtual codec capability configuration constraint.

[0172] Following the adaptation, the IVAS UE can provide an answer including, e.g., both SBA and MASA as part of list of coded formats (cf) with offered immersive mode bitrate (ibr) up to 80 kbps. This provides a simple, harmonized offer answer to make the negotiation more efficient and can significantly allow power consumption reduction and battery saving to the associated IVAS UE.

[0173] Fig 10 shows an example table which presents example complexity numbers in weighted millions of operations per second (WMOPS) for different IVAS coded 10 formats in different bitrates. The shading follows a scheme where: Light shading shows complexities below Level 1 (complexity < Level 1) Medium shading shows complexities above Level 1 and below Level 2 (Level 1 > complexity < Level 2) Dark shading shows complexities above Level 2 (complexity > Level 2) In Fig 10 the table shows a base level complexity (EVS_comp) is set to 128.86 WMOPS, which is comparable to the complexity of a standard EVS implementation. Level 1 is set to 3 * EVS_comp, Level 2 is set to 6 * EVS_comp and Level 3 is set to 10 * EVS_comp. Each row in the table represents an IVAS coded format, each column represents a bitrate. The numbers indicate the total complexity (encoding + decoding and rendering to binaural) of the format and bitrate combination.

[0174] Additional levels are defined based on bitrate limits. Level 1 based on bitrate is set to 13.2 -80 kbps, Level 2 is set to 96 -192 kbps and Level 3 is set to 256 512 kbps.

[0175] As can be seen from the table, some coded formats have Level 1 complexities (in terms of actual complexity in WMOPS) across all bitrates. For example, MASA 2TC has Level 1 complexity in all bitrates, which is indicated with the light shading coloring. However, this is not the case for all coded formats. For example, OSBA_ISM4_HOA3 combined format has some bitrates (96 -192 kbps) in Level 2 (medium shading) and some even in Level 3 (256 -512 kbps, dark shading).

[0176] If both MASA 2TC and OSBA ISM4 HOA3 are offered in the session, the complexity level can be adjusted based on the more complex format (OSBA_ISM4_HOA3), according to the embodiments as described above. In this case, the Level 1 for MASA 2TC would be adjusted to match the Level 1 for OSBA_ISM4_HOA3.

[0177] With respect to Fig 11 an example electronic device suitable for implementing the apparatus, functions and therefore the embodiments described above. The device may be any suitable electronics device or apparatus. For example in some embodiments the device 1900 is a mobile device, user equipment, tablet computer, computer, audio playback apparatus, etc. In some embodiments the device 1900 comprises at least one processor or central processing unit 1907. The processor 1907 can be configured to execute various program codes such as the methods such as described herein.

[0178] In some embodiments the device 1900 comprises a memory 1911. In some embodiments the at least one processor 1907 is coupled to the memory 1911. The memory 1911 can be any suitable storage means. In some embodiments the memory 1911 comprises a program code section for storing program codes implementable upon the processor 1907. Furthermore, in some embodiments the memory 1911 can further comprise a stored data section for storing data, for example data that has been processed or to be processed in accordance with the embodiments as described herein. The implemented program code stored within the program code section and the data stored within the stored data section can be retrieved by the processor 1907 whenever needed via the memory-processor coupling.

[0179] In some embodiments the device 1900 comprises a user interface 1905. The user interface 1905 can be coupled in some embodiments to the processor 1907.

[0180] In some embodiments the processor 1907 can control the operation of the user interface 1905 and receive inputs from the user interface 1905. In some embodiments the user interface 1905 can enable a user to input commands to the device 1900, for example via a keypad. In some embodiments the user interface 1905 can enable the user to obtain information from the device 1900. For example, the user interface 1905 may comprise a display configured to display information from the device 1900 to the user. The user interface 1905 can in some embodiments comprise a touch screen or touch interface capable of both enabling information to be entered to the device 1900 and further displaying information to the user of the device 1900.

[0181] In some embodiments the device 1900 comprises an input/output port 1909. The input/output port 1909 in some embodiments comprises a transceiver. The transceiver in such embodiments can be coupled to the processor 1907 and configured to enable a communication with other apparatus or electronic devices, for example via a wireless communications network. The transceiver or any suitable transceiver or transmitter and/or receiver means can in some embodiments be configured to communicate with other electronic devices or apparatus via a wire or wired coupling.

[0182] The transceiver can communicate with further apparatus by any suitable known communications protocol. For example, in some embodiments the transceiver can use a suitable universal mobile telecommunications system (UMTS) protocol, a wireless local area network (WLAN) protocol such as for example IEEE 802.X, a suitable short-range radio frequency communication protocol such as Bluetooth, or infrared data communication pathway (IRDA).

[0183] The transceiver input/output port 1909 may be configured to receive the signals and in some embodiments obtain the focus parameters as described herein.

[0184] In some embodiments the device 1900 may be employed to generate a suitable audio signal using the processor 1907 executing suitable code. The input/output port 1909 may be coupled to any suitable audio output for example to a multichannel speaker system and/or headphones (which may be a headtracked or a non-tracked headphones) or similar.

[0185] In general, the various embodiments of the invention may be implemented in hardware or special purpose circuits, software, logic or any combination thereof. For example, some aspects may be implemented in hardware, while other aspects may be implemented in firmware or software which may be executed by a controller, microprocessor or other computing device, although the invention is not limited thereto. While various aspects of the invention may be illustrated and described as block diagrams, flow charts, or using some other pictorial representation, it is well understood that these blocks, apparatus, systems, techniques or methods described herein may be implemented in, as non-limiting examples, hardware, software, firmware, special purpose circuits or logic, general purpose hardware or controller or other computing devices, or some combination thereof.

[0186] The embodiments of this invention may be implemented by computer 5 software executable by a data processor of the mobile device, such as in the processor entity, or by hardware, or by a combination of software and hardware. Further in this regard it should be noted that any blocks of the logic flow as in the Figures may represent program steps, or interconnected logic circuits, blocks and functions, or a combination of program steps and logic circuits, blocks and 10 functions. The software may be stored on such physical media as memory chips, or memory blocks implemented within the processor, magnetic media such as hard disk or floppy disks, and optical media such as for example DVD and the data variants thereof, CD.

[0187] The memory may be of any type suitable to the local technical environment and may be implemented using any suitable data storage technology, such as semiconductor-based memory devices, magnetic memory devices and systems, optical memory devices and systems, fixed memory and removable memory. The data processors may be of any type suitable to the local technical environment, and may include one or more of general purpose computers, special purpose computers, microprocessors, digital signal processors (DSPs), application specific integrated circuits (ASIC), gate level circuits and processors based on multi-core processor architecture, as non-limiting examples.

[0188] Embodiments of the inventions may be practiced in various components such as integrated circuit modules. The design of integrated circuits is by and large a highly automated process. Complex and powerful software tools are available for converting a logic level design into a semiconductor circuit design ready to be etched and formed on a semiconductor substrate.

[0189] Programs, such as those provided by Synopsys, Inc. of Mountain View, California and Cadence Design, of San Jose, California automatically route conductors and locate components on a semiconductor chip using well established rules of design as well as libraries of pre-stored design modules. Once the design for a semiconductor circuit has been completed, the resultant design, in a standardized electronic format (e.g., Opus, GDSII, or the like) may be transmitted to a semiconductor fabrication facility or "fab" for fabrication.

[0190] The foregoing description has provided by way of exemplary and non-limiting examples a full and informative description of the exemplary embodiment of this invention. However, various modifications and adaptations may become apparent to those skilled in the relevant arts in view of the foregoing description, when read in conjunction with the accompanying drawings and the appended claims. However, all such and similar modifications of the teachings of this invention will still fall within the scope of this invention as defined in the appended claims.

[0191] FOA First order Ambisonics HOA Higher order Ambisonics ISM Independent stream with metadata IVAS Immersive Voice and Audio Services kbps kilobits per second MASA Metadata-Assisted Spatial Audio MC Multichannel OMASA Object-based audio with MASA (combined input format) OSBA Object-based audio with SBA (combined input format) SBA Scene-Based Audio

[0192] SDP Session Description Protocol

[0193] UE User equipment

Claims

1. CLAIMS: 1. An apparatus for conversational immersive audio coding, the apparatus 5 comprising means configured to: support at least two different coded formats; obtain one or more first codec capability level for a first coded format of the supported at least two different coded formats; obtain one or more second codec capability level for a second coded format 10 of the supported at least two different coded formats, the one or more second codec capability level being different to the one or more first codec capability level; control the one or more first codec capability level based on the one or more second codec capability level; and determine coding parameters for supporting coding the conversational immersive audio based on the controlled one or more first codec capability level.

2. The apparatus as claimed in claim 1, wherein the one or more first codec capability level is at least one of: one or more codec capability level; one or more codec mode set; one or more codec profile; one or more capability for bitrate; and one or more capability for bandwidth.

3. The apparatus as claimed in any of claims 1 or 2, wherein the one or more second codec capability level is at least one of: one or more codec capability level; one or more codec mode set; one or more codec profile; one or more capability for bitrate; and one or more capability for bandwidth.

4. The apparatus as claimed in any of claims 1 to 3, wherein the controlled one or more first codec capability level is at least one of: one or more codec capability level; one or more codec mode set; one or more codec profile; one or more capability for bitrate; and one or more capability for bandwidth.

5. The apparatus as claimed in any of claims 1 to 4, wherein the one or more first codec capability level is a first codec capability list or profile defining at least one of: a first supported bitrate; and a first supported audio bandwidth.

6. The apparatus as claimed in any of claims 1 to 5, wherein the one or more second codec capability level is a second codec capability list or profile defining at least one of: a second supported bitrate; and a second supported audio bandwidth.

7. The apparatus as claimed in any of claims 1 to 6, wherein the one or more second codec capability level has a lower capability than the one or more first codec capability level, wherein the means configured to control the first one or more codec capability level based on the one or more second codec capability level is configured to adapt to the one or more second codec capability level.

8. The apparatus as claimed in any of claims 1 to 7, wherein the one or more first codec capability level and the one or more second codec capability level is one or more IVAS codec capability level.

9. The apparatus as claimed in claim 7, wherein the means configured to control the one or more first codec capability level based on the one or more second codec capability level is configured to perform at least one of: set a maximum supported bitrate for the first coded format to a maximum supported bitrate of the second coded format; and set a maximum supported bandwidth for the first coded format to a maximum supported bandwidth of the second coded format.

10. The apparatus as claimed in claim 9, wherein the means configured to determine the coding parameters for supporting coding the conversational immersive audio based on the controlled one or more first codec capability level is configured to negotiate with at least one further apparatus the controlled one or more first codec capability level, for supporting coding the conversational immersive audio.

11. The apparatus as claimed in claim 10, wherein the means configured to negotiate with at least one further apparatus the controlled one or more first codec capability level, for supporting coding the conversational immersive audio is configured to perform a SDP negotiation.

12. The apparatus as claimed in any of claims 1 to 11, wherein the second coded format is a coded format adapted by a virtual codec capability level based on an apparatus constraint.

13. The apparatus as claimed in claim 12, wherein the apparatus constraint comprises at least one of: an apparatus battery level; and an apparatus current consumption level.

14. The apparatus as claimed in any of claims 1 to 13, wherein the controlled one or more first codec capability level is one or more adapted codec capability level.

15. A method for an apparatus for conversational immersive audio coding, the method comprising: supporting at least two different coded formats; obtaining one or more first codec capability level for a first coded format of the supported at least two different coded formats; obtaining one or more second codec capability level for a second coded 5 format of the supported at least two different coded formats, the one or more second codec capability level being different to the one or more first codec capability level; controlling the one or more first codec capability level based on the one or more second codec capability level; and determining coding parameters for supporting coding the conversational immersive audio based on the controlled one or more first codec capability level.

16. The method as claimed in claim 15, wherein the one or more first codec capability level is at least one of: one or more codec capability level; one or more codec mode set; one or more codec profile; one or more capability for bitrate; and one or more capability for bandwidth.

17. The method as claimed in any of claims 15 or 16, wherein the one or more second codec capability level is at least one of: one or more codec capability level; one or more codec mode set; one or more codec profile; one or more capability for bitrate; and one or more capability for bandwidth.

18. The method as claimed in any of claims 15 to 17, wherein the controlled one or more first codec capability level is at least one of: one or more codec capability level; one or more codec mode set; one or more codec profile; one or more capability for bitrate; and one or more capability for bandwidth.

19. The method as claimed in any of claims 15 to 18, wherein the one or more first codec capability level is a first codec capability list or profile defining at least one of: a first supported bitrate; and a first supported audio bandwidth.

20. The method as claimed in any of claims 15 to 19, wherein the one or more second codec capability level is a second codec capability list or profile defining at least one of: a second supported bitrate; and a second supported audio bandwidth.

21. The method as claimed in any of claims 15 to 20, wherein the one or more second codec capability level has a lower capability than the one or more first codec capability level, wherein controlling the one or more first codec capability level based on the one or more second codec capability level comprises adapting to the one or more second codec capability level.

22. The method as claimed in any of claims 15 to 21, wherein the one or more first codec capability level and the one or more second codec capability level is one or more IVAS codec capability level.

23. The method as claimed in claim 22, wherein controlling the one or more first codec capability level based on the one or more second codec capability level comprises at least one of: setting a maximum supported bitrate for the first coded format to a maximum 30 supported bitrate of the second coded format; and setting a maximum supported bandwidth for the first coded format to a maximum supported bandwidth of the second coded format.

24. The method as claimed in claim 23, wherein determining the coding parameters for supporting coding the conversational immersive audio based on the controlled one or more first codec capability level comprises negotiating with at least one further apparatus the controlled one or more first codec capability level, for supporting coding the conversational immersive audio.

25. The method as claimed in claim 24, wherein negotiating with at least one further apparatus the controlled one or more first codec capability level, for supporting coding the conversational immersive audio comprises performing a SDP negotiation.

26. The method as claimed in any of claims 15 to 25, wherein the second coded format is a coded format adapted by a virtual codec capability level based on an apparatus constraint.

27. The method as claimed in claim 26, wherein the apparatus constraint comprises at least one of: an apparatus battery level; and an apparatus current consumption level.

28. The method as claimed in any of claims 15 to 27, wherein the controlled one or more first codec capability level is one or more adapted codec capability level.