WO2025031583A1

WO2025031583A1 - Method and apparatus for customization in radio access networks

Info

Publication number: WO2025031583A1
Application number: PCT/EP2023/072066
Authority: WO
Inventors: Bryan LIU; Alvaro VALCARCE RIAL
Original assignee: Nokia Solutions and Networks Oy
Current assignee: Nokia Solutions and Networks Oy
Priority date: 2023-08-09
Filing date: 2023-08-09
Publication date: 2025-02-13
Anticipated expiration: 2026-02-09

Abstract

Machine learning techniques are used at layer L2 in the radio interface so that the base station and the wireless devices in the radio access network can learn a customized control plane protocol adapted to the quality-of-service requirement of that radio access network. Additional control plane information is learned and sent in the control plane signaling according to a known protocol.

Description

METHOD AND APPARATUS FOR CUSTOMIZATION IN RADIO ACCESS NETWORKS

[0001] TECHNICAL FIELD

[0002] Various example embodiments relate generally to apparatus and methods for customization in a radio access network.

[0003] BACKGROUND

[0004] As cellular technology evolves, network functions grow in ability, sophistication and performance. This progress is often accompanied by novel procedures and messages that increase the signaling overhead in the network. At ISO layer 2 (L2) in the radio interface, signaling takes the form of large data structures with multiple fields, where each field conveys a piece of important information. Most devices today can easily handle such large data structures. However, current trends in mobile computing, especially in the market of wearable devices, suggest that lighter and less capable devices will abound in the next generations of cellular networks (e.g. 6G). Examples of such light devices include smart glasses, brain implants, smart & connected pacemakers and or battery-less chips. The ability of such future devices to handle large L2 signaling messages is limited and they would benefit from networks with reduced signaling overhead.

[0005] Furthermore, 5G opened the door to tailor-made wireless services for enterprise vertical markets with specific quality of service requirements (throughput, latency, reliability...). Networks will need to accommodate with very different use cases with very different quality of service requirements. For example, services such as telepresence and mixed reality will require very high data rates, while industrial services such as fast motion control in industrial machines will require very low latency, and medical services such as control of medical devices implanted in the human body will require very high reliability.

[0006] Therefore, there is a need for customization of radio access networks to adapt to various environment and applications.

[0007] SUMMARY

[0008] According to some aspects, there is provided the subject matter of the independent claims. Some further aspects are defined in the dependent claims.

[0009] According to a first aspect, a first apparatus is disclosed for use in a radio access network. The first apparatus comprises: means for receiving a first learning model from a training entity, the first learning model being previously trained at the training entity together with a second learning model to be used by a second apparatus in the radio access network, based on training data obtained from a plurality of apparatus in the radio access network, comprising said first and said second apparatus, to satisfy a preset quality of service requirement, means for obtaining first apparatus observations, means for providing the first apparatus observations as input to the first learning model to generate as output additional learned control plane information, means for generating control plane signaling according to a known protocol, said control plane signaling comprising the additional learned control plane information, means for sending the control plane signaling to the second apparatus.

[0010] According to a second aspect, a second apparatus for use in a radio access network is disclosed. The second apparatus comprises: means for receiving a second learning model from a training entity, the second learning model being previously trained at the training entity together with a first learning model to be used by a first apparatus in the radio access network, based on training data obtained from a plurality of apparatus in the radio access network, including said first and said second apparatus, to satisfy a preset quality of service requirement, means for obtaining second apparatus observations, including control plane signaling according to a known protocol received from the first apparatus, said control plane signaling comprising additional learned control plane information, means for providing the second apparatus observations as input to the second learning model to generate a control plane decision, means for sending signals to the first apparatus based on the control plane decision.

[0011] According to a third aspect, a first method is disclosed for use in a first apparatus in a radio access network. The first method comprises: receiving a first learning model from a training entity, the first learning model being previously trained at the training entity together with a second learning model to be used by a second apparatus in the radio access network, based on training data obtained from a plurality of apparatus in the radio access network, including said first and said second apparatus, to satisfy a preset quality of service requirement, obtaining first apparatus observations, providing the first apparatus observations as input to the first learning model to generate as output additional learned control plane information, generating control plane signaling according to a known protocol, said control plane signaling comprising the additional learned control plane information, sending the control plane signaling to the second apparatus.

[0012] According to another fourth aspect, a second method is disclosed for use in a second apparatus in a radio access network. The second method comprises: receiving a second learning model from a training entity, the second learning model being previously trained at the training entity together with a first learning model to be used by a first apparatus in the radio access network, based on training data obtained from a plurality of apparatus in the radio access network, including said first and said second apparatus, to satisfy a preset quality of service requirement, obtaining second apparatus observations, including control plane signaling according to a known protocol received from the first apparatus, said control plane signaling comprising additional learned control plane information, providing the second apparatus observations as input to the second learning model to generate a control plane decision, sending signals to the first apparatus based on the control plane decision.

[0013] In an embodiment the additional learned control plane information is sent by a base station in the radio access network to a wireless device in additional bits of a header of a downlink control packet according to the known protocol, and the control plane decision made by the wireless device relates to sending user plane data and/or control plane signaling to the base station.

[0014] In another embodiment, the additional learned control plane information is sent by a wireless device in the radio access network to a base station in additional bits of a header of an uplink control packet according to the known protocol. And the control plane decision made by the base station relates to scheduling and/or link adaptation functions implemented by the base station.

[0015] In an embodiment, obtaining first apparatus observations includes excluding observations related to the second apparatus.

[0016] In an embodiment, the observations of a given apparatus comprise control plane signaling received by the given apparatus, contextual information relating to the given apparatus, and history of control plane decisions made by the given apparatus.

[0017] In a fifth aspect, a training entity entity is disclosed comprising : means for training together a first learning model and a second learning model based on training data obtained from a plurality of apparatus of a radio access network, comprising one or more first apparatus and one or more second apparatus, wherein: the first learning model is trained to generate additional learned control plane information, the second learning model is trained to generate a control plane decision based on control plane signaling according to a known protocol, said control plane signaling comprising the additional learned control plane information, while satisfying a preset quality of service requirement, means for sending the trained first learning model to the one or more first apparatus in the radio access network and the trained second learning model to the one or more second apparatus.

[0018] In a sixth aspect, a training method is disclosed comprising: training together a first learning model and a second learning model based on training data obtained from a plurality of apparatus of a radio access network, comprising one or more first apparatus and one or more second apparatus, wherein: the first learning model is trained to generate additional learned control plane information, the second learning model is trained to generate a control plane decision based on control plane signaling according to a known protocol, said control plane signaling comprising the additional learned control plane information, while satisfying a preset quality of service requirement, sending the trained first learning model to the one or more first apparatus in the radio access network and the trained second learning model to the one or more second apparatus. [0019] In an embodiment, training the first and second learning model includes reinforcement training where a reward is calculated based on the preset quality of service requirement.

[0020] In an embodiment, the control plane signaling respond to a preset average number of bits.

[0021] In an embodiment, the control plane signaling respond to a preset average packet delay budget.

[0022] In an embodiment, the first and the second apparatus comprise means for performing one or more or all steps of the methods as disclosed herein.

[0023] The means may include circuitry configured to perform one or more or all steps of the first and the second apparatus as disclosed herein. The circuitry may be dedicated circuitry. The means may also include at least one processor and at least one memory including computer program code, wherein the at least one memory and the computer program code are configured to, with the at least one processor, cause the application manager and respectively the resources agent to perform one or more or all steps of the application management method and respectively the resources agency method as disclosed herein.

[0024] The methods may be carried out by a wireless device and a base station in a radio access network. For example, the wireless device is a User Equipment (UE) or an Internet-of-Things device (loT) in a radio access network (RAN). The first apparatus may be implemented in a base station and the second apparatus may be implemented in a wireless device. The first apparatus may be implemented in a wireless device and the second apparatus may be implemented in a base station. A base station may comprise both a first and a second apparatus. A wireless device may comprise both a first and a second apparatus. [0025] According to another aspect a computer program product is disclosed comprising a set of instructions which, when executed on an apparatus cause the apparatus to carry out the application management method and/or the agent application method as disclosed herein.

[0026] According to an embodiment the disclosed computer program product is embodied as a computer readable medium or directly loadable into a computer.

[0027] BRIEF DESCRIPTION OF THE DRAWINGS

[0028] Example embodiments will become more fully understood from the detailed description given herein below and the accompanying drawings, which are given by way of illustration only and thus are not limiting of this disclosure.

[0029] FIG.1 is a block diagram explaining the concept of the disclosure.

[0030] FIG.2. is a diagram explaining the benefit of the disclosure.

[0031] FIG.3 is flowchart of an exemplary embodiment of a training method as disclosed herein.

[0032] FIG.4 is a schematic representation of a first exemplary embodiment of the disclosure.

[0033] FIG.5 is a schematic representation of a second exemplary embodiment of the disclosure.

[0034] FIG.6 is a schematic representation of an apparatus suitable for implementing various aspects of the disclosure.

[0035] DETAILED DESCRIPTION

[0036] Various example embodiments will now be described more fully with reference to the accompanying drawings in which some example embodiments are shown.

[0037] Detailed example embodiments are disclosed herein. However, specific structural and functional details disclosed herein are merely representative for purposes of describing example embodiments. The example embodiments may, however, be embodied in many alternate forms and should not be construed as limited to only the embodiments set forth herein. Accordingly, while example embodiments are capable of various modifications and alternative forms, the embodiments are shown by way of example in the drawings and will be described herein in detail. It should be understood, however, that there is no intent to limit example embodiments to the particular forms disclosed.

[0038] The exemplary embodiments disclosed herein apply to the signaling between a base station and a wireless device in a radio access network. The disclosed method can be used for uplink signaling or downlink signaling or both. [0039] In communication systems, a distinction is typically made between User Plane (UP) carrying the user traffic and Control Plane (CP) data carrying the network signaling traffic. The less CP bits that a system devotes to running its operations, the more capacity that remains for transmitting UP bits. At the same time, the CP is essential to keeping the network running. An overly lean CP may be incapable of implementing sophisticated data handling strategies and it might end up as a victim of its own simplicity. There is an optimal ratio between the number of CP bits and the number of UP bits in a communication system. This optimal ratio is not the same in all network deployments (urban vs rural, residential vs industrial, subnetworks vs public networks, etc.). Known signaling schemes between base stations and wireless devices in radio access networks have been designed generically to accommodate a large number of use-cases, but they are not tailored to any particular network deployment.

[0040] The exemplary embodiments enable customization of the control plane signaling in a cost-effective manner to address the needs of specific vertical markets.

[0041] As will be further described below, this is achieved by using machine learning techniques, at layer L2 in the radio interface. The downlink (DL) and/or the uplink (UL) control plane signaling that a base station and wireless devices need to coordinate is learned in an end-to-end manner in such a way that any wireless device in the radio access network can use it. Typically, the UL control plane signaling comprises a scheduling requests message (SR) and the DL control plane signaling comprises a scheduling grant message (SG) or an acknowledgment message (ACK).

[0042] As depicted in FIG.1 , a central training entity 10 is used to train together a first learning model to be used in a first apparatus in a radio access network 1 1 (e.g. in a base station, respectively in a wireless device) and a second learning model to be used in a second apparatus in the radio access network (e.g. in a wireless device, respectively in a base station). Training makes use of training data 12 obtained from a plurality of apparatus 13 in the radio access network (e.g. from the base station and from all wireless devices in the radio access network) and aims at satisfying a preset quality of service requirement.

[0043] Once the first and second learning models are trained, they are sent to the first and second apparatus respectively. The transmission of the first or the second trained learning model from the training entity 10 to the plurality of apparatus 13 is illustrated by arrows 14 in FIG.1. The first learning model is used by the first apparatus to generate additional learned control plane information based on observations of the first apparatus. Control plane signaling according to a known protocol, comprising the additional learned control plane information, is sent by the first apparatus to the second apparatus. The second learning model is used by the second apparatus to generate a control plane decision based on observations of the second apparatus, which comprises the control plane signaling sent by the first apparatus, including the additional learned control plane information.

[0044] In a first embodiment, the learned control plane information is downlink control plane information. In this case the radio access network comprises a first apparatus in the base station, and a second apparatus in each wireless device. In a second embodiment, the learned control plane information is uplink control plane information. In this case, the radio access network comprises a first apparatus in each wireless device and a second apparatus in the base station. Both embodiments can be combined so that both downlink and uplink control plane information are learned.

[0045] As illustrated in FIG.2, the training entity can train several sets Za, Zb, and Zc of first and second learning models for use in different radio access network environments 1 1 a, 11 b and 1 1c with different quality of service requirements. As a result of the training, each set 11 a, 11 b and 1 1c will be specifically adapted to the corresponding network environment. This obviates the need to rely on a team of engineers to develop a customized signaling scheme for each type of environment and therefore saves time and resources.

[0046] FIG.3 is a schematic representation of an exemplary training method implemented by the training entity 10 to train a first and a second learning models for a given radio access network.

[0047] During the training phase, the training entity 10 stores in a memory buffer (also known as replay buffer) the history of features vectors associated with observations of the base station and all wireless devices in the radio access network. Observations comprise received control plane signaling, contextual information, and history of control plane decisions.

[0048] For example, observations of a wireless devices include scheduling request sent to the base station (SR), acknowledgment received from the base station (ACK), scheduling grant received from the base station (SG), control plane decision made by the wireless device, Packet (PDU) buffer status, and additional learned control bits. And observations of the base station include: scheduling request history for all wireless devices, acknowledgment sent to all wireless devices, and scheduling grant sent to all wireless devices. [0049] These features vectors may include network-wide features such as features relating to control plane messages, contextual information, and/or past control plane decisions. The process described below is executed for each wireless device in the radio access network. [0050] At step S1 , a features vector of the first apparatus observations at time index i (Transmission Time Interval i, also denoted TTIi) is obtained.

[0051] At step S2, the features vector is encoded into embeddings through a neural network and the embeddings are quantized into learned control bits. The encoder is the first learning model to be trained.

[0052] At step S3, control bits are generated which include the learned control bits. [0053] At step S4, a features vector of the second apparatus observations at time index i is obtained. This features vector includes the control bits generated at step S3, including the learned control bits. The features vector is provided as input to a reinforcement learning agent (RL agent). The RL agent is the second learning model to be trained. For example, the RL agent is implemented with a Soft Actor Critic (SAC) algorithm with prioritized relay buffer.

[0054] At step S5 the RL agent generates a control plane decision as output.

[0055] At step S6 the time index is incremented (i=i+1 ).

[0056] At step S7 a reward is calculated.

[0057] At step S8 the memory buffer is updated with the observations of the first and second apparatus at time index i and the associated reward.

[0058] And at step S9, the encoder and the RL agents are updated to take into account the rewards computed for each wireless device in the radio access network. The rewards are computed to optimize a quality-of-service requirement with preset constraints, for example an average number of control bits and/or an average packet delay budget.

[0059] The training method described above uses an end-to-end training scheme, where the RL agent influences the encoder. The encoder and the RL agent are updated at the same time during training. The disclosed scheme is a point to multipoint scheme where the base station and all wireless devices in the radio access network learn the same control plane protocol. This is necessary for the base station to be able to coordinate with any wireless device in the radio access network. The aim of the disclosed method is not to learn a protocol specific to a wireless device but rather to learn a network protocol that any wireless device (existing and future) can use.

[0060] FIG.4 illustrates a first exemplary embodiment where the first apparatus is in the base station and the second apparatus is in each wireless device. In this case, the additional learned control plane information is sent by the base station to the wireless devices in additional bits of a header of downlink control packets (MAC PDU header). And the control plane decision made by the wireless devices relate to sending user plane data and/or control plane signaling to the base station.

[0061] The base station 40 depicted in FIG.4 comprises:

- a receiver 41 which receives data packets PDU and sends acknowledgement messages ACK to a generator of control signals 42;

- a scheduler 43 which receives scheduling requests SR from wireless devices and decides which wireless device to serve; the scheduler 43 sends scheduling grant messages SG to the generator 42;

- an encoder and quantizer 44 which includes a first leaning model trained as explained above with reference to FIG.3 to generates learned control bits Ld from the base station observations O_BTS; the encoder and quantizer 44 sends the additional learned control bits to the generator 42.

[0062] The control signals generator 42 generates downlink control plane signaling DL CP and send it to the wireless device 46 that the scheduler 43 has decided to serve. This downlink control plane signaling includes the learned control bits.

[0063] The wireless device 46 comprises a RL agent 47 including a second learning module trained together with the first learning module as explained above with reference to FIG.3. The RL agent generates control plane decisions based on its own the wireless device observations OJJE including the control plane signaling received from the base station 40 with the learned control plane information. The control plane decisions relate to sending uplink user plane data and/or control plane signaling UL CP to the base station 40.

[0064] In an embodiment, the observations relating to the wireless station 46 is excluded from the base station observations O_BTS used by the encoder and quantizer 44 to generate the additional learned control plane information Ld. Indeed, these observations are already known from the wireless device 46 and there is no added value to include them when generating the learned control bits. More specifically, if

= [0] e₂ ^l 0y ] is the features vector at time index i with U being the number of wireless devices in the radio access network, the information used to generate the learned control bits for the k^th wireless device at time index i is 0 = {0 for j e {1, 2, ... , U}and j #= k}.

[0065] An example of reward computation will now be given in relation to the embodiment described in FIG.4. In this example, the constraints taken into consideration to compute the reward relate to the average packet delay budget and the average number of uplink control bits.

[0066] In the following p is the time step for a packet to be successfully transmitted and q is the total number of control bits. _p

represent the pre-set constraints on p and q respectively. M is the maximum buffer size of the wireless device and the buffer status at time index i is represented by b^l. The reward r is defined as the sum of the following six criteria. r = r + r₂ + r₃ + r₄ + r₅ + r₆ where: r = +1 if an acknowledgment is received by the wireless device and 0, otherwise. r₂ = +1 and -1 otherwise with 0 being assigned if the packet has not been successfully received yet. r₃: +1 if p < _p and -1 otherwise with 0 being assigned if the packet has not been successfully received yet. r₄: M-/?^max if the last time index, j_max is reached in 1 training episode, and 0 otherwise. r₅: M, a bonus reward M if the training episode ends before the last time index j_max. r₆: -1 if there is a collision during data transmission and 0 otherwise. [0067] FIG.5 illustrates a second exemplary embodiment where the first apparatus is in each wireless device and the second apparatus is in the base station. In this case, the additional learned control plane information is sent by the wireless device to the base station in additional bits of a header of an uplink control packet (MAC PDU header). And the control plane decision made by the base station relates to scheduling and/or link adaptation functions implemented by the base station.

[0068] The wireless device 50 depicted in FIG.5 comprises an encoder and quantizer 51 , a decision module 52 and a communication module 53. The encoder and quantizer 51 includes a first leaning model trained as explained above with reference to FIG.3 to generates learned control bits Lu from the wireless device observations O_UE. The encoder and quantizer 51 send the learned control bits to the communication module 53. The decision module 52 makes control plane decisions based on the wireless device observations OJJE. The control plan decisions relate to sending uplink user plane data UL UP and/or control plane signaling UL CP to the base station 55. The control plane decisions are sent to the communication module 53. The control plane signaling UL CP sent by the communication module 53 to the base station 55 includes the learned control bits Lu.

[0069] The base station 56 includes a receiver 57, a scheduler 58, a link adaptation module 59 and a control signals generator 60. The receiver 57 receives data packets PDU and sends acknowledgement messages ACK to the control signals generator 60. The scheduler 58 receives scheduling requests SR including the learned control bits Lu and decides which wireless device to serve. The scheduler 58 sends scheduling grant messages SG to the control signals generator 60. The link adaptation module 59 receives scheduling request SR including the learned control bits Lu and sends MCS messages to the control signal generator 60. The control signal generator 60 generates downlink control plane signaling DL CP and sends them to the wireless device that the scheduler 43 has decided to serve.

[0070] The scheduler 58 and the link adaptation element 59 each include a second learning model trained together with the first learning model as explained above with reference to FIG.3. The scheduler 58 makes control plane decisions related to scheduling based on the base station observations including the learned control bits Lu received from the wireless device 50. The link adaptation element 59 makes control plane decisions related to link adaptation based on the base station observations including the learned control bits Lu received from the wireless device 50.

[0071] The embodiments described with reference to FIG.4 and FIG.5 can be combined so that both downlink and uplink control plane information are learned.

[0072] As mentioned above in relation to the embodiment described in FIG.4, the observations relating to the base station 56 is advantageously excluded from the first apparatus control plane observations used to generate the additional learned control plane information signaling 61 by the encoder and quantizer 51. And the constraints taken into consideration to compute the reward relate to the average packet delay budget and the average number of uplink control bits.

[0073] FIG. 6 depicts a high-level block diagram of an apparatus 600 suitable for implementing various aspects of the disclosure. Although illustrated in a single block, in other embodiments the apparatus 600 may also be implemented using parallel and distributed architectures. Thus, for example, various steps such as those illustrated in the methods described above by reference to FIG.1 to 5 may be executed using apparatus 600 sequentially, in parallel, or in a different order based on particular implementations. The first apparatus, the second apparatus, wireless devices, the base station, and the train entity can be implemented in the form of apparatus 600.

[0074] According to an exemplary embodiment, depicted in FIG.6, apparatus 600 comprises a printed circuit board 601 on which a communication bus 602 connects a processor 603 (e.g., a central processing unit "CPU"), a random access memory 604, a storage medium 61 1 , possibly an interface 605 for connecting a display 606, a series of connectors 607 for connecting user interface devices or modules such as a mouse or trackpad 608 and a keyboard 609, a wireless network interface 610 and/or a wired network interface 612. Depending on the functionality required, the apparatus may implement only part of the above. Certain modules of FIG.6 may be internal or connected externally, in which case they do not necessarily form integral part of the apparatus itself. E.g. display 606 may be a display that is connected to the apparatus only under specific circumstances, or the apparatus may be controlled through another device with a display, i.e. no specific display 606 and interface 605 are required for such an apparatus. Memory 61 1 contains software code which, when executed by processor 603, causes the apparatus to perform the methods described herein. Memory 61 1 can also store observations as disclosed above, as well as the first or second learning model. In an exemplary embodiment, a detachable storage medium 613 such as a USB stick may also be connected. For example the detachable storage medium 613 can hold the software code to be uploaded to memory 611 .

[0075] The processor 603 may be any type of processor such as a general purpose central processing unit ("CPU") or a dedicated microprocessor such as an embedded microcontroller or a digital signal processor ("DSP").

[0076] In addition, apparatus 600 may also include other components typically found in computing systems, such as an operating system, queue managers, device drivers, or one or more network protocols that are stored in memory 611 and executed by the processor 603. [0077] Although aspects herein have been described with reference to particular embodiments, it is to be understood that these embodiments are merely illustrative of the principles and applications of the present disclosure. It is therefore to be understood that numerous modifications can be made to the illustrative embodiments and that other arrangements can be devised without departing from the spirit and scope of the disclosure as determined based upon the claims and any equivalents thereof.

[0078] For example, the data disclosed herein may be stored in various types of data structures which may be accessed and manipulated by a programmable processor (e.g., CPU or FPGA) that is implemented using software, hardware, or combination thereof.

[0079] It should be appreciated by those skilled in the art that any block diagrams herein represent conceptual views of illustrative circuitry embodying the principles of the disclosure. Similarly, it will be appreciated that any flow charts, flow diagrams, state transition diagrams, and the like represent various processes which may be substantially implemented by circuitry.

[0080] Each described function, engine, block, step can be implemented in hardware, software, firmware, middleware, microcode, or any suitable combination thereof. If implemented in software, the functions, engines, blocks of the block diagrams and/or flowchart illustrations can be implemented by computer program instructions / software code, which may be stored or transmitted over a computer-readable medium, or loaded onto a general purpose computer, special purpose computer or other programmable processing apparatus and / or system to produce a machine, such that the computer program instructions or software code which execute on the computer or other programmable processing apparatus, create the means for implementing the functions described herein.

[0081] In the present description, block denoted as "means configured to perform ..." (a certain function) shall be understood as functional blocks comprising circuitry that is adapted for performing or configured to perform a certain function. A means being configured to perform a certain function does, hence, not imply that such means necessarily is performing said function (at a given time instant). Moreover, any entity described herein as "means", may correspond to or be implemented as "one or more modules", "one or more devices", "one or more units", etc. When provided by a processor, the functions may be provided by a single dedicated processor, by a single shared processor, or by a plurality of individual processors, some of which may be shared. Moreover, explicit use of the term "processor" or "controller" should not be construed to refer exclusively to hardware capable of executing software, and may implicitly include, without limitation, digital signal processor (DSP) hardware, network processor, application specific integrated circuit (ASIC), field programmable gate array (FPGA), read only memory (ROM) for storing software, random access memory (RAM), and non-volatile storage. Other hardware, conventional or custom, may also be included. Their function may be carried out through the operation of program logic, through dedicated logic, through the interaction of program control and dedicated logic, or even manually, the particular technique being selectable by the implementer as more specifically understood from the context.

[0082] As used herein, the term "and/or," includes any and all combinations of one or more of the associated listed items.

[0083] When an element is referred to as being "connected," or "coupled," to another element, it can be directly connected or coupled to the other element or intervening elements may be present. Other words used to describe the relationship between elements should be interpreted in a like fashion (e.g., "between," versus "directly between," "adjacent," versus "directly adjacent," etc.).

[0084] The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting. As used herein, the singular forms "a," "an," and "the," are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises," "comprising," "includes," and/or "including," when used herein, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

[0085] Benefits, other advantages, and solutions to problems have been described above with regard to specific embodiments of the invention. However, the benefits, advantages, solutions to problems, and any element(s) that may cause or result in such benefits, advantages, or solutions, or cause such benefits, advantages, or solutions to become more pronounced are not to be construed as a critical, required, or essential feature or element of any or all the claims.

Claims

1 . A first apparatus for use in a radio access network, comprising: means for receiving a first learning model from a training entity, the first learning model being previously trained at the training entity together with a second learning model to be used by a second apparatus in the radio access network, based on training data obtained from a plurality of apparatus in the radio access network, comprising said first and said second apparatus, to satisfy a preset quality of service requirement, means for obtaining first apparatus observations, means for providing the first apparatus observations as input to the first learning model to generate as output additional learned control plane information, means for generating control plane signaling according to a known protocol, said control plane signaling comprising the additional learned control plane information, means for sending the control plane signaling to the second apparatus.

2. A second apparatus for use in a radio access network, comprising: means for receiving a second learning model from a training entity, the second learning model being previously trained at the training entity together with a first learning model to be used by a first apparatus in the radio access network, based on training data obtained from a plurality of apparatus in the radio access network, including said first and said second apparatus, to satisfy a preset quality of service requirement, means for obtaining second apparatus observations, including control plane signaling according to a known protocol received from the first apparatus, said control plane signaling comprising additional learned control plane information, means for providing the second apparatus observations as input to the second learning model to generate a control plane decision, means for sending signals to the first apparatus based on the control plane decision.

3. A first method for use in a first apparatus in a radio access network, comprising: receiving a first learning model from a training entity, the first learning model being previously trained at the training entity together with a second learning model to be used by a second apparatus in the radio access network, based on training data obtained from a plurality of apparatus in the radio access network, including said first and said second apparatus, to satisfy a preset quality of service requirement, obtaining first apparatus observations, providing the first apparatus observations as input to the first learning model to generate as output additional learned control plane information, generating control plane signaling according to a known protocol, said control plane signaling comprising the additional learned control plane information, sending the control plane signaling to the second apparatus.

4. A second method for use in a second apparatus in a radio access network, comprising: receiving a second learning model from a training entity, the second learning model being previously trained at the training entity together with a first learning model to be used by a first apparatus in the radio access network, based on training data obtained from a plurality of apparatus in the radio access network, including said first and said second apparatus, to satisfy a preset quality of service requirement, obtaining second apparatus observations, including control plane signaling according to a known protocol received from the first apparatus, said control plane signaling comprising additional learned control plane information, providing the second apparatus observations as input to the second learning model to generate a control plane decision, sending signals to the first apparatus based on the control plane decision.

5. The first apparatus as claim in claim 1 , or the method as claimed in claim 3, for use in a base station in the radio access network, wherein the additional learned control plane information is sent to a wireless device in the radio access network in additional bits of a header of a downlink control packet according to the known protocol.

6. The second apparatus as claimed in claim 2, or the second method as claimed in claim 4, for use in a wireless device in the radio access network wherein the additional learned control plane information is sent by a base station in the radio access network in additional bits of a header of an downlink control packet according to the known protocol, and the control plane decision relates to sending user plane data and/or control plane signaling to the base station.

7. The first apparatus as claimed in claim 1 , or the method as claimed in claim 3, for use in a wireless device in the radio access network wherein the additional learned control plane information is sent to a base station in the radio access network in additional bits of a header of an uplink control packet according to the known protocol.

8. The second apparatus as claimed in claim 2, or the second method as claimed in claim 4, for use in a base station in the radio access network, wherein the additional learned control plane information is sent by a wireless device in the radio access network in additional bits of a header of an uplink control packet according to the known protocol, and the control plane decision relates to scheduling and/or link adaptation functions implemented by the base station.

9. The first apparatus as claimed in any of claims 1 , 5, or 7, or the first method as claimed in any of claims 2, 4, or 8, wherein obtaining first apparatus observations includes excluding observations related to the second apparatus.

10. The apparatus as claimed in any of claims 1 or 2 or 5 to 9, or the method as claimed in any of claims 3, or 4 or 5 to 9, wherein the observations of a given apparatus comprise control plane signaling received by the given apparatus, contextual information relating to the given apparatus, and history of control plane decisions made by the given apparatus.

11 . A training entity comprising : means for training together a first learning model and a second learning model based on training data obtained from a plurality of apparatus of a radio access network, comprising one or more first apparatus and one or more second apparatus, wherein: the first learning model is trained to generate additional learned control plane information, the second learning model is trained to generate a control plane decision based on control plane signaling according to a known protocol, said control plane signaling comprising the additional learned control plane information, while satisfying a preset quality of service requirement, means for sending the trained first learning model to the one or more first apparatus in the radio access network and the trained second learning model to the one or more second apparatus.

12. A training method comprising: - training together a first learning model and a second learning model based on training data obtained from a plurality of apparatus of a radio access network, comprising one or more first apparatus and one or more second apparatus, wherein: the first learning model is trained to generate additional learned control plane information, the second learning model is trained to generate a control plane decision based on control plane signaling according to a known protocol, said control plane signaling comprising the additional learned control plane information, while satisfying a preset quality of service requirement,

- sending the trained first learning model to the one or more first apparatus in the radio access network and the trained second learning model to the one or more second apparatus.

13. The training entity as claimed in claim 11 , or the training method as claimed in claim 12, wherein training the first and second learning model includes reinforcement training where a reward is calculated based on the preset quality of service requirement.

14. The apparatus or a method as claimed in any of claims 1 to 13, wherein the control plane signaling respond to a preset average number of bits.

15. The apparatus or the method as claimed in any of claims 1 to 14, wherein the control plane signaling respond to a preset average packet delay budget.

16. A computer program comprising program instructions which when executed by an apparatus cause the apparatus to execute the steps of the method claimed in any of claims 3 to 10 or 12 to 15.