CN117134805A

CN117134805A - Air computing federal learning design method under Cloud-RAN architecture

Info

Publication number: CN117134805A
Application number: CN202310907454.6A
Authority: CN
Inventors: 袁晓军; 马浩铭
Original assignee: Yangtze River Delta Research Institute of UESTC Huzhou
Current assignee: Yangtze River Delta Research Institute of UESTC Huzhou
Priority date: 2023-07-24
Filing date: 2023-07-24
Publication date: 2023-11-28

Abstract

The invention belongs to the technical field of information and communication, and relates to an air computing (Over-the-air) federal learning (Federated Learning) design method under a Cloud-RAN (Cloud Radio Access Networks) architecture. In order to solve the problem of limited service range of traditional air computing federal learning, the invention provides a communication design scheme of a MIMOCROF system, which comprises the steps of introducing a Cloud-RAN technology and a MIMO technology into the air computing federal learning. In order to better utilize the correlation of transmission gradient under the MIMOCROF system, the invention designs a novel modeling thought of a lossy source coding view (Lossy distributed source coding) and a coding and decoding scheme based on a neural network, which is called a Practical L-DSC codec.

Description

Air computing federal learning design method under Cloud-RAN architecture

Technical Field

The invention belongs to the technical field of information and communication, and relates to an air computing (Over-the-air) federal learning (Federated Learning) design method under a Cloud-RAN (Cloud Radio Access Networks) architecture.

Background

The rapid increase in data collection and computing power on mobile edge devices continues to stimulate interest in providing artificial intelligence services, such as computer vision and natural language processing. Traditional centralized learning (Centralized Learning, CL) methods require local data to be uploaded to a central node for model training, which can create significant communication costs and raise concerns about data privacy. To address these issues, federal learning (Federated Learning, FL) has become a promising framework for distributed and security model training. In the FL framework, each edge device uses its local data for model optimization and sends its model updates to a Cloud Server (CS) (referred to as upstream transmission). The CS aggregates the local model updates for global model updates and then distributes the updated global model to the edge devices (referred to as downstream). FL significantly reduces the communication burden and risk of data leakage compared to CL and therefore becomes an attractive option for wireless edge machine learning applications.

FL upstream involves sending model updates from the distributed edge devices to the CS, which can create a critical communication bottleneck due to limited upstream channel resources (e.g., bandwidth, time, and space). By supporting analog transmissions from a large number of edge devices, over-the-air computing (OA) has become an efficient technique in FL upstream transmissions. Rather than allocating orthogonal resources to avoid interference, by deploying analog transmissions for model aggregation, air-computing federal learning (OA-FL) enables devices to share radio resources during model upload. The pioneering work has demonstrated its superiority in terms of noise tolerance and reduced delay compared to the conventional Orthogonal Multiple Access (OMA) protocol.

The FL faces another challenge in that a single CS with limited service coverage is typically unable to acquire the large amount of data needed for model training. In this regard, cloud wireless access networks (Cloud Radio Access Network, cloud-RAN) are an attractive alternative. The architecture consists of multiple Access Points (APs), each serving a particular set of mobile devices. The APs transmit (or receive) signals to the mobile devices via the wireless access network and upload (or download) data loads to (or from) the CS via the forwarding network (Fronthaul Network). The Cloud-RAN provides flexible network deployment, thereby significantly increasing system coverage at low cost.

In this work, the concepts of Cloud-RAN and OA-FL are combined, and Multiple-input Multiple-output (MIMO) technology is introduced to enhance the radio link quality. A MIMO Cloud-RAN OA-FL (mimocof) framework is proposed that includes three phases in each training wheel. The first stage of edge aggregation allows each AP to collect local updates from edge devices and construct edge updates using MIMO multiple access. In the second stage of global aggregation, the CS aggregates edge updates from the APs via the forwarding network to form global updates. In the third phase of model update and broadcast, the CS sends the updated global model parameters to the APs, which then broadcast the parameters to the devices they serve.

It has been observed that local updates in the FL are typically correlated, resulting in edge updates being correlated. This correlation between APs can be exploited to significantly reduce the communication costs of global aggregation. To better exploit the correlation between APs, it is proposed to model the global aggregation phase as a lossy distributed source coding (Lossy Distributed Source Coding, L-DSC) problem. Based on this, the performance of the MIMOCROF framework was further analyzed from the perspective of the rate distortion theory. A joint optimization problem of communication-learning is then formulated to improve system performance by taking into account the correlation between APs. To solve this problem efficiently, a solution algorithm was developed using alternating optimizations (Alternating Optimization, AO) to effectively improve FL learning performance.

Next, a Practical L-DSC design, i.e. a Practical L-DSC codec, is proposed, with the correlation between APs being exploited, and the encoding and decoding functions being deployed separately at each AP and CS. At each AP, the encoder compresses and quantizes the edge updates using a random compression matrix, which is then sent to the CS over the forwarding network. On CS, the decoder uses the proposed neural network structure to reconstruct global updates from the channel observations, exploiting the correlation between APs. Numerical results indicate that the proposed practical design effectively exploits correlation between APs and is superior to other baseline schemes.

The present invention will explain the above-described solution in detail.

Disclosure of Invention

The invention provides a communication design scheme of a MIMO Cloud-RAN OA-FL (MIMOCROF) system. The scheme comprises the steps of introducing the Cloud-RAN technology and the MIMO technology into the air computing federal learning, and aims to solve the problem that the service range of the air computing federal learning is limited. The invention provides a novel lossy source coding view angle (Lossy distributed source coding, L-DSC) modeling idea and a coding and decoding scheme based on a neural network, namely a Practical L-DSC codec, aiming at utilizing the correlation among APs under the MIMOCROF system.

The technical scheme adopted by the invention comprises the following steps:

s1, the invention considers a FL system, wherein N is _D Edge devices and a CS according to distribution in N _D Training data on the individual edge devices cooperatively learn the shared model. The goal is to minimize the global loss functionI.e.

Wherein the vector isIs a global model parameter vector with length N, < ->Is an empirical loss of device k, defined as

Where l (θ, b) is a sample-by-sample loss function,is the local dataset of device k, b is +.>Is a data sample of (a).

S2, as shown in FIG. 1, the invention assumes that the Cloud Server (CS) passes through N _A Access N for individual APs _D And a device. The CS randomly initializes and broadcasts the model parameters θ to N over the front thaul link _A And the APs. CS initialization device originating beamforming vector asWherein N is _T Is the number of transmit antennas per device. CS initializes AP receiving end beam forming vector to +.>Wherein N is _T Is the number of receive antennas per AP. CS initializes the L-DSC parameter matrix to Sigma _V =o. CS initializes the L-DSC polymerization coefficient vector to c=0. CS initializes the correlation matrix between APs to Sigma _s ＝O。

S3, each APi globally broadcasts the received model parameters theta to the service equipment thereof through the MIMO channel, namelyWhen (1). Note->Representing the set of service devices of AP i, satisfying +.>Order (1).Assume that the maximum training round for FL task training is T. In training round t, the following steps are performed:

s4, each device k locally performs gradient descent, and calculates local update, namely local gradient

S5, each device k updates itself locally byNormalization is carried out

Wherein there is varianceAnd mean->

S6, each device k willMapping to a complex vector +.>Is that

Wherein the method comprises the steps ofIs->C=n/2.

S7, each device k measures its transmission power budget for each symbolAnd sends it lossless to the AP to which it is accessing. Each device k will gradient mean +.>And gradient variance->Lossless transmissions are sent to the AP to which it is accessing.

S8, the channel is assumed to be unchanged when the gradient is uploaded in each training round. Each APi performs channel information (CSI) estimation with its service equipment, and matrices the estimated channel informationUploading to the cloud server CS. Each APi will receive the parameters +.>Uploading to the cloud server CS. Estimating its receiving end noise power +_in each APi by a expectation maximization algorithm>And upload it to the cloud server CS.

S9, the server CS is used for generating k E [ N ] for each k _D ]Optimizing device originating beamformingFix other equipment originating beam forming vector +.>Receiving end beam forming vector->And L-DSC polymerization vector c ^(t) The following problems are optimized:

wherein the method comprises the steps of For vector e ^(t) Is the i-th element of (c). The problem is a convex QCQP (quadratically constrained quadratic programming) problem that can be solved by existing tools.

S10, the server CS is used for each i E [ N ] _A ]Optimizing AP receiving end beam formingFixing beam forming vectors of other AP receiving ends>Equipment originating shaping vector->And L-DSC polymerization vector c ^(t) The following problems are optimized:

this problem is a convex one. Give its analytical solution (simplified superscript t) as

If the optimization converges, the process goes to S11. If not, turning to S9.

S11, server CS sendsAnd beta _i Each AP i is given.

S12, each AP i transmits the received device-end beam forming vector alpha _k Devices for sending to their services, i.e.When (1).

S13, each device k then uses C channel slot transmissionsI.e. device k has a transmit signal matrix of

Is provided withIs->C element of->Is->Is shown in column c. Device k transmits its update signal on the c-th channel use such that +.>To meet power constraints

Wherein the method comprises the steps of

S14, all devices synchronously transmit signal matrixInto a MIMO channel. At this time, the signal matrix observable by each APi is

Wherein the method comprises the steps ofIs an additive white gaussian noise matrix and each element is independently subject to equal distributionSynchronization may be achieved using existing techniques, such as timing advance mechanisms for uplink synchronization in 4G LTE.

S15, all APi aggregate the observation matrix by using the received beam forming vectorIs that

S16, updating all APi construction edges into

S17, as shown in FIG. 3, all APi calls the proposed Practical L-DSC encoder to compress edge update into low-dimensional vector as follows:

wherein the method comprises the steps ofIs a random compression matrix with a compression ratio sigma epsilon (0, 1)]。

S18, all APi call Practical L-DSC encoders are in the process ofAdding an error accumulation term to obtain:

wherein the error accumulation termFrom round t-1.

S19, all APi call Practical L-DSC encoders are used for vector quantityQuantization is carried out to obtain

Wherein the method comprises the steps ofCodebook size, which is a coding function, satisfies the transmission rate constraint +.>E is a small, predetermined constant, +.>Is that the preset upper limit of the rate from APi to CS is limited by the unit bit/symbol, and +.>Is a uniform quantizer will +.>Is discretized into a quantization number.

S20, all APi call Practical L-DSC encoders calculate error accumulation vectorsThe following are provided:

s21, all APi transmissionsTo the server CS. All apis set the compression ratio to σ=1 and the vector +_1 for the compression ratio σ=1 is obtained by steps S11-S12>Marked as->And sends its sub-vectors to the server CS.

S22, the server CS receives the dataSub-vector sum->Estimate and update the L-DSC parameter matrix Σ _V Correlation matrix Sigma between APs _S 。

S23, optimizing L-DSC aggregation vector c by server CS ^(t) . Fixing equipment transmitting end beam forming vectorReceiving end beam forming vector->The following problems are optimized:

wherein the method comprises the steps ofIs semi-positive. The problem is a second order convex problem, providing an analytical solution (simplified superscript t) as follows

Wherein the method comprises the steps ofIs a diagonal matrix>The element of row j of (i) is defined as +.>And->

S24, all APi are sentTo the server CS via the fronthaul link.

S25, when all the receivedAfter that, the server CS calls the proposed Practical L-DSC decoder. As shown in fig. 3, the decoder has K layers, each of which consists of N _A The system comprises a neural network and an auxiliary information module. CS initializes the side information vector of the first layer to +.>Initializing a count scalar k=1.

S26, invoking a neural network of a kth layer of the Practical L-DSC decoder by the server CSCalculated to obtain

Wherein the method comprises the steps ofIs the side information module from layer k-1. The kth layer is obtained by collecting->To generate an auxiliary information vector as follows:

CS self-increases the count scalar k by 1.

S27, if K is less than or equal to K, turning to S26.

S28, constructing an estimator by the server CS as

S29, constructing global update by the server CS as

S30, the server CS updates the global model parameters

Where η is the learning rate.

S31, the server CS broadcasts the updated global model theta to N through a frontthaul link _A And the APs.

S32, each APi globally broadcasts the received model parameters theta to the devices served by the APi through the MIMO channel, namelyWhen (1). It is assumed that the downlink frontau and downlink MIMO wireless transmissions are negligible in error.

S33, ending if the communication round T is more than T, otherwise turning to S4. The above-described flow is illustrated in fig. 2.

The improvements of the present invention can be summarized as follows: first, the present invention combines the concepts of the Cloud-RAN and the OA-FL, and introduces a Multiple Input Multiple Output (MIMO) technology, thereby solving the problem of limited service range of the conventional OA-FL. For this purpose, the invention proposes a MIMO Cloud-RAN OA-FL (MIMOCROF) framework comprising three phases: edge aggregation, global aggregation, and model updating and broadcasting. Secondly, in order to exploit the correlation between gradients under MIMOCROF to enhance learning performance, the present invention proposes to model the global aggregation phase as a lossy distributed source coding (L-DSC) problem and further analyze the performance of MIMOCROF from the perspective of rate-distortion theory. Based on such analysis, the present invention formulates a joint optimization problem of communication-learning and a solution algorithm using alternating optimizations (Alternating Optimization, AO) to improve system performance. Finally, the invention proposes a novel codec based on a neural network as a communication scheme for the L-DSC problem. Numerical results indicate that the proposed practical design makes efficient use of the correlation between APs and is superior to other baseline schemes.

Drawings

Fig. 1: schematic diagram of MIMO Cloud-RAN architecture

Fig. 2: flow diagram of MIMOCROF system

Fig. 3: schematic diagram of a neural network-based codec

Fig. 4: performance analysis graph of MIMOCROF system under different data set heterogeneous settings

Fig. 5: performance analysis graphs of MIMOCROF system under different data sets

Detailed Description

The following describes in further detail the embodiments of the present invention with reference to the drawings and examples.

The parameters of the specific method are set as follows:

the FL task was tested on three data sets: a handwritten digital MNIST dataset, a Fashion garment fascion-MNIST dataset, and a CIFAR-10 dataset. A DNN was trained on each device and CS, comprising two 5 x 5 convolutional layers (the first with 10 channels and the second with 20 channels, each followed by a 2 x 2 max pooling operation), a fully-connected layer with 50 units and a ReLU activation function, and a final softmax output layer. Model parameters for MNIST and Fashion-MNIST were of length n= 21840, while CIFAR-10 was of length n= 31340. Each device was updated 5 times using a random gradient descent algorithm with a learning rate of 0.01 and a local batch size of 1200. CS updates the global model using a method with a learning rate of 1.5/(1+t/10), where t is the training round. The total training round is set to t=200.

According to the above parameter settings, the specific steps of the simulation are as follows:

S3, each APi globally broadcasts the received model parameters theta to the service equipment thereof through the MIMO channel, namelyWhen (1). Note->Representing the set of service devices of AP i, satisfying +.>And is also provided.Assume that the maximum training round for FL task training is T. In training round t, the following steps are performed:

S5, each device k updates itself locally byNormalization is carried out

Wherein there is varianceAnd mean->

S6, each device k willMapping to a complex vector +.>Is that

Wherein the method comprises the steps ofIs->C=n/2.

S9, the server CS is used for generating k E [ N ] for each k _D ]Optimizing device originating beamformingFix other equipment originating beam forming vector +.>Receiving end beam forming vector/>And L-DSC polymerization vector c ^(t) The following problems are optimized:

wherein the method comprises the steps of Is vector c ^(t) Is the i-th element of (c). The problem is a convex QCQP (quadratically constrained quadratic programming) problem that can be solved by existing tools.

If the optimization converges, the process goes to S11. If not, turning to S9.

S11, server CS sendsAnd beta _i Each AP i is given.

Wherein the method comprises the steps of

S16, updating all APi construction edges into

wherein the error accumulation termFrom round t-1.

S24, all APi are sentTo the server CS via the fronthaul link.

CS self-increases the count scalar k by 1.

S27, if K is less than or equal to K, turning to S26.

S28, constructing an estimator by the server CS as

S29, constructing global update by the server CS as

S30, the server CS updates the global model parameters

Where η is the learning rate.

In experiments, various types of codecs were deployed. An encoder of one scheme employs DNN-based encoders and decoders to exploit gradient correlation between APs. Another encoder, called simple quantization, requires the AP to quantize the gradient and then send it to the CS, which then linearly aggregates the received quantized vectors to obtain a global update. The Practical L-DSC encoder provided by the invention is the steps S17-S20, and the decoder is the steps S26-S27.

In fig. 4, under the MIMOCROF framework, various types of codecs are deployed and different heterogeneous settings of the MNIST dataset are considered. Data set heterogeneous refers to the dissimilarity of the distribution of data sets among the data of different devices. The results of (a) show that under the mimocof framework, as the degree of data set isomerism increases, the learning performance of various schemes decreases. In addition, under all heterogeneous settings, the Practical L-DSC provided by the invention is superior to the rest of comparison schemes, and the superiority of the scheme provided by the invention is proved. Meanwhile, when the isomerization degree of the data set is lower, the learning performance of the proposed Practical L-DSC codec is equivalent to that of a theoretical optimal L-DSC codec and an error-free boundary.

In FIG. 5, under the MIMOCROF framework, various types of codecs are deployed and more complex scenarios are considered, such as Fashion-MNIST and CIFAR-10 datasets. The results of (a) show that under the MIMOCROF framework, the learning performance of the proposed Practical L-DSC codec is equivalent to that of a theoretical optimal L-DSC codec and an error-free boundary. These findings indicate that the proposed mimocof with a practical L-DSC has advantages in improving learning performance even for complex data sets.

Claims

1. An air computing federal learning design method under a Cloud-RAN architecture, comprising the steps of:

s1, consider a FL system, where there is N _D Edge devices and a CS according to distribution in N _D Training data on individual edge devices cooperatively learn a shared model with the goal of minimizing global loss functionsI.e.

Where l (θ, b) is a sample-by-sample loss function,is the local dataset of device k, b is +.>Is a data sample of (2);

s2, supposing that the cloud server CS passes through N _A Access N for individual APs _D The CS randomly initializes and broadcasts the model parameters θ to N over the frontthaul link _A The method comprises the steps of initializing an originating beam forming vector of equipment by using a CS (control and signaling) as an AP (access point)Wherein N is _T Is the number of transmitting antennas of each device, CS initializes the beamforming vector of the AP receiving end to +.>Wherein N is _T Is the number of receiving antennas per AP, CS initializes the L-DSC parameter matrix to Sigma _V =o, CS initializes L-DSC aggregate coefficient vector to c=0, CS initializes inter-AP correlation matrix to Σ _s ＝O；

S3, each APi globally broadcasts the received model parameters theta to the service equipment thereof through the MIMO channel, namelyAt the same time, notice +.>A set of service devices representing APi, satisfying +.>And is also provided.Assuming that the maximum training round of FL task training is T, in the training round T, the following steps are executed:

S5, each device k updates itself locally byNormalization is carried out

Wherein there is varianceAnd mean->

S6, each device k willMapping to a complex vector +.>Is that

Wherein the method comprises the steps ofIs->C=n/2;

s7, each device k measures its transmission power budget for each symbolAnd sends it lossless to the AP it accesses, each device k will gradient mean ++>And gradient variance->Lossless transmission to the AP to which it is connected;

s8, assuming that the channel remains unchanged during gradient uploading in each training round, each APi and the service equipment thereof estimate channel information CSI and matrix the estimated channel informationUploading to the cloud server CS. Each APi will receive the parameters +.>Uploading to the cloud server CS. Estimating its receiving end noise power +_in each APi by a expectation maximization algorithm>And upload it to cloud server CS;

s9, the server CS is used for generating k E [ N ] for each k _D ]Optimizing device originating beamformingFix other equipment originating beam forming vector +.>Receiving end beam formingShape vector->And L-DSC polymerization vector c ^(t) The following problems are optimized:

wherein the method comprises the steps of Is vector c ^(t) Is a convex qqp problem that can be solved by existing tools;

the problem is a convex problem, which is resolved into

If the optimization converges, turning to S11, and if the optimization does not converge, turning to S9;

s11, server CS sendsAnd beta _i Giving each AP i;

s12, each AP i transmits the received device-end beam forming vector alpha _k Devices for sending to their services, i.e.When in use;

Is provided withIs->C element of->Is->Device k sends its update signal on the c-th channel use such that +.>To meet power constraints

Wherein the method comprises the steps of

S14, all devices synchronously transmit signal matrixInto the MIMO channel, each APi can observe the signal matrix as

Wherein the method comprises the steps ofIs an additive white gaussian noise matrix and each element is independently subject to equal distributionSynchronization may be achieved using a timing advance mechanism for uplink synchronization in 4G LTE;

S16, updating all APi construction edges into

S17, all APi call a practical L-DSC coder, and firstly, edge updating is compressed into a low-dimensional vector, as follows:

wherein the method comprises the steps ofIs a random compression matrix with a compression ratio sigma epsilon (0, 1)]；

wherein the error accumulation termFrom round t-1;

Wherein the method comprises the steps ofCodebook size, which is a coding function, satisfies the transmission rate constraint +.>E is a small, predetermined constant, +.>Is that the preset upper limit of the rate from APi to CS is limited by the unit bit/symbol, and +.>Is a uniform quantizer will +.>Is discretized into a quantization number;

s21, all APi transmissionsAll APi set compression ratio σ=1 and get vector +_for compression ratio σ=1 by steps S11-S12 to server CS>Marked as->And send itSub-vectors to the server CS;

S23, optimizing L-DSC aggregation vector c by server CS ^(t) Fixing equipment transmitting end beam forming vectorReceiving end beam forming vector->The following problems are optimized:

wherein the method comprises the steps ofIs semi-positive, the problem is a second order convex problem, and the analytical solution is provided as follows

Wherein the method comprises the steps ofIs a diagonal matrix>The elements of row j of (i) are defined as/>And->

S24, all APi are sentTo the server CS over a fronthaul link;

s25, when all the receivedThe server CS then invokes a Practical L-DSC decoder having K layers, each of which consists of N _A The CS initializes the auxiliary information vector of the first layer to +.>Initializing a count scalar k=1;

Wherein the method comprises the steps ofIs an auxiliary information module from the k-1 th layer by collecting +.>To generate an auxiliary information vector as follows:

CS self-increases the count scalar k by 1;

s27, if K is less than or equal to K, turning to S26;

s28, constructing an estimator by the server CS as

S29, constructing global update by the server CS as

S30, the server CS updates the global model parameters

Where η is the learning rate;

s31, the server CS broadcasts the updated global model theta to N through a frontthaul link _A The method comprises the steps of (1) performing AP (access points);

s32, each APi globally broadcasts the received model parameters theta to the devices served by the APi through the MIMO channel, namelyWhen the downlink front au and downlink MIMO wireless transmissions are assumed to be negligible errors;

s33, ending if the communication round T is more than T, otherwise turning to S4.