[go: up one dir, main page]

CN111212438A - A resource allocation method for wireless energy-carrying communication technology - Google Patents

A resource allocation method for wireless energy-carrying communication technology Download PDF

Info

Publication number
CN111212438A
CN111212438A CN202010113438.6A CN202010113438A CN111212438A CN 111212438 A CN111212438 A CN 111212438A CN 202010113438 A CN202010113438 A CN 202010113438A CN 111212438 A CN111212438 A CN 111212438A
Authority
CN
China
Prior art keywords
user
resource allocation
decision process
markov decision
energy
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010113438.6A
Other languages
Chinese (zh)
Other versions
CN111212438B (en
Inventor
李立欣
马慧
王大伟
李旭
程岳
杨富程
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northwestern Polytechnical University
Original Assignee
Northwestern Polytechnical University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northwestern Polytechnical University filed Critical Northwestern Polytechnical University
Priority to CN202010113438.6A priority Critical patent/CN111212438B/en
Publication of CN111212438A publication Critical patent/CN111212438A/en
Application granted granted Critical
Publication of CN111212438B publication Critical patent/CN111212438B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W24/00Supervisory, monitoring or testing arrangements
    • H04W24/02Arrangements for optimising operational condition
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W52/00Power management, e.g. Transmission Power Control [TPC] or power classes
    • H04W52/04Transmission power control [TPC]
    • H04W52/06TPC algorithms
    • H04W52/14Separate analysis of uplink or downlink
    • H04W52/143Downlink power control
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W52/00Power management, e.g. Transmission Power Control [TPC] or power classes
    • H04W52/04Transmission power control [TPC]
    • H04W52/18TPC being performed according to specific parameters
    • H04W52/24TPC being performed according to specific parameters using SIR [Signal to Interference Ratio] or other wireless path parameters
    • H04W52/241TPC being performed according to specific parameters using SIR [Signal to Interference Ratio] or other wireless path parameters taking into account channel quality metrics, e.g. SIR, SNR, CIR or Eb/lo
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W52/00Power management, e.g. Transmission Power Control [TPC] or power classes
    • H04W52/04Transmission power control [TPC]
    • H04W52/18TPC being performed according to specific parameters
    • H04W52/26TPC being performed according to specific parameters using transmission rate or quality of service QoS [Quality of Service]
    • H04W52/265TPC being performed according to specific parameters using transmission rate or quality of service QoS [Quality of Service] taking into account the quality of service QoS
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W52/00Power management, e.g. Transmission Power Control [TPC] or power classes
    • H04W52/04Transmission power control [TPC]
    • H04W52/18TPC being performed according to specific parameters
    • H04W52/26TPC being performed according to specific parameters using transmission rate or quality of service QoS [Quality of Service]
    • H04W52/267TPC being performed according to specific parameters using transmission rate or quality of service QoS [Quality of Service] taking into account the information rate
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W52/00Power management, e.g. Transmission Power Control [TPC] or power classes
    • H04W52/04Transmission power control [TPC]
    • H04W52/30Transmission power control [TPC] using constraints in the total amount of available transmission power
    • H04W52/34TPC management, i.e. sharing limited amount of power among users or channels or data types, e.g. cell loading

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

本发明公开了一种面向模式划分多址接入技术的无线携能下行链路通信场景的资源分配方法,通过提出了基于约束马尔可夫过程的Q学习算法,解决了在保证所有用户服务质量的前提下最小化发射端的传输总功率的问题,其中所提到的用户服务质量包括用户接收到的最小能量要求和最小数据速率要求。经验证,所提出的资源分配策略可以明显降低发射端的传输总功率。

Figure 202010113438

The invention discloses a resource allocation method for wireless energy-carrying downlink communication scenarios oriented to the mode division multiple access technology. By proposing a Q-learning algorithm based on a constrained Markov process, it solves the problem of ensuring the service quality of all users. On the premise of minimizing the total transmission power of the transmitter, the user service quality mentioned includes the minimum energy requirement and the minimum data rate requirement received by the user. It has been verified that the proposed resource allocation strategy can significantly reduce the total transmission power of the transmitter.

Figure 202010113438

Description

Resource allocation method of wireless energy-carrying communication technology
[ technical field ] A method for producing a semiconductor device
The invention belongs to the field of wireless energy-carrying communication, and particularly relates to a resource allocation method in a wireless energy-carrying downlink communication scene facing a mode division multiple access technology.
[ background of the invention ]
The wireless energy-carrying communication technology is a novel wireless communication type, combines wireless power transmission and wireless signal transmission, and transmits energy while realizing reliable information interaction. Along with the rapid development of wireless energy-carrying communication technology, many drawbacks of the traditional power supply mode are as follows: the problem that the electric wire is easy to age and difficult to replace the battery in time is solved. However, solving the power saving and spectrum utilization problems in wireless energy-carrying communication technologies is still challenging at present.
In addition, the non-orthogonal multiple access technology is a 5G technology with great prospect, and can meet the requirements of low power consumption, high throughput, low time delay and wide coverage of a next generation mobile communication system. And the advantages of high spectrum efficiency, high access quantity and the like in the non-orthogonal multiple access technology just meet the explosive data growth and access requirements of the 5G era. In addition, the mode division multiple access technology in the non-orthogonal multiple access technology can fully utilize the multidimensional domain processing, and has the advantages of high coding flexibility, wide application range, low complexity and the like. And the application of the mode division multiple access technology in the wireless energy carrying communication technology can effectively improve the utilization rate of frequency spectrum resources and the energy efficiency. User quality of service as referred to herein includes the minimum received energy requirement and minimum data rate requirement of the receiving end user. Therefore, there is a need to find an effective tool to address the serious challenges.
In recent years, there has been an increasing discussion of how to design a reasonably efficient resource allocation method in a wireless energy-carrying communication system. The method has the advantages of universality and optimal user service quality, but has the disadvantage that the power consumption of a transmitting end cannot be minimized. The traditional method has high computational complexity and many constraints when solving the problem of minimizing the total power of transmission of a transmitting terminal in a wireless energy-carrying downlink communication scene of a mode division multiple access technology. Especially when the receiving end has a plurality of users, the service quality of each user is satisfied.
[ summary of the invention ]
The invention aims to provide a resource allocation method in a wireless energy-carrying downlink communication scene facing a mode division multiple access technology, which aims to solve the problem that the minimum total transmission power of a transmitting end still has great computational complexity while the service quality of a receiving end user is met.
The technical scheme adopted by the invention is that the resource allocation method in the wireless energy-carrying downlink communication scene facing the mode division multiple access technology is implemented according to the following steps:
step one, making a constraint Markov decision process:
describing a resource allocation problem in a wireless energy-carrying communication scene facing a mode division multiple access technology as a constraint Markov decision process, and converting the problem into an unconstrained Markov decision process by using a Lagrangian dual method;
step two, solving the unconstrained Markov decision process in the step one by using a reinforcement learning method to finally obtain an optimal resource allocation strategy; the objective of this strategy is to minimize the total power of transmission at the transmitting end while satisfying the quality of service for each user at the receiving end.
Further, the wireless energy-carrying downlink communication scenario is constructed as a system model, and the system model specifically includes:
a base station carries out wireless transmission of data and energy to T users in a specific area through K subcarriers, wherein a transmitting end adopts superposition coding, a receiving end adopts a serial interference elimination technology, and the base station of the transmitting end and the users of the receiving end are matched with a single antenna; the users are randomly distributed within a circle with radius r centered at the base station.
Further, the first step specifically comprises:
1) according to the system model, defining a state space and an action space of the system:
the state space of the system is specifically as follows:
s=(SINRk,t,k=0,1,...K,t=0,1,...T)∈S=SINR (1),
wherein, the SINRk,tThe SINR is the SINR when the kth subcarrier is loaded to the tth user, and the state set SINR belongs to a limited set of SINRs;
the action space of the system is specifically as follows:
Figure BDA0002390761150000031
wherein,
Figure BDA0002390761150000032
is a vector of transmission time ratios, P, assigned to the decoding of the information by T usersPDMAIs a power distribution matrix, GPDMAIs a sub-carrier mapping matrix that is,
Figure BDA0002390761150000033
GPDMA∈G,PPDMAe is P represents that the vector and the matrix respectively belong to a finite set of transmission time ratio, subcarrier mapping and power distribution allocated to information decoding;
2) the constrained markov decision process is detailed as follows:
Figure BDA0002390761150000034
Figure BDA0002390761150000035
Figure BDA0002390761150000036
wherein, PtotalIs total power of transmission of transmitting end(ii) a Equations (4) and (5) represent the constraints on the quality of service for each user, i.e. the energy E received by each usertAnd a data rate RtAre required to respectively satisfy the minimum energy requirement EreqAnd a data rate requirement Rreq(ii) a The Markov decision process is described as being through an adjustment action
Figure BDA0002390761150000037
GPDMA,PPDMAMinimizing the total power of transmission at the transmitting end under the constraint of satisfying the service quality of each user;
the markov decision process can be relaxed to an unconstrained markov process, i.e.:
Figure BDA0002390761150000041
Figure BDA0002390761150000042
wherein,
Figure BDA0002390761150000043
two sets of lagrangian operators, respectively; II type*The optimal resource allocation strategy is converted into a saddle point of a solving function L (lambda, mu, Π).
Further, in the second step, the updating formula of the Q value in reinforcement learning is specifically as follows:
Figure BDA0002390761150000044
wherein r isk+1Gamma and rho < 1 < 0 are respectively the reward obtained at the moment of k +1, the reward discount coefficient and the learning rate;
the optimum function is expressed as follows:
Figure BDA0002390761150000045
wherein Q is*(sAnd a) is the Q value given when the optimal strategy is followed for state s and action a.
The beneficial results of the invention are:
1. the invention provides a resource allocation method in a wireless energy-carrying downlink communication scene facing a mode division multiple access technology. Taking the time switching receiver as an example, the minimum total transmission power at the transmitting end is obtained by jointly optimizing the time slot ratio, the subcarrier mapping matrix and the power allocation matrix allocated to the energy reception and the data rate by the receiver.
2. In order to solve the problem that the constrained Markov decision process is difficult to solve, the Lagrangian dual theory is used to convert the constrained Markov decision process into an unconstrained Markov decision process. And finally, obtaining the optimal strategy in the Markov decision process by applying a Q learning algorithm in reinforcement learning.
3. The effectiveness of the method is verified through experiments, and compared with other methods, the method has the advantage that the transmitting end can obtain lower total transmission power.
[ description of the drawings ]
Fig. 1 is a diagram of a system model in a wireless energy-carrying downlink communication scenario for a mode-division multiple access technology according to the present invention;
FIG. 2 is a schematic diagram illustrating a variation of total transmission power at different iterations in the embodiment;
FIG. 3 is a comparison of performance of the embodiment using the DBN algorithm and the proposed Q learning algorithm under different user data rate requirements;
FIG. 4 is a comparison of performance of the embodiment using the DBN algorithm and the proposed Q learning algorithm under different user received energy requirements;
fig. 5 is a comparison of minimum total transmit power at the transmitting end for different qos requirements of users and different numbers of users in the embodiment.
[ detailed description ] embodiments
The present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.
In order to ensure that the total transmission power of a transmitting end in a wireless energy-carrying downlink communication scene facing to a mode division multiple access technology is minimum, the invention researches a resource allocation method based on a constraint Markov decision process. Specifically, the resource allocation problem in the wireless energy-carrying communication scene of the mode division multiple access technology is described as a constrained Markov decision process, and the constrained Markov decision problem is converted into an unconstrained Markov decision process by utilizing a Lagrangian duality theory. Finally, a Q-learning algorithm is proposed to solve the optimal solution of the unconstrained Markov decision process. Take the time-switched receiver as an example: the power allocation matrix, the subcarrier mapping matrix and the slot ratio allocated to information decoding and energy collection in the above-described scenario are adjusted to optimal values to minimize the total transmission power of the transmitter while satisfying the quality of service of each user.
Step one, constructing a system model: the system model is a wireless energy-carrying downlink communication system model based on a mode division multiple access technology and consists of a base station and a plurality of users;
the specific mode of the first step is as follows:
as shown in FIG. 1, assume that there is a base station that wirelessly transmits data and energy to T users in a particular area over K subcarriers, where
Figure BDA0002390761150000061
And
Figure BDA0002390761150000062
respectively user index and subcarrier index. In addition to this, superposition coding is employed at the transmitter and the subcarrier mapping matrix G is satisfiedPDMA∈NK×TIn which K isk={n|gk,t1 (K ∈ K) and
Figure BDA0002390761150000063
respectively the set and number of users to which the k-th sub-carrier is mapped. The mapping matrix with 3 sub-carriers and 5 users is shown in fig. 1, where K 11, 2, 3, 4 and | K1And 4. In addition, the time-switched receiver is taken as an example to solve the optimizationAnd (4) resource allocation strategies. User UtBy subcarrier HkThe received signals are:
Figure BDA0002390761150000064
wherein h isk,t=rk,tdk Is through a subcarrier HkFrom base station to user UtOf channel gain rk,tIs a small scale fading that satisfies the rayleigh distribution,
Figure BDA0002390761150000065
is large scale fading related to the distance between the base station and the user; in addition, Pk,tAnd xk,tIs to transmit a signal through a subcarrier HkLoaded to user UtPower and signal of wk,t~CN(0,σk 2) Is additive white gaussian noise.
The receiving end adopts the serial interference elimination technology according to
Figure BDA0002390761150000066
Decoding is performed in that order. In addition to the initial point of the process,
Figure BDA0002390761150000067
is the ratio of channel to noise, and CNRk,tShould satisfy
Figure BDA0002390761150000068
Then, the normalized interference is:
Figure BDA0002390761150000069
thus, the snr when the kth subcarrier is loaded to the tth user is:
Figure BDA00023907611500000610
wherein,
Figure BDA00023907611500000611
it is ensured that the decoding process is not interrupted. User UtBased on subcarrier HkThe information rate and energy obtained are respectively:
Rk,t=Bklog2(1+SINRk,t) (4)
Figure BDA0002390761150000071
of these, η is the energy harvesting efficiency, in addition, αtAnd 1- αtThe transmission slot ratios assigned to information decoding and energy collection, respectively, can be deduced that the information and energy collected by each user is:
Figure BDA0002390761150000072
Figure BDA0002390761150000073
step two, formulation of a constraint Markov decision problem: the resource allocation problem in the wireless energy-carrying communication system is converted into a constraint Markov decision problem, and the constraint Markov decision problem is converted into the unconstrained Markov decision problem by using Lagrangian dual theory.
The specific implementation manner of the second step is as follows:
the decision maker minimizes the total power of transmission at the transmitting end while meeting the energy requirements and data rate requirements received by each user at the receiving end. The resource allocation problem with user quality of service constraints is denoted as a constrained markov decision problem, which provides a corresponding resource allocation policy for each state. Next, the state space, the action space, the targets, and the constraints of the system will be described separately.
1) State space: to characterize the energy and signal received by the user, we define the state space as:
s=(SINRk,t,k=0,1,...K,t=0,1,...T)∈S=SINR (8)
wherein the state set SINR is a finite set belonging to the signal-to-interference ratio.
2) An action space: the transmitter minimizes the total power of transmission by controlling power allocation and subcarrier mapping, and the receiver by controlling the ratio of time slots allocated for information decoding and energy collection. Thus, the motion space is:
Figure BDA0002390761150000074
wherein,
Figure BDA0002390761150000075
and PPDMARespectively, the slot ratio vector and the power allocation matrix that all user receivers allocate to the decoding of the information. In addition, the first and second substrates are,
Figure BDA0002390761150000076
GPDMA∈G,PPDMAe P is discrete in the system and the α, G, P sets belong to a finite set of slot ratios, subcarrier mappings and power allocations, respectively, allocated to information decoding by all receivers.
3) Targets and constraints: the goal is to find the optimal strategy pi such that the total power transmitted, P, at the transmitting endtotalMinimum; the constraints are to meet minimum energy and data rate requirements per user. This resource allocation problem can be translated into a constrained markov decision process, i.e., P1:
Figure BDA0002390761150000081
Figure BDA0002390761150000082
Figure BDA0002390761150000083
Figure BDA0002390761150000084
Figure BDA0002390761150000085
Figure BDA0002390761150000086
Figure BDA0002390761150000087
the problem is that the total transmission power of a transmitting end is minimized by adopting a strategy pi to adaptively adjust the time slot ratio allocated to information decoding by all receivers, the subcarrier mapping and the power allocation of the transmitting end while meeting the service quality constraint of each user. In order to solve the problem of constrained Markov, the Lagrangian dual theory converts the constrained Markov problem into an unconstrained Markov process. The generalized Lagrangian function will be introduced below:
Figure BDA0002390761150000088
wherein λ ═ { λ ═ λ123,...,λt=T}、μ={μ123,...,μt=TIs a set of Lagrangian operators and the element λ123,...,λt=TAnd mu123,...,μt=TThe lagrange multipliers respectively correspond to the constraints of the energy harvested and the received data rate for each user. Considering L (λ, μ, Π) as a function of λ and μ, defined as:
Figure BDA0002390761150000089
the value of θ (Π) is P when the receiver satisfies the user quality of service constrainttotal. When the constraint is not satisfied, the two sets of Lagrangian operators are made positiveInfinite, the value of θ (Π) tends to be infinite, resulting in a function with no solution. Thus, the θ (Π) function can be described as:
Figure BDA0002390761150000091
thus, the constrained markov decision process can be relaxed to an unconstrained markov decision process, i.e.:
Figure BDA0002390761150000092
wherein,
Figure BDA0002390761150000093
and
Figure BDA0002390761150000094
additionally, pi*Is the optimal strategy. Thus, the optimal resource allocation strategy translates into a saddle point solving the function L (Π, λ, μ). Namely, (II)***) It should satisfy:
L(Π,λ**)≥L(Π***)≥L(Π*,λ,μ) (21)
since the channel transition probability is difficult to estimate, a Q learning algorithm is proposed to solve the optimal solution of the unconstrained markov decision process.
And thirdly, acquiring an optimal strategy of resource allocation based on a constraint Markov decision process in a wireless energy-carrying communication scene of a mode division multiple access technology by using a reinforcement learning method.
The specific implementation manner of the third step is as follows:
the reinforcement learning algorithm is widely applied to learning of an optimal control strategy of a model-free MDP problem, which means that environmental models such as channel conversion do not need to be considered. Therefore, the Q learning algorithm in reinforcement learning is proposed to solve the above resource allocation problem. The Q value calculation formula, the update formula, the epsilon-greedy strategy and the reward function of the Q learning algorithm will be given below respectively. For policy π, the Q value calculation formula when action a is performed at state s is:
Qπ(s,a)=Eπ[rk+1+γQπ(sk+1,ak+1)|sk=s,ak=a](22)
wherein r isk+1And γ are the prize and bonus discount coefficients obtained at time k +1, respectively. In the Q learning algorithm, the update formula of the Q value is:
Figure BDA0002390761150000101
wherein 0 < ρ < 1 is the learning rate. At state s, action a is chosen according to the strategy of ε -greedy, in order to make the best decision overall. Thus, the selection of actions follows:
Figure BDA0002390761150000102
wherein the-U (A) function randomly chooses any motion within the uniform motion space. To directly reflect the reward function of a target value, it is defined as:
Figure BDA0002390761150000103
in addition, the lagrange multiplier is calculated and updated using a secondary gradient method. After the Q value is calculated and updated, the control strategy for the problem (P2) can be described as:
Figure BDA0002390761150000104
where Q is*(s, a) is the Q value given following the optimal strategy for state s and action a.
Example (b):
the diagrams provided in the following examples and the setting of specific parameter values in the models are mainly for explaining the basic idea of the present invention and performing simulation verification on the present invention, and can be appropriately adjusted according to the actual scene and requirements in the specific application environment.
The invention relates to a wireless energy-carrying communication scene oriented to a mode division multiple access technology, wherein a transmitter and a receiver are both provided with a single antenna. The effectiveness of the proposed method is demonstrated by simulation: (1) the convergence performance of the algorithm under different learning rates is compared; (2) the total transmission power of the transmitting end varies with different algorithms as the receiving energy requirement of the user varies. Here, the proposed constrained markov process based Q learning algorithm is compared with the genetic algorithm based DBN algorithm; (3) the total transmission power at the transmitting end varies with different algorithms as the data rate requirements of the users vary. Here, the proposed constrained markov process based Q learning algorithm is compared with the genetic algorithm based DBN algorithm; (4) with the change of the number of users at the receiving end, the minimum total transmission power at the transmitting end changes with the difference of the requirements of the service quality of the users.
In the simulation, we assume that all users are distributed within a circle with a radius of 300meters, d, centered at the base stationkPath loss coefficient β is assumed to be 3.76 in order to meet the energy requirement of the receiving end, the power conversion efficiency of the receiver for energy harvesting is assumed to be η -30%20.01 w. To learn the Q value, an action set satisfying the constraints (13), (14), and (15) is set. Thus, the state space is a limited set of corresponding motion spaces. In addition, other parameters are set as: k is a radical ofmax2500,. epsilon. 0.1 and. gamma. 0.8. In the simulation process, three performance indexes are: total power transmitted at the transmitting end, energy harvested at the receiving end, and data rate. The advantages and disadvantages of the resource allocation strategy are characterized by performance indicators.
As shown in fig. 2, the convergence of the total transmission power at different learning rates was studied, and suitable algorithmic learning rates were determined, where ρ was set to 0.4, 0.5, and 0.6, respectively. In addition, the number of users and the number of subcarriers are set to 2. Furthermore, the acquired energy constraint and the data rate constraint are set to E, respectivelyreq0.1w and R req1 Mbit/s. It can be observed that the total power of transmission converges to 0.35w at different learning rates. Obviously, the convergence speed and stability are different at different learning rates. In consideration of two factors, the convergence speed and the stability, a learning rate of 0.6 is adopted. Since the algorithm adopts a greedy strategy, the total transmission power of the resource allocation scheme based on the constrained Markov process will slightly change with the increase of the iteration number, but the overall trend of the total transmission power is not affected.
As shown in fig. 3 and 4, the effectiveness of the algorithm was studied, which reflects the proposed performance comparison of the Q-learning based algorithm and the DBN algorithm at different user quality of service. In the simulation parameter setting, the number of receiving end users is set to 3. The results show that the algorithm is efficient and can significantly reduce the total transmission power.
Finally, fig. 5 shows the minimum total transmission power at the transmitting end estimated by the proposed Q learning algorithm under the constraints of different users and different user qos, where the number of users at the receiver is set to 2, 3 and 4, respectively. As shown in fig. 5, it is observed that the total power of transmission at the transmitting end tends to increase due to an increase in the quality of service of the user. In addition, as the number of users increases, the increasing trend of the minimum total transmission power of the transmitting end is gradually shown. The above results verify the effectiveness and reasonability of the algorithm.

Claims (4)

1. A resource allocation method in a wireless energy-carrying downlink communication scene facing to a mode division multiple access technology is characterized by comprising the following steps:
step one, making a constraint Markov decision process:
describing a resource allocation problem in a wireless energy-carrying communication scene facing a mode division multiple access technology as a constraint Markov decision process, and converting the problem into an unconstrained Markov decision process by using a Lagrangian dual method;
step two, solving the unconstrained Markov decision process in the step one by using a reinforcement learning method to finally obtain an optimal resource allocation strategy; the objective of this strategy is to minimize the total power of transmission at the transmitting end while satisfying the quality of service for each user at the receiving end.
2. The method according to claim 1, wherein the wireless energy-carrying downlink communication scenario is constructed as a system model, and the system model specifically comprises:
a base station carries out wireless transmission of data and energy to T users in a specific area through K subcarriers, wherein a transmitting end adopts superposition coding, a receiving end adopts a serial interference elimination technology, and the base station of the transmitting end and the users of the receiving end are matched with a single antenna; the users are randomly distributed within a circle with radius r centered at the base station.
3. The method according to claim 2, wherein the first step is specifically:
1) according to the system model, defining a state space and an action space of the system:
the state space of the system is specifically as follows:
s=(SINRk,t,k=0,1,...K,t=0,1,...T)∈S=SINR (1),
wherein, the SINRk,tThe SINR is the SINR when the kth subcarrier is loaded to the tth user, and the state set SINR belongs to a limited set of SINRs;
the action space of the system is specifically as follows:
Figure FDA0002390761140000021
wherein,
Figure FDA0002390761140000022
is a vector of transmission time ratios, P, assigned to the decoding of the information by T usersPDMAIs a power distribution matrix, GPDMAIs a sub-carrier mapping matrix that is,
Figure FDA0002390761140000023
GPDMA∈G,PPDMAe is P represents that the vector and the matrix respectively belong to a finite set of transmission time ratio, subcarrier mapping and power distribution allocated to information decoding;
2) the constrained markov decision process is detailed as follows:
Figure FDA0002390761140000024
Figure FDA0002390761140000025
Figure FDA0002390761140000026
wherein, PtotalIs the total power of transmission at the transmitting end; equations (4) and (5) represent the constraints on the quality of service for each user, i.e. the energy E received by each usertAnd a data rate RtAre required to respectively satisfy the minimum energy requirement EreqAnd a data rate requirement Rreq(ii) a The Markov decision process is described as being through an adjustment action
Figure FDA0002390761140000027
GPDMA,PPDMAMinimizing the total power of transmission at the transmitting end under the constraint of satisfying the service quality of each user;
the markov decision process can be relaxed to an unconstrained markov process, i.e.:
Figure FDA0002390761140000028
Figure FDA0002390761140000029
wherein,
Figure FDA00023907611400000210
two sets of lagrangian operators, respectively; II type*The optimal resource allocation strategy is converted into a saddle point of a solving function L (lambda, mu, Π).
4. The method as claimed in claim 2 or 3, wherein in the second step, the formula for updating the Q value in the reinforcement learning is as follows:
Figure FDA0002390761140000031
wherein r isk+1Gamma and rho < 1 < 0 are respectively the reward obtained at the moment of k +1, the reward discount coefficient and the learning rate;
the optimum function is expressed as follows:
Figure FDA0002390761140000032
wherein Q is*(s, a) is the Q value given when the optimal policy is followed for state s and action a.
CN202010113438.6A 2020-02-24 2020-02-24 A resource allocation method for wireless energy-carrying communication technology Active CN111212438B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010113438.6A CN111212438B (en) 2020-02-24 2020-02-24 A resource allocation method for wireless energy-carrying communication technology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010113438.6A CN111212438B (en) 2020-02-24 2020-02-24 A resource allocation method for wireless energy-carrying communication technology

Publications (2)

Publication Number Publication Date
CN111212438A true CN111212438A (en) 2020-05-29
CN111212438B CN111212438B (en) 2021-07-16

Family

ID=70789128

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010113438.6A Active CN111212438B (en) 2020-02-24 2020-02-24 A resource allocation method for wireless energy-carrying communication technology

Country Status (1)

Country Link
CN (1) CN111212438B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113542124A (en) * 2021-06-25 2021-10-22 西安交通大学 A Credit-Driven Cooperative Transmission Method in D2D Cache Networks
CN113938917A (en) * 2021-08-30 2022-01-14 北京工业大学 Heterogeneous B5G/RFID intelligent resource distribution system applied to industrial Internet of things
CN114222368A (en) * 2021-11-30 2022-03-22 中山大学·深圳 Multi-agent reinforcement learning method for data unloading of Internet of things
CN115085237A (en) * 2022-07-05 2022-09-20 南通大学 Hybrid energy storage self-adaptive power distribution method considering battery response speed
TWI812371B (en) * 2022-07-28 2023-08-11 國立成功大學 Resource allocation method in downlink pattern division multiple access system based on artificial intelligence

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105407535A (en) * 2015-10-22 2016-03-16 东南大学 High energy efficiency resource optimization method based on constrained Markov decision process
CN110113179A (en) * 2019-02-22 2019-08-09 华南理工大学 A kind of resource allocation methods for taking energy NOMA system based on deep learning
CN110602730A (en) * 2019-09-19 2019-12-20 重庆邮电大学 Resource allocation method of NOMA (non-orthogonal multiple access) heterogeneous network based on wireless energy carrying

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105407535A (en) * 2015-10-22 2016-03-16 东南大学 High energy efficiency resource optimization method based on constrained Markov decision process
CN110113179A (en) * 2019-02-22 2019-08-09 华南理工大学 A kind of resource allocation methods for taking energy NOMA system based on deep learning
CN110602730A (en) * 2019-09-19 2019-12-20 重庆邮电大学 Resource allocation method of NOMA (non-orthogonal multiple access) heterogeneous network based on wireless energy carrying

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
JINGCI LUO ET AL.: "A Deep Learning-Based Approach to Power Minimization in Multi-Carrier NOMA With SWIPT", 《IEEE ACCESS》 *
LIXIN LI ET AL.: "Learning-Aided Resource Allocation for Pattern Division Multiple Access Based SWIPT Systems", 《IEEE WIRELESS COMMUNICATIONS LETTERS(EARLY ACCESS)》 *
彭明遥: "《中国优秀硕士学位论文全文数据库 信息科技辑》", 15 December 2019 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113542124A (en) * 2021-06-25 2021-10-22 西安交通大学 A Credit-Driven Cooperative Transmission Method in D2D Cache Networks
CN113938917A (en) * 2021-08-30 2022-01-14 北京工业大学 Heterogeneous B5G/RFID intelligent resource distribution system applied to industrial Internet of things
CN114222368A (en) * 2021-11-30 2022-03-22 中山大学·深圳 Multi-agent reinforcement learning method for data unloading of Internet of things
CN114222368B (en) * 2021-11-30 2025-06-27 中山大学·深圳 A multi-agent reinforcement learning method for IoT data offloading
CN115085237A (en) * 2022-07-05 2022-09-20 南通大学 Hybrid energy storage self-adaptive power distribution method considering battery response speed
TWI812371B (en) * 2022-07-28 2023-08-11 國立成功大學 Resource allocation method in downlink pattern division multiple access system based on artificial intelligence

Also Published As

Publication number Publication date
CN111212438B (en) 2021-07-16

Similar Documents

Publication Publication Date Title
CN111212438A (en) A resource allocation method for wireless energy-carrying communication technology
CN109600828B (en) Self-adaptive transmission power distribution method for downlink of unmanned aerial vehicle base station
CN111446992B (en) Method for allocating resources with maximized minimum energy efficiency in wireless power supply large-scale MIMO network
CN112702792B (en) Wireless energy-carrying network uplink and downlink resource joint allocation method based on GFDM
CN108183733B (en) Beamforming optimization method based on online NOMA multi-antenna system
CN104703270B (en) User&#39;s access suitable for isomery wireless cellular network and power distribution method
CN117560043B (en) Non-cellular network power control method based on graph neural network
CN110460556B (en) Design method of wireless data and energy integrated transmission signal in orthogonal multi-carrier system
CN116321186B (en) IRS (inter-range request System) auxiliary cognition SWIPT (SWIPT) system maximum and rate resource optimization method
CN110113179A (en) A kind of resource allocation methods for taking energy NOMA system based on deep learning
CN109714806A (en) A kind of wireless power junction network optimization method of non-orthogonal multiple access
CN102571179B (en) Based on the cross-layer optimizing method for designing of incomplete channel condition information in mimo system
CN110418360B (en) Joint allocation method of multi-user subcarrier bits in wireless energy-carrying network
Li et al. Learning-aided resource allocation for pattern division multiple access-based SWIPT systems
CN111917444B (en) Resource allocation method suitable for millimeter wave MIMO-NOMA system
CN104158572B (en) A kind of green distributing antenna system communication means based on smart antenna
CN107071881B (en) A Game Theory-Based Distributed Energy Allocation Method for Small Cell Networks
CN115633402A (en) Resource scheduling method for mixed service throughput optimization
CN109787737A (en) A kind of ofdm system downlink multiuser method for optimizing resources based on mixed tensor acquisition
CN114051205A (en) Edge optimization method based on reinforcement learning dynamic multi-user wireless communication scene
CN111565394B (en) Channel binding and access method and system for Mesh network of dynamic unmanned aerial vehicle cluster
CN111654920B (en) Distributed energy efficiency subcarrier power distribution method
CN108471621B (en) Communication method based on electromagnetic wave energy supply
CN114980163A (en) Large-scale MIMO system energy efficiency optimization method based on nonlinear energy acquisition
CN114389784A (en) Migration learning-based downlink MISO-OFDMA cooperative transmission method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant