[go: up one dir, main page]

CN115086903B - Adaptive Duty Cycle Adjustment Method for Energy Harvesting Wireless Sensors Based on Fuzzy Q-learning - Google Patents

Adaptive Duty Cycle Adjustment Method for Energy Harvesting Wireless Sensors Based on Fuzzy Q-learning Download PDF

Info

Publication number
CN115086903B
CN115086903B CN202210663594.9A CN202210663594A CN115086903B CN 115086903 B CN115086903 B CN 115086903B CN 202210663594 A CN202210663594 A CN 202210663594A CN 115086903 B CN115086903 B CN 115086903B
Authority
CN
China
Prior art keywords
fuzzy
energy
eno
node
learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210663594.9A
Other languages
Chinese (zh)
Other versions
CN115086903A (en
Inventor
葛永琪
魏佳圆
袁振博
刘瑞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ningxia University
Original Assignee
Ningxia University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ningxia University filed Critical Ningxia University
Priority to CN202210663594.9A priority Critical patent/CN115086903B/en
Publication of CN115086903A publication Critical patent/CN115086903A/en
Application granted granted Critical
Publication of CN115086903B publication Critical patent/CN115086903B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/30Services specially adapted for particular environments, situations or purposes
    • H04W4/38Services specially adapted for particular environments, situations or purposes for collecting sensor information
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W52/00Power management, e.g. Transmission Power Control [TPC] or power classes
    • H04W52/02Power saving arrangements
    • H04W52/0209Power saving arrangements in terminal devices
    • H04W52/0212Power saving arrangements in terminal devices managed by the network, e.g. network or access point is leader and terminal is follower
    • H04W52/0219Power saving arrangements in terminal devices managed by the network, e.g. network or access point is leader and terminal is follower where the power saving management affects multiple terminals
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Feedback Control In General (AREA)

Abstract

本发明提供基于模糊Q‑learning的能量收集无线传感器占空比自适应调节方法,属于无线传感器技术领域。包括:建立无线传感器能量管理模型<S,A,Psa,R>;建立Q表,Q表中的值记为q(ski,aj);获取节点在t时刻的状态空间St,St=[Eh(t),Sv(t)];利用模糊推理系统计算St触发模糊规则k的触发强度ωki;通过ε‑greedy策略根据,从A中选择模糊规则k对应激活的动作aj;基于奖励函数R,计算出St执行动作aj的环境奖励R(si,aj),并进一步根据环境奖励R(si,aj)更新Q表中的q(ski,aj);基于aj和触发强度ωki计算出节点在t时刻的占空比更替值dc(t);修改节点的占空比为dc(t)并进入t+1时刻,得出新的状态空间St+1;根据新的状态空间St+1作为输入执行占空比调节操作,重复前述步骤,直至学习时间到达学习时长Ttotal

The invention provides a method for adaptively adjusting duty cycle of an energy harvesting wireless sensor based on fuzzy Q-learning, belonging to the technical field of wireless sensors. The invention comprises: establishing a wireless sensor energy management model <S,A, Psa ,R>; establishing a Q table, wherein the value in the Q table is recorded as q( ski , aj ); obtaining the state space St of the node at time t , St = [ Eh (t), Sv (t)]; using a fuzzy inference system to calculate the trigger strength ωki of the fuzzy rule k triggered by St ; selecting the action aj corresponding to the activation of the fuzzy rule k from A according to the ε-greedy strategy; calculating the environmental reward R( sij , aj ) of St performing the action aj based on the reward function R, and further updating the q( ski , aj ) in the Q table according to the environmental reward R( sij , aj ); calculating the duty cycle replacement value dc (t) of the node at time t based on aj and the trigger strength ωki ; modifying the duty cycle of the node to dc (t) and entering time t+1 to obtain a new state space St+1 ; and according to the new state space S t+1 is used as input to perform duty cycle adjustment operation, and the above steps are repeated until the learning time reaches the learning time length T total .

Description

Energy collection wireless sensor duty cycle self-adaptive adjustment method based on fuzzy Q-learning
Technical Field
The invention relates to the technical field of wireless sensors, in particular to a fuzzy Q-learning-based energy collection wireless sensor duty cycle self-adaptive adjustment method.
Background
The wireless sensor network is formed by self-organizing connection of a plurality of sensor nodes in a wireless communication mode, and the wireless sensor nodes with sensing capability serve as communication bodies in the wireless sensor network, and play a great role in the fields of environment monitoring, medical health, intelligent home, industrial control, military national defense and the like by virtue of the characteristics of low cost, low power consumption, multiple functions and the like. The wireless sensor is generally powered by a battery, and if the node is deployed in a severe environment, the battery of each device may be very expensive or unlikely to be charged, so that the service life of the wireless sensor network needs to be prolonged to the maximum extent, and a continuous power supply mode of the wireless sensor node is more feasible by collecting energy from the surrounding environment (sun, wind, vibration and the like) and converting the energy into electric energy.
The wireless sensor can continuously collect energy from the environment to store and supply energy, however, external energy alternately presents random or periodic dynamic changes along with time, so that node energy cannot be kept in a stable state all the time, if the energy collection rate is too fast, the actual consumed energy of the wireless sensor node is far lower than the collected energy, the collection and energy storage equipment can be damaged too fast, and if the energy collection rate is too low, the wireless sensor node can die due to energy consumption, so that the data safety of the whole wireless sensor network is affected.
Disclosure of Invention
In view of the above, the invention provides a fuzzy Q-learning-based energy collection wireless sensor duty cycle self-adaptive adjustment method, which is used for solving the problems of excessively fast breakage of collection and energy storage equipment and energy exhaustion of wireless sensor nodes caused by excessively high energy collection rate by adjusting the duty cycle to adapt to the energy collection rate.
The technical scheme adopted by the embodiment of the invention for solving the technical problems is as follows:
An energy harvesting wireless sensor duty cycle self-adaptive adjustment method based on fuzzy Q-learning, comprising:
Step S1, a wireless sensor energy management model < S, A, P sa, R > is established, wherein S is a state space set, A is a node sleep action space set, P sa is a probability distribution set that each state S i in S is transferred to the next state S i' through an action a j, R is a reward function, S i∈S,si′∈S,i∈[1,I],aj epsilon A, j epsilon [1, M ];
Step S2, a Q table is established, the value in the Q table is recorded as Q (S ki,aj), the Q table is initialized, wherein the Q-learning duration is specified as T total, the single-round duration is specified as T episode, the updating interval duration is delta T, and S ki is that a fuzzy rule k is adopted after the S i input fuzzy inference system;
Step S3, a state space S t,St=[Eh(t),Sv(t)],St∈S,St=si of a node at a time t is obtained, wherein E h (t) represents energy collected by an energy collecting unit of the node at the time t, and S v (t) represents super capacitor voltage of a wireless sensor at the time t;
S4, calculating triggering strength omega ki of triggering the fuzzy rule k by the S t through the fuzzy inference system, wherein k is E [1, N ];
S5, selecting an action a j which is activated correspondingly by the fuzzy rule k from the A according to an epsilon-greedy strategy;
step S6, calculating an environmental reward R (S i,aj) for the S t to execute the action a j based on the reward function R, and further updating the Q (S ki,aj) in the Q table according to the environmental reward R (S i,aj);
Step S7, calculating a duty ratio replacement value d c (t) of the node at the time t based on the a j and the trigger intensity ω ki;
Step S8, modifying the duty ratio of the node to d c (t) and entering the time t+1 to obtain a new state space S t+1,St+1=[Eh(t+1),Sv(t+1)],St+1∈S,St+1=si';
Step S9, returning to step S4, performing a duty cycle adjustment operation according to the new state space S t+1 as input, and repeating steps S4-S8 until the learning time reaches the learning duration T total.
Preferably, the probability elements in P sa are:
preferably, the step S4 of calculating, by using the fuzzy inference system, a trigger intensity ω ki of triggering the fuzzy rule k by the S t includes:
Step S41, formulating N fuzzy rules and membership functions, defining the E h (t) in the state space S t as a triangle membership function, defining the S v (t) in the state space S t as a trapezoid membership function, and enabling the fuzzy rules k to be E [1, N ];
Step S42, in which the same S i,si=[Eh(si),Sv(si as the state space S t is found, the S i is input as an input variable to the fuzzy inference system, and the triggering strength ω ki of the fuzzy rule k is calculated:
Wherein, Representing the input variable under the fuzzy rule k the E h(si in s i) by membership function calculationRepresenting the input variable under the fuzzy rule k the S v(si in S i) is a membership value calculated by a membership function.
Preferably, the step S6 of calculating an environmental reward R (S i,aj) for the step S t of performing the action a j based on the reward function R, and updating the Q (S ki,aj) in the Q table further according to the environmental reward R (S i,aj) includes:
Step S61, dividing the super capacitor voltage into low, medium, high states by a threshold value classification method;
Step S62, carrying out real-time environment rewards according to the state of S v (t), when S v (t) is in the low state,
When the S v (t) is in the medium state,
When the S v (t) is in the high state,
The symbol β and the symbol θ are calculation parameters, the ENO c is an energy neutral threshold, the ENO s is an energy neutral state of the node, and the iterative formula described by the ENO s、ENOc is:
ENOs(t+1)=ENOs(t)+Eneu(t)
ENOc(t+1)=ENOc(t)+μ×(ENOave(t)-ENOc(t))
Eneu(t)=Eh(t)–Ec(t)
wherein, E c (t) represents the energy consumed by the energy consumption unit of the node at time t, ENO ave (t) is the average value of the energy neutral values of the previous round of time period, and μ is the energy neutral threshold updating parameter;
step S63, updating the Q (S ki,aj) in the Q table according to the environmental reward R (S i,aj):
q(ski,aj)←q(ski,aj)+α·Δq(ski,aj)
Wherein, the Q (s ki,aj) is the Q value calculated value of the action a j executed by the s i based on the fuzzy rule k, the Q (s 'ki,aj) is the Q value calculated value of the action of the next state s i', the For the optimal action of the next state s i', α is the parameter learning rate and γ is the discount factor.
Preferably, the duty ratio replacement value d c (t) in the step S7 has a formula as follows:
Preferably, the a contains 4 sleep actions of different durations, a= [ a 1,a2,a3,a4 ], wherein a 1 represents 15 seconds of sleep, a 2 represents 60 seconds of sleep, a 3 represents 300 seconds of sleep, and a 4 represents 900 seconds of sleep.
Preferably, β has a value of 4, θ has a value of 2, T episode is defined as 24 hours, and Δt is set to 0.25h.
Preferably, the update frequency of the ENO ave (T) is T episode, and the calculation formula is:
ENOave(t)=0,t∈Tepisode
According to the technical scheme, the energy collecting wireless sensor duty ratio self-adaptive adjusting method based on fuzzy Q-learning provided by the embodiment of the invention comprises the steps of firstly establishing wireless sensor energy management models < S, A, P sa and R >; then a Q table is established, and the value in the Q table is marked as Q (s ki,aj); acquiring a state space S t,St=[Eh(t),Sv (t) of the node at the time t; calculating the triggering intensity omega ki of triggering the fuzzy rule k by using a fuzzy reasoning system S t; selecting an action a j which is activated correspondingly to the fuzzy rule k from the A according to an epsilon-greedy strategy; calculating S t an environmental benefit R (S i,aj) for performing action a j based on the benefit function R, and further updating Q (S ki,aj) in the Q table according to the environmental benefit R (S i,aj); calculating a duty ratio replacement value d c (t) of the node at the time t based on a j and the trigger intensity omega ki; modifying the duty ratio of the node to d c (t) and entering the time t+1 to obtain a new state space S t+1; the duty cycle adjustment operation is performed as input according to the new state space S t+1, and the foregoing steps are repeated until the learning time reaches the learning duration T total. The problem that the energy collection and energy storage equipment is damaged too quickly due to the fact that the energy collection rate is too high and the wireless sensor node is exhausted due to the fact that the energy collection rate is too low can be solved through adjusting the duty ratio to adapt to the energy collection rate in the Q-learning process.
Drawings
FIG. 1 is a flow chart of a method for adaptively adjusting the duty cycle of an energy harvesting wireless sensor based on fuzzy Q-learning.
FIG. 2 is a block diagram of a method for adaptively adjusting the duty cycle of an energy harvesting wireless sensor based on fuzzy Q-learning.
Detailed Description
The technical scheme and technical effects of the present invention are further elaborated below in conjunction with the drawings of the present invention.
Maintaining the wireless sensor node in an energy neutral state for a long period is an effective solution for realizing sustainable operation of the node, if the energy collected by the node is kept to be greater than or equal to the consumed energy, the node is said to be in the energy neutral state, as shown in formula (1), and if the difference between the consumed energy and the collected energy of the node approaches zero in an ideal state, the operation is said to be energy neutral operation:
Eneu=Eh(t)–Ec(t) (1)
As shown in fig. 1 and 2, the invention provides a fuzzy Q-learning-based energy collection wireless sensor duty cycle self-adaptive adjustment method, which is used for adjusting the duty cycle of the work of an energy collection wireless sensor node to influence the energy storage rate of the node, and dynamically adjusting the energy storage rate to enable the node to be in an energy neutral state. The wireless sensor node for energy collection mainly comprises an energy collector, an energy storage and an energy consumption unit, wherein E h (t) represents energy collected by the energy collection unit at the time t, E r (t) represents energy remained by the energy storage unit at the time t, and E c (t) represents energy consumed by the energy consumption unit at the time t. The method of the invention comprises the following specific implementation steps:
Step S1, establishing a wireless sensor energy management model < S, A, P sa, R >, wherein S is a state space set, A is a node sleep action space set, P sa is a probability distribution set of each state S i in S to be transferred to the next state S i' through an action a j, R is a reward function, S i∈S,sI′∈S,i∈[1,I],aj epsilon A, j epsilon [1, M ];
Step S2, a Q table is established, the value in the Q table is recorded as Q (S ki,aj), the Q table is initialized, wherein the Q-learning duration is specified as T total, the single-round duration is specified as T episode, the updating interval duration is delta T, and S ki is S i, and a fuzzy rule k is adopted after the fuzzy inference system is input;
Step S3, a state space S t,St=[Eh(t),Sv(t)],St∈S,St=si of the node at the time t is obtained, wherein S v (t) represents the super capacitor voltage of the wireless sensor at the time t;
s4, calculating triggering strength omega ki of triggering the fuzzy rule k by using a fuzzy inference system S t, wherein k is E [1, N ];
S5, selecting an action a j which is activated correspondingly to the fuzzy rule k from the A according to an epsilon-greedy strategy; the intelligent agent selects actions from an action space according to an epsilon-greedy strategy in the current state, the action with the largest Q value is selected from a Q table according to the epsilon probability, the probability of 1 epsilon randomly selects a certain action, and the action selected by the intelligent agent affects the calculation output of the duty ratio;
Step S6, calculating S t environmental rewards R (S i,aj) for executing action a j according to S v (t) based on the rewards function R, and further updating Q (S ki,aj) in the Q table according to the environmental rewards R (S i,aj);
Step S7, calculating a duty ratio replacement value d c (t) of the node at the time t based on a j and the trigger intensity ω ki:
Step S8, the duty ratio of the modified node is d c (t) and the time of entering t+1 is entered, and a new state space S t+1,St+1=[Eh(t+1),Sv(t+1)],St+1∈S,St+1=sI' is obtained;
step S9, returning to step S4, performing the duty cycle adjustment operation according to the new state space S t+1 as input, and repeating steps S4-S8 until the learning time reaches the learning duration T total.
In the present invention, the state space S t、St+1 can find the state space with the same element value in the state space set S, S I ' represents the next state of S i, and the state space set S has the state space with the same element value as S I ', so S i ' S can be considered.
In an embodiment, T episode is defined as 24 hours and Δt is set to 0.25h.
Step S1 defines an energy harvesting wireless sensor energy management model < S, a, P sa, R > quadruple based on a Markov Decision Process (MDP), wherein:
(1) S denotes the state space, the state of the node at time unit t is defined as the collected energy E h (t) and the supercapacitor voltage S v (t), so the state space S is expressed as:
S=[Eh(t),Sv(t)] (3)
(2) A represents an action space, the actions executed in the state S t are designed to be sleep actions with different node durations, and in an embodiment, the sleep actions can be divided into 4 kinds and expressed as follows
A=[a1,a2,a3,a4] (4)
For example, a 1 represents 15 seconds of sleep, a 2 represents 60 seconds of sleep, a 3 represents 300 seconds of sleep, and a 4 represents 900 seconds of sleep;
(3) P sa represents the probability distribution that the node will transition to other states after action a j under the action of the current state S t ε S, and the probability that the node takes action a j to reach S I' in state S t is represented as
(4) R represents a reward function, in response to the node' S rationality in performing action a j at state S t, the environment provides a reward scalar R (S i,aj) for evaluating the quality of the action. The core idea of the bonus function is to constrain the current energy neutral state ENO s of the node with an energy neutral threshold ENO c.
The fuzzy inference system provides a mapping from input to output based on a set of fuzzy rules and associated fuzzy membership functions. The rule base of the fuzzy inference system generally consists of a plurality of preset rules, and the fuzzy inference rule R j which correlates the state vector with the action has the following form:
Rj:IFs is in sjTHEN the action is a1 with q(s1i,a1)
or…
or the action is ak with q(ski,aj)
R k represents the kth rule, where Q (s ki,ai) is the Q value of the state action pair (s ki,aj) in the Q table.
The input linguistic variable E h(t)={poor,fair,good},Sv (t) = { lite, medium, high } of the If conditional statement is divided into different sets, e.g. "face" set in E h (t) represents weaker energy collected, and "lite" set in S v (t) represents less energy remaining. The membership function is responsible for blurring clear input variables and calculating membership degrees of different variables in different sets, and a membership degree value belonging to each rule is generated under each rule. Thus, the specific implementation of step S4 to calculate S t the trigger intensity ω ki of the trigger fuzzy rule k using the fuzzy inference system includes:
Step S41, setting N fuzzy rules and membership functions, defining E h (t) in a state space S t as a triangle membership function, defining S v (t) in a state space S t as a trapezoid membership function, and setting a fuzzy rule k E [1, N ];
Step S42, find S i,si=[Eh(si),Sv(si identical to the state space S t in S), input S i as input variable to the fuzzy inference system, calculate the triggering strength ω ki of the fuzzy rule k:
Wherein, E h(si in the input variables s i under the representative fuzzy rule k) is calculated by a membership function to obtain a membership value,S v(si in the input variable S i under the representative fuzzy rule k) is a membership value calculated by a membership function.
E h (t) and S v (t) are used as fuzzy input variables and are also used as state space of an intelligent agent, after the input variables are fuzzified, the intelligent agent selects an action a j under a rule k and receives an incentive R (S i,aj) from the environment at a time t+1, a incentive function is responsible for giving negative rewards to the action deviating from the energy neutral state and giving positive rewards to the action meeting the requirement of the energy neutral state, so that the setting of a rewarding mechanism is beneficial to maintaining the energy neutral state of a node, a step S6 is used for calculating S t the environmental incentive R (S i,aj) for executing the action a j, and further the specific implementation for updating Q (S ki,aj) in a Q table according to the environmental incentive R (S i,aj) comprises:
step S61, dividing the voltage of the super capacitor into low, medium, high states by a threshold value classification method;
step S62 performs real-time environmental rewards according to the state in which S v (t) is located, when S v (t) is in the low state,
When S v (t) is in the medium state,
When S v (t) is in the high state,
In the embodiment, the value of β is 4, the value of θ is 2, ENO c is an energy neutral threshold, ENO s is an energy neutral state of a node, ENO s is a negative value to represent that the energy currently consumed by the node is greater than the collected energy, and the iterative formula of ENO s、ENOc is as follows:
ENOs(t+1)=ENOs(t)+Eneu(t) (10)
ENOc(t+1)=ENOc(t)+μ×(ENOave(t)-ENOc(t)) (11)
Wherein E c (t) represents the energy consumed by the energy consumption unit of the node at the time t, ENO ave (t) is the average value of the energy neutral values of the previous round time period, and mu is the energy neutral threshold updating parameter; the intelligent agent obtains the reward, then reaches the next new state and updates each action selected by rules, and under the new state space, the fuzzy inference system fuzzifies the state vector again, and the intelligent agent performs action selection again, and sequentially and circularly changes until the process is finished;
Step S63, updating Q (S ki,aj) in the Q table according to the environmental rewards R (S i,aj):
q(ski,aj)←q(ski,aj)+α·Δq(ski,aj) (12)
Where Q (s ki,aj) is the Q value calculated value of the action a j performed by s i based on the fuzzy rule k, Q (s 'ki,aj) is the Q value calculated value of the action of the next state s i', For the optimal action of the next state s i', α is the parameter learning rate and γ is the discount factor.
In this embodiment, the update frequency of ENO ave (T) is T episode, that is, the new ENO ave (T) is obtained by calculating once at the end of each round, where the calculation formula is:
ENOave(t)=0,t∈Tepisode (17)
through the scheme, the node can continuously operate by setting the energy neutral threshold while the energy neutral state of the node is ensured.
Referring to fig. 2 together, the system structure shown in fig. 2 may operate as follows:
According to the energy collection wireless sensor duty cycle self-adaptive adjustment method based on fuzzy Q-learning, the node duty cycle is used as a specific adjustment object, so that the energy neutral state of the node in the duty cycle self-adaptive strategy floats around the energy neutral threshold value, the energy collection rate is adjusted, the node is maintained in the energy neutral state, and sustainable operation is maintained. The agent in reinforcement learning is used as a director in the node decision process to decide the action executed by the node in the current state. Because the collected energy is continuously changed and is difficult to predict, the action judgment of the intelligent agent can be influenced, so that the fuzzy logic and reinforcement learning are combined, the action selection of the intelligent agent is restrained by formulating a fuzzy rule, the formulation of the fuzzy rule is beneficial to reducing the trial-and-error action of the intelligent agent, and the convergence of the node energy neutral state is accelerated.
And the sustainable operation of the node is realized by setting the energy neutral threshold while the energy neutral state of the node is ensured. The invention has the following two advantages: (1) And in the operation of the sensor, node energy consumption and energy neutral threshold constraint of the energy storage unit are considered, so that node death caused by low energy neutral performance of node energy storage is avoided. (2) The fuzzy reasoning system is combined with reinforcement learning, and priori knowledge is provided for the intelligent agent when the intelligent agent performs action exploration, so that the speed of converging the node to the energy neutral threshold value is improved.
The foregoing disclosure is illustrative of the preferred embodiments of the present invention, and is not to be construed as limiting the scope of the invention, as it is understood by those skilled in the art that all or part of the above-described embodiments may be practiced with equivalents thereof, which fall within the scope of the invention as defined by the appended claims.

Claims (5)

1. The energy collection wireless sensor duty cycle self-adaptive adjustment method based on fuzzy Q-learning is characterized by comprising the following steps of:
Step S1, a wireless sensor energy management model < S, A, P sa, R > is established, wherein S is a state space set, A is a node sleep action space set, P sa is a probability distribution set that each state S i in S is transferred to the next state S' i through an action a j, R is a reward function, S i∈S,s′i∈S,i∈[1,I],aj epsilon A, j epsilon [1, M ];
Step S2, a Q table is established, the value in the Q table is recorded as Q (S ki,aj), the Q table is initialized, wherein the Q-learning duration is specified as T total, the single-round duration is specified as T episode, the updating interval duration is delta T, and S ki is that a fuzzy rule k is adopted after the S i input fuzzy inference system;
Step S3, a state space S t,St=[Eh(t),Sv(t)],St∈S,St=si of a node at a time t is obtained, wherein E h (t) represents energy collected by an energy collecting unit of the node at the time t, and S v (t) represents super capacitor voltage of a wireless sensor at the time t;
S4, calculating triggering strength omega ki of triggering the fuzzy rule k by the S t through the fuzzy inference system, wherein k is E [1, N ];
S5, selecting an action a j which is activated correspondingly by the fuzzy rule k from the A according to an epsilon-greedy strategy;
step S6, calculating an environmental reward R (S i,aj) for the S t to execute the action a j based on the reward function R, and further updating the Q (S ki,aj) in the Q table according to the environmental reward R (S i,aj);
Step S7, calculating a duty ratio replacement value d c (t) of the node at the time t based on the a j and the trigger intensity ω ki;
Step S8, modifying the duty ratio of the node to d c (t) and entering the time t+1 to obtain a new state space S t+1,St+1=[Eh(t+1),Sv(t+1)],St+1∈S,St+1=s′i;
Step S9, returning to the execution step S4, executing the duty ratio adjustment operation according to the new state space S t+1 as input, and repeatedly executing the steps S4-S8 until the learning time reaches the learning duration T total; the step S4 of calculating, by using the fuzzy inference system, the trigger intensity ω ki of triggering the fuzzy rule k by the S t includes:
Step S41, formulating N fuzzy rules and membership functions, defining the E h (t) in the state space S t as a triangle membership function, defining the S v (t) in the state space S t as a trapezoid membership function, and enabling the fuzzy rules k to be E [1, N ];
Step S42, in which the same S i,si=[Eh(si),Sv(si as the state space S t is found, the S i is input as an input variable to the fuzzy inference system, and the triggering strength ω ki of the fuzzy rule k is calculated:
Wherein, Representing the input variable under the fuzzy rule k the E h(si in s i) by membership function calculationRepresenting the S v(si) of the input variable in S i under the fuzzy rule k through membership function calculation;
The step S6 of calculating an environmental benefit R (S i,aj) for the step S t to perform the action a j based on the benefit function R, and further updating the Q (S ki,aj) in the Q table according to the environmental benefit R (S i,aj) includes:
Step S61, dividing the super capacitor voltage into low, medium, high states by a threshold value classification method;
Step S62, carrying out real-time environment rewards according to the state of S v (t), when S v (t) is in the low state,
When the S v (t) is in the medium state,
When the S v (t) is in the high state,
Wherein, the symbol beta and the symbol theta are calculation parameters, the ENO c is an energy neutral threshold, the ENO s is an energy neutral state of the node, and an iterative formula of the ENO s、ENOc is as follows:
ENOs(t+1)=ENOs(t)+Eneu(t)
ENOc(t+1)=ENOc(t)+μ×(ENOave(t)-ENOc(t))
Eneu(t)=Eh(t)–Ec(t)
wherein, E c (t) represents the energy consumed by the energy consumption unit of the node at time t, ENO ave (t) is the average value of the energy neutral values of the previous round of time period, and μ is the energy neutral threshold updating parameter;
step S63, updating the Q (S ki,aj) in the Q table according to the environmental reward R (S i,aj):
q(ski,aj)←q(ski,aj)+α·Δq(ski,aj)
Wherein, the Q (s ki,aj) is the Q value calculated value of the action a j executed by the s i based on the fuzzy rule k, the Q (s 'ki,aj) is the Q value calculated value of the action of the next state s' i, the For the optimal action of the next state s' i, α is the parameter learning rate, γ is the discount factor;
The calculation formula of the duty ratio replacement value d c (t) in the step S7 is as follows:
2. the energy harvesting wireless sensor duty cycle adaptive adjustment method based on fuzzy Q-learning of claim 1, wherein the probability elements in P sa are:
3. The adaptive adjustment method of duty cycle of energy harvesting wireless sensor based on fuzzy Q-learning of claim 2, wherein a comprises 4 sleep actions of different durations, a= [ a 1,a2,a3,a4 ], wherein a 1 represents 15 seconds of sleep, a 2 represents 60 seconds of sleep, a 3 represents 300 seconds of sleep, and a 4 represents 900 seconds of sleep.
4. The energy harvesting wireless sensor duty cycle adaptive adjustment method based on fuzzy Q-learning of claim 3, wherein the update frequency of ENO ave (T) is T episode, and the calculation formula is:
ENOave(t)=0,t∈Tepisode
5. The adaptive adjustment method of duty cycle of an energy harvesting wireless sensor based on fuzzy Q-learning of claim 4, wherein β is 4, θ is 2, T episode is defined as 24 hours, and Δt is set as 0.25 hours.
CN202210663594.9A 2022-06-10 2022-06-10 Adaptive Duty Cycle Adjustment Method for Energy Harvesting Wireless Sensors Based on Fuzzy Q-learning Active CN115086903B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210663594.9A CN115086903B (en) 2022-06-10 2022-06-10 Adaptive Duty Cycle Adjustment Method for Energy Harvesting Wireless Sensors Based on Fuzzy Q-learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210663594.9A CN115086903B (en) 2022-06-10 2022-06-10 Adaptive Duty Cycle Adjustment Method for Energy Harvesting Wireless Sensors Based on Fuzzy Q-learning

Publications (2)

Publication Number Publication Date
CN115086903A CN115086903A (en) 2022-09-20
CN115086903B true CN115086903B (en) 2024-06-14

Family

ID=83251144

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210663594.9A Active CN115086903B (en) 2022-06-10 2022-06-10 Adaptive Duty Cycle Adjustment Method for Energy Harvesting Wireless Sensors Based on Fuzzy Q-learning

Country Status (1)

Country Link
CN (1) CN115086903B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117412367B (en) * 2023-09-18 2024-10-01 宁夏大学 Energy management method for energy harvesting wireless sensor
CN119966048B (en) * 2025-01-14 2025-10-10 杭州电子科技大学 Energy control method for vibration energy self-powered wireless sensor

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108319286B (en) * 2018-03-12 2020-09-22 西北工业大学 Unmanned aerial vehicle air combat maneuver decision method based on reinforcement learning
US11665777B2 (en) * 2018-09-28 2023-05-30 Intel Corporation System and method using collaborative learning of interference environment and network topology for autonomous spectrum sharing
WO2020069534A1 (en) * 2018-09-29 2020-04-02 Brainworks Data representations and architectures, systems, and methods for multi-sensory fusion, computing, and cross-domain generalization
CN112822781B (en) * 2021-01-20 2022-04-12 重庆邮电大学 A resource allocation method based on Q-learning
CN114374977B (en) * 2022-01-13 2025-02-07 广州致为网络科技有限公司 A coexistence method based on Q-learning in non-cooperative environment

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
A Fuzzy Q-learning Based Power Management for Energy Harvest Wireless Sensor Node;oy Chaoming Hsu等;2018 International Conference on High Performance Computing & Simulation (HPCS);20181101;全文 *
无线传感器网络能量优化策略综述;杨光友;黄森茂;马志艳;徐显金;;湖北工业大学学报;20130415(02);全文 *

Also Published As

Publication number Publication date
CN115086903A (en) 2022-09-20

Similar Documents

Publication Publication Date Title
CN111884213B (en) Power distribution network voltage adjusting method based on deep reinforcement learning algorithm
CN115086903B (en) Adaptive Duty Cycle Adjustment Method for Energy Harvesting Wireless Sensors Based on Fuzzy Q-learning
CN114217524B (en) A real-time adaptive decision-making method for power grid based on deep reinforcement learning
CN114370698A (en) Indoor thermal environment learning efficiency improvement optimization control method based on reinforcement learning
CN114048576B (en) Intelligent control method for energy storage system for stabilizing power transmission section tide of power grid
CN117412367B (en) Energy management method for energy harvesting wireless sensor
CN109548044B (en) DDPG (distributed data group pg) -based bit rate optimization method for energy-collectable communication
CN118523318B (en) Black-start wind power prediction method based on super-capacity energy storage
Hsu et al. A fuzzy q-learning based power management for energy harvest wireless sensor node
CN111246438B (en) Method for selecting relay node in M2M communication based on reinforcement learning
CN114566971A (en) Real-time optimal power flow calculation method based on near-end strategy optimization algorithm
CN110191432B (en) IoT-based intelligent monitoring system
CN117833316A (en) A method for dynamic optimization operation of energy storage on user side
Rioual et al. Design and comparison of reward functions in reinforcement learning for energy management of sensor nodes
CN113191487A (en) Self-adaptive continuous power control method based on distributed PPO algorithm
CN110337082A (en) Transmission rate adjustment method of wireless sensor network for poultry breeding monitoring based on environment perception learning strategy
CN115663931A (en) Real-time Optimal Scheduling Method for AC-DC Distribution Network Based on Dual-Agent Reinforcement Learning
Krömer et al. Harvesting-aware control of wireless sensor nodes using fuzzy logic and differential evolution
Ardiansyah et al. Q-Learning Energy Management System (Q-EMS) in Wireless Sensor Network
Rioual et al. Reinforcement-learning approach guidelines for energy management
CN118739283A (en) A low-carbon economic dispatching method for controllable flexible load of photovoltaic storage direct-flexible system
CN117974185A (en) Economic dispatch decision-making method based on the fusion of physical model and value neural network
CN109121221B (en) Method for wireless energy distribution and user scheduling
CN117724370A (en) Control method and device for electric equipment
CN112383893B (en) Time-sharing-based wireless power transmission method for chargeable sensing network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant