CN118801756A - Intelligent control method of permanent magnet synchronous motor servo system in semi-closed loop scenario - Google Patents
Intelligent control method of permanent magnet synchronous motor servo system in semi-closed loop scenario Download PDFInfo
- Publication number
- CN118801756A CN118801756A CN202410774423.2A CN202410774423A CN118801756A CN 118801756 A CN118801756 A CN 118801756A CN 202410774423 A CN202410774423 A CN 202410774423A CN 118801756 A CN118801756 A CN 118801756A
- Authority
- CN
- China
- Prior art keywords
- control
- output
- motor
- controller
- feedback
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- H—ELECTRICITY
- H02—GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
- H02P—CONTROL OR REGULATION OF ELECTRIC MOTORS, ELECTRIC GENERATORS OR DYNAMO-ELECTRIC CONVERTERS; CONTROLLING TRANSFORMERS, REACTORS OR CHOKE COILS
- H02P21/00—Arrangements or methods for the control of electric machines by vector control, e.g. by control of field orientation
- H02P21/22—Current control, e.g. using a current control loop
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/092—Reinforcement learning
-
- H—ELECTRICITY
- H02—GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
- H02P—CONTROL OR REGULATION OF ELECTRIC MOTORS, ELECTRIC GENERATORS OR DYNAMO-ELECTRIC CONVERTERS; CONTROLLING TRANSFORMERS, REACTORS OR CHOKE COILS
- H02P21/00—Arrangements or methods for the control of electric machines by vector control, e.g. by control of field orientation
- H02P21/0003—Control strategies in general, e.g. linear type, e.g. P, PI, PID, using robust control
-
- H—ELECTRICITY
- H02—GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
- H02P—CONTROL OR REGULATION OF ELECTRIC MOTORS, ELECTRIC GENERATORS OR DYNAMO-ELECTRIC CONVERTERS; CONTROLLING TRANSFORMERS, REACTORS OR CHOKE COILS
- H02P21/00—Arrangements or methods for the control of electric machines by vector control, e.g. by control of field orientation
- H02P21/05—Arrangements or methods for the control of electric machines by vector control, e.g. by control of field orientation specially adapted for damping motor oscillations, e.g. for reducing hunting
-
- H—ELECTRICITY
- H02—GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
- H02P—CONTROL OR REGULATION OF ELECTRIC MOTORS, ELECTRIC GENERATORS OR DYNAMO-ELECTRIC CONVERTERS; CONTROLLING TRANSFORMERS, REACTORS OR CHOKE COILS
- H02P21/00—Arrangements or methods for the control of electric machines by vector control, e.g. by control of field orientation
- H02P21/13—Observer control, e.g. using Luenberger observers or Kalman filters
-
- H—ELECTRICITY
- H02—GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
- H02P—CONTROL OR REGULATION OF ELECTRIC MOTORS, ELECTRIC GENERATORS OR DYNAMO-ELECTRIC CONVERTERS; CONTROLLING TRANSFORMERS, REACTORS OR CHOKE COILS
- H02P21/00—Arrangements or methods for the control of electric machines by vector control, e.g. by control of field orientation
- H02P21/14—Estimation or adaptation of machine parameters, e.g. flux, current or voltage
-
- H—ELECTRICITY
- H02—GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
- H02P—CONTROL OR REGULATION OF ELECTRIC MOTORS, ELECTRIC GENERATORS OR DYNAMO-ELECTRIC CONVERTERS; CONTROLLING TRANSFORMERS, REACTORS OR CHOKE COILS
- H02P25/00—Arrangements or methods for the control of AC motors characterised by the kind of AC motor or by structural details
- H02P25/02—Arrangements or methods for the control of AC motors characterised by the kind of AC motor or by structural details characterised by the kind of motor
- H02P25/022—Synchronous motors
- H02P25/024—Synchronous motors controlled by supply frequency
-
- H—ELECTRICITY
- H02—GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
- H02P—CONTROL OR REGULATION OF ELECTRIC MOTORS, ELECTRIC GENERATORS OR DYNAMO-ELECTRIC CONVERTERS; CONTROLLING TRANSFORMERS, REACTORS OR CHOKE COILS
- H02P27/00—Arrangements or methods for the control of AC motors characterised by the kind of supply voltage
- H02P27/04—Arrangements or methods for the control of AC motors characterised by the kind of supply voltage using variable-frequency supply voltage, e.g. inverter or converter supply voltage
- H02P27/06—Arrangements or methods for the control of AC motors characterised by the kind of supply voltage using variable-frequency supply voltage, e.g. inverter or converter supply voltage using DC to AC converters or inverters
- H02P27/08—Arrangements or methods for the control of AC motors characterised by the kind of supply voltage using variable-frequency supply voltage, e.g. inverter or converter supply voltage using DC to AC converters or inverters with pulse width modulation
- H02P27/085—Arrangements or methods for the control of AC motors characterised by the kind of supply voltage using variable-frequency supply voltage, e.g. inverter or converter supply voltage using DC to AC converters or inverters with pulse width modulation wherein the PWM mode is adapted on the running conditions of the motor, e.g. the switching frequency
-
- H—ELECTRICITY
- H02—GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
- H02P—CONTROL OR REGULATION OF ELECTRIC MOTORS, ELECTRIC GENERATORS OR DYNAMO-ELECTRIC CONVERTERS; CONTROLLING TRANSFORMERS, REACTORS OR CHOKE COILS
- H02P2207/00—Indexing scheme relating to controlling arrangements characterised by the type of motor
- H02P2207/05—Synchronous machines, e.g. with permanent magnets or DC excitation
- H02P2207/055—Surface mounted magnet motors
Landscapes
- Engineering & Computer Science (AREA)
- Power Engineering (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Control Of Ac Motors In General (AREA)
Abstract
The invention discloses a control method of a permanent magnet synchronous motor driving servo system under a semi-closed loop scene, which aims at a semi-closed loop control scene which can only measure motor feedback signals such as rotor angle of a permanent magnet synchronous motor and cannot measure the actual position of a load mechanism, and provides a high-precision signal tracking method by adopting an edge intelligent control algorithm driven by deep reinforcement learning under a FOC (field oriented control) control framework of the permanent magnet synchronous motor, wherein a near-end strategy optimization algorithm is used for training a tuning strategy network, a traditional three-loop control strategy is fused, so that the three-loop instruction and feedback quantity of the permanent magnet synchronous motor are observed, and the control voltage of the motor is output, so that the control precision of the traditional three-loop control in the face of a high-order nonlinear load model is improved, and the safety in the running process is ensured.
Description
Technical Field
The invention belongs to a control technology of a permanent magnet synchronous motor, relates to a computer control system technology, in particular to a position control algorithm of the permanent magnet synchronous motor in an industrial Internet, and particularly relates to an intelligent control method of a servo system of the permanent magnet synchronous motor in a semi-closed loop scene.
Background
The permanent magnet synchronous motor has the advantages of compact structure, high efficiency and power density, good speed regulation performance and the like, and is widely applied to the fields of industrial Internet, electric traffic, industrial robots, aerospace and the like. A common magnetic field directional control (FOC) strategy in a permanent magnet synchronous motor position servo system is often based on a PID three-ring controller to control the rotation angle of the motor. While the low complexity and stability of algorithms have found wide application in industrial practice, PID tricyclic controllers also face a number of challenges and disadvantages when faced with some complex and special driving scenarios:
Firstly, when a servo motor drives a plurality of high-order nonlinear load mechanisms, parameters of a PID controller are difficult to determine through a model and an index, so that trial-and-error adjustment is needed to be carried out depending on manual experience, and good performance is difficult to obtain in terms of dynamic response. Secondly, considering the motor system limiting the rotational speed, current and inverter output voltage threshold limits, when the controller receives highly dynamic position, speed, current commands, the limiting of the commands will further degrade the performance of the controller. Most challenging is that the special equipment facing the invention cannot be provided with an external sensor or can not reliably measure the actual position of the load mechanism due to the special working environment, and the feedback obtained by the controller is only the motor rotating shaft angle obtained by detecting the magnetic encoder, namely a semi-closed loop control system. The semi-closed loop scenario may cause a significant performance degradation in conventional PID controllers that control based on feedback errors, as compared to closed loop control that monitors the final actuator. For example, in the invention, the motor rotating shaft drives the screw rod to feed or shrink through the gear, so that the swing mechanism connected with the triangular connecting rod structure deflects a certain angle. The swing angle of the swing mechanism cannot be measured by a sensor, and the rotation angle of the motor and the swing angle of the swing mechanism are influenced by the elastic motion of the screw rod, so that a high-order nonlinear dynamics equation is involved, and the function is difficult to simply express. Thus, the control algorithm cannot get the actual angle feedback to form a closed loop control.
The prior art also comprises an electric servo position feedback dynamic tuning method (publication number is CN 117335700A) based on deep reinforcement learning in a semi-closed loop scene, the technology is used for determining the state space and action setting of reinforcement learning agents on the basis of PID three-loop control, then the tracking precision and response speed of a load swing angle to an instruction target angle under the control of the semi-closed loop are improved to serve as targets, simulation software is used for modeling a system model, and a tuning strategy network for motor position feedback is obtained by using the deep reinforcement learning method so as to output an optimal tuning value. Taking this scheme as an example, it can be known that: the three-loop control scheme usually adds clipping at the output position of each loop to ensure control safety in extreme states, however this also limits the effect of the agent, since the agent cannot optimize the original three-loop control scheme by tuning once a certain loop of the three loops reaches clipping.
Disclosure of Invention
The invention aims to: aiming at the defects and problems of the PID three-loop control method in the case of the semi-closed loop of the permanent magnet synchronous motor in the industrial Internet and facing the high-order nonlinear load, the invention provides the control method of the industrial Internet servo system driven by deep reinforcement learning in the semi-closed loop scene.
The technical scheme is as follows: the intelligent control method of the permanent magnet synchronous motor servo system in the semi-closed loop scene comprises the steps of constructing a permanent magnet synchronous motor operation model based on a FOC control frame, a load and a mathematical model of a transmission mechanism of the load, then determining a state space and action setting by using a reinforcement learning intelligent body based on a three-loop structure, aiming at improving the tracking precision and response speed of a load swing angle to an instruction target angle in the semi-closed loop scene, modeling a system model by using simulation software, and obtaining a tuning strategy network for motor position feedback by using a deep reinforcement learning method so as to output optimal control voltage;
the method further comprises the steps of pre-training a model through empirical data acquired by a sensor in a test process, and then determining a strategy gradient algorithm based on double delay depth to optimize a strategy network, so that the strategy network can take a real-time observable feedback value or a computable value of a motor as a state quantity, and calculate an optimal PMSM control voltage, and a permanent magnet synchronous motor drives a transmission device to drive a servo system to respond to an instruction;
In the process of determining the state space and the action setting by using the reinforcement learning agent, the anti-interference capability of a control system is improved by using an active disturbance rejection algorithm at a position loop, PI control is utilized at a speed loop, a strategy network is trained by using a reinforcement learning algorithm at a current loop, and voltage applied to a PMSM is decided to be output by observing the feedback position input by the position loop and the output reference rotating speed, the feedback speed input by the speed loop and the output reference current and the feedback current, so that the system performance is improved, and the PPO reinforcement learning algorithm is used for optimizing the agent;
according to the method, the PPO reinforcement learning algorithm with continuous use states and action spaces is used for optimizing an intelligent body, so that the intelligent body observes the gap between the current PMSM state and a given reference value, predicts the deviation between the current PMSM state and the actual screw rod precession length theta loc (t) under the condition that the current PMSM state and the given reference value only depend on the approximate screw rod precession length as input, and outputs the voltage for controlling the PMSM so as to optimize the error between a system and a given instruction in the motor control process and relieve the problem of insufficient control precision caused by position feedback errors.
Further, the method comprises the implementation steps of:
S1, constructing a permanent magnet synchronous motor operation model based on an FOC control framework, wherein an FOC strategy of the permanent magnet synchronous motor decomposes the phase change quantity of the motor into a magnetic field component and a torque component, and independently controls the magnetic field component and the torque component of the motor to realize accurate control of the magnetic field and the torque of the motor;
Under FOC control, the motor system takes the voltage command values of the d and q axes from the controller as input, and inputs the power moment to the load mechanism, which is expressed as the rotation angle of the motor rotor;
s2, constructing a mathematical model of the load and a transmission mechanism thereof, wherein the motor load model takes into consideration the dynamic equations of the transmission mechanism, such as elastic deformation and a swinging mechanism, and also takes into consideration nonlinear factors including coulomb friction and moment transmission in a triangular connecting rod mechanism;
S3, considering the PMSM and the high-order nonlinear load model in the modeling, and constructing an edge intelligent control scheme driven by deep reinforcement learning on the FOC control frame;
In the control scheme, an ADRC controller receives an external input reference value of a controlled object, and outputs a reference value controlled by a position and the change rate of the reference value through a differential controller; meanwhile, the ADRC controller receives feedback of the controlled object measuring element, and because certain noise and oscillation in the time domain exist in the feedback during measurement, the accurate output of the follow-up deep reinforcement learning intelligent body can be influenced, so that the ADRC controller firstly inputs a feedback signal into the state expansion observer so as to predict the noise contained in the feedback signal and removes the noise by using a corresponding feedforward mechanism; finally, the ADRC controller performs difference on the outputs of the differential controller and the state expansion observer to output a reference value of the next ring;
the PI controller makes a difference between the reference value of ADRC and the feedback value of the measuring element, and linearly superimposes the difference value and the integral value in time as an output control quantity to the next executing mechanism;
S4, constructing a control scheme of a permanent magnet synchronous motor driven by deep reinforcement learning, training by using PPO, and driving a transmission device by using control voltage of an intelligent body output PMSM to drive a servo system to move so as to minimize a reference instruction and long-term errors of the servo system;
The observed value of the intelligent agent is determined as the output theta v1(t),θv2 (t) of the differential controller of ADRC, the output e 1(t),e2 (t) of the reference signal generator and the converted screw rod precession length of the command deflection angle alpha * (t) And feedback ofError e θ (t), reference rotational speed ω * (t) and reference currentAnd corresponding rotational speed feedback ω (t) and current feedback i q (t).
Considering that although the ADRC controller is able to predict the noise present in the feedback signal and use feedforward to compensate for the effects of the noise, the aforementioned agent observations still have varying degrees of oscillation in the time domain, mainly from the fast-varying rotational speed ω and current i q, which can lead to agent output oscillations; in addition, even if the input itself does not oscillate, the reinforcement learning agent's own output also oscillates in the time domain, since the agent aims to minimize the bonus function and ignores the potential safety hazards that may result when the control voltage is applied as a PMSM, and therefore, it is necessary to add a kalman filter to the feedback amounts ω and i q, while adding a low pass filter to the agent output to smooth the control voltage of the PMSM.
Based on the scheme, the method comprises the following optimization process:
In order to better optimize a training algorithm, carrying out dimensionalization processing on an input value, carrying out certain amplitude limiting, adding a Kalman filter to two feedback quantities omega and i q, and adding a low-pass filter to the output of an intelligent agent;
In order to comprehensively improve the performance of the controller, the task faced by the reinforcement learning controller should be as rich as possible to contain all possible instruction modes, including random step instructions, low-frequency sine instructions and high-frequency sine instructions;
To reduce the error between the swing angle α and the command angle α ref, the square of the difference between the reference value and the feedback value of each of the three rings is used and the opposite number is taken as the reward value of each time step, and the fixed coefficient is multiplied to balance the weights among the three rings:
To the state Using an optimization strategy functionEnabling the agent to output a tuning valueFor this purpose, two Q networks need to be established to evaluate the output of the policy network and to use the evaluation value to perform gradient optimization on actor networks.
Further, in step S1, under the control of the FOC, the motor system takes as input the voltage command values of the d and q axes from the controller, and inputs the torque to the load mechanism, which is expressed as the rotation angle of the motor rotor; the specific operation is as follows:
s11, acquiring rotor position and speed information of a motor for transformation calculation between a phase variable and a relative rotation variable of d-q coordinates in a static coordinate system;
in the d-q coordinate system, the motor equation can be described as:
Wherein: r s is the stator resistance; v d,vq is d-axis and q-axis voltage, respectively; i d,iq is d-axis and q-axis current, respectively; l d,Lq is d-axis and q-axis inductance respectively; lambda m is the d-axis magnetic flux of the permanent magnet; t e,TL is output torque and load torque respectively; b is the viscosity coefficient of the bearing; j is the total rotational inertia of the motor and the load; omega m,ωr is the mechanical angular velocity of the rotor and the electromagnetic angular velocity of the rotor, and p is the number of pairs of permanent magnets, satisfying omega r=p×ωm; the input d-q two-axis voltage outputs the motor angle theta m and the angular speed omega r according to the external load moment T L, and the d-q axis current is obtained through detection of a current detector;
S12, according to actual requirements, the motor type is determined to be a surface-mounted PMSM, and meanwhile, considering that a permanent magnet synchronous motor running in a steady state must be kept under certain running limits, the method requires that the rotor rotating speed and the stator current should be kept within a threshold range, and meets the following requirements:
Lq=Ld
|ωr|≤ωlimit
Wherein ω limit is the maximum value of the rotor speed; i limit is the stator current maximum.
Further, step S2 specifically includes:
for a high-order load mechanism with nonlinear factors, the high-order load mechanism transmits the torque of a motor rotating shaft to a screw rod by using a gear, and pushes and pulls a triangular connecting rod mechanism to form a force arm through the stroke of the screw rod to push and pull a swinging mechanism to perform angle deflection, and in order to attach the mechanical property of an actual load in the modeling process, the elastic movement of the screw rod is considered, and the movement equation of the swinging mechanism and the nonlinear coulomb friction in the movement process are considered;
the kinetic equation for this load can be described as:
wherein θ m is the rotor mechanical angle, g r is the gear reduction coefficient, n r is the screw rod reduction coefficient, L is the screw rod retraction/extension amount, K s is the rotation combined stiffness, and F is the acting force of the screw rod under the pushing of torque; m e is the mass of the screw rod, B e is elastic damping, and DeltaL is the compression stroke of the screw rod; t L is motor shaft load moment, effi is transmission efficiency; m is swing moment, K p is precession combined rigidity, and r is arm length; alpha is the swing angle of the swing mechanism, J b is swing inertia, B b is swing damping, K delta is position resistance moment, M f is friction moment, and the model is coulomb friction, and the expression is as follows:
The load model considers the elastic deformation of the screw rod, and if the screw rod is only started from the geometric relationship of the triangular connecting rod mechanism, an equivalent triangular structure formed among the screw rod edge, the swinging mechanism and the fixed fulcrum can be formed; irrespective of the elastic deformation of the screw, the relationship between the retracting/extending amount L of the screw and the deflection angle α is approximately expressed as:
Wherein a and b are respectively two adjacent sides OA and OB of the swing angle in the triangular connecting rod mechanism; alpha 0,L0 is the swing center angle AOB and the edge AB length when the load deflection angle is 0.
Further, step S3 uses three-loop control to perform basic control on the final position angle for the semi-closed loop electric servo system under the FOC control frame, so as to ensure the stability and robustness of the system operation, wherein:
S31, constructing a current loop for controlling torque as a decoupling current controller, and decomposing d and q two-axis voltage terms which are originally coupled with each other into a linear term and a nonlinear term:
Wherein v d1 and v q1 can be controlled by a linear PID current controller, and the nonlinear terms v d0 and v q0 can be calculated from the rotor speed value of the encoder:
s32, connection of an input error e (t) and an output control value u (t) in the PID controller:
in the ADRC controller, S33, a relationship between the input θ (t) and the output θ v1(t),θv2 (t) of the differential controller is:
Wherein, Representing the derivative of θ v1 (t) with respect to time t;
In S34, in the ADRC controller, the relationship between the state expansion observer θ (t) and the output θ z1(t),θz2(t),θz3 (t) is:
wherein, fal (e (t), alpha, delta) is an error filter, and the relation between the input e (t), alpha, delta and the output is:
In the ADRC controller, S35, the specific relationship between the output θ v1(t),θv2 (t) of the reference signal generator from the differential controller and the output θ z1(t),θz2(t),θz3 (t) of the state dilation observer is:
Wherein sat (x, x max) represents a saturation function, the output is x max when the input x is greater than x max, is-x max when x is less than-x max, otherwise is x;
S36, taking the difference between the fed-back motor rotation speed omega (t) and the rotation speed command omega * (t) as the input of the rotation speed PI controller, and taking the output of the controller as the reference value of the q-axis current
S37, constructing a position control ring based on a rotating speed controller, specifically a triangular connecting rod structure based on a load, and approximately converting the command deflection angle alpha ref into the screw rod precession lengthA differential controller for inputting ADRC and converting the ADRC with rotor angle feedback signal theta (t) to precession lengthThe state expansion observer of the ADRC is input, and the reference rotating speed output by the ADRC reference signal generator is used as the output of the position loop.
Further, the specific steps of the method for alleviating the problem of insufficient control precision caused by the position feedback error comprise:
S41, determining the observed value of the agent as ADRC (ADRC) and outputting theta v1(t),θv2 (t) by a differential controller, outputting e 1(t),e2 (t) by a reference signal generator, and commanding the converted screw rod precession length of the deflection angle alpha ref And feedback ofError e θ (t), reference rotational speed ω * (t) and reference currentAnd corresponding rotational speed feedback ω (t) and current feedback i q (t), the values of which are performed, and the state space s of the agent is determined as:
S42, taking the output continuous action a of the intelligent agent as the q-axis voltage input of the PMSM, so that the PMSM generates electromagnetic torque under the drive of given voltage, and driving the servo system to respond to given instructions;
S43, for improving the comprehensive performance of the controller, carrying out random step instruction, low-frequency sinusoidal instruction and high-frequency sinusoidal instruction mixed training, wherein each Episode selects a task as an instruction with equal probability and lasts for 10 seconds;
s44, in order to reduce the error between the swing mechanism angle alpha and the command angle alpha ref, the square of the difference between the command and the reference value of each loop in the three-loop control is used as the rewarding value of each time step:
S45, setting actor as an MLP of a 2-layer hidden layer in consideration of deployment requirements on an embedded chip; setting critic network as MLP of 4 hidden layers; during training, the agent sampling time was set to 0.01s.
The beneficial effects are that: aiming at the situation that only motor feedback signals such as the rotation angle of a permanent magnet synchronous motor can be measured, but the actual position of a load mechanism cannot be measured, the method provided by the invention considers the high-order nonlinear characteristics of a load model, and provides a method for improving the control precision and response speed of the traditional PID three-loop control in the face of a high-order nonlinear load model by adopting the traditional PID three-loop controller as a basis and using a dual-delay depth determination strategy gradient algorithm to train a tuning strategy network so as to observe the feedback quantity of the permanent magnet synchronous motor and the feedback position tuning value of an output position loop.
Drawings
FIG. 1 is a diagram of the overall control flow model architecture in the present invention;
FIG. 2 is a schematic modeling diagram of a permanent magnet synchronous motor according to the present invention;
FIG. 3 is a diagram of the equivalent triangle formed between the lead screw edge, the swing mechanism and the fulcrum in the present invention;
FIG. 4 is a schematic representation of the load mechanism of the present invention;
FIG. 5 is a block diagram of a decoupled current controller based on PID control in accordance with the invention;
FIG. 6 is a diagram of a speed loop and position loop architecture based on a PID controller in accordance with the invention;
FIG. 7 is a graph of the q-axis current versus step current command controlled by the current controller without consideration of voltage clipping (FIG. 7 (a)) and with use of voltage clipping (FIG. 7 (b)) in the present invention;
FIG. 8 is a schematic diagram showing the phenomenon that the error between the command angle precession distance and the approximate precession distance of the rotating shaft position in the position ring converges (FIG. 8 (a)) and the actual angle of the swinging mechanism and the command angle differ greatly (FIG. 8 (b));
FIG. 9 is a schematic diagram of control logic for an agent as a PMSM input voltage controller according to the present invention, where Uq is the response command of the agent as a PMSM input voltage controller control system;
FIG. 10 is a schematic diagram showing the phenomenon of the oscillation of the output value of the agent (FIG. 10 (a)) and the oscillation of the state quantity of the motor observed by the agent (FIG. 10 (b)) in the present invention;
FIG. 11 is a comparison of the effects of PID scheme, ADRC scheme and reinforcement learning scheme of the present invention at different orders (FIGS. 11 (a) - (d)).
Detailed Description
For a detailed description of the disclosed embodiments of the present invention, the present invention is further described below with reference to the accompanying drawings and detailed description.
Firstly, the key problem to be solved by the method is how to improve the control algorithm to improve the tracking precision of the actual load position to the instruction position under the condition that only motor parameters such as the motor rotation angle and the like can be measured and the actual position of the high-order nonlinear load mechanism cannot be measured when the three-ring position control is performed on the permanent magnet synchronous motor.
The main design idea of the invention is to use data-driven deep reinforcement learning, to use empirical data which can be acquired by a sensor in the test process but cannot be observed in the actual operation process to perform model pre-training, to use a dual-delay depth determination strategy gradient algorithm to optimize a strategy network, so that the strategy network can take a real-time observable feedback value or a computable value of a motor as a state quantity, and to calculate the optimal PMSM control voltage, and to drive a transmission device to drive a servo system to respond to instructions by a permanent magnet synchronous motor. The overall control flow provided by this method is shown in fig. 1.
The construction and training process of the intelligent control method of the permanent magnet synchronous motor servo system in the semi-closed loop scene comprises the following steps:
Step 1: constructing mathematical model under FOC frame of permanent magnet synchronous motor
The rotor in a permanent magnet synchronous motor is composed of a rotor core and permanent magnets arranged around the core, and the magnetic flux (magnetic induction intensity) density distribution generated by the paired magnetic poles in the air gap is similar regardless of the rotor arrangement, and in the physical modeling, it is generally assumed that the magnetic flux density distribution generated by the permanent magnet poles mounted on the rotor surface or embedded in the core in the air gap of the motor is sinusoidal. Therefore, the fundamental wave of the curve of the magnetic density is regarded as an ideal magnetic density distribution. Meanwhile, for the sinusoidal magnetic density signal, the coordinate axis of the sinusoidal magnetic density signal is defined as a magnetic field angle theta r, the logarithm of a permanent magnet arranged on a rotor core is defined as p, and the relation between the mechanical angle theta m of rotor rotation and the corresponding magnetic field angle is as follows:
meanwhile, one permanent magnet pole axis (sine wave extreme point) is defined as the d axis. Between the two magnetic poles, the angle of the magnetic field is 90 degrees different from that of the d-axis magnetic field, and the position where the magnetic flux is 0 is q-axis.
The configuration of the rotor results in the permanent magnet synchronous motor being divided into two categories: salient pole machines and non-salient pole machines. Wherein the salient pole motor is a built-in magnet; the non-salient pole motor is a surface-mounted magnet. The distinction between salient pole machines and salient pole machines is of interest because the permeability of permanent magnets is almost the same as free air, while the permeability of the core far exceeds air (ferromagnetic).
The magnetic induction intensity of a certain point in space according to the ampere law is directly proportional to the magnetic permeability of the point. Thus, consider a magnetic field of constant magnetic field strength generated by an energized solenoid: when the rotor rotates, no matter which direction the surface-mounted motor rotates to, the radial length of the iron core through which the magnetic field lines pass is the same, namely the magnetic resistance of the magnetic paths is the same; for a salient pole motor, the rotor has the minimum number of iron cores and the maximum magnetic resistance in a magnetic circuit when rotating to a d axis; and by q-axis, the iron core is the largest, the magnetic resistance is the smallest, and the magnetic air gap is uneven, and the phenomenon is called magnetic saliency.
The stator windings of the motor are essentially energized solenoids with different positions and directions, and the coils are distributed in stator slots at the periphery of a stator core in 120-degree displacement and are named A, B, C-phase windings. In practical circuits, the tail parts of the ABC three-phase windings are commonly connected to form a triangle connecting circuit, in this case:
ia+ib+ic=0
for the three-phase motor windings, three-phase alternating currents with 120-degree phase difference are respectively applied, and the rotating magnetic field can be spatially synthesized by the time variable of the three-phase sine.
For a rotating rotor, if the rotating magnetic field is guaranteed to be consistent with the rotating speed of the rotor and the magnetic phase is constant, the interaction of the magnetic fields can generate constant torque, namely magnetic torque. For salient pole machines, another type of torque, reluctance torque, is also generated that pushes the rotor to rotate with the load.
And defining an a-b-c reference coordinate system by taking the direction of the square magnetic field generated by the a-b-c three-phase winding as the coordinate axis direction. In this coordinate system, the phase variable in the time domain may be denoted as f a、fb、fc, where f may represent the phase voltage, phase current, and flux linkage. Considering faraday's law of electromagnetic induction and ohm's law, the three-phase voltage can be expressed as:
According to the above discussion of the rotor, for the surface-mounted rotor motor, the self inductance and the mutual inductance magnetic permeability of each stator coil are unchanged, and the relationship between the self inductance coefficient and the mutual inductance coefficient of the rotor and the magnetic angle is arbitrarily configured:
wherein, for a surface mount motor, L 2 =0. Therefore, consider the flux linkage value of the three-phase winding of abc as the flux linkage of self-inductance and mutual inductance plus leakage of the permanent magnet into the coil:
wherein lambda m is the maximum flux linkage of the N pole of the permanent magnet to one coil, and the two formulas are substituted to obtain:
the above-described motor model in the stationary reference frame has problems of parameter variation with time, which complicate the control system design. This control complexity due to rotation can be solved by projecting the phase change amount of the model to two models under a rotating reference frame. The d-q coordinate system has two orthogonal axes fixed on the rotor, namely the d axis of the permanent magnet magnetic pole of the rotor and the q axis orthogonal to the d axis.
Consider transforming a motor model from a three-phase stationary a-b-c reference frame to a two-phase rotating frame: it is first necessary to know the angle θ r of the d-q axis relative to the stationary a-b-c coordinate system, then the three-phase variable is projected onto the d axis to get f d and onto the q axis to get f q. Mathematically, the transformation process can be solved mathematically using Park transforms, whose matrix form is as follows:
Wherein the coefficients are To ensure that the transformed amplitude remains equal, again because for the phase variables:
fa+fb+fc=0
the transformation matrix is thus reversible, adding to the constraint, and the transformation from the d-q reference frame to the a-b-c reference frame can be achieved by:
Where f o is the 0 component. FOC control converts a three-phase rotating magnetic field into a rotor d-q axis relative rotating change through Park change, and when a three-phase variable a-b-c is an unbalanced sinusoidal signal under the condition of motor starting or sudden load, the d-q axis variable is generally of a time-varying model; when the motor runs in a steady state, the rotating magnetic field created by the phase change quantity and the rotor keep relatively static, and d-q axis variables become some direct current signals. In this case, the variable controlling the d-q axis is equivalent to controlling two equivalent solenoids that are always level or perpendicular to the magnetic axis, and the corresponding motor control algorithm becomes relatively simple.
The Park variation is transformed to the above equation to obtain the voltage equation of the rotor reference system as follows:
Wherein v d and v q are the stator voltages of the d-axis and q-axis, respectively; i d and i q are stator currents of d-axis and q-axis, respectively; lambda d and lambda d are the stator flux linkages of the d-axis and q-axis respectively, their values are:
Wherein L d and L q are d-axis and q-axis inductances, respectively, and λ m is d-axis magnetic flux with opposite poles:
in the d-q coordinate system, the expression of the mutual inductance coefficient becomes constant for each phase which is transformed with rotation. And (3) combining the two modes to obtain a motor current-voltage equation:
Where v d,vq is the system input (control quantity), their input determines the current and torque. Under a rotor coordinate system, the instantaneous input power of the permanent magnet synchronous motor during operation is as follows:
Wherein, To compensate for the coefficients multiplied when doing Park transforms. Note that the term R sid is a resistive voltage drop term and does not contribute to the final motor output power; The term is the field drop term, which electrical power is stored in the magnetic field and therefore does not contribute to the final motor output power. The actual value of the conversion of electrical energy into mechanical power is therefore:
According to the torque theorem:
M=P/ω
The electromagnetic torque is:
the permanent magnet synchronous motor is connected with a mechanical load, and the dynamics of the mechanical part of the motor are described by the following formula:
Wherein T L is load torque, B is motor bearing viscosity coefficient, and J is total rotational inertia of the motor and the load. Finally, the whole permanent magnet synchronous motor can be built into a simulink sub-module to facilitate calling, as shown in fig. 2. The module correspondingly inputs the voltages of the d-q axes, outputs the angle and the angular speed of the motor according to the external load moment T L, and detects the d-q axis current through a current detector.
Based on the analysis, aiming at an actual motor model, the invention constructs a permanent magnet synchronous motor operation mathematical model based on the FOC control frame and is realized in a Simulink.
Under FOC control, the motor system takes as input the voltage command values of the d and q axes from the controller, and inputs the torque to the load mechanism, which is expressed as the motor rotor rotation angle.
The FOC strategy of the permanent magnet synchronous motor decomposes the phase change quantity of the motor into a magnetic field component and a torque component, and independently controls the magnetic field component and the torque component, so that the accurate control of the magnetic field and the torque of the motor is realized. It requires the acquisition of rotor position and speed information of the motor for the transformation calculation between the phase variables in the stationary coordinate system and the relative rotation variables of the d-q coordinates. In the d-q coordinate system, the motor equation can be described as:
Wherein: r s is the stator resistance; v d,vq is d-axis and q-axis voltage, respectively; i d,iq is d-axis and q-axis current, respectively; l d,Lq is d-axis and q-axis inductance respectively; lambda m is the d-axis magnetic flux of the permanent magnet; t e,TL is output torque and load torque respectively; b is the viscosity coefficient of the bearing; j is the total rotational inertia of the motor and the load; omega m,ωr is the mechanical angular velocity of the rotor and the electromagnetic angular velocity of the rotor, and p is the number of pairs of permanent magnets, satisfying omega r=p×ωm. The module responds to the input d-q two-axis voltage, outputs a motor angle theta m and an angular speed omega r according to an external load moment T L, and detects the d-q axis current through a current detector.
The motor type is determined as a surface-mounted pmsm, according to actual requirements, while taking into account that permanent magnet synchronous motors operating in steady state must be kept under certain operating limits, the invention requires that the rotor speed and stator current should be kept within threshold values, i.e. that:
Lq=Ld
|ωr|≤ωlimit
Wherein ω limit is the maximum value of the rotor speed; i limit is the stator current maximum.
Step 2: and constructing a mathematical model of the load and the transmission mechanism of the load.
In combination with the actual demand, a high-order load mechanism with nonlinear factors is determined. The mechanism transmits the torque of the motor rotating shaft to the screw rod by using a gear, and the triangular connecting rod mechanism is pushed and pulled through the stroke of the screw rod to form a force arm to push the swinging mechanism to perform angle deflection. In order to fit the mechanical property of the actual load in the modeling process, the elastic movement of the screw rod, the movement equation of the swinging mechanism and the nonlinear coulomb friction in the movement process are considered. The kinetic equation for this load can be described as:
Wherein θ m is the rotor mechanical angle, g r is the gear reduction coefficient, n r is the screw rod reduction coefficient, L is the screw rod retraction/extension amount, K s is the rotation combined stiffness, and F is the acting force of the screw rod under the pushing of torque; m e is the mass of the screw rod, B e is elastic damping, and DeltaL is the compression stroke of the screw rod; t L is motor shaft load moment, effi is transmission efficiency; m is swing moment, K p is precession combined rigidity, and r is arm length; alpha is the swing angle of the swing mechanism, J b is swing inertia, B b is swing damping, K delta is position resistance moment, M f is friction moment, and the model is coulomb friction, and the expression is as follows:
The load model considers the elastic deformation of the screw rod, and if the geometric relationship of the triangular connecting rod mechanism is only considered, an equivalent triangular structure formed among the screw rod side, the swinging mechanism and the fixed pivot point can be represented by the figure 3. The relationship between the retracting/extending amount L of the screw and the deflection angle alpha is approximately expressed as
Wherein a and b are respectively two adjacent sides OA and OB of the swing angle in the triangular connecting rod mechanism; alpha 0,L0 is the swing center angle AOB and the edge AB length when the load deflection angle is 0.
The integral load is modeled as a high-order nonlinear system, and a complex differential equation relation is arranged between the load swing angle and the rotation shaft angle of the motor observability, so that the traditional error-based closed-loop controller is difficult to obtain good precision and response speed performance in the scene, and an optimization space is reserved for the data-based artificial intelligence algorithm.
And constructing an equivalent load module in Matlab/Simulink to realize the equation set, as shown in figure 4. The module takes motor rotor angle θ m as input, and outputs feedback to motor load torque T L and control quantity α.
Step3: building basic three-ring controller and scheme
In the invention, the PID controller makes a difference between an external input reference value of a controlled object and a feedback value of a measuring element, and linearly superimposes the difference value, an integral value in time and the differential value as an output control quantity to a next executing mechanism. The relationship between the input error e (t) and the output control value u (t) is:
wherein K p,Ki,Kd is the proportionality coefficient of three phases, and can be manually adjusted, or can be automatically adjusted by means of parameter setting or other optimization algorithms.
In the FOC framework, vector control is a method of controlling torque by controlling i d and i q currents. Thus, vector control is related to the innermost control in the motor drive system, and subsequent speed and position control should be performed on the basis of current control. The surface-mounted permanent magnet synchronous motor considered by the invention comprises:
The electromagnetic torque is a linear function of i q, i d has no influence on the torque, and any d-axis current work can cause waste of input power (mainly dissipated in resistance and magnetic field). Thus, control i q controls torque while maintaining i d =0, maximum torque to current ratio control (MTPA) is achieved, i.e., maximum torque output at any stator current.
In the rotor reference frame, the motor model is affected by the cross-coupling of the speed voltage terms (i.e., ω rLqiq and ω rLdid+ωrλm). This term may dominate the voltage equation, especially at high speeds. This in practice impairs the performance of the PI controller and therefore requires a decoupling circuit as a current control scheme for vector control. To linearize the control of i d and i q, the d-axis voltage and the q-axis voltage can be provided by a combination of two signals, respectively:
Wherein v d1 and v q1 can be controlled by linear PI current control, and the nonlinear terms v d0 and v q0 can be calculated from the rotor speed value of the encoder:
On the basis of the decoupling current controller, the difference between the fed-back motor rotating speed omega fb and the rotating speed command omega ref is used as the input of the rotating speed PI controller, and the output of the controller is used as the reference value of the q-axis current. At the same time, the limitation of the system on the rotation speed is noted, and the motor rotation speed reference value omega ref is limited in the (-omega limit,ωlimit) range. And constructing a position control ring on the basis of the rotating speed controller. Considering the triangle connecting rod structure of the load, the deflection angle alpha ref of the instruction swinging mechanism can be approximately converted into the screw rod precession length by using the cosine formula And the calculated precession length L fb is calculated by the rotor angle feedback signal theta fb to be input into a position loop PI controller, and the speed loop reference value is output.
The Simulink modeling of decoupled current control in the above three loop control is shown in fig. 5, and the modeling of the position loop and the velocity loop is shown in fig. 6. The PMSM three-ring control model is linear control for controlling the angle of the actual swinging mechanism, and can meet certain requirements on precision and response speed under the condition of ensuring the robustness of the system. And the motor current and the rotating speed value can be ensured not to exceed the system limit. However, the above-described three-loop control flow is poor in the final control performance, which is mainly caused by the following problems:
On the one hand, the equivalent voltage of the output of SVPWM in an actual system cannot exceed the inverter limit (220V), so that the output value of the decoupling current controller shown in fig. 5 needs to be limited, and the limiting function breaks the linearity of the linear term of the decoupling current controller, fig. 7 shows the response curve of the q-axis current controlled by the controller to the step current command reference value before considering the threshold limit and after adding the threshold limit, it can be observed that the q-axis current without adding the threshold limit can track to the command size at a higher speed under the same controller parameter configuration, and the q-axis current curve shows high irregularity with a large overshoot and long convergence time after adding the threshold limit. This non-linearity reduces the response speed of the current loop to a certain extent, and increases the risk that the actual current exceeds the system limit, reducing the performance and reliability of the controller.
On the other hand, the system cannot observe the actual angle of the swing mechanism, and only the command angle alpha ref and the rotor position feedback theta can be approximately converted into a screw rod processAndIrrespective of the complex differential equation relationship between the motor rotor angle and the wobble mechanism angle, this results in a situation where the system, when faced with low frequency signals, has a position ring error much greater than the actual wobble angle, as shown in fig. 8.
Step 4: permanent magnet synchronous motor control scheme for constructing deep reinforcement learning drive
The PPO is used for training, and the intelligent body directly outputs PMSM control voltage to drive the transmission device to drive the servo system to move so as to minimize the long-term errors of the reference curve and the actual state of the servo system. The reinforcement learning intelligent body is directly used as the PMSM voltage controller, so that the self-adaptability of the intelligent model in the control process can be exerted to the greatest extent, and the limitation of the output of the intelligent body by three-loop control is avoided; the three-loop control scheme typically adds clipping at the output position of each loop to ensure control safety in extreme states, however this also limits the effect of the agent, since the agent cannot optimize the original three-loop control output by tuning once a certain loop of the three loops reaches clipping. While the agent acts as a voltage controller for the PMSM, the only limitation is that its output should be within the voltage range that the PMSM can withstand, which has greater flexibility and adaptation than the agent acts as a feedforward compensator. However, since the reinforcement learning agent itself targets the maximization of the reward function, its output is subject to oscillations in the time domain; in addition, since the state quantity of the agent contains rapidly changing rotation speed and current, if the process of adding no noise and oscillation further aggravates the oscillation of the agent, which results in unsatisfactory control effect and even causes control safety problem, when the agent is directly used as a PMSM voltage controller, it is necessary to use not only an ADRC controller with noise interference resistance in a position loop to suppress noise in a feedback signal, but also a kalman filter for rapidly changing rotation speed and current and a low pass filter for the agent output to maximally suppress the oscillation of the agent output in a time domain.
The observed value of the intelligent agent is determined as the output theta v1(t),θv2 (t) of the differential controller of ADRC, the output e 1(t),e2 (t) of the reference signal generator and the converted screw rod precession length of the command deflection angle alpha * (t)And feedback ofError e θ (t), reference rotational speed ω * (t) and reference currentAnd corresponding rotational speed feedback ω (t) and current feedback i q (t).
In order to better optimize the training algorithm, the input value is subjected to dimensionalization processing and certain amplitude limiting, meanwhile, a Kalman filter is added to two feedback quantities omega and i q, and a low-pass filter is added to the output of the intelligent agent.
In order to improve the performance of the controller in a comprehensive way, the task faced by the reinforcement learning controller should be as rich as possible to contain all possible instruction modes, including random step instructions, low-frequency sine instructions and high-frequency sine instructions.
To reduce the error between the swing angle α and the command angle α ref, the square of the difference between the reference value and the feedback value of each of the three rings is used and the opposite number is taken as the reward value of each time step, and the fixed coefficient is multiplied to balance the weights among the three rings:
To the state Using an optimization strategy functionEnabling the agent to output a tuning valueFor this purpose, two Q networks need to be established to evaluate the output of the policy network and to use the evaluation value to perform gradient optimization on actor networks.
Meanwhile, the agent sampling time was set to 0.01s.
In particular, the invention includes the following considerations:
1) The invention installs the agent to the voltage input end of the permanent magnet synchronous motor, the agent driven by reinforcement learning directly controls the input voltage, and the output amplitude limit of the reinforcement learning agent is 220 x 2/3V because the input voltage is q-axis voltage.
2) The state space variables for reinforcement learning should be values that can be collected or calculated by the controller. Wherein i d is close to 0, since the d-axis current i d, the q-axis current i q are already managed by the decoupled current controller.
3) In order to better optimize the training algorithm, the input values are subjected to dimensionalization processing, so that the input quantity is maintained to be 10 0~102 in various working states. At the same time, for abrupt position instructions (e.g. steps), the derivative value will become a large number, resulting in a pathological empirical sample, and therefore the derivative value will need to be limited to a certain range.
Because of the highly non-linearity of neural networks, at the beginning of training, the output of the network tends to take a boundary value and oscillate back and forth in both the upper and lower boundaries. In the motor model, the oscillations of the rotational speed command will cause the motor feedback value to oscillate, which in turn causes the reinforcement learning controller to oscillate, eventually causing the motor to be in a highly unstable state, as shown in fig. 10. The use of such pathological data as the empirical value of the agent often fails to learn anything from it, and in many cases even if the agent has undergone many rounds of training, whether or not the effect of the controller is achieved, the resulting output is still oscillating repeatedly, which is not allowed in the actual point motor control, even with the risk of damaging the motor. To alleviate this, a kalman filter may be added to the two feedback quantities ω and i q of the motor. After such an operation, even if the system oscillation occurs at the beginning of training, the presence of the kalman filter filters the input state into a low frequency signal, and the output is also a low frequency signal in gradual training, so that the training stability is ensured to a certain extent.
4) In order to improve the performance of the controller in combination, the task faced by the reinforcement learning controller should be as rich as possible to contain all possible instruction patterns. According to the automatic control principle, the conventional performance analysis method of the linear system comprises the following steps: the response capability of the device facing to the step instruction is inspected in a time domain, or the open-loop amplitude-frequency characteristic and the open-loop logarithmic-phase frequency characteristic are inspected in a frequency domain, and the training tasks are divided into the following three types by combining with a final adopted performance evaluation scheme:
a. random step instruction: zero steady state (each derivative is 0) is used as an initial state, a new random step target is generated every 2.5 seconds, and the step range d epsilon-3.5 degrees and 3.5 degrees are ensured.
B. A low frequency sinusoidal instruction: the zero steady state is used as an initial state to generate a low-frequency sinusoidal command delta c=δcm.sin (ωt) with amplitude delta cm epsilon [1.5 DEG, 4 DEG ] [2 pi multiplied by 0.05,2 pi multiplied by 0.1] (rad/s) of circular frequency omega epsilon.
C. High frequency sinusoidal instructions: the zero steady state is used as an initial state to generate a high-frequency sinusoidal command delta c=δcm.sin (ωt) with amplitude delta cm epsilon [0.3 DEG, 1.2 DEG ], and circular frequency omega epsilon [1,20] (rad/s).
As the training proceeds, each Episode selects a task as an instruction with equal probability and for 10 seconds.
5) The setting of the reinforcement learning reward function directly affects the optimization of the algorithm. Considering that the invention aims to optimize the error of the swing mechanism angle alpha and the command angle alpha ref, the square of the difference between the reference value and the feedback value of each loop in the three-loop control can be used as the reward value of each time step, such rewards are not sparse rewards, and meanwhile, the range of rewards can be stabilized within [ -200,200] through the adjustment of coefficients, which is possibly beneficial to training.
6) During training, for an input agent state value:
it is desirable to train out an optimal strategy function pi so that the optimal compensation value can be output for s:
the PPO training algorithm improves the strategy gradient method by limiting the updating amplitude of the strategy, so that the training process is more stable and efficient, and the specific training process is as follows:
for each step subscript i e [1,2,3 … T/Ts ] of the agent time step size Ts, which is often T, the agent takes the state s i of pattern 4.4, and generates an action a with the policy network:
The next state s i+1 caused by the action is taken as a training sample [ a i,si,ri,si+1 ] experience playback buffer memory together with the state s i of the step, the action a i and the step reward value r i calculated by the environment Is a kind of medium. When (when)The number of samples in (a) reaches a mini-batch of size N, each step is taken fromSamples of a batch are sampled, and a time sequence difference error or generalized dominance estimation is calculated to calculate a dominance function.
Next, the PPO will optimize the objective function, first, the PPO defines the probability ratio of the old policy and the new policyAnd defining a clipping objective function:
Where ε is the hyper-parameter of the clipping range.
The PPO would then update the policy network parameters using a random gradient ascent (Adam optimizer) to maximize the clipping objective function L CLIP (θ).
At the same time, PPO will update the value network parameters using a random gradient descent method (Adam optimizer) to minimize the loss function of the value network:
finally, updating the parameters of the target network according to the smooth proportion tau, and repeating the steps until reaching the training termination condition:
θ'←τθ+(1-τ)θ’
7) Considering the deployment requirement on an embedded chip, actor networks should not be set too large, and the invention sets the network as MLP of 2 hidden layers; whereas critic networks are set to the MLP of the 4-layer hidden layer. Meanwhile, the sine command frequency input by the controller can reach 20rad/s, the training speed is not too slow, and the intelligent body sampling time is set to be 0.01 s. Other training parameters were set as follows:
to comprehensively evaluate the performance improvement of the reinforcement learning-based position loop tuning scheme over the conventional PID method, the performance is evaluated by the following three indexes.
1) Load location characteristics
Under the condition of maximum load, the angle instruction sequence alpha ref is used as instruction input, and the actual swing angle sequence alpha fb of the swing mechanism in the time range of t epsilon (11, 31) is acquired, wherein:
the position loop curve is plotted with α ref as the abscissa and α fb as the ordinate. The nominal position curve is a midpoint connecting line of a position loop curve on a transverse axis, the nominal position datum line is first-order linear fitting of the nominal position curve, and tracking precision in a time domain is achieved through a loop width and zero offset analysis algorithm. Wherein the method comprises the steps of The maximum swing angles in the positive and negative directions represent the tracking capability of the limiting position of the approaching instruction; the maximum loop width of Δδ max characterizes the maximum value of the tracking error; the delta 0 null measures the symmetry of the control algorithm in both the positive and negative directions in the face of a sinusoidal low frequency such symmetric signal.
In connection with fig. 11 (a) and the following table, the DRL method is superior to the PID and ADRC methods in terms of maximum swing angle, maximum loop width, and zero bias in the control process, which proves that the DRL control method has more excellent tracking longitude through data-driven empirical learning.
2) Experiment of speed characteristics
Under the condition of maximum load, taking a step instruction with the amplitude of 3 degrees as a system input alpha ref, collecting a load actual position sequence of a time period with the swing angle in the range of (0.5 degrees and 1.5 degrees), and averaging the swing angular speed through the swing angleThe response speed of the algorithm in the time domain is analyzed, and the experimental results are shown in the following table. As can be seen by combining fig. 11 (b) and the following table, the PID control method has the fastest rising speed, but the overshoot is the highest as well, proving that the PID control method is the least stable despite the fast response; the ADRC control method is contrary to it, being most stable despite the slowest response; the DRL method is interposed between the two methods, has a relatively high response speed and relatively stable performance.
3) Frequency characteristic experiment
At maximum load conditions, sinusoidal command α ref =δ·sin (ωt) is taken as α ref, where δ=0.5 °,0.8 °,1.1 °, ω=2, 4,8,16 (rad/s). Each frequency and amplitude simulates 6 instruction cycles. By measuring phase attenuation of the swing angle output compared to the input commandAnd amplitude attenuation L, and analyzing the response characteristic of the algorithm in the frequency domain. Orthogonal decomposition of α ref and α fb on sine and cosine bases at the same frequency as α ref yields the sine and cosine components α ref,bref and a fb,bfb of α ref and α fb by the following equation. Calculating relative standard excitation amplitude And phase angle
Calculating gain L (dB) and phase lag of actual angle signal relative to command signal by the above methodThe results are recorded below.
As can be seen from a combination of fig. 11 (c), 11 (d) and the following table, the controller under the DRL method is generally superior to the PID method in terms of phase decay index.
The PID method, the ADRC method and the DRL method are quantitatively measured through the three test methods, and compared with each other, the control performance of the DRL method on various indexes is known to be due to the PID method and the ADRC method on most indexes, and the system obtains better tracking precision and response speed.
Claims (7)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202410774423.2A CN118801756B (en) | 2024-06-17 | 2024-06-17 | Intelligent control method of permanent magnet synchronous motor servo system in semi-closed loop scenario |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202410774423.2A CN118801756B (en) | 2024-06-17 | 2024-06-17 | Intelligent control method of permanent magnet synchronous motor servo system in semi-closed loop scenario |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN118801756A true CN118801756A (en) | 2024-10-18 |
| CN118801756B CN118801756B (en) | 2025-03-28 |
Family
ID=93028918
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202410774423.2A Active CN118801756B (en) | 2024-06-17 | 2024-06-17 | Intelligent control method of permanent magnet synchronous motor servo system in semi-closed loop scenario |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN118801756B (en) |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN119362946A (en) * | 2024-12-23 | 2025-01-24 | 浙江嘉宏运动器材有限公司 | A permanent magnet synchronous motor speed stabilization control method based on reinforcement learning |
Citations (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JPH0968085A (en) * | 1995-09-04 | 1997-03-11 | Unisia Jecs Corp | Engine idle speed controller |
| US6841969B1 (en) * | 2003-09-24 | 2005-01-11 | General Motors Corporation | Flux observer in a sensorless controller for permanent magnet motors |
| US20170111000A1 (en) * | 2015-10-19 | 2017-04-20 | Fanuc Corporation | Machine learning apparatus and method for learning correction value in motor current control, correction value computation apparatus including machine learning apparatus and motor driving apparatus |
| CN111342720A (en) * | 2020-03-06 | 2020-06-26 | 南京理工大学 | Adaptive Continuous Sliding Mode Control Method for Permanent Magnet Synchronous Motor Based on Load Torque Observation |
| CN115001334A (en) * | 2022-07-19 | 2022-09-02 | 北京理工华创电动车技术有限公司 | Rotation speed control method and system of position-sensor-free ultra-high-speed permanent magnet synchronous motor based on active disturbance rejection |
| CN117335700A (en) * | 2023-09-14 | 2024-01-02 | 南京航空航天大学 | Dynamic optimization method of electric servo position feedback based on deep reinforcement learning in semi-closed loop scenario |
| CN118034129A (en) * | 2024-02-26 | 2024-05-14 | 南京航空航天大学 | A servo motor control parameter optimization method based on evolutionary reinforcement learning |
-
2024
- 2024-06-17 CN CN202410774423.2A patent/CN118801756B/en active Active
Patent Citations (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JPH0968085A (en) * | 1995-09-04 | 1997-03-11 | Unisia Jecs Corp | Engine idle speed controller |
| US6841969B1 (en) * | 2003-09-24 | 2005-01-11 | General Motors Corporation | Flux observer in a sensorless controller for permanent magnet motors |
| US20170111000A1 (en) * | 2015-10-19 | 2017-04-20 | Fanuc Corporation | Machine learning apparatus and method for learning correction value in motor current control, correction value computation apparatus including machine learning apparatus and motor driving apparatus |
| CN111342720A (en) * | 2020-03-06 | 2020-06-26 | 南京理工大学 | Adaptive Continuous Sliding Mode Control Method for Permanent Magnet Synchronous Motor Based on Load Torque Observation |
| CN115001334A (en) * | 2022-07-19 | 2022-09-02 | 北京理工华创电动车技术有限公司 | Rotation speed control method and system of position-sensor-free ultra-high-speed permanent magnet synchronous motor based on active disturbance rejection |
| CN117335700A (en) * | 2023-09-14 | 2024-01-02 | 南京航空航天大学 | Dynamic optimization method of electric servo position feedback based on deep reinforcement learning in semi-closed loop scenario |
| CN118034129A (en) * | 2024-02-26 | 2024-05-14 | 南京航空航天大学 | A servo motor control parameter optimization method based on evolutionary reinforcement learning |
Non-Patent Citations (1)
| Title |
|---|
| 程国卿等: "交流伺服电机速度与位置环的一体化鲁棒控制", 计算技术与自动化, vol. 32, no. 4, 15 December 2013 (2013-12-15), pages 23 - 27 * |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN119362946A (en) * | 2024-12-23 | 2025-01-24 | 浙江嘉宏运动器材有限公司 | A permanent magnet synchronous motor speed stabilization control method based on reinforcement learning |
Also Published As
| Publication number | Publication date |
|---|---|
| CN118801756B (en) | 2025-03-28 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN112701968B (en) | A Robust Performance Improvement Method for Model Predictive Control of Permanent Magnet Synchronous Motors | |
| CN107070341B (en) | Torque Ripple Suppression Method for Permanent Magnet Synchronous Motor Based on Robust Iterative Learning Control | |
| CN104242769B (en) | Permanent magnet synchronous motor speed composite control method based on continuous terminal slip form technology | |
| CN105827168B (en) | Method for controlling permanent magnet synchronous motor and system based on sliding formwork observation | |
| CN106655938B (en) | Control system for permanent-magnet synchronous motor and control method based on High-Order Sliding Mode method | |
| CN113364377B (en) | A method of automatic disturbance rejection position servo control of permanent magnet synchronous motor | |
| Zhao et al. | Back EMF-based dynamic position estimation in the whole speed range for precision sensorless control of PMLSM | |
| CN117335700A (en) | Dynamic optimization method of electric servo position feedback based on deep reinforcement learning in semi-closed loop scenario | |
| CN110995102A (en) | Direct torque control method and system for permanent magnet synchronous motor | |
| CN114710080A (en) | Permanent magnet synchronous motor sliding mode control method based on improved variable gain approximation law | |
| CN113726240B (en) | A permanent magnet synchronous motor control method and system based on second-order active disturbance rejection control | |
| CN113067520B (en) | Sensorless Response Adaptive Motor Control Method Based on Optimization Residuals | |
| CN118034129A (en) | A servo motor control parameter optimization method based on evolutionary reinforcement learning | |
| CN118801756B (en) | Intelligent control method of permanent magnet synchronous motor servo system in semi-closed loop scenario | |
| CN113708684B (en) | Permanent magnet synchronous motor control method and device based on extended potential observer | |
| CN117614333A (en) | A permanent magnet synchronous motor position control method and system based on sliding mode control | |
| CN118199453A (en) | Sensorless torque ripple suppression method for stepping motor based on extended Kalman filter | |
| CN117895851A (en) | A full-speed control method for surface-mounted permanent magnet synchronous motor | |
| CN112436774A (en) | Control method of asynchronous motor driven by non-speed sensor | |
| Badini et al. | MRAS-based speed and parameter estimation for a vector-controlled PMSM drive | |
| CN119483031B (en) | Decoupling method of torque system and suspension system of single-winding magnetic levitation permanent magnet synchronous motor | |
| Gao et al. | Sensorless Control of PMSM via ADRC and SMC with Super-Twisting Observer | |
| Zhao et al. | Disturbance rejection enhancement of vector controlled PMSM using second-order nonlinear ADRC | |
| CN114564053B (en) | Control method of control moment gyro frame system based on induction synchronizer error compensation | |
| Liu et al. | A sensorless control method for PMSM Based on FSMO optimized by QIO |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |