[go: up one dir, main page]

CN118801756A - Intelligent control method of permanent magnet synchronous motor servo system in semi-closed loop scenario - Google Patents

Intelligent control method of permanent magnet synchronous motor servo system in semi-closed loop scenario Download PDF

Info

Publication number
CN118801756A
CN118801756A CN202410774423.2A CN202410774423A CN118801756A CN 118801756 A CN118801756 A CN 118801756A CN 202410774423 A CN202410774423 A CN 202410774423A CN 118801756 A CN118801756 A CN 118801756A
Authority
CN
China
Prior art keywords
control
output
motor
controller
feedback
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202410774423.2A
Other languages
Chinese (zh)
Other versions
CN118801756B (en
Inventor
朱海峰
易畅言
郑好
吴昊
祝可可
戴兴安
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Aeronautics and Astronautics
Nanjing Chenguang Group Co Ltd
Original Assignee
Nanjing University of Aeronautics and Astronautics
Nanjing Chenguang Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Aeronautics and Astronautics, Nanjing Chenguang Group Co Ltd filed Critical Nanjing University of Aeronautics and Astronautics
Priority to CN202410774423.2A priority Critical patent/CN118801756B/en
Publication of CN118801756A publication Critical patent/CN118801756A/en
Application granted granted Critical
Publication of CN118801756B publication Critical patent/CN118801756B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02PCONTROL OR REGULATION OF ELECTRIC MOTORS, ELECTRIC GENERATORS OR DYNAMO-ELECTRIC CONVERTERS; CONTROLLING TRANSFORMERS, REACTORS OR CHOKE COILS
    • H02P21/00Arrangements or methods for the control of electric machines by vector control, e.g. by control of field orientation
    • H02P21/22Current control, e.g. using a current control loop
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/092Reinforcement learning
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02PCONTROL OR REGULATION OF ELECTRIC MOTORS, ELECTRIC GENERATORS OR DYNAMO-ELECTRIC CONVERTERS; CONTROLLING TRANSFORMERS, REACTORS OR CHOKE COILS
    • H02P21/00Arrangements or methods for the control of electric machines by vector control, e.g. by control of field orientation
    • H02P21/0003Control strategies in general, e.g. linear type, e.g. P, PI, PID, using robust control
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02PCONTROL OR REGULATION OF ELECTRIC MOTORS, ELECTRIC GENERATORS OR DYNAMO-ELECTRIC CONVERTERS; CONTROLLING TRANSFORMERS, REACTORS OR CHOKE COILS
    • H02P21/00Arrangements or methods for the control of electric machines by vector control, e.g. by control of field orientation
    • H02P21/05Arrangements or methods for the control of electric machines by vector control, e.g. by control of field orientation specially adapted for damping motor oscillations, e.g. for reducing hunting
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02PCONTROL OR REGULATION OF ELECTRIC MOTORS, ELECTRIC GENERATORS OR DYNAMO-ELECTRIC CONVERTERS; CONTROLLING TRANSFORMERS, REACTORS OR CHOKE COILS
    • H02P21/00Arrangements or methods for the control of electric machines by vector control, e.g. by control of field orientation
    • H02P21/13Observer control, e.g. using Luenberger observers or Kalman filters
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02PCONTROL OR REGULATION OF ELECTRIC MOTORS, ELECTRIC GENERATORS OR DYNAMO-ELECTRIC CONVERTERS; CONTROLLING TRANSFORMERS, REACTORS OR CHOKE COILS
    • H02P21/00Arrangements or methods for the control of electric machines by vector control, e.g. by control of field orientation
    • H02P21/14Estimation or adaptation of machine parameters, e.g. flux, current or voltage
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02PCONTROL OR REGULATION OF ELECTRIC MOTORS, ELECTRIC GENERATORS OR DYNAMO-ELECTRIC CONVERTERS; CONTROLLING TRANSFORMERS, REACTORS OR CHOKE COILS
    • H02P25/00Arrangements or methods for the control of AC motors characterised by the kind of AC motor or by structural details
    • H02P25/02Arrangements or methods for the control of AC motors characterised by the kind of AC motor or by structural details characterised by the kind of motor
    • H02P25/022Synchronous motors
    • H02P25/024Synchronous motors controlled by supply frequency
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02PCONTROL OR REGULATION OF ELECTRIC MOTORS, ELECTRIC GENERATORS OR DYNAMO-ELECTRIC CONVERTERS; CONTROLLING TRANSFORMERS, REACTORS OR CHOKE COILS
    • H02P27/00Arrangements or methods for the control of AC motors characterised by the kind of supply voltage
    • H02P27/04Arrangements or methods for the control of AC motors characterised by the kind of supply voltage using variable-frequency supply voltage, e.g. inverter or converter supply voltage
    • H02P27/06Arrangements or methods for the control of AC motors characterised by the kind of supply voltage using variable-frequency supply voltage, e.g. inverter or converter supply voltage using DC to AC converters or inverters
    • H02P27/08Arrangements or methods for the control of AC motors characterised by the kind of supply voltage using variable-frequency supply voltage, e.g. inverter or converter supply voltage using DC to AC converters or inverters with pulse width modulation
    • H02P27/085Arrangements or methods for the control of AC motors characterised by the kind of supply voltage using variable-frequency supply voltage, e.g. inverter or converter supply voltage using DC to AC converters or inverters with pulse width modulation wherein the PWM mode is adapted on the running conditions of the motor, e.g. the switching frequency
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02PCONTROL OR REGULATION OF ELECTRIC MOTORS, ELECTRIC GENERATORS OR DYNAMO-ELECTRIC CONVERTERS; CONTROLLING TRANSFORMERS, REACTORS OR CHOKE COILS
    • H02P2207/00Indexing scheme relating to controlling arrangements characterised by the type of motor
    • H02P2207/05Synchronous machines, e.g. with permanent magnets or DC excitation
    • H02P2207/055Surface mounted magnet motors

Landscapes

  • Engineering & Computer Science (AREA)
  • Power Engineering (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Control Of Ac Motors In General (AREA)

Abstract

The invention discloses a control method of a permanent magnet synchronous motor driving servo system under a semi-closed loop scene, which aims at a semi-closed loop control scene which can only measure motor feedback signals such as rotor angle of a permanent magnet synchronous motor and cannot measure the actual position of a load mechanism, and provides a high-precision signal tracking method by adopting an edge intelligent control algorithm driven by deep reinforcement learning under a FOC (field oriented control) control framework of the permanent magnet synchronous motor, wherein a near-end strategy optimization algorithm is used for training a tuning strategy network, a traditional three-loop control strategy is fused, so that the three-loop instruction and feedback quantity of the permanent magnet synchronous motor are observed, and the control voltage of the motor is output, so that the control precision of the traditional three-loop control in the face of a high-order nonlinear load model is improved, and the safety in the running process is ensured.

Description

Intelligent control method for permanent magnet synchronous motor servo system in semi-closed loop scene
Technical Field
The invention belongs to a control technology of a permanent magnet synchronous motor, relates to a computer control system technology, in particular to a position control algorithm of the permanent magnet synchronous motor in an industrial Internet, and particularly relates to an intelligent control method of a servo system of the permanent magnet synchronous motor in a semi-closed loop scene.
Background
The permanent magnet synchronous motor has the advantages of compact structure, high efficiency and power density, good speed regulation performance and the like, and is widely applied to the fields of industrial Internet, electric traffic, industrial robots, aerospace and the like. A common magnetic field directional control (FOC) strategy in a permanent magnet synchronous motor position servo system is often based on a PID three-ring controller to control the rotation angle of the motor. While the low complexity and stability of algorithms have found wide application in industrial practice, PID tricyclic controllers also face a number of challenges and disadvantages when faced with some complex and special driving scenarios:
Firstly, when a servo motor drives a plurality of high-order nonlinear load mechanisms, parameters of a PID controller are difficult to determine through a model and an index, so that trial-and-error adjustment is needed to be carried out depending on manual experience, and good performance is difficult to obtain in terms of dynamic response. Secondly, considering the motor system limiting the rotational speed, current and inverter output voltage threshold limits, when the controller receives highly dynamic position, speed, current commands, the limiting of the commands will further degrade the performance of the controller. Most challenging is that the special equipment facing the invention cannot be provided with an external sensor or can not reliably measure the actual position of the load mechanism due to the special working environment, and the feedback obtained by the controller is only the motor rotating shaft angle obtained by detecting the magnetic encoder, namely a semi-closed loop control system. The semi-closed loop scenario may cause a significant performance degradation in conventional PID controllers that control based on feedback errors, as compared to closed loop control that monitors the final actuator. For example, in the invention, the motor rotating shaft drives the screw rod to feed or shrink through the gear, so that the swing mechanism connected with the triangular connecting rod structure deflects a certain angle. The swing angle of the swing mechanism cannot be measured by a sensor, and the rotation angle of the motor and the swing angle of the swing mechanism are influenced by the elastic motion of the screw rod, so that a high-order nonlinear dynamics equation is involved, and the function is difficult to simply express. Thus, the control algorithm cannot get the actual angle feedback to form a closed loop control.
The prior art also comprises an electric servo position feedback dynamic tuning method (publication number is CN 117335700A) based on deep reinforcement learning in a semi-closed loop scene, the technology is used for determining the state space and action setting of reinforcement learning agents on the basis of PID three-loop control, then the tracking precision and response speed of a load swing angle to an instruction target angle under the control of the semi-closed loop are improved to serve as targets, simulation software is used for modeling a system model, and a tuning strategy network for motor position feedback is obtained by using the deep reinforcement learning method so as to output an optimal tuning value. Taking this scheme as an example, it can be known that: the three-loop control scheme usually adds clipping at the output position of each loop to ensure control safety in extreme states, however this also limits the effect of the agent, since the agent cannot optimize the original three-loop control scheme by tuning once a certain loop of the three loops reaches clipping.
Disclosure of Invention
The invention aims to: aiming at the defects and problems of the PID three-loop control method in the case of the semi-closed loop of the permanent magnet synchronous motor in the industrial Internet and facing the high-order nonlinear load, the invention provides the control method of the industrial Internet servo system driven by deep reinforcement learning in the semi-closed loop scene.
The technical scheme is as follows: the intelligent control method of the permanent magnet synchronous motor servo system in the semi-closed loop scene comprises the steps of constructing a permanent magnet synchronous motor operation model based on a FOC control frame, a load and a mathematical model of a transmission mechanism of the load, then determining a state space and action setting by using a reinforcement learning intelligent body based on a three-loop structure, aiming at improving the tracking precision and response speed of a load swing angle to an instruction target angle in the semi-closed loop scene, modeling a system model by using simulation software, and obtaining a tuning strategy network for motor position feedback by using a deep reinforcement learning method so as to output optimal control voltage;
the method further comprises the steps of pre-training a model through empirical data acquired by a sensor in a test process, and then determining a strategy gradient algorithm based on double delay depth to optimize a strategy network, so that the strategy network can take a real-time observable feedback value or a computable value of a motor as a state quantity, and calculate an optimal PMSM control voltage, and a permanent magnet synchronous motor drives a transmission device to drive a servo system to respond to an instruction;
In the process of determining the state space and the action setting by using the reinforcement learning agent, the anti-interference capability of a control system is improved by using an active disturbance rejection algorithm at a position loop, PI control is utilized at a speed loop, a strategy network is trained by using a reinforcement learning algorithm at a current loop, and voltage applied to a PMSM is decided to be output by observing the feedback position input by the position loop and the output reference rotating speed, the feedback speed input by the speed loop and the output reference current and the feedback current, so that the system performance is improved, and the PPO reinforcement learning algorithm is used for optimizing the agent;
according to the method, the PPO reinforcement learning algorithm with continuous use states and action spaces is used for optimizing an intelligent body, so that the intelligent body observes the gap between the current PMSM state and a given reference value, predicts the deviation between the current PMSM state and the actual screw rod precession length theta loc (t) under the condition that the current PMSM state and the given reference value only depend on the approximate screw rod precession length as input, and outputs the voltage for controlling the PMSM so as to optimize the error between a system and a given instruction in the motor control process and relieve the problem of insufficient control precision caused by position feedback errors.
Further, the method comprises the implementation steps of:
S1, constructing a permanent magnet synchronous motor operation model based on an FOC control framework, wherein an FOC strategy of the permanent magnet synchronous motor decomposes the phase change quantity of the motor into a magnetic field component and a torque component, and independently controls the magnetic field component and the torque component of the motor to realize accurate control of the magnetic field and the torque of the motor;
Under FOC control, the motor system takes the voltage command values of the d and q axes from the controller as input, and inputs the power moment to the load mechanism, which is expressed as the rotation angle of the motor rotor;
s2, constructing a mathematical model of the load and a transmission mechanism thereof, wherein the motor load model takes into consideration the dynamic equations of the transmission mechanism, such as elastic deformation and a swinging mechanism, and also takes into consideration nonlinear factors including coulomb friction and moment transmission in a triangular connecting rod mechanism;
S3, considering the PMSM and the high-order nonlinear load model in the modeling, and constructing an edge intelligent control scheme driven by deep reinforcement learning on the FOC control frame;
In the control scheme, an ADRC controller receives an external input reference value of a controlled object, and outputs a reference value controlled by a position and the change rate of the reference value through a differential controller; meanwhile, the ADRC controller receives feedback of the controlled object measuring element, and because certain noise and oscillation in the time domain exist in the feedback during measurement, the accurate output of the follow-up deep reinforcement learning intelligent body can be influenced, so that the ADRC controller firstly inputs a feedback signal into the state expansion observer so as to predict the noise contained in the feedback signal and removes the noise by using a corresponding feedforward mechanism; finally, the ADRC controller performs difference on the outputs of the differential controller and the state expansion observer to output a reference value of the next ring;
the PI controller makes a difference between the reference value of ADRC and the feedback value of the measuring element, and linearly superimposes the difference value and the integral value in time as an output control quantity to the next executing mechanism;
S4, constructing a control scheme of a permanent magnet synchronous motor driven by deep reinforcement learning, training by using PPO, and driving a transmission device by using control voltage of an intelligent body output PMSM to drive a servo system to move so as to minimize a reference instruction and long-term errors of the servo system;
The observed value of the intelligent agent is determined as the output theta v1(t),θv2 (t) of the differential controller of ADRC, the output e 1(t),e2 (t) of the reference signal generator and the converted screw rod precession length of the command deflection angle alpha * (t) And feedback ofError e θ (t), reference rotational speed ω * (t) and reference currentAnd corresponding rotational speed feedback ω (t) and current feedback i q (t).
Considering that although the ADRC controller is able to predict the noise present in the feedback signal and use feedforward to compensate for the effects of the noise, the aforementioned agent observations still have varying degrees of oscillation in the time domain, mainly from the fast-varying rotational speed ω and current i q, which can lead to agent output oscillations; in addition, even if the input itself does not oscillate, the reinforcement learning agent's own output also oscillates in the time domain, since the agent aims to minimize the bonus function and ignores the potential safety hazards that may result when the control voltage is applied as a PMSM, and therefore, it is necessary to add a kalman filter to the feedback amounts ω and i q, while adding a low pass filter to the agent output to smooth the control voltage of the PMSM.
Based on the scheme, the method comprises the following optimization process:
In order to better optimize a training algorithm, carrying out dimensionalization processing on an input value, carrying out certain amplitude limiting, adding a Kalman filter to two feedback quantities omega and i q, and adding a low-pass filter to the output of an intelligent agent;
In order to comprehensively improve the performance of the controller, the task faced by the reinforcement learning controller should be as rich as possible to contain all possible instruction modes, including random step instructions, low-frequency sine instructions and high-frequency sine instructions;
To reduce the error between the swing angle α and the command angle α ref, the square of the difference between the reference value and the feedback value of each of the three rings is used and the opposite number is taken as the reward value of each time step, and the fixed coefficient is multiplied to balance the weights among the three rings:
To the state Using an optimization strategy functionEnabling the agent to output a tuning valueFor this purpose, two Q networks need to be established to evaluate the output of the policy network and to use the evaluation value to perform gradient optimization on actor networks.
Further, in step S1, under the control of the FOC, the motor system takes as input the voltage command values of the d and q axes from the controller, and inputs the torque to the load mechanism, which is expressed as the rotation angle of the motor rotor; the specific operation is as follows:
s11, acquiring rotor position and speed information of a motor for transformation calculation between a phase variable and a relative rotation variable of d-q coordinates in a static coordinate system;
in the d-q coordinate system, the motor equation can be described as:
Wherein: r s is the stator resistance; v d,vq is d-axis and q-axis voltage, respectively; i d,iq is d-axis and q-axis current, respectively; l d,Lq is d-axis and q-axis inductance respectively; lambda m is the d-axis magnetic flux of the permanent magnet; t e,TL is output torque and load torque respectively; b is the viscosity coefficient of the bearing; j is the total rotational inertia of the motor and the load; omega mr is the mechanical angular velocity of the rotor and the electromagnetic angular velocity of the rotor, and p is the number of pairs of permanent magnets, satisfying omega r=p×ωm; the input d-q two-axis voltage outputs the motor angle theta m and the angular speed omega r according to the external load moment T L, and the d-q axis current is obtained through detection of a current detector;
S12, according to actual requirements, the motor type is determined to be a surface-mounted PMSM, and meanwhile, considering that a permanent magnet synchronous motor running in a steady state must be kept under certain running limits, the method requires that the rotor rotating speed and the stator current should be kept within a threshold range, and meets the following requirements:
Lq=Ld
r|≤ωlimit
Wherein ω limit is the maximum value of the rotor speed; i limit is the stator current maximum.
Further, step S2 specifically includes:
for a high-order load mechanism with nonlinear factors, the high-order load mechanism transmits the torque of a motor rotating shaft to a screw rod by using a gear, and pushes and pulls a triangular connecting rod mechanism to form a force arm through the stroke of the screw rod to push and pull a swinging mechanism to perform angle deflection, and in order to attach the mechanical property of an actual load in the modeling process, the elastic movement of the screw rod is considered, and the movement equation of the swinging mechanism and the nonlinear coulomb friction in the movement process are considered;
the kinetic equation for this load can be described as:
wherein θ m is the rotor mechanical angle, g r is the gear reduction coefficient, n r is the screw rod reduction coefficient, L is the screw rod retraction/extension amount, K s is the rotation combined stiffness, and F is the acting force of the screw rod under the pushing of torque; m e is the mass of the screw rod, B e is elastic damping, and DeltaL is the compression stroke of the screw rod; t L is motor shaft load moment, effi is transmission efficiency; m is swing moment, K p is precession combined rigidity, and r is arm length; alpha is the swing angle of the swing mechanism, J b is swing inertia, B b is swing damping, K delta is position resistance moment, M f is friction moment, and the model is coulomb friction, and the expression is as follows:
The load model considers the elastic deformation of the screw rod, and if the screw rod is only started from the geometric relationship of the triangular connecting rod mechanism, an equivalent triangular structure formed among the screw rod edge, the swinging mechanism and the fixed fulcrum can be formed; irrespective of the elastic deformation of the screw, the relationship between the retracting/extending amount L of the screw and the deflection angle α is approximately expressed as:
Wherein a and b are respectively two adjacent sides OA and OB of the swing angle in the triangular connecting rod mechanism; alpha 0,L0 is the swing center angle AOB and the edge AB length when the load deflection angle is 0.
Further, step S3 uses three-loop control to perform basic control on the final position angle for the semi-closed loop electric servo system under the FOC control frame, so as to ensure the stability and robustness of the system operation, wherein:
S31, constructing a current loop for controlling torque as a decoupling current controller, and decomposing d and q two-axis voltage terms which are originally coupled with each other into a linear term and a nonlinear term:
Wherein v d1 and v q1 can be controlled by a linear PID current controller, and the nonlinear terms v d0 and v q0 can be calculated from the rotor speed value of the encoder:
s32, connection of an input error e (t) and an output control value u (t) in the PID controller:
in the ADRC controller, S33, a relationship between the input θ (t) and the output θ v1(t),θv2 (t) of the differential controller is:
Wherein, Representing the derivative of θ v1 (t) with respect to time t;
In S34, in the ADRC controller, the relationship between the state expansion observer θ (t) and the output θ z1(t),θz2(t),θz3 (t) is:
wherein, fal (e (t), alpha, delta) is an error filter, and the relation between the input e (t), alpha, delta and the output is:
In the ADRC controller, S35, the specific relationship between the output θ v1(t),θv2 (t) of the reference signal generator from the differential controller and the output θ z1(t),θz2(t),θz3 (t) of the state dilation observer is:
Wherein sat (x, x max) represents a saturation function, the output is x max when the input x is greater than x max, is-x max when x is less than-x max, otherwise is x;
S36, taking the difference between the fed-back motor rotation speed omega (t) and the rotation speed command omega * (t) as the input of the rotation speed PI controller, and taking the output of the controller as the reference value of the q-axis current
S37, constructing a position control ring based on a rotating speed controller, specifically a triangular connecting rod structure based on a load, and approximately converting the command deflection angle alpha ref into the screw rod precession lengthA differential controller for inputting ADRC and converting the ADRC with rotor angle feedback signal theta (t) to precession lengthThe state expansion observer of the ADRC is input, and the reference rotating speed output by the ADRC reference signal generator is used as the output of the position loop.
Further, the specific steps of the method for alleviating the problem of insufficient control precision caused by the position feedback error comprise:
S41, determining the observed value of the agent as ADRC (ADRC) and outputting theta v1(t),θv2 (t) by a differential controller, outputting e 1(t),e2 (t) by a reference signal generator, and commanding the converted screw rod precession length of the deflection angle alpha ref And feedback ofError e θ (t), reference rotational speed ω * (t) and reference currentAnd corresponding rotational speed feedback ω (t) and current feedback i q (t), the values of which are performed, and the state space s of the agent is determined as:
S42, taking the output continuous action a of the intelligent agent as the q-axis voltage input of the PMSM, so that the PMSM generates electromagnetic torque under the drive of given voltage, and driving the servo system to respond to given instructions;
S43, for improving the comprehensive performance of the controller, carrying out random step instruction, low-frequency sinusoidal instruction and high-frequency sinusoidal instruction mixed training, wherein each Episode selects a task as an instruction with equal probability and lasts for 10 seconds;
s44, in order to reduce the error between the swing mechanism angle alpha and the command angle alpha ref, the square of the difference between the command and the reference value of each loop in the three-loop control is used as the rewarding value of each time step:
S45, setting actor as an MLP of a 2-layer hidden layer in consideration of deployment requirements on an embedded chip; setting critic network as MLP of 4 hidden layers; during training, the agent sampling time was set to 0.01s.
The beneficial effects are that: aiming at the situation that only motor feedback signals such as the rotation angle of a permanent magnet synchronous motor can be measured, but the actual position of a load mechanism cannot be measured, the method provided by the invention considers the high-order nonlinear characteristics of a load model, and provides a method for improving the control precision and response speed of the traditional PID three-loop control in the face of a high-order nonlinear load model by adopting the traditional PID three-loop controller as a basis and using a dual-delay depth determination strategy gradient algorithm to train a tuning strategy network so as to observe the feedback quantity of the permanent magnet synchronous motor and the feedback position tuning value of an output position loop.
Drawings
FIG. 1 is a diagram of the overall control flow model architecture in the present invention;
FIG. 2 is a schematic modeling diagram of a permanent magnet synchronous motor according to the present invention;
FIG. 3 is a diagram of the equivalent triangle formed between the lead screw edge, the swing mechanism and the fulcrum in the present invention;
FIG. 4 is a schematic representation of the load mechanism of the present invention;
FIG. 5 is a block diagram of a decoupled current controller based on PID control in accordance with the invention;
FIG. 6 is a diagram of a speed loop and position loop architecture based on a PID controller in accordance with the invention;
FIG. 7 is a graph of the q-axis current versus step current command controlled by the current controller without consideration of voltage clipping (FIG. 7 (a)) and with use of voltage clipping (FIG. 7 (b)) in the present invention;
FIG. 8 is a schematic diagram showing the phenomenon that the error between the command angle precession distance and the approximate precession distance of the rotating shaft position in the position ring converges (FIG. 8 (a)) and the actual angle of the swinging mechanism and the command angle differ greatly (FIG. 8 (b));
FIG. 9 is a schematic diagram of control logic for an agent as a PMSM input voltage controller according to the present invention, where Uq is the response command of the agent as a PMSM input voltage controller control system;
FIG. 10 is a schematic diagram showing the phenomenon of the oscillation of the output value of the agent (FIG. 10 (a)) and the oscillation of the state quantity of the motor observed by the agent (FIG. 10 (b)) in the present invention;
FIG. 11 is a comparison of the effects of PID scheme, ADRC scheme and reinforcement learning scheme of the present invention at different orders (FIGS. 11 (a) - (d)).
Detailed Description
For a detailed description of the disclosed embodiments of the present invention, the present invention is further described below with reference to the accompanying drawings and detailed description.
Firstly, the key problem to be solved by the method is how to improve the control algorithm to improve the tracking precision of the actual load position to the instruction position under the condition that only motor parameters such as the motor rotation angle and the like can be measured and the actual position of the high-order nonlinear load mechanism cannot be measured when the three-ring position control is performed on the permanent magnet synchronous motor.
The main design idea of the invention is to use data-driven deep reinforcement learning, to use empirical data which can be acquired by a sensor in the test process but cannot be observed in the actual operation process to perform model pre-training, to use a dual-delay depth determination strategy gradient algorithm to optimize a strategy network, so that the strategy network can take a real-time observable feedback value or a computable value of a motor as a state quantity, and to calculate the optimal PMSM control voltage, and to drive a transmission device to drive a servo system to respond to instructions by a permanent magnet synchronous motor. The overall control flow provided by this method is shown in fig. 1.
The construction and training process of the intelligent control method of the permanent magnet synchronous motor servo system in the semi-closed loop scene comprises the following steps:
Step 1: constructing mathematical model under FOC frame of permanent magnet synchronous motor
The rotor in a permanent magnet synchronous motor is composed of a rotor core and permanent magnets arranged around the core, and the magnetic flux (magnetic induction intensity) density distribution generated by the paired magnetic poles in the air gap is similar regardless of the rotor arrangement, and in the physical modeling, it is generally assumed that the magnetic flux density distribution generated by the permanent magnet poles mounted on the rotor surface or embedded in the core in the air gap of the motor is sinusoidal. Therefore, the fundamental wave of the curve of the magnetic density is regarded as an ideal magnetic density distribution. Meanwhile, for the sinusoidal magnetic density signal, the coordinate axis of the sinusoidal magnetic density signal is defined as a magnetic field angle theta r, the logarithm of a permanent magnet arranged on a rotor core is defined as p, and the relation between the mechanical angle theta m of rotor rotation and the corresponding magnetic field angle is as follows:
meanwhile, one permanent magnet pole axis (sine wave extreme point) is defined as the d axis. Between the two magnetic poles, the angle of the magnetic field is 90 degrees different from that of the d-axis magnetic field, and the position where the magnetic flux is 0 is q-axis.
The configuration of the rotor results in the permanent magnet synchronous motor being divided into two categories: salient pole machines and non-salient pole machines. Wherein the salient pole motor is a built-in magnet; the non-salient pole motor is a surface-mounted magnet. The distinction between salient pole machines and salient pole machines is of interest because the permeability of permanent magnets is almost the same as free air, while the permeability of the core far exceeds air (ferromagnetic).
The magnetic induction intensity of a certain point in space according to the ampere law is directly proportional to the magnetic permeability of the point. Thus, consider a magnetic field of constant magnetic field strength generated by an energized solenoid: when the rotor rotates, no matter which direction the surface-mounted motor rotates to, the radial length of the iron core through which the magnetic field lines pass is the same, namely the magnetic resistance of the magnetic paths is the same; for a salient pole motor, the rotor has the minimum number of iron cores and the maximum magnetic resistance in a magnetic circuit when rotating to a d axis; and by q-axis, the iron core is the largest, the magnetic resistance is the smallest, and the magnetic air gap is uneven, and the phenomenon is called magnetic saliency.
The stator windings of the motor are essentially energized solenoids with different positions and directions, and the coils are distributed in stator slots at the periphery of a stator core in 120-degree displacement and are named A, B, C-phase windings. In practical circuits, the tail parts of the ABC three-phase windings are commonly connected to form a triangle connecting circuit, in this case:
ia+ib+ic=0
for the three-phase motor windings, three-phase alternating currents with 120-degree phase difference are respectively applied, and the rotating magnetic field can be spatially synthesized by the time variable of the three-phase sine.
For a rotating rotor, if the rotating magnetic field is guaranteed to be consistent with the rotating speed of the rotor and the magnetic phase is constant, the interaction of the magnetic fields can generate constant torque, namely magnetic torque. For salient pole machines, another type of torque, reluctance torque, is also generated that pushes the rotor to rotate with the load.
And defining an a-b-c reference coordinate system by taking the direction of the square magnetic field generated by the a-b-c three-phase winding as the coordinate axis direction. In this coordinate system, the phase variable in the time domain may be denoted as f a、fb、fc, where f may represent the phase voltage, phase current, and flux linkage. Considering faraday's law of electromagnetic induction and ohm's law, the three-phase voltage can be expressed as:
According to the above discussion of the rotor, for the surface-mounted rotor motor, the self inductance and the mutual inductance magnetic permeability of each stator coil are unchanged, and the relationship between the self inductance coefficient and the mutual inductance coefficient of the rotor and the magnetic angle is arbitrarily configured:
wherein, for a surface mount motor, L 2 =0. Therefore, consider the flux linkage value of the three-phase winding of abc as the flux linkage of self-inductance and mutual inductance plus leakage of the permanent magnet into the coil:
wherein lambda m is the maximum flux linkage of the N pole of the permanent magnet to one coil, and the two formulas are substituted to obtain:
the above-described motor model in the stationary reference frame has problems of parameter variation with time, which complicate the control system design. This control complexity due to rotation can be solved by projecting the phase change amount of the model to two models under a rotating reference frame. The d-q coordinate system has two orthogonal axes fixed on the rotor, namely the d axis of the permanent magnet magnetic pole of the rotor and the q axis orthogonal to the d axis.
Consider transforming a motor model from a three-phase stationary a-b-c reference frame to a two-phase rotating frame: it is first necessary to know the angle θ r of the d-q axis relative to the stationary a-b-c coordinate system, then the three-phase variable is projected onto the d axis to get f d and onto the q axis to get f q. Mathematically, the transformation process can be solved mathematically using Park transforms, whose matrix form is as follows:
Wherein the coefficients are To ensure that the transformed amplitude remains equal, again because for the phase variables:
fa+fb+fc=0
the transformation matrix is thus reversible, adding to the constraint, and the transformation from the d-q reference frame to the a-b-c reference frame can be achieved by:
Where f o is the 0 component. FOC control converts a three-phase rotating magnetic field into a rotor d-q axis relative rotating change through Park change, and when a three-phase variable a-b-c is an unbalanced sinusoidal signal under the condition of motor starting or sudden load, the d-q axis variable is generally of a time-varying model; when the motor runs in a steady state, the rotating magnetic field created by the phase change quantity and the rotor keep relatively static, and d-q axis variables become some direct current signals. In this case, the variable controlling the d-q axis is equivalent to controlling two equivalent solenoids that are always level or perpendicular to the magnetic axis, and the corresponding motor control algorithm becomes relatively simple.
The Park variation is transformed to the above equation to obtain the voltage equation of the rotor reference system as follows:
Wherein v d and v q are the stator voltages of the d-axis and q-axis, respectively; i d and i q are stator currents of d-axis and q-axis, respectively; lambda d and lambda d are the stator flux linkages of the d-axis and q-axis respectively, their values are:
Wherein L d and L q are d-axis and q-axis inductances, respectively, and λ m is d-axis magnetic flux with opposite poles:
in the d-q coordinate system, the expression of the mutual inductance coefficient becomes constant for each phase which is transformed with rotation. And (3) combining the two modes to obtain a motor current-voltage equation:
Where v d,vq is the system input (control quantity), their input determines the current and torque. Under a rotor coordinate system, the instantaneous input power of the permanent magnet synchronous motor during operation is as follows:
Wherein, To compensate for the coefficients multiplied when doing Park transforms. Note that the term R sid is a resistive voltage drop term and does not contribute to the final motor output power; The term is the field drop term, which electrical power is stored in the magnetic field and therefore does not contribute to the final motor output power. The actual value of the conversion of electrical energy into mechanical power is therefore:
According to the torque theorem:
M=P/ω
The electromagnetic torque is:
the permanent magnet synchronous motor is connected with a mechanical load, and the dynamics of the mechanical part of the motor are described by the following formula:
Wherein T L is load torque, B is motor bearing viscosity coefficient, and J is total rotational inertia of the motor and the load. Finally, the whole permanent magnet synchronous motor can be built into a simulink sub-module to facilitate calling, as shown in fig. 2. The module correspondingly inputs the voltages of the d-q axes, outputs the angle and the angular speed of the motor according to the external load moment T L, and detects the d-q axis current through a current detector.
Based on the analysis, aiming at an actual motor model, the invention constructs a permanent magnet synchronous motor operation mathematical model based on the FOC control frame and is realized in a Simulink.
Under FOC control, the motor system takes as input the voltage command values of the d and q axes from the controller, and inputs the torque to the load mechanism, which is expressed as the motor rotor rotation angle.
The FOC strategy of the permanent magnet synchronous motor decomposes the phase change quantity of the motor into a magnetic field component and a torque component, and independently controls the magnetic field component and the torque component, so that the accurate control of the magnetic field and the torque of the motor is realized. It requires the acquisition of rotor position and speed information of the motor for the transformation calculation between the phase variables in the stationary coordinate system and the relative rotation variables of the d-q coordinates. In the d-q coordinate system, the motor equation can be described as:
Wherein: r s is the stator resistance; v d,vq is d-axis and q-axis voltage, respectively; i d,iq is d-axis and q-axis current, respectively; l d,Lq is d-axis and q-axis inductance respectively; lambda m is the d-axis magnetic flux of the permanent magnet; t e,TL is output torque and load torque respectively; b is the viscosity coefficient of the bearing; j is the total rotational inertia of the motor and the load; omega mr is the mechanical angular velocity of the rotor and the electromagnetic angular velocity of the rotor, and p is the number of pairs of permanent magnets, satisfying omega r=p×ωm. The module responds to the input d-q two-axis voltage, outputs a motor angle theta m and an angular speed omega r according to an external load moment T L, and detects the d-q axis current through a current detector.
The motor type is determined as a surface-mounted pmsm, according to actual requirements, while taking into account that permanent magnet synchronous motors operating in steady state must be kept under certain operating limits, the invention requires that the rotor speed and stator current should be kept within threshold values, i.e. that:
Lq=Ld
r|≤ωlimit
Wherein ω limit is the maximum value of the rotor speed; i limit is the stator current maximum.
Step 2: and constructing a mathematical model of the load and the transmission mechanism of the load.
In combination with the actual demand, a high-order load mechanism with nonlinear factors is determined. The mechanism transmits the torque of the motor rotating shaft to the screw rod by using a gear, and the triangular connecting rod mechanism is pushed and pulled through the stroke of the screw rod to form a force arm to push the swinging mechanism to perform angle deflection. In order to fit the mechanical property of the actual load in the modeling process, the elastic movement of the screw rod, the movement equation of the swinging mechanism and the nonlinear coulomb friction in the movement process are considered. The kinetic equation for this load can be described as:
Wherein θ m is the rotor mechanical angle, g r is the gear reduction coefficient, n r is the screw rod reduction coefficient, L is the screw rod retraction/extension amount, K s is the rotation combined stiffness, and F is the acting force of the screw rod under the pushing of torque; m e is the mass of the screw rod, B e is elastic damping, and DeltaL is the compression stroke of the screw rod; t L is motor shaft load moment, effi is transmission efficiency; m is swing moment, K p is precession combined rigidity, and r is arm length; alpha is the swing angle of the swing mechanism, J b is swing inertia, B b is swing damping, K delta is position resistance moment, M f is friction moment, and the model is coulomb friction, and the expression is as follows:
The load model considers the elastic deformation of the screw rod, and if the geometric relationship of the triangular connecting rod mechanism is only considered, an equivalent triangular structure formed among the screw rod side, the swinging mechanism and the fixed pivot point can be represented by the figure 3. The relationship between the retracting/extending amount L of the screw and the deflection angle alpha is approximately expressed as
Wherein a and b are respectively two adjacent sides OA and OB of the swing angle in the triangular connecting rod mechanism; alpha 0,L0 is the swing center angle AOB and the edge AB length when the load deflection angle is 0.
The integral load is modeled as a high-order nonlinear system, and a complex differential equation relation is arranged between the load swing angle and the rotation shaft angle of the motor observability, so that the traditional error-based closed-loop controller is difficult to obtain good precision and response speed performance in the scene, and an optimization space is reserved for the data-based artificial intelligence algorithm.
And constructing an equivalent load module in Matlab/Simulink to realize the equation set, as shown in figure 4. The module takes motor rotor angle θ m as input, and outputs feedback to motor load torque T L and control quantity α.
Step3: building basic three-ring controller and scheme
In the invention, the PID controller makes a difference between an external input reference value of a controlled object and a feedback value of a measuring element, and linearly superimposes the difference value, an integral value in time and the differential value as an output control quantity to a next executing mechanism. The relationship between the input error e (t) and the output control value u (t) is:
wherein K p,Ki,Kd is the proportionality coefficient of three phases, and can be manually adjusted, or can be automatically adjusted by means of parameter setting or other optimization algorithms.
In the FOC framework, vector control is a method of controlling torque by controlling i d and i q currents. Thus, vector control is related to the innermost control in the motor drive system, and subsequent speed and position control should be performed on the basis of current control. The surface-mounted permanent magnet synchronous motor considered by the invention comprises:
The electromagnetic torque is a linear function of i q, i d has no influence on the torque, and any d-axis current work can cause waste of input power (mainly dissipated in resistance and magnetic field). Thus, control i q controls torque while maintaining i d =0, maximum torque to current ratio control (MTPA) is achieved, i.e., maximum torque output at any stator current.
In the rotor reference frame, the motor model is affected by the cross-coupling of the speed voltage terms (i.e., ω rLqiq and ω rLdidrλm). This term may dominate the voltage equation, especially at high speeds. This in practice impairs the performance of the PI controller and therefore requires a decoupling circuit as a current control scheme for vector control. To linearize the control of i d and i q, the d-axis voltage and the q-axis voltage can be provided by a combination of two signals, respectively:
Wherein v d1 and v q1 can be controlled by linear PI current control, and the nonlinear terms v d0 and v q0 can be calculated from the rotor speed value of the encoder:
On the basis of the decoupling current controller, the difference between the fed-back motor rotating speed omega fb and the rotating speed command omega ref is used as the input of the rotating speed PI controller, and the output of the controller is used as the reference value of the q-axis current. At the same time, the limitation of the system on the rotation speed is noted, and the motor rotation speed reference value omega ref is limited in the (-omega limitlimit) range. And constructing a position control ring on the basis of the rotating speed controller. Considering the triangle connecting rod structure of the load, the deflection angle alpha ref of the instruction swinging mechanism can be approximately converted into the screw rod precession length by using the cosine formula And the calculated precession length L fb is calculated by the rotor angle feedback signal theta fb to be input into a position loop PI controller, and the speed loop reference value is output.
The Simulink modeling of decoupled current control in the above three loop control is shown in fig. 5, and the modeling of the position loop and the velocity loop is shown in fig. 6. The PMSM three-ring control model is linear control for controlling the angle of the actual swinging mechanism, and can meet certain requirements on precision and response speed under the condition of ensuring the robustness of the system. And the motor current and the rotating speed value can be ensured not to exceed the system limit. However, the above-described three-loop control flow is poor in the final control performance, which is mainly caused by the following problems:
On the one hand, the equivalent voltage of the output of SVPWM in an actual system cannot exceed the inverter limit (220V), so that the output value of the decoupling current controller shown in fig. 5 needs to be limited, and the limiting function breaks the linearity of the linear term of the decoupling current controller, fig. 7 shows the response curve of the q-axis current controlled by the controller to the step current command reference value before considering the threshold limit and after adding the threshold limit, it can be observed that the q-axis current without adding the threshold limit can track to the command size at a higher speed under the same controller parameter configuration, and the q-axis current curve shows high irregularity with a large overshoot and long convergence time after adding the threshold limit. This non-linearity reduces the response speed of the current loop to a certain extent, and increases the risk that the actual current exceeds the system limit, reducing the performance and reliability of the controller.
On the other hand, the system cannot observe the actual angle of the swing mechanism, and only the command angle alpha ref and the rotor position feedback theta can be approximately converted into a screw rod processAndIrrespective of the complex differential equation relationship between the motor rotor angle and the wobble mechanism angle, this results in a situation where the system, when faced with low frequency signals, has a position ring error much greater than the actual wobble angle, as shown in fig. 8.
Step 4: permanent magnet synchronous motor control scheme for constructing deep reinforcement learning drive
The PPO is used for training, and the intelligent body directly outputs PMSM control voltage to drive the transmission device to drive the servo system to move so as to minimize the long-term errors of the reference curve and the actual state of the servo system. The reinforcement learning intelligent body is directly used as the PMSM voltage controller, so that the self-adaptability of the intelligent model in the control process can be exerted to the greatest extent, and the limitation of the output of the intelligent body by three-loop control is avoided; the three-loop control scheme typically adds clipping at the output position of each loop to ensure control safety in extreme states, however this also limits the effect of the agent, since the agent cannot optimize the original three-loop control output by tuning once a certain loop of the three loops reaches clipping. While the agent acts as a voltage controller for the PMSM, the only limitation is that its output should be within the voltage range that the PMSM can withstand, which has greater flexibility and adaptation than the agent acts as a feedforward compensator. However, since the reinforcement learning agent itself targets the maximization of the reward function, its output is subject to oscillations in the time domain; in addition, since the state quantity of the agent contains rapidly changing rotation speed and current, if the process of adding no noise and oscillation further aggravates the oscillation of the agent, which results in unsatisfactory control effect and even causes control safety problem, when the agent is directly used as a PMSM voltage controller, it is necessary to use not only an ADRC controller with noise interference resistance in a position loop to suppress noise in a feedback signal, but also a kalman filter for rapidly changing rotation speed and current and a low pass filter for the agent output to maximally suppress the oscillation of the agent output in a time domain.
The observed value of the intelligent agent is determined as the output theta v1(t),θv2 (t) of the differential controller of ADRC, the output e 1(t),e2 (t) of the reference signal generator and the converted screw rod precession length of the command deflection angle alpha * (t)And feedback ofError e θ (t), reference rotational speed ω * (t) and reference currentAnd corresponding rotational speed feedback ω (t) and current feedback i q (t).
In order to better optimize the training algorithm, the input value is subjected to dimensionalization processing and certain amplitude limiting, meanwhile, a Kalman filter is added to two feedback quantities omega and i q, and a low-pass filter is added to the output of the intelligent agent.
In order to improve the performance of the controller in a comprehensive way, the task faced by the reinforcement learning controller should be as rich as possible to contain all possible instruction modes, including random step instructions, low-frequency sine instructions and high-frequency sine instructions.
To reduce the error between the swing angle α and the command angle α ref, the square of the difference between the reference value and the feedback value of each of the three rings is used and the opposite number is taken as the reward value of each time step, and the fixed coefficient is multiplied to balance the weights among the three rings:
To the state Using an optimization strategy functionEnabling the agent to output a tuning valueFor this purpose, two Q networks need to be established to evaluate the output of the policy network and to use the evaluation value to perform gradient optimization on actor networks.
Meanwhile, the agent sampling time was set to 0.01s.
In particular, the invention includes the following considerations:
1) The invention installs the agent to the voltage input end of the permanent magnet synchronous motor, the agent driven by reinforcement learning directly controls the input voltage, and the output amplitude limit of the reinforcement learning agent is 220 x 2/3V because the input voltage is q-axis voltage.
2) The state space variables for reinforcement learning should be values that can be collected or calculated by the controller. Wherein i d is close to 0, since the d-axis current i d, the q-axis current i q are already managed by the decoupled current controller.
3) In order to better optimize the training algorithm, the input values are subjected to dimensionalization processing, so that the input quantity is maintained to be 10 0~102 in various working states. At the same time, for abrupt position instructions (e.g. steps), the derivative value will become a large number, resulting in a pathological empirical sample, and therefore the derivative value will need to be limited to a certain range.
Because of the highly non-linearity of neural networks, at the beginning of training, the output of the network tends to take a boundary value and oscillate back and forth in both the upper and lower boundaries. In the motor model, the oscillations of the rotational speed command will cause the motor feedback value to oscillate, which in turn causes the reinforcement learning controller to oscillate, eventually causing the motor to be in a highly unstable state, as shown in fig. 10. The use of such pathological data as the empirical value of the agent often fails to learn anything from it, and in many cases even if the agent has undergone many rounds of training, whether or not the effect of the controller is achieved, the resulting output is still oscillating repeatedly, which is not allowed in the actual point motor control, even with the risk of damaging the motor. To alleviate this, a kalman filter may be added to the two feedback quantities ω and i q of the motor. After such an operation, even if the system oscillation occurs at the beginning of training, the presence of the kalman filter filters the input state into a low frequency signal, and the output is also a low frequency signal in gradual training, so that the training stability is ensured to a certain extent.
4) In order to improve the performance of the controller in combination, the task faced by the reinforcement learning controller should be as rich as possible to contain all possible instruction patterns. According to the automatic control principle, the conventional performance analysis method of the linear system comprises the following steps: the response capability of the device facing to the step instruction is inspected in a time domain, or the open-loop amplitude-frequency characteristic and the open-loop logarithmic-phase frequency characteristic are inspected in a frequency domain, and the training tasks are divided into the following three types by combining with a final adopted performance evaluation scheme:
a. random step instruction: zero steady state (each derivative is 0) is used as an initial state, a new random step target is generated every 2.5 seconds, and the step range d epsilon-3.5 degrees and 3.5 degrees are ensured.
B. A low frequency sinusoidal instruction: the zero steady state is used as an initial state to generate a low-frequency sinusoidal command delta c=δcm.sin (ωt) with amplitude delta cm epsilon [1.5 DEG, 4 DEG ] [2 pi multiplied by 0.05,2 pi multiplied by 0.1] (rad/s) of circular frequency omega epsilon.
C. High frequency sinusoidal instructions: the zero steady state is used as an initial state to generate a high-frequency sinusoidal command delta c=δcm.sin (ωt) with amplitude delta cm epsilon [0.3 DEG, 1.2 DEG ], and circular frequency omega epsilon [1,20] (rad/s).
As the training proceeds, each Episode selects a task as an instruction with equal probability and for 10 seconds.
5) The setting of the reinforcement learning reward function directly affects the optimization of the algorithm. Considering that the invention aims to optimize the error of the swing mechanism angle alpha and the command angle alpha ref, the square of the difference between the reference value and the feedback value of each loop in the three-loop control can be used as the reward value of each time step, such rewards are not sparse rewards, and meanwhile, the range of rewards can be stabilized within [ -200,200] through the adjustment of coefficients, which is possibly beneficial to training.
6) During training, for an input agent state value:
it is desirable to train out an optimal strategy function pi so that the optimal compensation value can be output for s:
the PPO training algorithm improves the strategy gradient method by limiting the updating amplitude of the strategy, so that the training process is more stable and efficient, and the specific training process is as follows:
for each step subscript i e [1,2,3 … T/Ts ] of the agent time step size Ts, which is often T, the agent takes the state s i of pattern 4.4, and generates an action a with the policy network:
The next state s i+1 caused by the action is taken as a training sample [ a i,si,ri,si+1 ] experience playback buffer memory together with the state s i of the step, the action a i and the step reward value r i calculated by the environment Is a kind of medium. When (when)The number of samples in (a) reaches a mini-batch of size N, each step is taken fromSamples of a batch are sampled, and a time sequence difference error or generalized dominance estimation is calculated to calculate a dominance function.
Next, the PPO will optimize the objective function, first, the PPO defines the probability ratio of the old policy and the new policyAnd defining a clipping objective function:
Where ε is the hyper-parameter of the clipping range.
The PPO would then update the policy network parameters using a random gradient ascent (Adam optimizer) to maximize the clipping objective function L CLIP (θ).
At the same time, PPO will update the value network parameters using a random gradient descent method (Adam optimizer) to minimize the loss function of the value network:
finally, updating the parameters of the target network according to the smooth proportion tau, and repeating the steps until reaching the training termination condition:
θ'←τθ+(1-τ)θ’
7) Considering the deployment requirement on an embedded chip, actor networks should not be set too large, and the invention sets the network as MLP of 2 hidden layers; whereas critic networks are set to the MLP of the 4-layer hidden layer. Meanwhile, the sine command frequency input by the controller can reach 20rad/s, the training speed is not too slow, and the intelligent body sampling time is set to be 0.01 s. Other training parameters were set as follows:
to comprehensively evaluate the performance improvement of the reinforcement learning-based position loop tuning scheme over the conventional PID method, the performance is evaluated by the following three indexes.
1) Load location characteristics
Under the condition of maximum load, the angle instruction sequence alpha ref is used as instruction input, and the actual swing angle sequence alpha fb of the swing mechanism in the time range of t epsilon (11, 31) is acquired, wherein:
the position loop curve is plotted with α ref as the abscissa and α fb as the ordinate. The nominal position curve is a midpoint connecting line of a position loop curve on a transverse axis, the nominal position datum line is first-order linear fitting of the nominal position curve, and tracking precision in a time domain is achieved through a loop width and zero offset analysis algorithm. Wherein the method comprises the steps of The maximum swing angles in the positive and negative directions represent the tracking capability of the limiting position of the approaching instruction; the maximum loop width of Δδ max characterizes the maximum value of the tracking error; the delta 0 null measures the symmetry of the control algorithm in both the positive and negative directions in the face of a sinusoidal low frequency such symmetric signal.
In connection with fig. 11 (a) and the following table, the DRL method is superior to the PID and ADRC methods in terms of maximum swing angle, maximum loop width, and zero bias in the control process, which proves that the DRL control method has more excellent tracking longitude through data-driven empirical learning.
2) Experiment of speed characteristics
Under the condition of maximum load, taking a step instruction with the amplitude of 3 degrees as a system input alpha ref, collecting a load actual position sequence of a time period with the swing angle in the range of (0.5 degrees and 1.5 degrees), and averaging the swing angular speed through the swing angleThe response speed of the algorithm in the time domain is analyzed, and the experimental results are shown in the following table. As can be seen by combining fig. 11 (b) and the following table, the PID control method has the fastest rising speed, but the overshoot is the highest as well, proving that the PID control method is the least stable despite the fast response; the ADRC control method is contrary to it, being most stable despite the slowest response; the DRL method is interposed between the two methods, has a relatively high response speed and relatively stable performance.
3) Frequency characteristic experiment
At maximum load conditions, sinusoidal command α ref =δ·sin (ωt) is taken as α ref, where δ=0.5 °,0.8 °,1.1 °, ω=2, 4,8,16 (rad/s). Each frequency and amplitude simulates 6 instruction cycles. By measuring phase attenuation of the swing angle output compared to the input commandAnd amplitude attenuation L, and analyzing the response characteristic of the algorithm in the frequency domain. Orthogonal decomposition of α ref and α fb on sine and cosine bases at the same frequency as α ref yields the sine and cosine components α ref,bref and a fb,bfb of α ref and α fb by the following equation. Calculating relative standard excitation amplitude And phase angle
Calculating gain L (dB) and phase lag of actual angle signal relative to command signal by the above methodThe results are recorded below.
As can be seen from a combination of fig. 11 (c), 11 (d) and the following table, the controller under the DRL method is generally superior to the PID method in terms of phase decay index.
The PID method, the ADRC method and the DRL method are quantitatively measured through the three test methods, and compared with each other, the control performance of the DRL method on various indexes is known to be due to the PID method and the ADRC method on most indexes, and the system obtains better tracking precision and response speed.

Claims (7)

1.一种半闭环场景下永磁同步电机伺服系统智能控制方法,其特征在于,该方法包括构建基于FOC控制框架的永磁同步电机运行模型和负载及其传动机构的数学模型,然后以三环结构为基础,使用强化学习智能体确定状态空间和动作设置,以提升半闭环场景下负载摆动角度对指令目标角度的跟踪精度和响应速度为目标,使用仿真软件对系统模型进行建模并利用深度强化学习方法获得电机位置反馈的调优策略网络,以输出最优的控制电压;1. An intelligent control method for a permanent magnet synchronous motor servo system in a semi-closed loop scenario, characterized in that the method includes constructing a permanent magnet synchronous motor operation model based on a FOC control framework and a mathematical model of a load and its transmission mechanism, and then using a three-ring structure as a basis, using a reinforcement learning agent to determine the state space and action settings, with the goal of improving the tracking accuracy and response speed of the load swing angle to the command target angle in a semi-closed loop scenario, using simulation software to model the system model and using a deep reinforcement learning method to obtain a tuning strategy network for motor position feedback to output the optimal control voltage; 所述方法还通过传感器在测试过程中采集到的经验数据对模型进行预训练,然后基于双延迟深度确定策略梯度算法优化策略网络,使得策略网络能够将电机实时的可观测反馈值或可计算值作为状态量,并由此计算最优的PMSM控制电压,由永磁同步电机驱动传动装置带动伺服系统响应指令;The method also pre-trains the model through empirical data collected by the sensor during the test process, and then optimizes the policy network based on the double-delay depth determination policy gradient algorithm, so that the policy network can use the real-time observable feedback value or computable value of the motor as the state quantity, and thereby calculate the optimal PMSM control voltage, and the permanent magnet synchronous motor drives the transmission device to drive the servo system to respond to the instruction; 在上述使用强化学习智能体确定状态空间和动作设置的过程中,在位置环利用自抗扰算法提升控制系统的抗干扰能力,在速度环利用PI控制,在电流环利用强化学习算法训练策略网络,通过观测位置环输入的反馈位置和输出的参考转速、速度环输入的反馈速度和输出的参考电流、反馈电流,决策输出施加在PMSM的电压,以提升系统性能,使用PPO强化学习算法进行智能体优化;In the above process of using reinforcement learning agent to determine the state space and action settings, the anti-disturbance algorithm is used in the position loop to improve the anti-disturbance ability of the control system, PI control is used in the speed loop, and the reinforcement learning algorithm is used in the current loop to train the strategy network. By observing the feedback position of the position loop input and the reference speed of the output, the feedback speed of the speed loop input and the reference current and feedback current of the output, the decision output voltage applied to the PMSM is made to improve the system performance, and the PPO reinforcement learning algorithm is used for agent optimization; 该方法在使用状态和动作空间都连续的PPO强化学习算法进行智能体优化,使其观测当前PMSM状态与给定参考值之间的差距,并在仅依赖近似丝杆进动长度作为输入的情况下,预测其与实际丝杆进动长度θloc之间的偏差,输出控制PMSM的电压,以优化电机控制过程中系统与给定指令之间的误差,并缓解位置反馈误差导致的控制精度不足的问题。This method uses the PPO reinforcement learning algorithm with continuous state and action space to optimize the intelligent agent, so that it observes the gap between the current PMSM state and the given reference value, and predicts the deviation between it and the actual screw precession length θ loc when only relying on the approximate screw precession length as input, and outputs the voltage to control the PMSM, so as to optimize the error between the system and the given instruction during the motor control process and alleviate the problem of insufficient control accuracy caused by position feedback error. 2.根据权利要求1所述的半闭环场景下永磁同步电机伺服系统智能控制方法,其特征在于,该方法的实施步骤包括:2. The intelligent control method of a permanent magnet synchronous motor servo system in a semi-closed loop scenario according to claim 1 is characterized in that the implementation steps of the method include: S1、构建基于FOC控制框架的永磁同步电机运行模型,永磁同步电机的FOC策略将电机的相变量分解为磁场分量和转矩分量,并对其进行独立控制,实现对电机磁场和转矩的精确控制;S1. Construct a permanent magnet synchronous motor operation model based on the FOC control framework. The FOC strategy of the permanent magnet synchronous motor decomposes the phase variable of the motor into magnetic field component and torque component, and controls them independently to achieve precise control of the motor magnetic field and torque; 在FOC控制下,电机系统将来自控制器的d,q两轴的电压指令值作为输入,输入向负载机构的动力矩,表现为电机转子转动角度;Under FOC control, the motor system takes the voltage command values of the d and q axes from the controller as input, and inputs the dynamic torque to the load mechanism, which is expressed as the rotation angle of the motor rotor; S2、构建负载及其传动机构的数学模型,电机负载模型考虑包括传动机构的弹性形变、摆动机构的动力学方程,同时还考虑库伦摩擦、三角连杆机构中的力矩传递在内的非线性因素;S2. Construct a mathematical model of the load and its transmission mechanism. The motor load model takes into account the elastic deformation of the transmission mechanism, the dynamic equation of the swing mechanism, and also considers nonlinear factors such as Coulomb friction and torque transmission in the triangular linkage mechanism; S3、考虑上述建模中的PMSM和高阶非线性负载模型,在FOC控制框架上构建深度强化学习驱动的边缘智能控制方案;S3. Considering the PMSM and high-order nonlinear load models in the above modeling, a deep reinforcement learning-driven edge intelligent control solution is constructed on the FOC control framework; 所述的控制方案中,ADRC控制器接收被控对象的外界输入参考值,经过差分控制器输出位置控制的参考值及其变化率;同时,ADRC控制器接收被控对象测量元件的反馈,由于反馈在测量时存在一定的噪声与时域上的振荡,这些都会影响后续深度强化学习智能体的正确输出,因此ADRC控制器首先将反馈信号输入状态扩张观测器,以预测其中含有的噪声,并使用相应的前馈机制去除;最后,ADRC控制器会对差分控制器和状态扩张观测器的输出作差,以输出下一环的参考值;In the control scheme, the ADRC controller receives the external input reference value of the controlled object, and outputs the reference value of the position control and its rate of change through the differential controller; at the same time, the ADRC controller receives the feedback of the measurement element of the controlled object. Since there is a certain amount of noise and oscillation in the time domain during the measurement of the feedback, these will affect the correct output of the subsequent deep reinforcement learning agent. Therefore, the ADRC controller first inputs the feedback signal into the state expansion observer to predict the noise contained therein, and uses the corresponding feedforward mechanism to remove it; finally, the ADRC controller will make a difference between the outputs of the differential controller and the state expansion observer to output the reference value of the next loop; PI控制器将ADRC的参考值和测量元件的反馈值做差,将差值、时间上的积分值线性叠加作为输出控制量给下一个执行机构;The PI controller makes a difference between the reference value of the ADRC and the feedback value of the measuring element, and linearly superimposes the difference and the integral value over time as the output control quantity to the next actuator; S4、构建深度强化学习驱动的永磁同步电机控制方案,使用PPO进行训练,智能体输出PMSM的控制电压驱动传动装置带动伺服系统运动,以最小化参考指令和伺服系统的长期误差;S4. Build a permanent magnet synchronous motor control solution driven by deep reinforcement learning, use PPO for training, and the intelligent agent outputs the control voltage of the PMSM to drive the transmission device to drive the servo system to move, so as to minimize the long-term error between the reference command and the servo system; 智能体观察的值确定为ADRC的差分控制器输出θv1(t),θv2(t),参考信号生成器的输出e1(t),e2(t),指令偏转角度α*(t)的折算丝杆进动长度与反馈的误差eθ(t),参考转速ω*(t)和参考电流以及对应的转速反馈ω(t)和电流反馈iq(t);进一步地的考虑到尽管ADRC控制器能够预测反馈信号中存在的噪声,并使用前馈补偿噪声带来的影响,但前述智能体观察值依然在时域上存在不同程度的振荡,这个振荡主要来自快速变换的转速ω和电流iq,这会导致智能体输出振荡;此外,即便输入本身没有振荡,强化学习智能体自身的输出在时域上同样存在振荡,因为智能体以最小化奖励函数为目标,而忽略了作为PMSM的控制电压时可能会导致的安全隐患,因此,需要为反馈量ω和iq加上卡尔曼滤波器,同时为智能体输出加上低通滤波器,以平滑PMSM的控制电压。The values observed by the agent are determined as the differential controller outputs θ v1 (t), θ v2 (t) of the ADRC, the outputs of the reference signal generator e 1 (t), e 2 (t), and the reduced lead screw precession length of the command deflection angle α * (t) With feedback The error e θ (t), the reference speed ω * (t) and the reference current and the corresponding speed feedback ω(t) and current feedback i q (t); further considering that although the ADRC controller can predict the noise in the feedback signal and use feedforward to compensate for the impact of the noise, the aforementioned agent observations still have varying degrees of oscillation in the time domain. This oscillation mainly comes from the rapidly changing speed ω and current i q , which will cause the agent output to oscillate; in addition, even if the input itself does not oscillate, the output of the reinforcement learning agent itself also oscillates in the time domain, because the agent aims to minimize the reward function and ignores the potential safety hazards that may be caused when used as the control voltage of the PMSM. Therefore, it is necessary to add a Kalman filter to the feedback quantities ω and i q , and add a low-pass filter to the agent output to smooth the control voltage of the PMSM. 3.根据权利要求1或2所述的半闭环场景下永磁同步电机伺服系统智能控制方法,其特征在于,所述方法包括如下的优化过程:3. The intelligent control method of a permanent magnet synchronous motor servo system in a semi-closed loop scenario according to claim 1 or 2, characterized in that the method comprises the following optimization process: 为了训练算法更好的优化,将输入值进行量纲化处理,并做限幅,同时将两个反馈量ω和iq加上卡尔曼滤波器,将智能体的输出加上低通滤波器;In order to optimize the training algorithm better, the input value is dimensioned and limited. At the same time, the two feedback quantities ω and i q are added to the Kalman filter, and the output of the intelligent agent is added to the low-pass filter. 为了综合提高控制器的性能,强化学习控制器面对的任务应该尽可能丰富以包含所有可能的指令模式,包括随机阶跃指令、低频正弦指令、高频正弦指令;In order to comprehensively improve the performance of the controller, the tasks faced by the reinforcement learning controller should be as rich as possible to include all possible command patterns, including random step commands, low-frequency sine commands, and high-frequency sine commands; 为了减小摆动机构角度α与指令角度αref的误差,用三环每一环的参考值和反馈至之间差值的平方并取相反数作为每时间步的奖励值,同时乘以固定系数以平衡三环之间的权重:In order to reduce the error between the swing mechanism angle α and the command angle α ref , the square of the difference between the reference value and the feedback value of each of the three rings is used and the inverse is taken as the reward value for each time step, and multiplied by a fixed coefficient to balance the weights among the three rings: 对状态使用优化策略函数使得智能体能够输出调优值为此,需要建立两个Q网络对策略网络的输出做价值评估,并用评估值对actor网络做梯度优化。Status Using the optimization strategy function Enables the agent to output tuning values To this end, it is necessary to establish two Q networks to evaluate the value of the output of the policy network and use the evaluation value to perform gradient optimization on the actor network. 4.根据权利要求1或2所述的半闭环场景下永磁同步电机伺服系统智能控制方法,其特征在于,步骤S1中,在FOC控制下,电机系统将来自控制器的d,q两轴的电压指令值作为输入,输入向负载机构的动力矩,表现为电机转子转动角度;具体操作如下:4. The intelligent control method of a permanent magnet synchronous motor servo system in a semi-closed loop scenario according to claim 1 or 2 is characterized in that, in step S1, under FOC control, the motor system takes the voltage command values of the d and q axes from the controller as input, and inputs the dynamic torque to the load mechanism, which is expressed as the rotation angle of the motor rotor; the specific operations are as follows: S11、获取电机的转子位置和速度信息用于静止坐标系下的相变量和d-q坐标的相对旋转变量之间的变换计算;S11, obtaining the rotor position and speed information of the motor for use in the transformation calculation between the phase variable in the stationary coordinate system and the relative rotation variable of the d-q coordinate; 在d-q坐标系下,电机方程可描述为:In the d-q coordinate system, the motor equation can be described as: 其中:Rs为定子电阻;vd,vq分别为d轴,q轴电压;id,iq分别为d轴,q轴电流;Ld,Lq分别为d轴,q轴电感;λm为永磁体d轴磁通;Te,TL分别为输出转矩和负载转矩;B为轴承粘滞系数;J为电机和负载的总转动惯量;ωm,ωr分别为转子机械角速度和转子电磁角速度,p为永磁体对数,满足ωr=p×ωm;输入的d-q两轴的电压,根据外部的负载力矩TL,输出电机角度θm和角速度ωr,并通过电流检测器检测得到d-q轴电流;Wherein: Rs is the stator resistance; vd , vq are the d-axis and q-axis voltages respectively; id , iq are the d-axis and q-axis currents respectively; Ld , Lq are the d-axis and q-axis inductances respectively; λm is the permanent magnet d-axis flux; Te , TL are the output torque and load torque respectively; B is the bearing viscosity coefficient; J is the total moment of inertia of the motor and the load; ωm , ωr are the rotor mechanical angular velocity and the rotor electromagnetic angular velocity respectively, p is the permanent magnet logarithm, satisfying ωr = p× ωm ; the input dq-axis voltages, according to the external load torque TL , output the motor angle θm and angular velocity ωr , and the dq-axis current is detected by the current detector; S12、根据实际需求,将电机种类确定为表贴式PMSM.同时考虑到在稳态下运行的永磁同步电机必须保持在某些运行限制下,所述方法要求转子转速和定子电流应保持在阈值范围内,满足:S12. According to actual needs, the motor type is determined to be a surface-mounted PMSM. Considering that the permanent magnet synchronous motor operating in a steady state must be kept under certain operating limits, the method requires that the rotor speed and stator current should be kept within a threshold range, satisfying: Lq=Ld LqLd r|≤ωlimit r |≤ω limit 其中,ωlimit为转子转速最大值;ilimit为定子电流最大值。Among them, ω limit is the maximum value of the rotor speed; i limit is the maximum value of the stator current. 5.根据权利要求1或2所述的半闭环场景下永磁同步电机伺服系统智能控制方法,其特征在于,步骤S2具体包括:5. The intelligent control method for a permanent magnet synchronous motor servo system in a semi-closed loop scenario according to claim 1 or 2, characterized in that step S2 specifically comprises: 对于具有非线性因素的高阶负载机构,该高阶负载机构将电机转轴力矩使用齿轮传动至丝杆,通过丝杆行程推拉三角形连杆机构形成力臂,推动摆动机构进行角度偏转,在建模的过程中为了贴合实际负载的力学性质,考虑了丝杆的弹性运动,摆动机构的运动方程以及运动过程中的非线性的库伦摩擦力;For high-order load mechanisms with nonlinear factors, the motor shaft torque is transmitted to the screw by gears, and the triangular connecting rod mechanism is pushed and pulled by the screw stroke to form a force arm, which pushes the swing mechanism to deflect the angle. In the process of modeling, in order to fit the mechanical properties of the actual load, the elastic motion of the screw, the motion equation of the swing mechanism and the nonlinear Coulomb friction during the motion are considered; 该负载的动力学方程可描述为:The dynamic equation of this load can be described as: 其中,θm为转子机械角度,gr为齿轮减速系数,nr为丝杆减速系数,L为丝杆缩进/伸出量,Ks为转动组合刚度,F为丝杆受转矩推动的作用力;Me为丝杆质量,Be为弹性阻尼,ΔL为丝杆压缩行程;TL为电机转轴负载力矩,effi为传动效率;M为摆动力矩,Kp为进动组合刚度,r为力臂长度;α为摆动机构摆动角度,Jb为摆动惯量,Bb为摆动阻尼,Kdelta为位置阻力矩,Mf为摩擦力矩,将其建模为库伦摩擦,其表达式如下:Among them, θm is the rotor mechanical angle, gr is the gear reduction coefficient, nr is the screw reduction coefficient, L is the screw retraction/extension amount, Ks is the rotation combined stiffness, F is the force of the screw driven by the torque; Me is the screw mass, Be is the elastic damping, ΔL is the screw compression stroke; TL is the motor shaft load torque, effi is the transmission efficiency; M is the swing torque, Kp is the precession combined stiffness, r is the arm length; α is the swing angle of the swing mechanism, Jb is the swing inertia, Bb is the swing damping, Kdelta is the position resistance torque, Mf is the friction torque, which is modeled as Coulomb friction, and its expression is as follows: 上述负载模型考虑了丝杆的弹性形变,若仅从三角形连杆机构的几何关系出发,可以将丝杆边、摆动机构和固定支点之间形成的等效三角形结构;不考虑丝杆的弹性形变,丝杆的缩进/伸出量L与偏转角度α的关系近似表述为:The above load model takes into account the elastic deformation of the screw. If we only consider the geometric relationship of the triangular linkage mechanism, we can form an equivalent triangular structure between the screw side, the swing mechanism and the fixed fulcrum; without considering the elastic deformation of the screw, the relationship between the retraction/extension amount L of the screw and the deflection angle α can be approximately expressed as: 其中a,b分别为三角形连杆机构中,摆动角的两个相邻边OA,OB长;α0,L0分别为负载偏转角度为0时摆心角∠AOB和边AB长度;并且,由于伺服系统与连杆机构位于同一直线上,伺服系统的实际转动角度θloc与∠AOB的角度α相等。Where a, b are the lengths of two adjacent sides OA, OB of the swing angle in the triangular linkage mechanism; α 0 , L 0 are the swing center angle ∠AOB and the length of side AB when the load deflection angle is 0; and, since the servo system and the linkage mechanism are located on the same straight line, the actual rotation angle θ loc of the servo system is equal to the angle α of ∠AOB. 6.根据权利要求1或2所述的半闭环场景下永磁同步电机伺服系统智能控制方法,其特征在于,步骤S3针对FOC控制框架下的半闭环电动伺服系统,采用三环控制对最终的位置角度做基础控制,保证系统运行的稳定性和鲁棒性,其中:6. The intelligent control method of a permanent magnet synchronous motor servo system in a semi-closed loop scenario according to claim 1 or 2 is characterized in that, in step S3, a three-loop control is used to perform basic control on the final position angle for the semi-closed loop electric servo system under the FOC control framework to ensure the stability and robustness of the system operation, wherein: S31、控制转矩的电流环构建为解耦电流控制器,将本来相互耦合的d、q两轴电压项分解为线性项和非线性项:S31. The current loop for controlling the torque is constructed as a decoupled current controller, which decomposes the d and q axis voltage terms that are originally coupled to each other into linear terms and nonlinear terms: 其中,vd1和vq1可以由线性的PID电流控制器进行控制,而非线性项vd0和vq0可以通过编码器的转子速度值计算得到:Among them, v d1 and v q1 can be controlled by a linear PID current controller, while the nonlinear terms v d0 and v q0 can be calculated from the rotor speed value of the encoder: S32、PID控制器中输入误差e(t)和输出控制值u(t)两者的联系:S32. The relationship between the input error e(t) and the output control value u(t) in the PID controller: S33、ADRC控制器中,差分控制器输入θ(t)与输出θv1(t),θv2(t)的关系为:In S33, ADRC controller, the relationship between the differential controller input θ(t) and the output θ v1 (t), θ v2 (t) is: 其中,表示θv1(t)对时间t的导数;in, represents the derivative of θ v1 (t) with respect to time t; S34、ADRC控制器中,状态扩张观测器θ(t)与输出θz1(t),θz2(t),θz3(t)的关系为:In S34, ADRC controller, the relationship between the state extended observer θ(t) and the output θz1 (t), θz2 (t), θz3 (t) is: 其中,fal(e(t),α,δ)为误差滤波器,其输入e(t),α,δ与输出的关系为:Among them, fal(e(t),α,δ) is the error filter, and the relationship between its input e(t),α,δ and output is: S35、ADRC控制器中,参考信号生成器的输入来自差分控制器的输出θv1(t),θv2(t)和状态扩张观测器的输出θz1(t),θz2(t),θz3(t),具体关系为:S35. In the ADRC controller, the input of the reference signal generator comes from the output of the differential controller θ v1 (t), θ v2 (t) and the output of the state expansion observer θ z1 (t), θ z2 (t), θ z3 (t), and the specific relationship is: 其中,sat(x,xmax)表示饱和函数,当输入x大于xmax时输出为xmax,当x小于-xmax时输出为-xmax,否则输出为x;Among them, sat(x,x max ) represents a saturation function. When the input x is greater than x max , the output is x max . When x is less than -x max , the output is -x max . Otherwise, the output is x. S36、将反馈的电机转速ω(t)和转速指令ω*(t)做差作为转速PI控制器的输入,将控制器其输出作为q轴电流的参考值 S36: The difference between the feedback motor speed ω(t) and the speed command ω * (t) is used as the input of the speed PI controller, and the output of the controller is used as the reference value of the q-axis current. S37、在转速控制器的基础上,构建位置控制环,具体是基于负载的三角型连杆结构,将指令偏转角度αref近似地转换为丝杆进动长度输入ADRC的差分控制器,并将其与转子角度反馈信号θ(t)换算的进动长度输入ADRC的状态扩张观测器,将ADRC参考信号生成器输出的参考转速作为位置环的输出。S37. On the basis of the speed controller, a position control loop is constructed, specifically a load-based triangular connecting rod structure, which converts the command deflection angle α ref approximately into the screw precession length Input the differential controller of ADRC and convert it to the precession length of the rotor angle feedback signal θ(t) The state expansion observer of ADRC is input, and the reference speed output by the ADRC reference signal generator is used as the output of the position loop. 7.根据权利要求1所述的半闭环场景下永磁同步电机伺服系统智能控制方法,其特征在于,对于缓解位置反馈误差导致的控制精度不足问题具体步骤包括:7. The intelligent control method for a permanent magnet synchronous motor servo system in a semi-closed loop scenario according to claim 1 is characterized in that the specific steps for alleviating the problem of insufficient control accuracy caused by position feedback error include: S41、智能体观察的值确定为ADRC的差分控制器输出θv1(t),θv2(t),参考信号生成器的输出e1(t),e2(t),指令偏转角度αref的折算丝杆进动长度与反馈的误差eθ(t),参考转速ω*(t)和参考电流以及对应的转速反馈ω(t)和电流反馈iq(t),将其值进行,将智能体的状态空间s确定为:S41, the value observed by the agent is determined as the differential controller output θ v1 (t), θ v2 (t) of ADRC, the output of the reference signal generator e 1 (t), e 2 (t), and the reduced lead screw precession length of the command deflection angle α ref With feedback The error e θ (t), the reference speed ω * (t) and the reference current As well as the corresponding speed feedback ω(t) and current feedback i q (t), their values are transformed to determine the state space s of the intelligent agent as: S42、将智能体的输出连续动作a作为PMSM的q轴电压输入,使得PMSM在给定电压的驱使下产生电磁转矩,驱动伺服系统响应给定指令;S42, using the output continuous action a of the intelligent body as the q-axis voltage input of the PMSM, so that the PMSM generates an electromagnetic torque under the drive of a given voltage, and drives the servo system to respond to a given instruction; S43、为提高控制器的综合性能随机阶跃指令、低频正弦指令、高频正弦指令混合训练,每一个Episode以同等概率选择一个任务作为指令并持续10秒;S43, in order to improve the comprehensive performance of the controller, random step instructions, low-frequency sine instructions, and high-frequency sine instructions are mixed for training. Each Episode selects a task as an instruction with equal probability and lasts for 10 seconds; S44、为减小摆动机构角度α与指令角度αref的误差,用三环控制中,每一环的指令和参考值之间差值的平方取相反数作为每时间步的奖励值:S44. To reduce the error between the swing mechanism angle α and the command angle α ref , the square inverse of the difference between the command and the reference value of each ring in the three-ring control is taken as the reward value for each time step: S45、考虑到在嵌入式芯片上的部署需求,将actor设定为2层隐藏层的MLP;将critic网络设定为4层隐藏层的MLP;在训练时,将智能体采样时间设置为0.01s。S45. Considering the deployment requirements on embedded chips, the actor is set to an MLP with 2 hidden layers; the critic network is set to an MLP with 4 hidden layers; during training, the agent sampling time is set to 0.01s.
CN202410774423.2A 2024-06-17 2024-06-17 Intelligent control method of permanent magnet synchronous motor servo system in semi-closed loop scenario Active CN118801756B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410774423.2A CN118801756B (en) 2024-06-17 2024-06-17 Intelligent control method of permanent magnet synchronous motor servo system in semi-closed loop scenario

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410774423.2A CN118801756B (en) 2024-06-17 2024-06-17 Intelligent control method of permanent magnet synchronous motor servo system in semi-closed loop scenario

Publications (2)

Publication Number Publication Date
CN118801756A true CN118801756A (en) 2024-10-18
CN118801756B CN118801756B (en) 2025-03-28

Family

ID=93028918

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410774423.2A Active CN118801756B (en) 2024-06-17 2024-06-17 Intelligent control method of permanent magnet synchronous motor servo system in semi-closed loop scenario

Country Status (1)

Country Link
CN (1) CN118801756B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119362946A (en) * 2024-12-23 2025-01-24 浙江嘉宏运动器材有限公司 A permanent magnet synchronous motor speed stabilization control method based on reinforcement learning

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0968085A (en) * 1995-09-04 1997-03-11 Unisia Jecs Corp Engine idle speed controller
US6841969B1 (en) * 2003-09-24 2005-01-11 General Motors Corporation Flux observer in a sensorless controller for permanent magnet motors
US20170111000A1 (en) * 2015-10-19 2017-04-20 Fanuc Corporation Machine learning apparatus and method for learning correction value in motor current control, correction value computation apparatus including machine learning apparatus and motor driving apparatus
CN111342720A (en) * 2020-03-06 2020-06-26 南京理工大学 Adaptive Continuous Sliding Mode Control Method for Permanent Magnet Synchronous Motor Based on Load Torque Observation
CN115001334A (en) * 2022-07-19 2022-09-02 北京理工华创电动车技术有限公司 Rotation speed control method and system of position-sensor-free ultra-high-speed permanent magnet synchronous motor based on active disturbance rejection
CN117335700A (en) * 2023-09-14 2024-01-02 南京航空航天大学 Dynamic optimization method of electric servo position feedback based on deep reinforcement learning in semi-closed loop scenario
CN118034129A (en) * 2024-02-26 2024-05-14 南京航空航天大学 A servo motor control parameter optimization method based on evolutionary reinforcement learning

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0968085A (en) * 1995-09-04 1997-03-11 Unisia Jecs Corp Engine idle speed controller
US6841969B1 (en) * 2003-09-24 2005-01-11 General Motors Corporation Flux observer in a sensorless controller for permanent magnet motors
US20170111000A1 (en) * 2015-10-19 2017-04-20 Fanuc Corporation Machine learning apparatus and method for learning correction value in motor current control, correction value computation apparatus including machine learning apparatus and motor driving apparatus
CN111342720A (en) * 2020-03-06 2020-06-26 南京理工大学 Adaptive Continuous Sliding Mode Control Method for Permanent Magnet Synchronous Motor Based on Load Torque Observation
CN115001334A (en) * 2022-07-19 2022-09-02 北京理工华创电动车技术有限公司 Rotation speed control method and system of position-sensor-free ultra-high-speed permanent magnet synchronous motor based on active disturbance rejection
CN117335700A (en) * 2023-09-14 2024-01-02 南京航空航天大学 Dynamic optimization method of electric servo position feedback based on deep reinforcement learning in semi-closed loop scenario
CN118034129A (en) * 2024-02-26 2024-05-14 南京航空航天大学 A servo motor control parameter optimization method based on evolutionary reinforcement learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
程国卿等: "交流伺服电机速度与位置环的一体化鲁棒控制", 计算技术与自动化, vol. 32, no. 4, 15 December 2013 (2013-12-15), pages 23 - 27 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119362946A (en) * 2024-12-23 2025-01-24 浙江嘉宏运动器材有限公司 A permanent magnet synchronous motor speed stabilization control method based on reinforcement learning

Also Published As

Publication number Publication date
CN118801756B (en) 2025-03-28

Similar Documents

Publication Publication Date Title
CN112701968B (en) A Robust Performance Improvement Method for Model Predictive Control of Permanent Magnet Synchronous Motors
CN107070341B (en) Torque Ripple Suppression Method for Permanent Magnet Synchronous Motor Based on Robust Iterative Learning Control
CN104242769B (en) Permanent magnet synchronous motor speed composite control method based on continuous terminal slip form technology
CN105827168B (en) Method for controlling permanent magnet synchronous motor and system based on sliding formwork observation
CN106655938B (en) Control system for permanent-magnet synchronous motor and control method based on High-Order Sliding Mode method
CN113364377B (en) A method of automatic disturbance rejection position servo control of permanent magnet synchronous motor
Zhao et al. Back EMF-based dynamic position estimation in the whole speed range for precision sensorless control of PMLSM
CN117335700A (en) Dynamic optimization method of electric servo position feedback based on deep reinforcement learning in semi-closed loop scenario
CN110995102A (en) Direct torque control method and system for permanent magnet synchronous motor
CN114710080A (en) Permanent magnet synchronous motor sliding mode control method based on improved variable gain approximation law
CN113726240B (en) A permanent magnet synchronous motor control method and system based on second-order active disturbance rejection control
CN113067520B (en) Sensorless Response Adaptive Motor Control Method Based on Optimization Residuals
CN118034129A (en) A servo motor control parameter optimization method based on evolutionary reinforcement learning
CN118801756B (en) Intelligent control method of permanent magnet synchronous motor servo system in semi-closed loop scenario
CN113708684B (en) Permanent magnet synchronous motor control method and device based on extended potential observer
CN117614333A (en) A permanent magnet synchronous motor position control method and system based on sliding mode control
CN118199453A (en) Sensorless torque ripple suppression method for stepping motor based on extended Kalman filter
CN117895851A (en) A full-speed control method for surface-mounted permanent magnet synchronous motor
CN112436774A (en) Control method of asynchronous motor driven by non-speed sensor
Badini et al. MRAS-based speed and parameter estimation for a vector-controlled PMSM drive
CN119483031B (en) Decoupling method of torque system and suspension system of single-winding magnetic levitation permanent magnet synchronous motor
Gao et al. Sensorless Control of PMSM via ADRC and SMC with Super-Twisting Observer
Zhao et al. Disturbance rejection enhancement of vector controlled PMSM using second-order nonlinear ADRC
CN114564053B (en) Control method of control moment gyro frame system based on induction synchronizer error compensation
Liu et al. A sensorless control method for PMSM Based on FSMO optimized by QIO

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant