CN118801756A

CN118801756A - Intelligent control method of permanent magnet synchronous motor servo system in semi-closed loop scenario

Info

Publication number: CN118801756A
Application number: CN202410774423.2A
Authority: CN
Inventors: 朱海峰; 易畅言; 郑好; 吴昊; 祝可可; 戴兴安
Original assignee: Nanjing University of Aeronautics and Astronautics; Nanjing Chenguang Group Co Ltd
Current assignee: Nanjing University of Aeronautics and Astronautics; Nanjing Chenguang Group Co Ltd
Priority date: 2024-06-17
Filing date: 2024-06-17
Publication date: 2024-10-18
Anticipated expiration: 2044-06-17
Also published as: CN118801756B

Abstract

The invention discloses a control method of a permanent magnet synchronous motor driving servo system under a semi-closed loop scene, which aims at a semi-closed loop control scene which can only measure motor feedback signals such as rotor angle of a permanent magnet synchronous motor and cannot measure the actual position of a load mechanism, and provides a high-precision signal tracking method by adopting an edge intelligent control algorithm driven by deep reinforcement learning under a FOC (field oriented control) control framework of the permanent magnet synchronous motor, wherein a near-end strategy optimization algorithm is used for training a tuning strategy network, a traditional three-loop control strategy is fused, so that the three-loop instruction and feedback quantity of the permanent magnet synchronous motor are observed, and the control voltage of the motor is output, so that the control precision of the traditional three-loop control in the face of a high-order nonlinear load model is improved, and the safety in the running process is ensured.

Description

Intelligent control method for permanent magnet synchronous motor servo system in semi-closed loop scene

Technical Field

The invention belongs to a control technology of a permanent magnet synchronous motor, relates to a computer control system technology, in particular to a position control algorithm of the permanent magnet synchronous motor in an industrial Internet, and particularly relates to an intelligent control method of a servo system of the permanent magnet synchronous motor in a semi-closed loop scene.

Background

The permanent magnet synchronous motor has the advantages of compact structure, high efficiency and power density, good speed regulation performance and the like, and is widely applied to the fields of industrial Internet, electric traffic, industrial robots, aerospace and the like. A common magnetic field directional control (FOC) strategy in a permanent magnet synchronous motor position servo system is often based on a PID three-ring controller to control the rotation angle of the motor. While the low complexity and stability of algorithms have found wide application in industrial practice, PID tricyclic controllers also face a number of challenges and disadvantages when faced with some complex and special driving scenarios:

Firstly, when a servo motor drives a plurality of high-order nonlinear load mechanisms, parameters of a PID controller are difficult to determine through a model and an index, so that trial-and-error adjustment is needed to be carried out depending on manual experience, and good performance is difficult to obtain in terms of dynamic response. Secondly, considering the motor system limiting the rotational speed, current and inverter output voltage threshold limits, when the controller receives highly dynamic position, speed, current commands, the limiting of the commands will further degrade the performance of the controller. Most challenging is that the special equipment facing the invention cannot be provided with an external sensor or can not reliably measure the actual position of the load mechanism due to the special working environment, and the feedback obtained by the controller is only the motor rotating shaft angle obtained by detecting the magnetic encoder, namely a semi-closed loop control system. The semi-closed loop scenario may cause a significant performance degradation in conventional PID controllers that control based on feedback errors, as compared to closed loop control that monitors the final actuator. For example, in the invention, the motor rotating shaft drives the screw rod to feed or shrink through the gear, so that the swing mechanism connected with the triangular connecting rod structure deflects a certain angle. The swing angle of the swing mechanism cannot be measured by a sensor, and the rotation angle of the motor and the swing angle of the swing mechanism are influenced by the elastic motion of the screw rod, so that a high-order nonlinear dynamics equation is involved, and the function is difficult to simply express. Thus, the control algorithm cannot get the actual angle feedback to form a closed loop control.

The prior art also comprises an electric servo position feedback dynamic tuning method (publication number is CN 117335700A) based on deep reinforcement learning in a semi-closed loop scene, the technology is used for determining the state space and action setting of reinforcement learning agents on the basis of PID three-loop control, then the tracking precision and response speed of a load swing angle to an instruction target angle under the control of the semi-closed loop are improved to serve as targets, simulation software is used for modeling a system model, and a tuning strategy network for motor position feedback is obtained by using the deep reinforcement learning method so as to output an optimal tuning value. Taking this scheme as an example, it can be known that: the three-loop control scheme usually adds clipping at the output position of each loop to ensure control safety in extreme states, however this also limits the effect of the agent, since the agent cannot optimize the original three-loop control scheme by tuning once a certain loop of the three loops reaches clipping.

Disclosure of Invention

The invention aims to: aiming at the defects and problems of the PID three-loop control method in the case of the semi-closed loop of the permanent magnet synchronous motor in the industrial Internet and facing the high-order nonlinear load, the invention provides the control method of the industrial Internet servo system driven by deep reinforcement learning in the semi-closed loop scene.

The technical scheme is as follows: the intelligent control method of the permanent magnet synchronous motor servo system in the semi-closed loop scene comprises the steps of constructing a permanent magnet synchronous motor operation model based on a FOC control frame, a load and a mathematical model of a transmission mechanism of the load, then determining a state space and action setting by using a reinforcement learning intelligent body based on a three-loop structure, aiming at improving the tracking precision and response speed of a load swing angle to an instruction target angle in the semi-closed loop scene, modeling a system model by using simulation software, and obtaining a tuning strategy network for motor position feedback by using a deep reinforcement learning method so as to output optimal control voltage;

the method further comprises the steps of pre-training a model through empirical data acquired by a sensor in a test process, and then determining a strategy gradient algorithm based on double delay depth to optimize a strategy network, so that the strategy network can take a real-time observable feedback value or a computable value of a motor as a state quantity, and calculate an optimal PMSM control voltage, and a permanent magnet synchronous motor drives a transmission device to drive a servo system to respond to an instruction;

In the process of determining the state space and the action setting by using the reinforcement learning agent, the anti-interference capability of a control system is improved by using an active disturbance rejection algorithm at a position loop, PI control is utilized at a speed loop, a strategy network is trained by using a reinforcement learning algorithm at a current loop, and voltage applied to a PMSM is decided to be output by observing the feedback position input by the position loop and the output reference rotating speed, the feedback speed input by the speed loop and the output reference current and the feedback current, so that the system performance is improved, and the PPO reinforcement learning algorithm is used for optimizing the agent;

according to the method, the PPO reinforcement learning algorithm with continuous use states and action spaces is used for optimizing an intelligent body, so that the intelligent body observes the gap between the current PMSM state and a given reference value, predicts the deviation between the current PMSM state and the actual screw rod precession length theta _loc (t) under the condition that the current PMSM state and the given reference value only depend on the approximate screw rod precession length as input, and outputs the voltage for controlling the PMSM so as to optimize the error between a system and a given instruction in the motor control process and relieve the problem of insufficient control precision caused by position feedback errors.

Further, the method comprises the implementation steps of:

S1, constructing a permanent magnet synchronous motor operation model based on an FOC control framework, wherein an FOC strategy of the permanent magnet synchronous motor decomposes the phase change quantity of the motor into a magnetic field component and a torque component, and independently controls the magnetic field component and the torque component of the motor to realize accurate control of the magnetic field and the torque of the motor;

Under FOC control, the motor system takes the voltage command values of the d and q axes from the controller as input, and inputs the power moment to the load mechanism, which is expressed as the rotation angle of the motor rotor;

s2, constructing a mathematical model of the load and a transmission mechanism thereof, wherein the motor load model takes into consideration the dynamic equations of the transmission mechanism, such as elastic deformation and a swinging mechanism, and also takes into consideration nonlinear factors including coulomb friction and moment transmission in a triangular connecting rod mechanism;

S3, considering the PMSM and the high-order nonlinear load model in the modeling, and constructing an edge intelligent control scheme driven by deep reinforcement learning on the FOC control frame;

In the control scheme, an ADRC controller receives an external input reference value of a controlled object, and outputs a reference value controlled by a position and the change rate of the reference value through a differential controller; meanwhile, the ADRC controller receives feedback of the controlled object measuring element, and because certain noise and oscillation in the time domain exist in the feedback during measurement, the accurate output of the follow-up deep reinforcement learning intelligent body can be influenced, so that the ADRC controller firstly inputs a feedback signal into the state expansion observer so as to predict the noise contained in the feedback signal and removes the noise by using a corresponding feedforward mechanism; finally, the ADRC controller performs difference on the outputs of the differential controller and the state expansion observer to output a reference value of the next ring;

the PI controller makes a difference between the reference value of ADRC and the feedback value of the measuring element, and linearly superimposes the difference value and the integral value in time as an output control quantity to the next executing mechanism;

S4, constructing a control scheme of a permanent magnet synchronous motor driven by deep reinforcement learning, training by using PPO, and driving a transmission device by using control voltage of an intelligent body output PMSM to drive a servo system to move so as to minimize a reference instruction and long-term errors of the servo system;

The observed value of the intelligent agent is determined as the output theta _v1(t),θ_v2 (t) of the differential controller of ADRC, the output e ₁(t),e₂ (t) of the reference signal generator and the converted screw rod precession length of the command deflection angle alpha ^* (t) And feedback ofError e _θ (t), reference rotational speed ω ^* (t) and reference currentAnd corresponding rotational speed feedback ω (t) and current feedback i _q (t).

Considering that although the ADRC controller is able to predict the noise present in the feedback signal and use feedforward to compensate for the effects of the noise, the aforementioned agent observations still have varying degrees of oscillation in the time domain, mainly from the fast-varying rotational speed ω and current i _q, which can lead to agent output oscillations; in addition, even if the input itself does not oscillate, the reinforcement learning agent's own output also oscillates in the time domain, since the agent aims to minimize the bonus function and ignores the potential safety hazards that may result when the control voltage is applied as a PMSM, and therefore, it is necessary to add a kalman filter to the feedback amounts ω and i _q, while adding a low pass filter to the agent output to smooth the control voltage of the PMSM.

Based on the scheme, the method comprises the following optimization process:

In order to better optimize a training algorithm, carrying out dimensionalization processing on an input value, carrying out certain amplitude limiting, adding a Kalman filter to two feedback quantities omega and i _q, and adding a low-pass filter to the output of an intelligent agent;

In order to comprehensively improve the performance of the controller, the task faced by the reinforcement learning controller should be as rich as possible to contain all possible instruction modes, including random step instructions, low-frequency sine instructions and high-frequency sine instructions;

To reduce the error between the swing angle α and the command angle α _ref, the square of the difference between the reference value and the feedback value of each of the three rings is used and the opposite number is taken as the reward value of each time step, and the fixed coefficient is multiplied to balance the weights among the three rings:

To the state Using an optimization strategy functionEnabling the agent to output a tuning valueFor this purpose, two Q networks need to be established to evaluate the output of the policy network and to use the evaluation value to perform gradient optimization on actor networks.

Further, in step S1, under the control of the FOC, the motor system takes as input the voltage command values of the d and q axes from the controller, and inputs the torque to the load mechanism, which is expressed as the rotation angle of the motor rotor; the specific operation is as follows:

s11, acquiring rotor position and speed information of a motor for transformation calculation between a phase variable and a relative rotation variable of d-q coordinates in a static coordinate system;

in the d-q coordinate system, the motor equation can be described as:

Wherein: r _s is the stator resistance; v _d,v_q is d-axis and q-axis voltage, respectively; i _d,i_q is d-axis and q-axis current, respectively; l _d,L_q is d-axis and q-axis inductance respectively; lambda _m is the d-axis magnetic flux of the permanent magnet; t _e,T_L is output torque and load torque respectively; b is the viscosity coefficient of the bearing; j is the total rotational inertia of the motor and the load; omega _m,ω_r is the mechanical angular velocity of the rotor and the electromagnetic angular velocity of the rotor, and p is the number of pairs of permanent magnets, satisfying omega _r＝p×ω_m; the input d-q two-axis voltage outputs the motor angle theta _m and the angular speed omega _r according to the external load moment T _L, and the d-q axis current is obtained through detection of a current detector;

S12, according to actual requirements, the motor type is determined to be a surface-mounted PMSM, and meanwhile, considering that a permanent magnet synchronous motor running in a steady state must be kept under certain running limits, the method requires that the rotor rotating speed and the stator current should be kept within a threshold range, and meets the following requirements:

L_q＝L_d

|ω_r|≤ω_limit

Wherein ω _limit is the maximum value of the rotor speed; i _limit is the stator current maximum.

Further, step S2 specifically includes:

for a high-order load mechanism with nonlinear factors, the high-order load mechanism transmits the torque of a motor rotating shaft to a screw rod by using a gear, and pushes and pulls a triangular connecting rod mechanism to form a force arm through the stroke of the screw rod to push and pull a swinging mechanism to perform angle deflection, and in order to attach the mechanical property of an actual load in the modeling process, the elastic movement of the screw rod is considered, and the movement equation of the swinging mechanism and the nonlinear coulomb friction in the movement process are considered;

the kinetic equation for this load can be described as:

wherein θ _m is the rotor mechanical angle, g _r is the gear reduction coefficient, n _r is the screw rod reduction coefficient, L is the screw rod retraction/extension amount, K _s is the rotation combined stiffness, and F is the acting force of the screw rod under the pushing of torque; m _e is the mass of the screw rod, B _e is elastic damping, and DeltaL is the compression stroke of the screw rod; t _L is motor shaft load moment, effi is transmission efficiency; m is swing moment, K _p is precession combined rigidity, and r is arm length; alpha is the swing angle of the swing mechanism, J _b is swing inertia, B _b is swing damping, K _delta is position resistance moment, M _f is friction moment, and the model is coulomb friction, and the expression is as follows:

The load model considers the elastic deformation of the screw rod, and if the screw rod is only started from the geometric relationship of the triangular connecting rod mechanism, an equivalent triangular structure formed among the screw rod edge, the swinging mechanism and the fixed fulcrum can be formed; irrespective of the elastic deformation of the screw, the relationship between the retracting/extending amount L of the screw and the deflection angle α is approximately expressed as:

Wherein a and b are respectively two adjacent sides OA and OB of the swing angle in the triangular connecting rod mechanism; alpha ₀,L₀ is the swing center angle AOB and the edge AB length when the load deflection angle is 0.

Further, step S3 uses three-loop control to perform basic control on the final position angle for the semi-closed loop electric servo system under the FOC control frame, so as to ensure the stability and robustness of the system operation, wherein:

S31, constructing a current loop for controlling torque as a decoupling current controller, and decomposing d and q two-axis voltage terms which are originally coupled with each other into a linear term and a nonlinear term:

Wherein v _d1 and v _q1 can be controlled by a linear PID current controller, and the nonlinear terms v _d0 and v _q0 can be calculated from the rotor speed value of the encoder:

s32, connection of an input error e (t) and an output control value u (t) in the PID controller:

in the ADRC controller, S33, a relationship between the input θ (t) and the output θ _v1(t),θ_v2 (t) of the differential controller is:

Wherein, Representing the derivative of θ _v1 (t) with respect to time t;

In S34, in the ADRC controller, the relationship between the state expansion observer θ (t) and the output θ _z1(t),θ_z2(t),θ_z3 (t) is:

wherein, fal (e (t), alpha, delta) is an error filter, and the relation between the input e (t), alpha, delta and the output is:

In the ADRC controller, S35, the specific relationship between the output θ _v1(t),θ_v2 (t) of the reference signal generator from the differential controller and the output θ _z1(t),θ_z2(t),θ_z3 (t) of the state dilation observer is:

Wherein sat (x, x _max) represents a saturation function, the output is x _max when the input x is greater than x _max, is-x _max when x is less than-x _max, otherwise is x;

S36, taking the difference between the fed-back motor rotation speed omega (t) and the rotation speed command omega ^* (t) as the input of the rotation speed PI controller, and taking the output of the controller as the reference value of the q-axis current

S37, constructing a position control ring based on a rotating speed controller, specifically a triangular connecting rod structure based on a load, and approximately converting the command deflection angle alpha _ref into the screw rod precession lengthA differential controller for inputting ADRC and converting the ADRC with rotor angle feedback signal theta (t) to precession lengthThe state expansion observer of the ADRC is input, and the reference rotating speed output by the ADRC reference signal generator is used as the output of the position loop.

Further, the specific steps of the method for alleviating the problem of insufficient control precision caused by the position feedback error comprise:

S41, determining the observed value of the agent as ADRC (ADRC) and outputting theta _v1(t),θ_v2 (t) by a differential controller, outputting e ₁(t),e₂ (t) by a reference signal generator, and commanding the converted screw rod precession length of the deflection angle alpha _ref And feedback ofError e _θ (t), reference rotational speed ω ^* (t) and reference currentAnd corresponding rotational speed feedback ω (t) and current feedback i _q (t), the values of which are performed, and the state space s of the agent is determined as:

S42, taking the output continuous action a of the intelligent agent as the q-axis voltage input of the PMSM, so that the PMSM generates electromagnetic torque under the drive of given voltage, and driving the servo system to respond to given instructions;

S43, for improving the comprehensive performance of the controller, carrying out random step instruction, low-frequency sinusoidal instruction and high-frequency sinusoidal instruction mixed training, wherein each Episode selects a task as an instruction with equal probability and lasts for 10 seconds;

s44, in order to reduce the error between the swing mechanism angle alpha and the command angle alpha _ref, the square of the difference between the command and the reference value of each loop in the three-loop control is used as the rewarding value of each time step:

S45, setting actor as an MLP of a 2-layer hidden layer in consideration of deployment requirements on an embedded chip; setting critic network as MLP of 4 hidden layers; during training, the agent sampling time was set to 0.01s.

The beneficial effects are that: aiming at the situation that only motor feedback signals such as the rotation angle of a permanent magnet synchronous motor can be measured, but the actual position of a load mechanism cannot be measured, the method provided by the invention considers the high-order nonlinear characteristics of a load model, and provides a method for improving the control precision and response speed of the traditional PID three-loop control in the face of a high-order nonlinear load model by adopting the traditional PID three-loop controller as a basis and using a dual-delay depth determination strategy gradient algorithm to train a tuning strategy network so as to observe the feedback quantity of the permanent magnet synchronous motor and the feedback position tuning value of an output position loop.

Drawings

FIG. 1 is a diagram of the overall control flow model architecture in the present invention;

FIG. 2 is a schematic modeling diagram of a permanent magnet synchronous motor according to the present invention;

FIG. 3 is a diagram of the equivalent triangle formed between the lead screw edge, the swing mechanism and the fulcrum in the present invention;

FIG. 4 is a schematic representation of the load mechanism of the present invention;

FIG. 5 is a block diagram of a decoupled current controller based on PID control in accordance with the invention;

FIG. 6 is a diagram of a speed loop and position loop architecture based on a PID controller in accordance with the invention;

FIG. 7 is a graph of the q-axis current versus step current command controlled by the current controller without consideration of voltage clipping (FIG. 7 (a)) and with use of voltage clipping (FIG. 7 (b)) in the present invention;

FIG. 8 is a schematic diagram showing the phenomenon that the error between the command angle precession distance and the approximate precession distance of the rotating shaft position in the position ring converges (FIG. 8 (a)) and the actual angle of the swinging mechanism and the command angle differ greatly (FIG. 8 (b));

FIG. 9 is a schematic diagram of control logic for an agent as a PMSM input voltage controller according to the present invention, where Uq is the response command of the agent as a PMSM input voltage controller control system;

FIG. 10 is a schematic diagram showing the phenomenon of the oscillation of the output value of the agent (FIG. 10 (a)) and the oscillation of the state quantity of the motor observed by the agent (FIG. 10 (b)) in the present invention;

FIG. 11 is a comparison of the effects of PID scheme, ADRC scheme and reinforcement learning scheme of the present invention at different orders (FIGS. 11 (a) - (d)).

Detailed Description

For a detailed description of the disclosed embodiments of the present invention, the present invention is further described below with reference to the accompanying drawings and detailed description.

Firstly, the key problem to be solved by the method is how to improve the control algorithm to improve the tracking precision of the actual load position to the instruction position under the condition that only motor parameters such as the motor rotation angle and the like can be measured and the actual position of the high-order nonlinear load mechanism cannot be measured when the three-ring position control is performed on the permanent magnet synchronous motor.

The main design idea of the invention is to use data-driven deep reinforcement learning, to use empirical data which can be acquired by a sensor in the test process but cannot be observed in the actual operation process to perform model pre-training, to use a dual-delay depth determination strategy gradient algorithm to optimize a strategy network, so that the strategy network can take a real-time observable feedback value or a computable value of a motor as a state quantity, and to calculate the optimal PMSM control voltage, and to drive a transmission device to drive a servo system to respond to instructions by a permanent magnet synchronous motor. The overall control flow provided by this method is shown in fig. 1.

The construction and training process of the intelligent control method of the permanent magnet synchronous motor servo system in the semi-closed loop scene comprises the following steps:

Step 1: constructing mathematical model under FOC frame of permanent magnet synchronous motor

The rotor in a permanent magnet synchronous motor is composed of a rotor core and permanent magnets arranged around the core, and the magnetic flux (magnetic induction intensity) density distribution generated by the paired magnetic poles in the air gap is similar regardless of the rotor arrangement, and in the physical modeling, it is generally assumed that the magnetic flux density distribution generated by the permanent magnet poles mounted on the rotor surface or embedded in the core in the air gap of the motor is sinusoidal. Therefore, the fundamental wave of the curve of the magnetic density is regarded as an ideal magnetic density distribution. Meanwhile, for the sinusoidal magnetic density signal, the coordinate axis of the sinusoidal magnetic density signal is defined as a magnetic field angle theta _r, the logarithm of a permanent magnet arranged on a rotor core is defined as p, and the relation between the mechanical angle theta _m of rotor rotation and the corresponding magnetic field angle is as follows:

meanwhile, one permanent magnet pole axis (sine wave extreme point) is defined as the d axis. Between the two magnetic poles, the angle of the magnetic field is 90 degrees different from that of the d-axis magnetic field, and the position where the magnetic flux is 0 is q-axis.

The configuration of the rotor results in the permanent magnet synchronous motor being divided into two categories: salient pole machines and non-salient pole machines. Wherein the salient pole motor is a built-in magnet; the non-salient pole motor is a surface-mounted magnet. The distinction between salient pole machines and salient pole machines is of interest because the permeability of permanent magnets is almost the same as free air, while the permeability of the core far exceeds air (ferromagnetic).

The magnetic induction intensity of a certain point in space according to the ampere law is directly proportional to the magnetic permeability of the point. Thus, consider a magnetic field of constant magnetic field strength generated by an energized solenoid: when the rotor rotates, no matter which direction the surface-mounted motor rotates to, the radial length of the iron core through which the magnetic field lines pass is the same, namely the magnetic resistance of the magnetic paths is the same; for a salient pole motor, the rotor has the minimum number of iron cores and the maximum magnetic resistance in a magnetic circuit when rotating to a d axis; and by q-axis, the iron core is the largest, the magnetic resistance is the smallest, and the magnetic air gap is uneven, and the phenomenon is called magnetic saliency.

The stator windings of the motor are essentially energized solenoids with different positions and directions, and the coils are distributed in stator slots at the periphery of a stator core in 120-degree displacement and are named A, B, C-phase windings. In practical circuits, the tail parts of the ABC three-phase windings are commonly connected to form a triangle connecting circuit, in this case:

i_a+i_b+i_c＝0

for the three-phase motor windings, three-phase alternating currents with 120-degree phase difference are respectively applied, and the rotating magnetic field can be spatially synthesized by the time variable of the three-phase sine.

For a rotating rotor, if the rotating magnetic field is guaranteed to be consistent with the rotating speed of the rotor and the magnetic phase is constant, the interaction of the magnetic fields can generate constant torque, namely magnetic torque. For salient pole machines, another type of torque, reluctance torque, is also generated that pushes the rotor to rotate with the load.

And defining an a-b-c reference coordinate system by taking the direction of the square magnetic field generated by the a-b-c three-phase winding as the coordinate axis direction. In this coordinate system, the phase variable in the time domain may be denoted as f _a、f_b、f_c, where f may represent the phase voltage, phase current, and flux linkage. Considering faraday's law of electromagnetic induction and ohm's law, the three-phase voltage can be expressed as:

According to the above discussion of the rotor, for the surface-mounted rotor motor, the self inductance and the mutual inductance magnetic permeability of each stator coil are unchanged, and the relationship between the self inductance coefficient and the mutual inductance coefficient of the rotor and the magnetic angle is arbitrarily configured:

wherein, for a surface mount motor, L ₂ =0. Therefore, consider the flux linkage value of the three-phase winding of abc as the flux linkage of self-inductance and mutual inductance plus leakage of the permanent magnet into the coil:

wherein lambda _m is the maximum flux linkage of the N pole of the permanent magnet to one coil, and the two formulas are substituted to obtain:

the above-described motor model in the stationary reference frame has problems of parameter variation with time, which complicate the control system design. This control complexity due to rotation can be solved by projecting the phase change amount of the model to two models under a rotating reference frame. The d-q coordinate system has two orthogonal axes fixed on the rotor, namely the d axis of the permanent magnet magnetic pole of the rotor and the q axis orthogonal to the d axis.

Consider transforming a motor model from a three-phase stationary a-b-c reference frame to a two-phase rotating frame: it is first necessary to know the angle θ _r of the d-q axis relative to the stationary a-b-c coordinate system, then the three-phase variable is projected onto the d axis to get f _d and onto the q axis to get f _q. Mathematically, the transformation process can be solved mathematically using Park transforms, whose matrix form is as follows:

Wherein the coefficients are To ensure that the transformed amplitude remains equal, again because for the phase variables:

f_a+f_b+f_c＝0

the transformation matrix is thus reversible, adding to the constraint, and the transformation from the d-q reference frame to the a-b-c reference frame can be achieved by:

Where f _o is the 0 component. FOC control converts a three-phase rotating magnetic field into a rotor d-q axis relative rotating change through Park change, and when a three-phase variable a-b-c is an unbalanced sinusoidal signal under the condition of motor starting or sudden load, the d-q axis variable is generally of a time-varying model; when the motor runs in a steady state, the rotating magnetic field created by the phase change quantity and the rotor keep relatively static, and d-q axis variables become some direct current signals. In this case, the variable controlling the d-q axis is equivalent to controlling two equivalent solenoids that are always level or perpendicular to the magnetic axis, and the corresponding motor control algorithm becomes relatively simple.

The Park variation is transformed to the above equation to obtain the voltage equation of the rotor reference system as follows:

Wherein v _d and v _q are the stator voltages of the d-axis and q-axis, respectively; i _d and i _q are stator currents of d-axis and q-axis, respectively; lambda _d and lambda _d are the stator flux linkages of the d-axis and q-axis respectively, their values are:

Wherein L _d and L _q are d-axis and q-axis inductances, respectively, and λ _m is d-axis magnetic flux with opposite poles:

in the d-q coordinate system, the expression of the mutual inductance coefficient becomes constant for each phase which is transformed with rotation. And (3) combining the two modes to obtain a motor current-voltage equation:

Where v _d,v_q is the system input (control quantity), their input determines the current and torque. Under a rotor coordinate system, the instantaneous input power of the permanent magnet synchronous motor during operation is as follows:

Wherein, To compensate for the coefficients multiplied when doing Park transforms. Note that the term R _si_d is a resistive voltage drop term and does not contribute to the final motor output power; The term is the field drop term, which electrical power is stored in the magnetic field and therefore does not contribute to the final motor output power. The actual value of the conversion of electrical energy into mechanical power is therefore:

According to the torque theorem:

M＝P/ω

The electromagnetic torque is:

the permanent magnet synchronous motor is connected with a mechanical load, and the dynamics of the mechanical part of the motor are described by the following formula:

Wherein T _L is load torque, B is motor bearing viscosity coefficient, and J is total rotational inertia of the motor and the load. Finally, the whole permanent magnet synchronous motor can be built into a simulink sub-module to facilitate calling, as shown in fig. 2. The module correspondingly inputs the voltages of the d-q axes, outputs the angle and the angular speed of the motor according to the external load moment T _L, and detects the d-q axis current through a current detector.

Based on the analysis, aiming at an actual motor model, the invention constructs a permanent magnet synchronous motor operation mathematical model based on the FOC control frame and is realized in a Simulink.

Under FOC control, the motor system takes as input the voltage command values of the d and q axes from the controller, and inputs the torque to the load mechanism, which is expressed as the motor rotor rotation angle.

The FOC strategy of the permanent magnet synchronous motor decomposes the phase change quantity of the motor into a magnetic field component and a torque component, and independently controls the magnetic field component and the torque component, so that the accurate control of the magnetic field and the torque of the motor is realized. It requires the acquisition of rotor position and speed information of the motor for the transformation calculation between the phase variables in the stationary coordinate system and the relative rotation variables of the d-q coordinates. In the d-q coordinate system, the motor equation can be described as:

Wherein: r _s is the stator resistance; v _d,v_q is d-axis and q-axis voltage, respectively; i _d,i_q is d-axis and q-axis current, respectively; l _d,L_q is d-axis and q-axis inductance respectively; lambda _m is the d-axis magnetic flux of the permanent magnet; t _e,T_L is output torque and load torque respectively; b is the viscosity coefficient of the bearing; j is the total rotational inertia of the motor and the load; omega _m,ω_r is the mechanical angular velocity of the rotor and the electromagnetic angular velocity of the rotor, and p is the number of pairs of permanent magnets, satisfying omega _r＝p×ω_m. The module responds to the input d-q two-axis voltage, outputs a motor angle theta _m and an angular speed omega _r according to an external load moment T _L, and detects the d-q axis current through a current detector.

The motor type is determined as a surface-mounted pmsm, according to actual requirements, while taking into account that permanent magnet synchronous motors operating in steady state must be kept under certain operating limits, the invention requires that the rotor speed and stator current should be kept within threshold values, i.e. that:

L_q＝L_d

|ω_r|≤ω_limit

Step 2: and constructing a mathematical model of the load and the transmission mechanism of the load.

In combination with the actual demand, a high-order load mechanism with nonlinear factors is determined. The mechanism transmits the torque of the motor rotating shaft to the screw rod by using a gear, and the triangular connecting rod mechanism is pushed and pulled through the stroke of the screw rod to form a force arm to push the swinging mechanism to perform angle deflection. In order to fit the mechanical property of the actual load in the modeling process, the elastic movement of the screw rod, the movement equation of the swinging mechanism and the nonlinear coulomb friction in the movement process are considered. The kinetic equation for this load can be described as:

The load model considers the elastic deformation of the screw rod, and if the geometric relationship of the triangular connecting rod mechanism is only considered, an equivalent triangular structure formed among the screw rod side, the swinging mechanism and the fixed pivot point can be represented by the figure 3. The relationship between the retracting/extending amount L of the screw and the deflection angle alpha is approximately expressed as

The integral load is modeled as a high-order nonlinear system, and a complex differential equation relation is arranged between the load swing angle and the rotation shaft angle of the motor observability, so that the traditional error-based closed-loop controller is difficult to obtain good precision and response speed performance in the scene, and an optimization space is reserved for the data-based artificial intelligence algorithm.

And constructing an equivalent load module in Matlab/Simulink to realize the equation set, as shown in figure 4. The module takes motor rotor angle θ _m as input, and outputs feedback to motor load torque T _L and control quantity α.

Step3: building basic three-ring controller and scheme

In the invention, the PID controller makes a difference between an external input reference value of a controlled object and a feedback value of a measuring element, and linearly superimposes the difference value, an integral value in time and the differential value as an output control quantity to a next executing mechanism. The relationship between the input error e (t) and the output control value u (t) is:

wherein K _p,K_i,K_d is the proportionality coefficient of three phases, and can be manually adjusted, or can be automatically adjusted by means of parameter setting or other optimization algorithms.

In the FOC framework, vector control is a method of controlling torque by controlling i _d and i _q currents. Thus, vector control is related to the innermost control in the motor drive system, and subsequent speed and position control should be performed on the basis of current control. The surface-mounted permanent magnet synchronous motor considered by the invention comprises:

The electromagnetic torque is a linear function of i _q, i _d has no influence on the torque, and any d-axis current work can cause waste of input power (mainly dissipated in resistance and magnetic field). Thus, control i _q controls torque while maintaining i _d =0, maximum torque to current ratio control (MTPA) is achieved, i.e., maximum torque output at any stator current.

In the rotor reference frame, the motor model is affected by the cross-coupling of the speed voltage terms (i.e., ω _rL_qi_q and ω _rL_di_d+ω_rλ_m). This term may dominate the voltage equation, especially at high speeds. This in practice impairs the performance of the PI controller and therefore requires a decoupling circuit as a current control scheme for vector control. To linearize the control of i _d and i _q, the d-axis voltage and the q-axis voltage can be provided by a combination of two signals, respectively:

Wherein v _d1 and v _q1 can be controlled by linear PI current control, and the nonlinear terms v _d0 and v _q0 can be calculated from the rotor speed value of the encoder:

On the basis of the decoupling current controller, the difference between the fed-back motor rotating speed omega _fb and the rotating speed command omega _ref is used as the input of the rotating speed PI controller, and the output of the controller is used as the reference value of the q-axis current. At the same time, the limitation of the system on the rotation speed is noted, and the motor rotation speed reference value omega _ref is limited in the (-omega _limit,ω_limit) range. And constructing a position control ring on the basis of the rotating speed controller. Considering the triangle connecting rod structure of the load, the deflection angle alpha _ref of the instruction swinging mechanism can be approximately converted into the screw rod precession length by using the cosine formula And the calculated precession length L _fb is calculated by the rotor angle feedback signal theta _fb to be input into a position loop PI controller, and the speed loop reference value is output.

The Simulink modeling of decoupled current control in the above three loop control is shown in fig. 5, and the modeling of the position loop and the velocity loop is shown in fig. 6. The PMSM three-ring control model is linear control for controlling the angle of the actual swinging mechanism, and can meet certain requirements on precision and response speed under the condition of ensuring the robustness of the system. And the motor current and the rotating speed value can be ensured not to exceed the system limit. However, the above-described three-loop control flow is poor in the final control performance, which is mainly caused by the following problems:

On the one hand, the equivalent voltage of the output of SVPWM in an actual system cannot exceed the inverter limit (220V), so that the output value of the decoupling current controller shown in fig. 5 needs to be limited, and the limiting function breaks the linearity of the linear term of the decoupling current controller, fig. 7 shows the response curve of the q-axis current controlled by the controller to the step current command reference value before considering the threshold limit and after adding the threshold limit, it can be observed that the q-axis current without adding the threshold limit can track to the command size at a higher speed under the same controller parameter configuration, and the q-axis current curve shows high irregularity with a large overshoot and long convergence time after adding the threshold limit. This non-linearity reduces the response speed of the current loop to a certain extent, and increases the risk that the actual current exceeds the system limit, reducing the performance and reliability of the controller.

On the other hand, the system cannot observe the actual angle of the swing mechanism, and only the command angle alpha _ref and the rotor position feedback theta can be approximately converted into a screw rod processAndIrrespective of the complex differential equation relationship between the motor rotor angle and the wobble mechanism angle, this results in a situation where the system, when faced with low frequency signals, has a position ring error much greater than the actual wobble angle, as shown in fig. 8.

Step 4: permanent magnet synchronous motor control scheme for constructing deep reinforcement learning drive

The PPO is used for training, and the intelligent body directly outputs PMSM control voltage to drive the transmission device to drive the servo system to move so as to minimize the long-term errors of the reference curve and the actual state of the servo system. The reinforcement learning intelligent body is directly used as the PMSM voltage controller, so that the self-adaptability of the intelligent model in the control process can be exerted to the greatest extent, and the limitation of the output of the intelligent body by three-loop control is avoided; the three-loop control scheme typically adds clipping at the output position of each loop to ensure control safety in extreme states, however this also limits the effect of the agent, since the agent cannot optimize the original three-loop control output by tuning once a certain loop of the three loops reaches clipping. While the agent acts as a voltage controller for the PMSM, the only limitation is that its output should be within the voltage range that the PMSM can withstand, which has greater flexibility and adaptation than the agent acts as a feedforward compensator. However, since the reinforcement learning agent itself targets the maximization of the reward function, its output is subject to oscillations in the time domain; in addition, since the state quantity of the agent contains rapidly changing rotation speed and current, if the process of adding no noise and oscillation further aggravates the oscillation of the agent, which results in unsatisfactory control effect and even causes control safety problem, when the agent is directly used as a PMSM voltage controller, it is necessary to use not only an ADRC controller with noise interference resistance in a position loop to suppress noise in a feedback signal, but also a kalman filter for rapidly changing rotation speed and current and a low pass filter for the agent output to maximally suppress the oscillation of the agent output in a time domain.

The observed value of the intelligent agent is determined as the output theta _v1(t),θ_v2 (t) of the differential controller of ADRC, the output e ₁(t),e₂ (t) of the reference signal generator and the converted screw rod precession length of the command deflection angle alpha ^* (t)And feedback ofError e _θ (t), reference rotational speed ω ^* (t) and reference currentAnd corresponding rotational speed feedback ω (t) and current feedback i _q (t).

In order to better optimize the training algorithm, the input value is subjected to dimensionalization processing and certain amplitude limiting, meanwhile, a Kalman filter is added to two feedback quantities omega and i _q, and a low-pass filter is added to the output of the intelligent agent.

In order to improve the performance of the controller in a comprehensive way, the task faced by the reinforcement learning controller should be as rich as possible to contain all possible instruction modes, including random step instructions, low-frequency sine instructions and high-frequency sine instructions.

Meanwhile, the agent sampling time was set to 0.01s.

In particular, the invention includes the following considerations:

1) The invention installs the agent to the voltage input end of the permanent magnet synchronous motor, the agent driven by reinforcement learning directly controls the input voltage, and the output amplitude limit of the reinforcement learning agent is 220 x 2/3V because the input voltage is q-axis voltage.

2) The state space variables for reinforcement learning should be values that can be collected or calculated by the controller. Wherein i _d is close to 0, since the d-axis current i _d, the q-axis current i _q are already managed by the decoupled current controller.

3) In order to better optimize the training algorithm, the input values are subjected to dimensionalization processing, so that the input quantity is maintained to be 10 ⁰～10² in various working states. At the same time, for abrupt position instructions (e.g. steps), the derivative value will become a large number, resulting in a pathological empirical sample, and therefore the derivative value will need to be limited to a certain range.

Because of the highly non-linearity of neural networks, at the beginning of training, the output of the network tends to take a boundary value and oscillate back and forth in both the upper and lower boundaries. In the motor model, the oscillations of the rotational speed command will cause the motor feedback value to oscillate, which in turn causes the reinforcement learning controller to oscillate, eventually causing the motor to be in a highly unstable state, as shown in fig. 10. The use of such pathological data as the empirical value of the agent often fails to learn anything from it, and in many cases even if the agent has undergone many rounds of training, whether or not the effect of the controller is achieved, the resulting output is still oscillating repeatedly, which is not allowed in the actual point motor control, even with the risk of damaging the motor. To alleviate this, a kalman filter may be added to the two feedback quantities ω and i _q of the motor. After such an operation, even if the system oscillation occurs at the beginning of training, the presence of the kalman filter filters the input state into a low frequency signal, and the output is also a low frequency signal in gradual training, so that the training stability is ensured to a certain extent.

4) In order to improve the performance of the controller in combination, the task faced by the reinforcement learning controller should be as rich as possible to contain all possible instruction patterns. According to the automatic control principle, the conventional performance analysis method of the linear system comprises the following steps: the response capability of the device facing to the step instruction is inspected in a time domain, or the open-loop amplitude-frequency characteristic and the open-loop logarithmic-phase frequency characteristic are inspected in a frequency domain, and the training tasks are divided into the following three types by combining with a final adopted performance evaluation scheme:

a. random step instruction: zero steady state (each derivative is 0) is used as an initial state, a new random step target is generated every 2.5 seconds, and the step range d epsilon-3.5 degrees and 3.5 degrees are ensured.

B. A low frequency sinusoidal instruction: the zero steady state is used as an initial state to generate a low-frequency sinusoidal command delta _c＝δ_cm.sin (ωt) with amplitude delta _cm epsilon [1.5 DEG, 4 DEG ] [2 pi multiplied by 0.05,2 pi multiplied by 0.1] (rad/s) of circular frequency omega epsilon.

C. High frequency sinusoidal instructions: the zero steady state is used as an initial state to generate a high-frequency sinusoidal command delta _c＝δ_cm.sin (ωt) with amplitude delta _cm epsilon [0.3 DEG, 1.2 DEG ], and circular frequency omega epsilon [1,20] (rad/s).

As the training proceeds, each Episode selects a task as an instruction with equal probability and for 10 seconds.

5) The setting of the reinforcement learning reward function directly affects the optimization of the algorithm. Considering that the invention aims to optimize the error of the swing mechanism angle alpha and the command angle alpha _ref, the square of the difference between the reference value and the feedback value of each loop in the three-loop control can be used as the reward value of each time step, such rewards are not sparse rewards, and meanwhile, the range of rewards can be stabilized within [ -200,200] through the adjustment of coefficients, which is possibly beneficial to training.

6) During training, for an input agent state value:

it is desirable to train out an optimal strategy function pi so that the optimal compensation value can be output for s:

the PPO training algorithm improves the strategy gradient method by limiting the updating amplitude of the strategy, so that the training process is more stable and efficient, and the specific training process is as follows:

for each step subscript i e [1,2,3 … T/Ts ] of the agent time step size Ts, which is often T, the agent takes the state s _i of pattern 4.4, and generates an action a with the policy network:

The next state s _i+1 caused by the action is taken as a training sample [ a _i,s_i,r_i,s_i+1 ] experience playback buffer memory together with the state s _i of the step, the action a _i and the step reward value r _i calculated by the environment Is a kind of medium. When (when)The number of samples in (a) reaches a mini-batch of size N, each step is taken fromSamples of a batch are sampled, and a time sequence difference error or generalized dominance estimation is calculated to calculate a dominance function.

Next, the PPO will optimize the objective function, first, the PPO defines the probability ratio of the old policy and the new policyAnd defining a clipping objective function:

Where ε is the hyper-parameter of the clipping range.

The PPO would then update the policy network parameters using a random gradient ascent (Adam optimizer) to maximize the clipping objective function L ^CLIP (θ).

At the same time, PPO will update the value network parameters using a random gradient descent method (Adam optimizer) to minimize the loss function of the value network:

finally, updating the parameters of the target network according to the smooth proportion tau, and repeating the steps until reaching the training termination condition:

θ'←τθ+(1-τ)θ’

7) Considering the deployment requirement on an embedded chip, actor networks should not be set too large, and the invention sets the network as MLP of 2 hidden layers; whereas critic networks are set to the MLP of the 4-layer hidden layer. Meanwhile, the sine command frequency input by the controller can reach 20rad/s, the training speed is not too slow, and the intelligent body sampling time is set to be 0.01 s. Other training parameters were set as follows:

to comprehensively evaluate the performance improvement of the reinforcement learning-based position loop tuning scheme over the conventional PID method, the performance is evaluated by the following three indexes.

1) Load location characteristics

Under the condition of maximum load, the angle instruction sequence alpha _ref is used as instruction input, and the actual swing angle sequence alpha _fb of the swing mechanism in the time range of t epsilon (11, 31) is acquired, wherein:

the position loop curve is plotted with α _ref as the abscissa and α _fb as the ordinate. The nominal position curve is a midpoint connecting line of a position loop curve on a transverse axis, the nominal position datum line is first-order linear fitting of the nominal position curve, and tracking precision in a time domain is achieved through a loop width and zero offset analysis algorithm. Wherein the method comprises the steps of The maximum swing angles in the positive and negative directions represent the tracking capability of the limiting position of the approaching instruction; the maximum loop width of Δδ _max characterizes the maximum value of the tracking error; the delta ₀ null measures the symmetry of the control algorithm in both the positive and negative directions in the face of a sinusoidal low frequency such symmetric signal.

In connection with fig. 11 (a) and the following table, the DRL method is superior to the PID and ADRC methods in terms of maximum swing angle, maximum loop width, and zero bias in the control process, which proves that the DRL control method has more excellent tracking longitude through data-driven empirical learning.

2) Experiment of speed characteristics

Under the condition of maximum load, taking a step instruction with the amplitude of 3 degrees as a system input alpha _ref, collecting a load actual position sequence of a time period with the swing angle in the range of (0.5 degrees and 1.5 degrees), and averaging the swing angular speed through the swing angleThe response speed of the algorithm in the time domain is analyzed, and the experimental results are shown in the following table. As can be seen by combining fig. 11 (b) and the following table, the PID control method has the fastest rising speed, but the overshoot is the highest as well, proving that the PID control method is the least stable despite the fast response; the ADRC control method is contrary to it, being most stable despite the slowest response; the DRL method is interposed between the two methods, has a relatively high response speed and relatively stable performance.

3) Frequency characteristic experiment

At maximum load conditions, sinusoidal command α _ref =δ·sin (ωt) is taken as α _ref, where δ=0.5 °,0.8 °,1.1 °, ω=2, 4,8,16 (rad/s). Each frequency and amplitude simulates 6 instruction cycles. By measuring phase attenuation of the swing angle output compared to the input commandAnd amplitude attenuation L, and analyzing the response characteristic of the algorithm in the frequency domain. Orthogonal decomposition of α _ref and α _fb on sine and cosine bases at the same frequency as α _ref yields the sine and cosine components α _ref,b_ref and a _fb,b_fb of α _ref and α _fb by the following equation. Calculating relative standard excitation amplitude And phase angle

Calculating gain L (dB) and phase lag of actual angle signal relative to command signal by the above methodThe results are recorded below.

As can be seen from a combination of fig. 11 (c), 11 (d) and the following table, the controller under the DRL method is generally superior to the PID method in terms of phase decay index.

The PID method, the ADRC method and the DRL method are quantitatively measured through the three test methods, and compared with each other, the control performance of the DRL method on various indexes is known to be due to the PID method and the ADRC method on most indexes, and the system obtains better tracking precision and response speed.

Claims

1. An intelligent control method for a permanent magnet synchronous motor servo system in a semi-closed loop scenario, characterized in that the method includes constructing a permanent magnet synchronous motor operation model based on a FOC control framework and a mathematical model of a load and its transmission mechanism, and then using a three-ring structure as a basis, using a reinforcement learning agent to determine the state space and action settings, with the goal of improving the tracking accuracy and response speed of the load swing angle to the command target angle in a semi-closed loop scenario, using simulation software to model the system model and using a deep reinforcement learning method to obtain a tuning strategy network for motor position feedback to output the optimal control voltage;

The method also pre-trains the model through empirical data collected by the sensor during the test process, and then optimizes the policy network based on the double-delay depth determination policy gradient algorithm, so that the policy network can use the real-time observable feedback value or computable value of the motor as the state quantity, and thereby calculate the optimal PMSM control voltage, and the permanent magnet synchronous motor drives the transmission device to drive the servo system to respond to the instruction;

In the above process of using reinforcement learning agent to determine the state space and action settings, the anti-disturbance algorithm is used in the position loop to improve the anti-disturbance ability of the control system, PI control is used in the speed loop, and the reinforcement learning algorithm is used in the current loop to train the strategy network. By observing the feedback position of the position loop input and the reference speed of the output, the feedback speed of the speed loop input and the reference current and feedback current of the output, the decision output voltage applied to the PMSM is made to improve the system performance, and the PPO reinforcement learning algorithm is used for agent optimization;

This method uses the PPO reinforcement learning algorithm with continuous state and action space to optimize the intelligent agent, so that it observes the gap between the current PMSM state and the given reference value, and predicts the deviation between it and the actual screw precession length θ _loc when only relying on the approximate screw precession length as input, and outputs the voltage to control the PMSM, so as to optimize the error between the system and the given instruction during the motor control process and alleviate the problem of insufficient control accuracy caused by position feedback error.

2. The intelligent control method of a permanent magnet synchronous motor servo system in a semi-closed loop scenario according to claim 1 is characterized in that the implementation steps of the method include:

S1. Construct a permanent magnet synchronous motor operation model based on the FOC control framework. The FOC strategy of the permanent magnet synchronous motor decomposes the phase variable of the motor into magnetic field component and torque component, and controls them independently to achieve precise control of the motor magnetic field and torque;

Under FOC control, the motor system takes the voltage command values of the d and q axes from the controller as input, and inputs the dynamic torque to the load mechanism, which is expressed as the rotation angle of the motor rotor;

S2. Construct a mathematical model of the load and its transmission mechanism. The motor load model takes into account the elastic deformation of the transmission mechanism, the dynamic equation of the swing mechanism, and also considers nonlinear factors such as Coulomb friction and torque transmission in the triangular linkage mechanism;

S3. Considering the PMSM and high-order nonlinear load models in the above modeling, a deep reinforcement learning-driven edge intelligent control solution is constructed on the FOC control framework;

In the control scheme, the ADRC controller receives the external input reference value of the controlled object, and outputs the reference value of the position control and its rate of change through the differential controller; at the same time, the ADRC controller receives the feedback of the measurement element of the controlled object. Since there is a certain amount of noise and oscillation in the time domain during the measurement of the feedback, these will affect the correct output of the subsequent deep reinforcement learning agent. Therefore, the ADRC controller first inputs the feedback signal into the state expansion observer to predict the noise contained therein, and uses the corresponding feedforward mechanism to remove it; finally, the ADRC controller will make a difference between the outputs of the differential controller and the state expansion observer to output the reference value of the next loop;

The PI controller makes a difference between the reference value of the ADRC and the feedback value of the measuring element, and linearly superimposes the difference and the integral value over time as the output control quantity to the next actuator;

S4. Build a permanent magnet synchronous motor control solution driven by deep reinforcement learning, use PPO for training, and the intelligent agent outputs the control voltage of the PMSM to drive the transmission device to drive the servo system to move, so as to minimize the long-term error between the reference command and the servo system;

The values observed by the agent are determined as the differential controller outputs θ _v1 (t), θ _v2 (t) of the ADRC, the outputs of the reference signal generator e ₁ (t), e ₂ (t), and the reduced lead screw precession length of the command deflection angle α ^* (t) With feedback The error e _θ (t), the reference speed ω ^* (t) and the reference current and the corresponding speed feedback ω(t) and current feedback i _q (t); further considering that although the ADRC controller can predict the noise in the feedback signal and use feedforward to compensate for the impact of the noise, the aforementioned agent observations still have varying degrees of oscillation in the time domain. This oscillation mainly comes from the rapidly changing speed ω and current i _q , which will cause the agent output to oscillate; in addition, even if the input itself does not oscillate, the output of the reinforcement learning agent itself also oscillates in the time domain, because the agent aims to minimize the reward function and ignores the potential safety hazards that may be caused when used as the control voltage of the PMSM. Therefore, it is necessary to add a Kalman filter to the feedback quantities ω and i _q , and add a low-pass filter to the agent output to smooth the control voltage of the PMSM.

3. The intelligent control method of a permanent magnet synchronous motor servo system in a semi-closed loop scenario according to claim 1 or 2, characterized in that the method comprises the following optimization process:

In order to optimize the training algorithm better, the input value is dimensioned and limited. At the same time, the two feedback quantities ω and i _q are added to the Kalman filter, and the output of the intelligent agent is added to the low-pass filter.

In order to comprehensively improve the performance of the controller, the tasks faced by the reinforcement learning controller should be as rich as possible to include all possible command patterns, including random step commands, low-frequency sine commands, and high-frequency sine commands;

In order to reduce the error between the swing mechanism angle α and the command angle α _ref , the square of the difference between the reference value and the feedback value of each of the three rings is used and the inverse is taken as the reward value for each time step, and multiplied by a fixed coefficient to balance the weights among the three rings:

Status Using the optimization strategy function Enables the agent to output tuning values To this end, it is necessary to establish two Q networks to evaluate the value of the output of the policy network and use the evaluation value to perform gradient optimization on the actor network.

4. The intelligent control method of a permanent magnet synchronous motor servo system in a semi-closed loop scenario according to claim 1 or 2 is characterized in that, in step S1, under FOC control, the motor system takes the voltage command values of the d and q axes from the controller as input, and inputs the dynamic torque to the load mechanism, which is expressed as the rotation angle of the motor rotor; the specific operations are as follows:

S11, obtaining the rotor position and speed information of the motor for use in the transformation calculation between the phase variable in the stationary coordinate system and the relative rotation variable of the d-q coordinate;

In the d-q coordinate system, the motor equation can be described as:

Wherein: _Rs is the stator resistance; _vd , _vq are the d-axis and q-axis voltages respectively; _id , _iq are the d-axis and q-axis currents respectively; _Ld , _Lq are the d-axis and q-axis inductances respectively; _λm is the permanent magnet d-axis flux; _Te , _TL are the output torque and load torque respectively; B is the bearing viscosity coefficient; J is the total moment of inertia of the motor and the load; _ωm , _ωr are the rotor mechanical angular velocity and the rotor electromagnetic angular velocity respectively, p is the permanent magnet logarithm, satisfying _ωr = p× _ωm ; the input dq-axis voltages, according to the external load torque _TL , output the motor angle _θm and angular velocity _ωr , and the dq-axis current is detected by the current detector;

S12. According to actual needs, the motor type is determined to be a surface-mounted PMSM. Considering that the permanent magnet synchronous motor operating in a steady state must be kept under certain operating limits, the method requires that the rotor speed and stator current should be kept within a threshold range, satisfying:

_Lq ＝ _Ld

|ω _r |≤ω _limit

Among them, ω _limit is the maximum value of the rotor speed; i _limit is the maximum value of the stator current.

5. The intelligent control method for a permanent magnet synchronous motor servo system in a semi-closed loop scenario according to claim 1 or 2, characterized in that step S2 specifically comprises:

For high-order load mechanisms with nonlinear factors, the motor shaft torque is transmitted to the screw by gears, and the triangular connecting rod mechanism is pushed and pulled by the screw stroke to form a force arm, which pushes the swing mechanism to deflect the angle. In the process of modeling, in order to fit the mechanical properties of the actual load, the elastic motion of the screw, the motion equation of the swing mechanism and the nonlinear Coulomb friction during the motion are considered;

The dynamic equation of this load can be described as:

Among them, _θm is the rotor mechanical angle, _gr is the gear reduction coefficient, _nr is the screw reduction coefficient, L is the screw retraction/extension amount, _Ks is the rotation combined stiffness, F is the force of the screw driven by the torque; _Me is the screw mass, _Be is the elastic damping, ΔL is the screw compression stroke; _TL is the motor shaft load torque, effi is the transmission efficiency; M is the swing torque, _Kp is the precession combined stiffness, r is the arm length; α is the swing angle of the swing mechanism, _Jb is the swing inertia, _Bb is the swing damping, _Kdelta is the position resistance torque, _Mf is the friction torque, which is modeled as Coulomb friction, and its expression is as follows:

The above load model takes into account the elastic deformation of the screw. If we only consider the geometric relationship of the triangular linkage mechanism, we can form an equivalent triangular structure between the screw side, the swing mechanism and the fixed fulcrum; without considering the elastic deformation of the screw, the relationship between the retraction/extension amount L of the screw and the deflection angle α can be approximately expressed as:

Where a, b are the lengths of two adjacent sides OA, OB of the swing angle in the triangular linkage mechanism; α ₀ , L ₀ are the swing center angle ∠AOB and the length of side AB when the load deflection angle is 0; and, since the servo system and the linkage mechanism are located on the same straight line, the actual rotation angle θ _loc of the servo system is equal to the angle α of ∠AOB.

6. The intelligent control method of a permanent magnet synchronous motor servo system in a semi-closed loop scenario according to claim 1 or 2 is characterized in that, in step S3, a three-loop control is used to perform basic control on the final position angle for the semi-closed loop electric servo system under the FOC control framework to ensure the stability and robustness of the system operation, wherein:

S31. The current loop for controlling the torque is constructed as a decoupled current controller, which decomposes the d and q axis voltage terms that are originally coupled to each other into linear terms and nonlinear terms:

Among them, v _d1 and v _q1 can be controlled by a linear PID current controller, while the nonlinear terms v _d0 and v _q0 can be calculated from the rotor speed value of the encoder:

S32. The relationship between the input error e(t) and the output control value u(t) in the PID controller:

In S33, ADRC controller, the relationship between the differential controller input θ(t) and the output θ _v1 (t), θ _v2 (t) is:

in, represents the derivative of θ _v1 (t) with respect to time t;

In S34, ADRC controller, the relationship between the state extended observer θ(t) and the output _θz1 (t), _θz2 (t), _θz3 (t) is:

Among them, fal(e(t),α,δ) is the error filter, and the relationship between its input e(t),α,δ and output is:

S35. In the ADRC controller, the input of the reference signal generator comes from the output of the differential controller θ _v1 (t), θ _v2 (t) and the output of the state expansion observer θ _z1 (t), θ _z2 (t), θ _z3 (t), and the specific relationship is:

Among them, sat(x,x _max ) represents a saturation function. When the input x is greater than x _max , the output is x _max . When x is less than -x _max , the output is -x _max . Otherwise, the output is x.

S36: The difference between the feedback motor speed ω(t) and the speed command ω ^* (t) is used as the input of the speed PI controller, and the output of the controller is used as the reference value of the q-axis current.

S37. On the basis of the speed controller, a position control loop is constructed, specifically a load-based triangular connecting rod structure, which converts the command deflection angle α _ref approximately into the screw precession length Input the differential controller of ADRC and convert it to the precession length of the rotor angle feedback signal θ(t) The state expansion observer of ADRC is input, and the reference speed output by the ADRC reference signal generator is used as the output of the position loop.

7. The intelligent control method for a permanent magnet synchronous motor servo system in a semi-closed loop scenario according to claim 1 is characterized in that the specific steps for alleviating the problem of insufficient control accuracy caused by position feedback error include:

S41, the value observed by the agent is determined as the differential controller output θ _v1 (t), θ _v2 (t) of ADRC, the output of the reference signal generator e ₁ (t), e ₂ (t), and the reduced lead screw precession length of the command deflection angle α _ref With feedback The error e _θ (t), the reference speed ω ^* (t) and the reference current As well as the corresponding speed feedback ω(t) and current feedback i _q (t), their values are transformed to determine the state space s of the intelligent agent as:

S42, using the output continuous action a of the intelligent body as the q-axis voltage input of the PMSM, so that the PMSM generates an electromagnetic torque under the drive of a given voltage, and drives the servo system to respond to a given instruction;

S43, in order to improve the comprehensive performance of the controller, random step instructions, low-frequency sine instructions, and high-frequency sine instructions are mixed for training. Each Episode selects a task as an instruction with equal probability and lasts for 10 seconds;

S44. To reduce the error between the swing mechanism angle α and the command angle α _ref , the square inverse of the difference between the command and the reference value of each ring in the three-ring control is taken as the reward value for each time step:

S45. Considering the deployment requirements on embedded chips, the actor is set to an MLP with 2 hidden layers; the critic network is set to an MLP with 4 hidden layers; during training, the agent sampling time is set to 0.01s.