CN119575818A

CN119575818A - Machine tool vibration suppression optimization method and system

Info

Publication number: CN119575818A
Application number: CN202411710652.4A
Authority: CN
Inventors: 朱飞
Original assignee: Zhongke Times Shenzhen Computer System Co ltd
Current assignee: Zhongke Times Shenzhen Computer System Co ltd
Priority date: 2024-11-27
Filing date: 2024-11-27
Publication date: 2025-03-07
Anticipated expiration: 2044-11-27
Also published as: CN119575818B

Abstract

The present invention discloses a machine tool vibration suppression optimization method and system, the optimization method includes: generating a control strategy according to processing parameters; collecting vibration data of key parts of a CNC machine tool; inputting the vibration data and current working parameters into a vibration prediction model; judging whether the vibration result predicted by the vibration prediction model meets the processing requirements, and if not, using a reinforcement learning algorithm to adjust the machine tool working parameters in the control strategy. In the present invention, a method combining a vibration prediction model and a reinforcement learning algorithm is used to suppress vibration, and real-time vibration prediction and adaptive control are realized, which can effectively suppress vibration, significantly improve processing accuracy, reduce equipment wear and failure, and extend the service life of the machine tool.

Description

Machine tool vibration suppression optimization method and system

Technical Field

The invention belongs to the field of numerical control machine tools, and particularly relates to a machine tool vibration suppression optimization method and system.

Background

The vibration phenomenon of the numerical control machine tool is difficult to avoid in the cutting process of a workpiece, and the vibration of the machine tool is mainly caused by factors such as cutting force, structural rigidity of machine tool parts, external environment and the like. Machine tool vibration can affect workpiece surface quality, but also can cause tool damage and increase machining errors. The vibration suppression of the machine tool is an important means for improving the machining precision, prolonging the service life of the machine tool and ensuring the machining quality, and has important significance in modern numerical control machine tools and high-precision machining, but the traditional vibration suppression method mainly depends on mechanical design and manual experience, and is difficult to cope with complex and varied vibration environments, so that a new machine tool vibration suppression method is necessary to be provided.

Disclosure of Invention

Aiming at the defects of the prior art, the invention aims to provide a machine tool vibration suppression optimization method and system.

In order to solve the technical problems, the invention provides the following technical scheme:

A machine tool vibration suppression optimization method comprises the following steps:

S100, generating a control strategy according to the machining parameters, and generating a real-time control signal according to the control strategy to control the working parameters of the machine tool in each operation period;

s200, collecting vibration data of key parts of the machine tool in the working process of the numerical control machine tool;

S300, inputting vibration data and current working parameters of the machine tool into a vibration prediction model to predict a subsequent vibration result;

S400, judging whether the vibration result predicted by the vibration prediction model meets the machining requirement, if not, executing the step S500, otherwise, returning to execute the step S200;

s500, according to the vibration result predicted by the vibration prediction model, the working parameters of the machine tool in the control strategy are adjusted by adopting a reinforcement learning algorithm.

Furthermore, the key parts of the machine tool comprise a main shaft, a cutter and a workbench of the numerical control machine tool, and the working parameters comprise cutting speed, cutting acceleration, feeding quantity, cutting depth, main shaft rotating speed, cutter type and cutting route.

Further, the method for collecting vibration data of key parts of the machine tool comprises the following steps:

s210, arranging a vibration acquisition device at a key part of the machine tool, and acquiring a vibration signal of the key part of the machine tool;

s220, denoising the vibration signal to eliminate high-frequency and low-frequency noise;

s230, extracting frequency domain features and time domain features of the vibration signals to obtain vibration data.

Further, the method guarantee replacement for obtaining the vibration prediction model includes the following steps:

S310, determining a machine learning model or a deep learning model as a basic model of a vibration prediction model;

S320, acquiring historical vibration data of the machine tool, determining normal state data and abnormal state data in the historical vibration data according to the vibration range of the machine tool during normal operation, and marking the vibration data of a reverse point, a starting point and a stopping point and the abnormal state data;

S330, training the basic model by using the marked vibration data to obtain a vibration prediction model.

Furthermore, a long-term memory network LSTM is adopted as a basic model.

Furthermore, the reinforcement learning algorithm adopts a Q-learning algorithm, the working parameter value and vibration data of the current machine tool are used as the state S of the Q-learning algorithm, the adjustment of the working parameter of the machine tool is used as the action a of the Q-learning algorithm, and the reduction of the vibration of the machine tool and/or the machining performance of the machine tool are/is optimized to be used as the reward R of the Q-learning algorithm.

Further, the method for adjusting the working parameters of the machine tool in the control strategy by adopting the reinforcement learning algorithm comprises the following steps:

s510, obtaining current processing parameters and vibration results predicted by a vibration prediction model, and representing the current processing parameters and the vibration results as a current state S _t;

S520, selecting an action a as a current action a _t by adopting an E-greedy strategy according to the current state;

S530, adjusting working parameters of the machine tool in the control strategy to execute the current action a _t, and acquiring real-time vibration data and processing effects as feedback results after executing the preset time of the current action a _t;

s540, calculating a reward value r _t+1 according to a feedback result;

S550, updating the current state and the Q value of the action.

S560, judging whether a preset termination condition is met, if so, returning to the step S200, otherwise, returning to the step S510. And after updating the current state s _t, adjusting the control strategy again until the requirement is met.

Further, in the step S540, the calculation formula of the prize value r _t+1 after executing the action a _t is as follows:

Wherein a _v is a weighting coefficient of vibration data value, V (t) is the value of vibration data fed back by the machine tool after executing action a _t, i is an index of the number of terms of the parameter participating in calculating the reward value, N is the total number of terms of the parameter participating in calculating the reward value, b _i is a weighting coefficient of the ith parameter, and f _i (t) is the feedback value of the ith parameter after executing action a _t. The weighting coefficient is positive when the parameter is larger and better, and is negative when the parameter is smaller and better.

Further, in the step S550, the formula for updating the Q value is as follows:

Wherein Q' (s _t,a_t) is the Q value of the action a _t executed in the state s _t, namely the expected accumulated rewarding value of the current action a _t to the strategy, Q (s _t,a_t) is the Q value before updating, alpha is the learning rate and represents the influence weight of new information to the current Q value, the value range is 0< alpha is not more than 1;r _t+1 and is the instant rewarding obtained after the action a _t is executed at the current time step t, gamma is the discount factor and represents the influence degree of future rewarding to the current decision, and the value range is 0 not more than gamma is not more than 1; In the next state S _t+1, the action with the largest Q value among all the selectable actions a' _t+1 is selected.

A machine tool vibration suppression optimization system comprises

The control strategy module is used for generating a control strategy according to the machining parameters, generating a real-time control signal to the machine tool executing mechanism according to the control strategy and controlling the working parameters of the machine tool executing mechanism in each operation period;

The data acquisition module is used for acquiring vibration data of key parts of the machine tool in the working process of the machine tool;

the vibration prediction module is trained by a machine learning model or a deep learning model and is used for predicting the vibration result of the subsequent operation period according to the vibration data and the working parameter of the machine tool in the current operation period, and

And the parameter optimization module is used for adjusting the working parameters of the machine tool in the control strategy by adopting a reinforcement learning algorithm when the vibration result predicted by the vibration prediction module does not meet the processing requirement.

In the invention, the vibration is restrained by adopting a method combining the advanced prediction of the vibration prediction model and the optimization control strategy of the reinforcement learning algorithm, so that the real-time vibration prediction and the self-adaptive control are realized, the vibration can be effectively restrained, the processing precision is obviously improved, the equipment abrasion and faults can be reduced, and the service life of a machine tool is prolonged. In addition, the control strategy is optimized by continuously executing the reinforcement learning algorithm in the machining process, so that the control strategy can be continuously optimized, and finally, the optimal machining performance of the machine tool is achieved under the condition that vibration is minimized.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute a limitation on the application. In the drawings:

FIG. 1 is a flow chart of an embodiment of a machine tool vibration suppression optimization method according to the present invention.

Fig. 2 is a flowchart for acquiring vibration data of key parts of a machine tool.

Fig. 3 is a flowchart for obtaining a vibration prediction model.

FIG. 4 is a flow chart for adjusting a control strategy using a reinforcement learning algorithm.

FIG. 5 is a block diagram of one embodiment of a vibration suppression optimization system for a machine tool of the present invention.

The specification reference numerals are as follows:

The system comprises a control strategy module-100, a machine tool executing mechanism-200, a data acquisition module-300, a vibration prediction module-400 and a parameter optimization module-500.

Detailed Description

The following description of the embodiments of the invention is given by way of specific examples, the illustrations provided in the following examples merely illustrate the basic idea of the invention, and the following examples and features of the examples can be combined with one another without conflict.

At present, a main shaft vibration suppression technology, a feed shaft vibration suppression technology and a cutter vibration suppression technology are mainly adopted for vibration suppression.

The main shaft vibration suppression technology mainly comprises three methods, namely variable-speed cutting, main shaft bearing pretightening force control and main shaft system self-balancing control, but all have obvious defects. Among them, the variable speed cutting technique employs a periodic continuous change of cutting speed to avoid an unstable cutting region, thereby suppressing vibration of cutting. The disadvantage is that a large number of cutting tests are required to be performed on the machining system to establish a system stability limit diagram, and if any part of the spindle, tool, fixture and workpiece in the machining system is changed, the stability limit diagram is changed, so that the chatter prediction data is required to be re-planned. The control of the pretightening force of the main shaft bearing is low increasing pretightening force when the torque is high a method for reducing pretightening force at high speed and low torque. The defects are as follows: the bearing pre-tightening force can be fixed only for a single application scene. The self-balancing control of the main shaft system adopts a related balancing mechanism to carry out self-balancing of high-speed rotation of the main shaft. The balance mechanism has the defects that the structure is complex, the popularization and the application are difficult, and the balance of the main shaft is generally calibrated only before processing.

The vibration suppression of the feed shaft adopts a method for improving the structure of a mechanical transmission part to perform vibration suppression, and mainly adopts novel linear motors, electromagnetic shielding devices of an electric cabinet and the like and servo tuning. The method has the defects of consuming a great deal of time and energy and having insignificant economic benefit.

The method for restraining the vibration of the cutter mainly adopts the methods of improving cutter bar materials, optimizing the clamp, optimizing the technological parameters and the like, and has the defects of consuming a great deal of time and energy and having insignificant economic benefit.

Referring to fig. 1, fig. 1 is a flowchart illustrating an embodiment of a machine tool vibration suppression optimization method according to the present invention. The machine tool vibration suppression optimization method of the embodiment comprises the following steps:

And S100, generating a control strategy by the numerical control machine tool according to the machining parameters, and generating a real-time control signal according to the control strategy to control the working parameters of the machine tool in each operation period. The machining parameters are generally parameters such as cutting speed, feed amount, cutting depth, machining route, etc. which are input in advance at the time of machining.

S200, collecting vibration data of key parts of the machine tool in the working process of the machine tool. Referring to fig. 2, the method for collecting vibration data of key parts of a machine tool may include the following steps:

S210, arranging vibration acquisition devices such as an acceleration sensor, a displacement sensor, a strain gauge and the like at key parts of the machine tool, and acquiring vibration signals of the key parts of the machine tool. The key parts of the machine tool generally comprise the positions of a main shaft, a cutter, a workbench (comprising a feed shaft) and the like of the machine tool.

S220, denoising the acquired vibration signals by adopting a filter (such as Kalman filtering, band-pass filtering and the like), and eliminating high-frequency and low-frequency noise.

S230, extracting frequency domain features and time domain features of the vibration signals by adopting methods such as Fourier transform (FFT) and wavelet transform to obtain vibration data.

S300, inputting vibration data and current working parameters of the machine tool into a vibration prediction model to predict a subsequent vibration result. The operating parameters of the machine tool generally include cutting speed, cutting acceleration, feed rate, cutting depth, spindle rotational speed, tool type, cutting path, etc. The operation period is only one period which is manually divided for the convenience of control, the duration of the operation period can be selected according to actual needs, and the control strategy can issue a plurality of real-time control signals in one operation period. For example, 1ms may be used as one operation period, and of course, 0.1ms or other time periods may be selected as one operation period as long as the final control effect can be achieved.

Referring to fig. 3, the method for obtaining the vibration prediction model through pre-training includes the following steps:

s310, determining a machine learning model (such as a Support Vector Machine (SVM) and a random forest) or a deep learning model (such as a Convolutional Neural Network (CNN) and a long-term memory network (LSTM)) as a basic model of the vibration prediction model. In this embodiment, a long-short-term memory network LSTM is used as a basic model.

S320, acquiring historical vibration data of the machine tool, determining normal state data and abnormal state data in the historical vibration data according to the vibration range of the machine tool during normal operation, and marking the vibration data of a reverse point, a starting point and a stopping point and the abnormal state data. When a new workpiece is processed, vibration data can be obtained as historical vibration data of the machine tool through trial cutting of the workpiece. In the processing process, the reverse point, the starting point and the stopping point generally generate larger vibration, so that the reverse point, the starting point and the stopping point are marked together with abnormal state data during training, and the rest data are normal state data. The normal state data and the abnormal state data are generally distinguished according to the historical vibration range, for example, the historical data show that the vibration value is between 0 and 1, the data value in the range is the normal state data, and the data value beyond the range is the abnormal state.

S330, training the basic model by using the marked vibration data to obtain a vibration prediction model. Of course, when the workpiece is formally machined, after execution of each cycle is completed, vibration data and relevant working parameters in the machining process can be used as historical vibration data to continuously train the model, so that the prediction accuracy of the vibration prediction model is continuously improved.

S400, judging whether the vibration result predicted by the vibration prediction model meets the machining requirement, if not, executing the step S500, otherwise, returning to execute the step S200.

Because the processing cost needs to be considered, the number of times of trial cutting of the workpiece is limited, the model can only be trained to preliminarily meet the processing requirement by only data of trial cutting of the workpiece, and the control strategy cannot be in an optimal state, so that the step S500 is generally executed to continuously optimize the control strategy in the initial processing stage of the workpiece. With the increasing number of times of workpiece processing, the model is continuously trained by utilizing vibration data of each processing, so that the control strategy of workpiece processing gradually tends to be stable after continuous optimization. At this time, in normal state, optimization is not needed any more, and the step S200 is directly executed again, and only monitoring is performed, so as to facilitate timely intervention when unexpected processing occurs. Of course, if the machine tool collides, the performance of the machine tool is changed, or a new machined workpiece is replaced, the machine tool needs to be optimized again.

S500, according to the vibration result predicted by the vibration prediction model, a reinforcement learning algorithm (such as Q-learning algorithm, depth Q network DQN algorithm and the like) is adopted to adjust working parameters of the machine tool in the control strategy.

In this embodiment, the reinforcement learning algorithm adopts a Q-learning algorithm, and takes the current working parameter value and vibration data (including vibration data predicted by a vibration prediction model and vibration data fed back after executing an action) of the machine tool as the state S of the Q-learning algorithm, so as to adjust the working parameter of the machine tool as the action a of the Q-learning algorithm. For example, the parameter value of a certain operating parameter may be stepped up or down as one action a, or the cutting route may be adjusted as the action a. The machine tool vibration reduction and the machine tool processing performance are optimized to be used as the reward R of the Q-learning algorithm, and of course, the machine tool vibration reduction can be used as the reward R of the Q-learning algorithm.

Referring to fig. 4, when the reinforcement learning algorithm employs the Q-learning algorithm, this step may include the following sub-steps:

S510, obtaining current processing parameters and vibration results predicted by the vibration prediction model, and representing the current processing parameters and the vibration results as current states S _t, so that possible vibration conditions can be processed in advance according to the vibration results predicted by the vibration prediction model, subsequent vibration is reduced by adjusting a control strategy, and the effect of suppressing vibration is achieved.

S520, selecting an action a as a current action a _t by adopting an epsilon-greedy strategy (i.e. a decayable greedy strategy) according to the current state, wherein the formula is as follows:

Wherein, The action with the maximum Q value in the current state is adopted. The epsilon is the probability of exploration, namely, a random action is selected to explore a new possibility by the probability of epsilon, so that the local optimum is prevented from being trapped, and 0< epsilon <1. 1-epsilon is the probability of utilization, i.e. the action with the largest current Q value is selected with the probability of 1-epsilon to further optimize the control strategy epsilon-greedy strategy, and the value of epsilon can be gradually reduced with the increase of the times.

And S530, adjusting working parameters of the machine tool in the control strategy to execute the current action a _t, and acquiring real-time vibration data and machining effects as feedback results after the preset time of executing the current action a _t. For example, in a position where vibration is large due to the assembly, structure, or the like of the machine tool itself, vibration can be reduced by reducing the cutting speed or the like, and when the adjustment of the parameter value such as the cutting speed or the like is still not solved, the cutting route can be adjusted, and excessive vibration can be avoided by bypassing the position.

S540, calculating a reward value r _t+1 according to the feedback result. Since the purpose of the present embodiment is to suppress vibration, the feedback result will include vibration data, and for example, the calculation formula of the prize value r _t+1 after the execution of the action a _t may be set as follows:

r_t+1=-V(t)

Wherein V (t) represents the value of vibration data fed back by the machine tool after executing the action a _t, and the value of the vibration data can be a vibration amplitude value or a comprehensive value of vibration performance related parameters such as vibration acceleration, displacement and the like. Of course, in the actual machining process, not only vibration but also performance in terms of machining effects such as machining efficiency may be required, and in this case, the performance in terms of machining effects such as machining efficiency may be added to the calculation of the prize value r _t+1. At this time, the calculation formula of the prize value r _t+1 after the execution of the action a _t is as follows:

Wherein a _v is a weighting coefficient of vibration data value, i is an index of the number of terms of the parameter participating in calculating the bonus value, and i=0 indicates performance in terms of processing effect such as processing efficiency. N is the total number of terms of the parameters involved in calculating the prize value, b _i is the weighting coefficient of the i-th parameter, and f _i (t) is the feedback value of the i-th parameter after performing action a _t.

In this embodiment, parameters involved in calculating the prize value include vibration performance parameters of the machine tool, machining efficiency and surface roughness of the workpiece, and therefore, the prize value r _t+1 after executing the action a _t is calculated as follows:

r_t+1＝-a_vV(t)+b₁f₁(t)+b₂f₂(t)

Where b ₁ is a weight coefficient of machine tool machining efficiency, and f ₁ (t) is machining efficiency fed back by the machine tool after performing action a _t. B ₂ is a weighting coefficient of the workpiece surface roughness, and f ₂ (t) is the workpiece surface roughness fed back after performing action a _t.

S550, updating the current state and the Q value of the action. When the Q-learning algorithm is executed for the first time, the Q value Q (s, a) of the state and action is initialized to an initial value, which may be generally set to a random value of 0 or less. And then the Q-learning algorithm is executed to continuously update the Q values of the states and the actions until the Q values of the states and the actions tend to be stable. The updated Q value is also used as the basis for selecting the current action a _t in step S520. The formula for updating the Q value is as follows:

S560, judging whether a preset termination condition is met, wherein the termination condition is that the system can be converged to an optimal processing parameter (for example, when the change of the Q value is smaller than a certain threshold value or the reward value is stabilized at a higher level in a plurality of steps, the system can be judged to be converged to the optimal processing parameter), vibration is minimized or other preset conditions (for example, processing efficiency, surface roughness and the like) are met. If the termination condition is met, the control strategy is considered to be basically optimal, the optimization is not needed, and the step S200 is executed to monitor the processing process continuously. If the termination condition is not satisfied, the control strategy is considered to have a space for continuing to optimize, after the current state S _t is updated, the step S510 is executed again to adjust the control strategy again until the requirement is satisfied.

In the embodiment, vibration is restrained by adopting a method of combining the advanced prediction of the vibration prediction model and the optimization control strategy of the reinforcement learning algorithm, and the vibration prediction model can improve the prediction of the vibration condition of the machine tool, so that the control strategy can be optimized in advance through the reinforcement learning algorithm before larger vibration is likely to occur, and the influence on the machining precision caused by the larger vibration in the machining process is avoided. In addition, the control strategy is optimized by continuously executing the reinforcement learning algorithm in the machining process, so that the control strategy can be continuously optimized, and finally, the optimal machining performance of the machine tool is achieved under the condition that vibration is minimized.

Referring to fig. 5, fig. 5 is a block diagram illustrating an embodiment of a vibration suppression optimization system for a machine tool according to the present invention. The machine tool vibration suppression optimization system of the present embodiment includes a control strategy module 100, a data acquisition module 300, a vibration prediction module 400, and a parameter optimization module 500. The control strategy module 100 is configured to generate a control strategy according to the machining parameters, and generate a real-time control signal to the machine tool executing mechanism 200 according to the control strategy, so as to control the working parameters of the machine tool executing mechanism 200 in each operation cycle.

The data acquisition module 300 is used for acquiring vibration data of key parts of the machine tool in the machine tool executing mechanism 200 during the working process of the machine tool. The data acquisition module 300 may include a vibration acquisition unit, a denoising processing unit, and a feature extraction unit. The vibration acquisition unit is used for acquiring vibration signals of key parts of the machine tool such as a main shaft, a cutter and a workbench in the machine tool executing mechanism 200, and can comprise a plurality of vibration acquisition devices such as acceleration sensors, displacement sensors and strain gauges which are arranged at the key parts of the machine tool, and data acquired by the vibration acquisition devices can be transmitted to the denoising processing unit through the industrial internet of things and the like. The denoising processing unit is used for denoising the acquired vibration signals by adopting a filter (such as Kalman filtering, band-pass filtering and the like) to eliminate high-frequency and low-frequency noise. The feature extraction unit is used for extracting frequency domain features and time domain features of the vibration signals by adopting methods such as Fourier transformation and wavelet transformation to obtain vibration data.

The vibration prediction module 400 is configured to predict a vibration result of a subsequent operation cycle according to the vibration data and an operation parameter of the machine tool in a current operation cycle. The vibration prediction module 400 may be trained by a machine learning model or a deep learning model, for example, may be obtained by executing steps S310 to S330 in the embodiment of the machine tool vibration suppression optimization method.

The parameter optimization module 500 is configured to adjust working parameters of the machine tool in the control strategy by using a reinforcement learning algorithm when the vibration result predicted by the vibration prediction module does not meet the machining requirement. For example, the steps S510 to S560 in the embodiment of the machine tool vibration suppression optimization method can be executed to adjust the working parameters of the machine tool in the control strategy, so as to realize the adaptive control of the numerical control machine tool.

The machine tool vibration suppression optimization system can effectively suppress vibration through real-time vibration prediction and self-adaptive control, remarkably improve machining precision, reduce equipment abrasion and faults, and prolong the service life of a machine tool. In addition, through predictive maintenance, sudden faults and downtime can be reduced, and maintenance cost is reduced.

The foregoing examples only represent preferred embodiments of the present invention, which are described in more detail and are not to be construed as limiting the scope of the invention. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the invention, which are all within the scope of the invention. Accordingly, the scope of protection of the present invention is to be determined by the appended claims.

Claims

1. The machine tool vibration suppression optimization method is characterized by comprising the following steps of:

2. The method for suppressing and optimizing vibration of a machine tool according to claim 1, wherein the key parts of the machine tool comprise a spindle, a cutter and a workbench of the numerical control machine tool, and the working parameters comprise cutting speed, cutting acceleration, feeding amount, cutting depth, spindle rotation speed, cutter type and cutting route.

3. The method for optimizing vibration suppression of a machine tool according to claim 1, wherein the method for acquiring vibration data of a critical part of the machine tool comprises the steps of:

4. The method for optimizing vibration suppression of a machine tool according to claim 1, wherein the method guarantee replacement for obtaining the vibration prediction model comprises the steps of:

5. The method for vibration suppression and optimization of a machine tool according to claim 1, wherein a long-short-term memory network LSTM is used as a basic model.

6. The method for optimizing vibration suppression of a machine tool according to any one of claims 1 to 5, wherein the reinforcement learning algorithm adopts a Q-learning algorithm, the working parameter value and vibration data of the current machine tool are used as a state S of the Q-learning algorithm, the adjustment of the working parameter of the machine tool is used as an action a of the Q-learning algorithm, and the reduction of vibration and/or the machining performance of the machine tool are optimized as a reward R of the Q-learning algorithm.

7. The method for optimizing vibration suppression of a machine tool of claim 6, wherein the method for adjusting machine tool operating parameters in the control strategy using a reinforcement learning algorithm comprises the steps of:

s540, calculating a reward value r _t+1 according to a feedback result;

S550, updating the current state and the Q value of the action.

8. The method for optimizing vibration suppression of a machine tool according to claim 7, wherein in said step S540, the calculation formula of the prize value r _t+1 after executing the action a _t is as follows:

9. The method for optimizing vibration suppression of a machine tool according to claim 8, wherein in said step S550, the formula for updating the Q value is as follows:

10. A machine tool vibration suppression optimization system is characterized by comprising