Disclosure of Invention
The invention provides a method and a system for automatically evaluating the safety of a strength training action gesture, and aims to solve at least one of the defects.
The invention relates to a method for automatically evaluating the safety of a strength training action gesture, which comprises the following steps:
receiving a video segment transmitted by a user terminal, extracting frame image characteristics, and outputting an action category sequence number;
outputting a calling model vector according to the frame image characteristics and the action category sequence number;
Calling a corresponding body part granularity level model according to the calling model vector;
outputting human body posture key point coordinates according to the body part granularity level model;
Calculating the movement speed and the joint angle according to the coordinates of the key points of the human body posture and the sequence number of the action category;
judging whether the movement speed or the joint angle is within a safety threshold range;
If the movement speed or the joint angle is not within the safety threshold range, a prompt warning is given.
Further, the step of receiving a video segment transmitted by the user terminal, extracting frame image features and outputting an action category sequence number comprises the following steps:
extracting a series of successive frame images from the video at regular time intervals;
preprocessing the extracted frame image to ensure the consistency and the effectiveness of an input model;
The method comprises the steps of processing a fixed frame sequence by using a 3D convolution layer, utilizing a 3D convolution kernel to slide in three dimensions of time, width and height so as to realize simultaneous extraction of time and space characteristics, flattening four-dimensional characteristics processed by a 3D convolution module into three-dimensional characteristic representation, constructing a backbone network based on a double-flow transform architecture, respectively extracting time sequence and space two types of characteristics by using two parallel transform coding modules on each layer, and fusing the two types of characteristics in an addition mode so as to obtain more comprehensive characteristic representation, wherein the four-dimensional characteristics comprise frame number, width, height and characteristic dimension, and the three-dimensional characteristics comprise time sequence, space and characteristic dimension;
Flattening the fused three-dimensional feature vectors into one-dimensional feature vectors, inputting the one-dimensional feature vectors into a full-connection layer, mapping the three-dimensional feature vectors into an action category space by the full-connection layer, enabling the number of output nodes to be equal to the number of action categories, converting the output into probability distribution through a softmax activation function, selecting the category with the highest probability as a prediction result according to the output probability distribution, and outputting the action category serial number with the highest probability.
Further, the step of outputting the calling model vector according to the frame image feature and the action class sequence number comprises the following steps:
The frame image characteristics and the action class serial numbers output in the last step are taken as inputs to be transmitted to a decision model;
the decision model decides and calls the body part model with corresponding granularity level according to the frame image characteristics and the action category serial number output in the last step by the video quality, and outputs the final model call result, and the result is expressed by a vector.
Further, a section of video transmitted by a user terminal is received, frame image characteristics are extracted, a data set is prepared before the step of outputting an action type serial number, a corresponding body part model is determined and called according to the action type serial number, extracted video characteristic data are respectively transmitted into models corresponding to each granularity level of the body part in a model pool, the total number N of key point estimation and average error ME are respectively output by each model, and the average error ME is calculated by the following formula:
Wherein ME represents average error, the position of the key point output by the model is (x 1,y1)、(x2,y2)……(xn,yn), and the position of the key point correctly marked by the original picture is (x1 ′,y1 ′)、(x2 ′,y2 ′)……(xn ′,yn ′);N, which represents the total number of the key point estimation.
Further, the decision model decides and calls the body part model with corresponding granularity level according to the frame image characteristics and the action category sequence number output in the last step by video quality, and outputs a final model call result, and in the step of representing the result by vectors, if the foot key point estimation model-granularity level 3 and the trunk key point estimation model-granularity level 3 are called, a final output vector [3,3,0,0] is obtained.
Further, the step of outputting coordinates of key points of the human body posture according to the body part granularity level model includes:
After the final output vector [3,3,0,0] is obtained, the output vector [3,3,0,0] is transmitted into a gesture estimation model pool to represent that a foot key point estimation model-granularity 3 level and a trunk key point estimation model-granularity 3 level are required to be called, and the two-dimensional coordinates of the human gesture key points of each frame in the video are obtained.
Further, the joint angles include elbow joint angles, and in the step of calculating the movement speed and the joint angles according to the human body posture key point coordinates and the action category serial numbers, the elbow joint angles are calculated by the following formula:
Wherein, the angle ABC represents the angle of the elbow joint, Representing a vector from the elbow to the shoulder,A vector representing elbow to wrist;
Wherein, the coordinates of the three points of the angle ABC are respectively A (x 1,y1)、B(x2,y2)、C(x3,y3).
Further, in the step of calculating the movement speed and the joint angle according to the human body posture key point coordinates and the action category serial number, the movement speed is calculated by the following formula:
Where v denotes the motion speed, s denotes the distance from the coordinates of the previous frame key point a (x 1,y1) to the coordinates of the current frame key point a ′ (x 1 ′,y1 ′), t denotes the interval time between each frame of images, and n denotes the number of frames of images.
Further, the step of determining whether the movement speed or the joint angle is within the safety threshold range further includes:
If the movement speed and the joint angle are recognized to be within the safe threshold range, ending.
The invention also provides a system for automatically evaluating the security of the gesture of the strength training action, which is applied to the method for automatically evaluating the security of the gesture of the strength training action, and comprises the following steps:
the first output module is used for receiving a section of video transmitted by the user terminal, extracting frame image characteristics and outputting an action category sequence number;
The second output module is used for outputting a calling model vector according to the frame image characteristics and the action category serial number;
The calling module is used for calling the corresponding body part granularity grade model according to the calling model vector;
the third output module is used for outputting the coordinates of key points of the human body posture according to the body part granularity level model;
the calculation module is used for calculating the movement speed and the joint angle according to the coordinates of the key points of the human body posture and the sequence number of the action category;
The judging module is used for judging whether the movement speed or the joint angle is within a safety threshold range;
And the identification module is used for giving a prompt warning if the movement speed or the joint angle is not in the safety threshold range.
The beneficial effects obtained by the invention are as follows:
The invention provides a method and a system for automatically evaluating the motion gesture safety of strength training, which are characterized in that a frame image feature is extracted by receiving a video transmitted by a user side, a motion class sequence number is output, a calling model vector is output according to the frame image feature and the motion class sequence number, a corresponding body part granularity level model is called according to the calling model vector, a human body gesture key point coordinate is output according to the body part granularity level model, a motion speed and a joint angle are calculated according to the human body gesture key point coordinate and the motion class sequence number, whether the motion speed or the joint angle is in a safety threshold range is judged, and a prompt warning is given if the motion speed or the joint angle is not in the safety threshold range. The method and the system for automatically evaluating the action gesture safety of the strength training provided by the invention have the following beneficial effects:
1. automatic assessment of motion gesture accuracy without human observation
The technology for automatically evaluating the safety of the strength training actions can reduce subjectivity and inaccuracy in artificial observation, thereby improving the safety and effect of the sports training. By capturing actions through the camera and combining with the gesture estimation model, the motion information such as the angle of the action joint, the motion speed and the like in motion can be automatically detected, and small but key gesture deviations such as the position of the knee in deep squat can be accurately captured. The method not only reduces human observation errors caused by fatigue or inattention, but also provides instant feedback for body-building people, helps to adjust actions in time and avoids sports injury.
2. Support multitasking, improve model selection efficiency
The decision model can call the recognition tasks of a plurality of body parts, and the efficiency of model selection is remarkably improved. By integrating the gesture estimation models of different body parts and combining factors such as precision, calculation cost, granularity and the like, a flexible task selection mechanism is designed, the decision model can rapidly select and call gesture estimation tasks or single tasks of a plurality of different body parts in a gesture estimation model pool, and a user can realize gesture recognition of various body parts without manually selecting a specific model. The task selection mechanism of the decision model not only improves the model selection efficiency, but also meets diversified application requirements.
3. Autonomous selection of estimated granularity based on video quality
The decision model can autonomously select the appropriate estimated granularity based on the input video quality. For high quality video, the model may employ finer granularity for fine pose estimation to ensure accuracy and detail richness of the results. For lower quality video, the model selects coarser granularity to balance the consumption of computing resources and the credibility of the estimation result. This ability to dynamically adjust granularity makes the model excellent at various video qualities to provide the best estimation results.
Detailed Description
In order to better understand the above technical solutions, the following detailed description will be given with reference to the accompanying drawings and specific embodiments.
As shown in fig. 1 and 2, a first embodiment of the present invention proposes a method for automatically evaluating the security of the gesture of a strength training exercise, comprising the following steps:
step S100, a video segment transmitted by a user terminal is received, frame image characteristics are extracted, and action category serial numbers are output.
Video is input to an action category model, the action category model extracts video features, an action category sequence number is output, for example, output "1" indicates that the action category is push-up, output "2" indicates that the action category is flat support, and so on.
And step 200, outputting a calling model vector according to the frame image characteristics and the action category sequence number.
And (5) taking the frame image characteristics and the action class serial numbers output in the last step as inputs to be transmitted to a decision model. The existing 4 different body part key point estimation models are respectively a hand key point estimation model, a foot key point estimation model, a face key point estimation model and a trunk key point estimation model.
The video quality can affect the number of keypoints that each body-part model can accurately identify. By referring to the related literature and data, the recognition granularity of each body part model was classified into several classes according to the granularity class, the lower the class, the coarser the granularity was indicated, and the fewer the number of key points that can be recognized accurately (class 0 indicates that the video does not have the body part), as shown in table 1.
TABLE 1
According to table 1, each body part model has data sets of different granularity levels, 3 foot key point estimation models, 4 torso key point estimation models, 5 hand key point estimation models, and 6 face key point estimation models.
A plurality of image datasets with human body posture marks are collected, the datasets comprise human body images with different postures and angles, and the positions of the dataset marks corresponding to different body part models are different.
Taking a trunk key point estimation model as an example, the 4 granularity level data sets of the trunk key point estimation model are data sets with 4, 8, 12 and 17 key points marked on feet. The model pool is built by constructing each key point estimation model by replacing different pre-measurement heads by using the current most advanced CNN-based top-down human body posture estimation algorithm HRNet (high-resolution network) as a main network. Taking a trunk key point estimation model with granularity of 4 as an example, inputting the features extracted through HRNet trunk networks into corresponding prediction heads, and outputting corresponding 17 key point coordinates by the prediction heads.
Training the model by using the collected labeling data, and dividing the data set into a training set, a verification set and a test set according to the proportion of 8:1:1. In the training process, the model parameters are continuously optimized, so that the positions of key points of a human body can be accurately predicted. The model is evaluated using the validation set to adjust model parameters and prevent over-fitting and under-fitting. After training, the model was finally evaluated using the test set, ensuring that it performed well on unseen data.
And the other body part models are similar, so that a plurality of granularity grade models corresponding to each body part model can be finally obtained, and the total number of the granularity grade models is 18 (3 foot key point estimation models, 4 trunk key point estimation models, 5 hand key point estimation models and 6 face key point estimation models). And (5) simply splicing and integrating all the models to form a posture estimation model pool.
The models are vectorized using one-hot coding (one-hot coding), i.e. each model is represented by a vector whose length is equal to the total number of models, each position in the vector corresponds to one model (the first position represents the foot keypoint estimation model, the second position represents the torso keypoint estimation model, the third position represents the hand keypoint estimation model, and the fourth position represents the face keypoint estimation model), and the number represents the granularity level of the corresponding position model, for example:
(1) The foot keypoint estimation model-granularity level 1 vector form is represented as [1, 0];
(2) The vector form of the trunk key point estimation model-granularity level 2 is expressed as [0,2,0,0];
(3) The hand key point estimation model-granularity 3-level vector form is represented as [0,0,3,0];
(4) The vector form of the face key point estimation model-granularity level 4 is represented as [0,0,0,4];
(5) Simultaneously calling a vector form of a hand key point estimation model-granularity 5 level and a vector form of a face key point estimation model-granularity 6 level to be expressed as [0,0,5,6];
and so on according to the rules above.
It can be seen that each serial number has a corresponding invoked body part keypoint estimation model, such as a serial number "2" flat support to invoke a foot keypoint estimation model and a torso keypoint estimation model. After the decision model obtains the information according to the action category sequence number, the video quality determines what granularity level body part model is called, a final model calling result is output, the result is expressed by vectors, and if a foot key point estimation model-granularity level 3 and a trunk key point estimation model-granularity level 3 are called, the final output vector is [3,3,0,0].
Step S300, calling a corresponding body part granularity level model according to the calling model vector.
After the final output vector [3,3,0,0] is obtained, the vector is transmitted into a gesture estimation model pool to represent that the foot key point estimation model-granularity 3 level and the trunk key point estimation model-granularity 3 level are required to be called.
Step 400, outputting the coordinates of key points of the human body posture according to the body part granularity level model.
According to the body part granularity level model, the two-dimensional coordinates of the human body posture key points of each frame in the video can be obtained.
And S500, calculating the movement speed and the joint angle according to the coordinates of the key points of the human body posture and the sequence number of the action category.
According to the action type sequence number, the human body can know what action is being performed, and the concerned part of each action is different, so that whether the action is accurately performed or not is judged, the joint angle and the movement speed of the concerned part are required to be obtained, and the required joint angle and movement speed can be obtained through calculation of gesture key point coordinates.
Step S600, judging whether the movement speed or the joint angle is within a safety threshold range.
By referring to the relevant movement literature and data, a safety threshold range is set for the joint angle and movement speed required to be detected for each movement, and when the joint angle or movement speed is not within the safety threshold range, the movement is considered to have injury risk at the moment, so that warning and reminding are realized.
And step S700, if the movement speed or the joint angle is not in the safety threshold range, a prompt warning is given.
When the calculated movement speed or joint angle is not within the range of the pre-checked safety threshold value, a prompt warning is given.
Further, please refer to fig. 1 and fig. 2, in the method for automatically evaluating the security of the gesture of the strength training action according to the present embodiment, step S100 includes:
step S110, extracting a series of continuous frame images from the video at fixed time intervals.
And step 120, preprocessing the extracted frame image to ensure the consistency and the effectiveness of the input model.
Preprocessing the extracted frame image comprises resizing, normalizing, clipping and the like so as to meet the input requirements of an input model.
Step S130, a 3D convolution layer is used for processing a fixed frame sequence, a 3D convolution kernel is used for sliding in three dimensions of time, width and height to achieve simultaneous extraction of time and space characteristics, four-dimensional characteristics processed by a 3D convolution module are flattened into three-dimensional characteristic representations, a backbone network is built based on a double-flow transform architecture, each layer uses two parallel transform coding modules to extract time sequence and space characteristics respectively, the two types of characteristics are fused in an adding mode to obtain more comprehensive characteristic representations, the four-dimensional characteristics comprise frame number, width, height and characteristic dimensions, and the three-dimensional characteristics comprise time sequence, space and characteristic dimensions.
The transform model architecture, a model proposed in 2017 Google in paper Attentions is All you need, uses Self-Attention structure instead of RNN (Recurrent Neural Network ) network structure commonly used in NLP tasks. Compared with the RNN network structure, the method has the greatest advantage of parallel computation.
Step S140, flattening the fused three-dimensional feature vectors into one-dimensional feature vectors, inputting the one-dimensional feature vectors into a full-connection layer, mapping the three-dimensional feature vectors into an action category space by the full-connection layer, enabling the number of output nodes to be equal to the number of action categories, converting the output into probability distribution through a softmax activation function, selecting the category with the highest probability as a prediction result according to the output probability distribution, and outputting the action category serial number with the highest probability.
Preferably, referring to fig. 1 and 2, in the method for automatically evaluating the security of the gesture of the strength training action according to the present embodiment, step S200 includes:
And step S210, the frame image characteristics and the action type serial numbers output in the last step are all used as inputs to be transmitted to a decision model, and the action type serial numbers are all provided with the body part models which are correspondingly called.
And (5) taking the frame image characteristics and the action class serial numbers output in the last step as inputs to be transmitted to a decision model. Each serial number has a corresponding invoked body part model, such as a serial number '2' flat support to invoke a foot keypoint estimation model and a torso keypoint estimation model.
And S220, determining and calling the body part model with the corresponding granularity level by the decision model according to the frame image characteristics and the action type serial number output in the last step, and outputting a final model calling result, wherein the result is expressed by a vector.
After the decision model obtains the information according to the action category sequence number, the video quality determines what granularity level body part model is called, a final model calling result is output, the result is expressed by vectors, and if a foot key point estimation model-granularity level 3 and a trunk key point estimation model-granularity level 3 are called, the final output vector is [3,3,0,0].
Further, please refer to fig. 1 and 2, in the method for automatically evaluating the safety of the gesture of the strength training action according to the present embodiment, step S100 includes:
Step S100A, data set preparation
The input data of the decision model is video and the action category serial number thereof, the output is a model pool selection result expressed in a vector form, if the hand model-granularity 3 level is called, the output result is [0,0,3,0], if the hand model-granularity 5 level and the face model-granularity 6 level are called, the output result is [0,0,5,6]. The input data and the corresponding output vector are used as a dataset for subsequent training and validation of the decision model, the dataset being prepared as follows.
And determining and calling the corresponding body part model according to the action type serial number. The extracted video characteristic data are respectively transmitted into models corresponding to each granularity level of the body part in a model pool, the total number N of key point estimation and the average error ME are respectively output by each model, and the average error ME is calculated by the following formula:
In the formula (1), ME represents average error, the position of a key point output by the model is (x 1,y1)、(x2,y2)……(xn,yn), and the position of the key point correctly marked by the original picture is (x1 ′,y1 ′)、(x2 ′,y2 ′)……(xn ′,yn ′);N, which represents the total number of the key point estimates. The average error ME units are mm.
The total number N of key points of the gesture estimation reflects the granularity of a gesture estimation model, and in the same body part model, the larger N represents the larger granularity level, namely the finer granularity, but the larger the consumed calculation resource is. The average error ME reflects the accuracy of the model to the pose key point estimate, the lower ME, the more accurate the position estimate.
According to the investigation of the current human body posture estimation model, the allowable range of average error is different for models with different granularity levels. Obtaining the allowable average error range of different granularity levels of each body part model by continuously consulting literature, calculating and verifying, such as
Table 2 shows the results.
TABLE 2
Table 2 is described below:
(1) For each body part model, when a plurality of granularity level models conform to the average error range correspondingly set, selecting the model with the highest granularity level by default;
(2) For each body part model, defining the granularity grade as 0 when the granularity grade model exceeds the set average error range, and not selecting the granularity grade model;
The collection of data sets (i.e., input data and its corresponding output vector) according to the table 2 rules proceeds as follows.
According to the action category sequence number, each video is respectively input into a body part model to be selected by the model pool, and different granularity levels of each body part model respectively output a group of N and ME. By looking up table 2, the corresponding vector form can be output as shown in table 3).
For example:
TABLE 3 Table 3
The final result of the above table 3 is [3,1,4,0], so the output vector corresponding to the video is [3,1,4,0].
Through the steps and operations of data preparation, a plurality of segments of videos, action category serial numbers thereof and corresponding output vectors can be finally obtained and used as a data set of a model. The data set is divided into a training set, a verification set and a test set according to the proportion of 8:1:1.
Step S100B, constructing a decision model
1. Model network structure
The model input is an action class sequence number and a video frame sequence, and the model input is a model selection vector. Firstly, extracting image features in a video sequence, referring to a network structure of an action category model, and using a 3D-CNN structure and a double-flow transducer. After the frame sequence features are obtained through 3D-CNN processing, the action class serial numbers are spliced into feature vectors, the spliced feature vectors are input into double-flow feature vectors for processing, and finally the features are mapped into a1 multiplied by 4 vector through a full connection layer and output for being used as an output result of a model.
2. Model training
The loss function and optimizer are defined before starting to train the model. For the regression task, the Mean Square Error (MSE) may be selected as the loss function and an Adam optimizer may be used to accelerate the training process. The training set data is input into the model for training, and model parameters are optimized through multiple iterations, so that the performance of the model is gradually improved.
3. Model evaluation and tuning
During the training process, the verification set is used for evaluating the performance of the model. And according to feedback of the verification set, performing super-parameter tuning, such as adjusting parameters of learning rate, batch size and the like, so as to ensure that the model can be effectively learned but not fitted. After model training is completed, the performance of the final model is further evaluated on a test set, so that the model can be well performed on unseen data, and the problems of over-fitting and under-fitting are avoided.
4. Model deployment
When model training and evaluation is completed, the weights and structure of the model are saved for subsequent use. The trained model is deployed into a production environment, so that new data can be inferred or predicted, and the body part model in the gesture estimation model pool can be efficiently and accurately called.
Further, please refer to fig. 1 and 2, in the method for automatically evaluating the safety of the gesture of the strength training according to the present embodiment, the joint angle includes an elbow joint angle, and in step S500, the elbow joint angle is calculated by the following formula:
In the formula (2), the angle ABC represents the elbow joint angle, Representing a vector from the elbow to the shoulder,A vector representing elbow to wrist;
In the formulas (3) and (4), the coordinates of the three points of the angle ABC are respectively A (x 1,y1)、B(x2,y2)、C(x3,y3).
Preferably, please refer to fig. 1 and 2, in the method for automatically evaluating the gesture safety of the strength training in the present embodiment, in step S500, the portion of interest of each motion is different, and therefore the portion to be calculated for the movement speed is also different. As shown in fig. 2, taking push-ups as an example, the movement speed is focused on the movement speed of the point A, B, C on the shoulder.
Taking the movement speed of the point A as an example, if the video is 1 second n frames of images, the interval time t between each frame of images isThe distance s from the coordinates (x 1,y1) of the key point A of the previous frame to the coordinates (x 1 ′,y1 ′) of the key point A ′ of the current frame isThe movement speed is calculated by the following formula:
In the formula (5), v represents the motion speed, s represents the distance from the coordinates (x 1,y1) of the key point a of the previous frame to the coordinates (x 1 ′,y1 ′) of the key point a ′ of the current frame, t represents the interval time between each frame of images, and n represents the frame number of the images.
Further, please refer to fig. 1 and fig. 2, in the method for automatically evaluating the safety of the gesture of the strength training action according to the present embodiment, step S600 further includes:
Step S800, if it is recognized that the movement speed and the joint angle are within the safety threshold range, the process is ended.
When the calculated movement speed and the joint angle are recognized to be within the range of the pre-checked safety threshold value, the whole flow is ended.
The embodiment also provides a system for automatically evaluating the safety of the strength training action gesture, which is applied to the method for automatically evaluating the safety of the strength training action gesture, and comprises a first output module, a second output module, a calling module, a third output module, a calculation module, a judgment module and an identification module, wherein the first output module is used for receiving a section of video transmitted by a user end, extracting frame image features and outputting an action category sequence number, the second output module is used for outputting a calling model vector according to the frame image features and the action category sequence number, the calling module is used for calling a corresponding body part granularity level model according to the calling model vector, the third output module is used for outputting a body gesture key point coordinate according to the body part granularity level model, the calculation module is used for calculating a movement speed and a joint angle according to the body gesture key point coordinate and the action category sequence number, the judgment module is used for judging whether the movement speed or the joint angle is within a safety threshold range, and the identification module is used for giving a prompt if the movement speed or the joint angle is identified to be out of the safety threshold range.
The method and the system for automatically evaluating the motion gesture safety of the strength training are provided, a frame image feature is extracted by receiving a video transmitted by a user side, a motion type sequence number is output, a calling model vector is output according to the frame image feature and the motion type sequence number, a corresponding body part granularity level model is called according to the calling model vector, body gesture key point coordinates are output according to the body part granularity level model, motion speed and joint angles are calculated according to the body gesture key point coordinates and the motion type sequence number, whether the motion speed or the joint angles are in a safety threshold range is judged, and a prompt warning is given if the motion speed or the joint angles are not in the safety threshold range. The method and system for automatically evaluating the action gesture safety of strength training provided by the embodiment have the following beneficial effects:
1. automatic assessment of motion gesture accuracy without human observation
The technology for automatically evaluating the safety of the strength training actions can reduce subjectivity and inaccuracy in artificial observation, thereby improving the safety and effect of the sports training. By capturing actions through the camera and combining with the gesture estimation model, the motion information such as the angle of the action joint, the motion speed and the like in motion can be automatically detected, and small but key gesture deviations such as the position of the knee in deep squat can be accurately captured. The method not only reduces human observation errors caused by fatigue or inattention, but also provides instant feedback for body-building people, helps to adjust actions in time and avoids sports injury.
2. Support multitasking, improve model selection efficiency
The decision model can call the recognition tasks of a plurality of body parts, and the efficiency of model selection is remarkably improved. By integrating the gesture estimation models of different body parts and combining factors such as precision, calculation cost, granularity and the like, a flexible task selection mechanism is designed, the decision model can rapidly select and call gesture estimation tasks or single tasks of a plurality of different body parts in a gesture estimation model pool, and a user can realize gesture recognition of various body parts without manually selecting a specific model. The task selection mechanism of the decision model not only improves the model selection efficiency, but also meets diversified application requirements.
3. Autonomous selection of estimated granularity based on video quality
The decision model can autonomously select the appropriate estimated granularity based on the input video quality. For high quality video, the model may employ finer granularity for fine pose estimation to ensure accuracy and detail richness of the results. For lower quality video, the model selects coarser granularity to balance the consumption of computing resources and the credibility of the estimation result. This ability to dynamically adjust granularity makes the model excellent at various video qualities to provide the best estimation results.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the invention. It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.