CN114863329B

CN114863329B - Video processing method, device, equipment and storage medium

Info

Publication number: CN114863329B
Application number: CN202210446486.6A
Authority: CN
Inventors: 廖天驹; 刘晨晓; 成皖梅; 胡越; 万延鑫; 池洁; 何晓灵; 肖飞
Original assignee: Shanghai Haijisi Health Management Consulting Co ltd
Current assignee: Shanghai Haijisi Health Management Consulting Co ltd
Priority date: 2022-04-26
Filing date: 2022-04-26
Publication date: 2025-04-08
Anticipated expiration: 2042-04-26
Also published as: CN114863329A

Abstract

The invention discloses a video processing method, a video processing device, video processing equipment and a storage medium. The method comprises the steps of obtaining a detection frame image containing a target object in a running video and running gesture data associated with the detection frame image, wherein the running gesture data comprise a gesture point regression position, a gesture point heat map and a leg azimuth relation of the target object, correcting the leg azimuth relation in the running gesture data according to the detection frame image and the running gesture data, and determining key moments of the running process of the target object according to the corrected running gesture data. According to the scheme provided by the invention, the accuracy of the finally determined running gesture data can be ensured by correcting the acquired running gesture data, so that the subsequent running gesture state analysis based on the running gesture data is facilitated.

Description

Video processing method, device, equipment and storage medium

Technical Field

Embodiments of the present invention relate to computer technologies, and in particular, to a video processing method, apparatus, device, and storage medium.

Background

Along with the continuous development of video processing technology, related target detection algorithms and gesture recognition algorithms are layered, and particularly, the recognition of the running gesture of a target object in running video is widely applied, but the recognition errors, such as left and right recognition errors, travel direction errors, position recognition errors and the like, are easy to occur when the existing gesture recognition algorithm obtains running gesture data, so that how to more effectively ensure the accuracy of the running gesture data is a problem to be solved at present.

Disclosure of Invention

The invention provides a video processing method, a video processing device, video processing equipment and a storage medium. By correcting the acquired running gesture data, the accuracy of the finally determined running gesture data can be ensured, and the subsequent running gesture state analysis based on the running gesture data is facilitated.

In a first aspect, an embodiment of the present invention provides a video processing method, where the method includes:

Acquiring a detection frame image containing a target object in a running video and running gesture data associated with the detection frame image, wherein the running gesture data comprises a gesture point regression position, a gesture point heat map and a leg azimuth relation of the target object;

correcting the leg azimuth relation in the running gesture data according to the detection frame image and the running gesture data;

And determining the key moment of the running process of the target object according to the corrected running gesture data.

In a second aspect, an embodiment of the present invention further provides a video processing apparatus, including:

The acquisition module is used for acquiring a detection frame image containing a target object in the running video and running gesture data associated with the detection frame image; the running gesture data comprise a gesture point regression position, a gesture point heat map and a leg azimuth relation of a target object;

the correction module is used for correcting the leg azimuth relation in the running gesture data according to the detection frame image and the running gesture data;

and the determining module is used for determining the key moment of the running process of the target object according to the corrected running gesture data.

In a third aspect, an embodiment of the present invention further provides an electronic device, including:

one or more processors;

A memory for storing one or more programs;

The one or more programs, when executed by the one or more processors, cause the one or more processors to implement the video processing method as provided by any embodiment of the present invention.

In a fourth aspect, embodiments of the present invention also provide a computer-readable storage medium having a computer program stored thereon. Wherein the program, when executed by a processor, implements a video processing method as provided by any of the embodiments of the present invention.

After acquiring a detection frame image containing a target object in a running video and running gesture data such as a gesture point regression position, a gesture point heat map, a leg azimuth relation and the like of the target object related to the detection frame image, correcting the leg azimuth relation in the running gesture data according to the detection frame image and the running gesture data, and finally determining the key moment of the running process of the target object according to the corrected running gesture data. By correcting the acquired running gesture data, the accuracy of the finally determined running gesture data can be ensured, and the subsequent running gesture state analysis based on the running gesture data is facilitated.

Drawings

Fig. 1A is a flowchart of a video processing method according to a first embodiment of the present invention;

FIG. 1B is a schematic diagram of a running stage according to a first embodiment of the present invention;

fig. 2 is a flowchart of a video processing method according to a second embodiment of the present invention;

Fig. 3 is a block diagram of a video processing apparatus according to a third embodiment of the present invention;

Fig. 4 is a schematic structural diagram of an electronic device according to a fourth embodiment of the present invention.

Detailed Description

The invention is described in further detail below with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting thereof. It should be further noted that, for convenience of description, only some, but not all of the structures related to the present invention are shown in the drawings.

Example 1

Fig. 1A is a flowchart of a video processing method according to an embodiment of the present invention, and fig. 1B is a schematic diagram of a running stage according to an embodiment of the present invention, where the embodiment is suitable for a situation of acquiring running gesture data of a running video and performing correction processing, the method may be performed by a video processing apparatus, and the apparatus may be implemented in a software and/or hardware manner and may be integrated into an electronic device having a video processing function. As shown in fig. 1A, the video processing method provided in this embodiment specifically includes:

S101, acquiring a detection frame image containing a target object in a running video and running gesture data associated with the detection frame image.

The running video refers to a running video of a runner, and may be, for example, a slow-motion shot running video of a runner. The target object refers to a target running object in the running video. The detection frame image refers to a detection frame image of an area where a target object is located, which is determined from each frame in the running video. The running gesture data is data for representing the running gesture of the target object, and specifically comprises a regression position of a gesture point of the target object, a heat map of the gesture point and a leg azimuth relation. The regression position of the posture point of the target object refers to the regression position of each joint posture point of the target object in the detection frame image. The posture points at least can comprise joint posture points such as left and right hips, left and right knees, left and right ankles, left and right heels, left and right toes, and the like. The pose point heat map refers to a heat map that characterizes the probability that each joint pose point is at its corresponding regression location, i.e., each pose point has a corresponding heat map. Leg azimuth refers to leg azimuth information characterizing the front and back and up and down of the left and right legs of the target object, for example, the leg azimuth may include information about which leg of the target object is in front, which leg is behind, which leg is above, and which leg is below in each frame in the running video.

Optionally, a common target detection algorithm, for example, a Yolox target detection algorithm, may be used to perform target detection on the running video, obtain all the target frame images including the target object in the running video detected by the target detection algorithm, further integrate the obtained target frame images according to the sequence of time sequences, and determine the detection frame image including the target object in the running video, that is, obtain the detection frame image including the target object in the running video.

It should be noted that, through the target detection algorithm, each frame of image including the target object in the running video can detect a target frame, at the next moment, the position of the target frame in the video image will change, and the same target frame will change continuously in different video image frames, so as to form a group of time-sequence motion track frames, namely, the detection frame image including the target object.

Optionally, the acquired target frame images are integrated according to a time sequence order, and one implementation way of determining the detection frame image containing the target object is to screen the acquired group of target frame images to ensure that the target runner is completely in the image, specifically, edge judgment can be performed on each target frame image, the distance between the position and the edge of the image is judged according to the position of each target frame in the image determined by a target detection algorithm, if the distance is smaller than a certain threshold value, the target runner is considered to be not completely in the image, the frame target frame image can be discarded, the screened group of target frame images are integrated according to the time sequence order, and the detection frame image containing the target object can be determined.

Optionally, after the detection frame images of the target object included in the running video are obtained, a common gesture recognition algorithm, such as pose (gesture) algorithm provided in MEDIAPIPE open source projects, may be used for performing human gesture estimation with respect to each detection frame image, so as to obtain a gesture point regression position, a gesture point heat map and a leg azimuth relationship of the target object, that is, obtain running gesture data associated with the detection frame images.

S102, correcting the leg azimuth relation in the running gesture data according to the detection frame image and the running gesture data.

Optionally, after the detection frame image and the running gesture data are determined, the detection frame image and the running gesture data can be input into a pre-trained correction model, the running gesture data after correction of each frame of detection frame image is output, the corrected running gesture data are further adopted to replace the original running gesture data, namely, the leg azimuth relation in the running gesture data is corrected, or whether the leg azimuth relation in the running gesture data associated with each detection frame image is wrong or not can be judged according to a preset rule, if yes, the regression positions of gesture points contained by two legs and the gesture point heat map are exchanged, namely, the leg azimuth relation in the running gesture data is corrected.

S103, determining the key moment of the running process of the target object according to the corrected running gesture data.

The key moment refers to the key ground contact, buffering, stepping off or vacation moment in the running process of the target object. The target subject running process may include at least one running phase.

Optionally, the key time may be when the running gesture of the target object in the running video is touchdown, buffering, kicking off or vacating, the corresponding video frame is a frame sequence of all video frames relative to the whole running video, for example, 1000 frames of the running video are taken as the touchdown key time is a 25 th frame of video frame, or may be when the running gesture of the target object in the running video is touchdown, buffering, kicking off or vacating, the corresponding video frame is a time in the whole running video, for example, the running video is 5 minutes long, and the touchdown key time is a3 rd 10 second time.

Referring to fig. 1B, the touch key time may be a time of a corresponding detection frame image when a y-axis value (a position coordinate value on a y-axis) of the landing leg is maximum, and the y-axis value of the landing leg may refer to a toe posture point, a heel posture point, or a y-axis value of a midpoint of a connection line of the heel and the toe of the landing leg. The flight key time is the time of the detection frame image corresponding to the time when the y-axis value of the midpoint of the connecting line of the left hip joint and the right hip joint (or the two-point interval of the hip) is minimum between every two touchdown time. The key buffering moment is the moment of detecting the frame image when the middle points of the heel and the toe of the landing foot of the target object are identical to the middle points of the left hip joint and the right hip joint and the x-axis values of the two middle points. Off-foot refers to a foot that has a tendency to move upward on the toe of the foot that is already on the ground. The key moment of pedaling off is the moment when the whole ground leaves just.

Optionally, the key time of the kick-off can be determined by a plurality of ways, for example, when the y-axis speed of the toe gesture point of the off-foot (the off-foot of the foot at the last moment is the off-foot of the next moment) is not the maximum, the corresponding moment of the detection frame image is taken as the key time of the kick-off, and the key time of the kick-off can be determined according to the speed of the toe gesture point, namely, 1. The toe speed of each moment (between the buffer key time and the next vacation moment, the sampling may or may not be performed at this time) is calculated, 2. Each frame of detection frame image is traversed, whether the speed of the toe corresponding to the frame image exceeds a preset threshold value is judged, if yes, the moment of the frame of detection frame image is the key time of the kick-off.

It should be noted that, referring to fig. 1B, the two legs of the target object are separated when viewed from the side. The two legs of the target object at the ground contact key moment and the kick-off key moment are in a non-overlapping state that one leg is behind the other leg. The two legs of the target object at the key moment of buffering have a certain probability of being coincident or not.

Optionally, a detection frame image corresponding to the maximum y-axis value of the y-axis positions of all the attitude points included in the landing leg can be determined, the time corresponding to the frame detection frame image is taken as the touchdown time, the touchdown time can be reversely pushed according to the principle that the touchdown time is between the buffering time and the vacation time, specifically, the pixel speed can be calculated through discrete differentiation according to the pixel positions of the ankle, the heel and the toe of the landing leg, and the time point when the speed of the support time point (at the moment, the speeds of the 3 attitude points all tend to 0) is reversely deduced to be not 0 is taken as the touchdown time. Since we find the time corresponding to the moment when the landing leg just touches the ground, we need to determine the time corresponding to the last attitude point with a speed other than 0 when pushing back from the buffering time.

Alternatively, if the running video is slow motion video (120 frames-240 frames/second), the frame-to-frame variation range may not be large at this time, so that the speed interval may be enlarged by adopting an interval sampling manner, and at this time, although the speed interval is larger, the speed will not be changed greatly when approaching 0, so that a more accurate speed variation situation can be obtained.

Optionally, after determining the touchdown time, the accuracy of the touchdown time determined in S101-S103 may be verified by using a deep learning method, specifically, the step of verification may be 1. Determining the touchdown time through S101-S103. 2. And determining the ground touching time and the buffering time and the y-axis value of the attitude point corresponding to the ground falling leg by adopting an automatic image digging mode of deep learning. 3. And determining a difference value of the ground contact time and the buffer time ground falling margin y axis value, if the difference value is smaller than a preset threshold value, determining the ground contact time to be reasonable, otherwise, determining the ground contact time to be unreasonable, and searching in a sampling mode.

It should be noted that, the key time of the running process of the target object determined by the implementation can facilitate the subsequent calculation of a series of angles formed between important posture points. And the running state of the target object can be conveniently researched by combining with kinematic analysis.

For example, referring to fig. 1B, a running phase of a target object running process may include a touchdown moment, a buffer moment, a kick-off moment, and a vacation moment.

Optionally, according to the corrected running gesture data, the azimuth relation of the gesture point in each frame of detection frame image can be determined directly based on the regression position of the gesture point and the heat map of the gesture point in the corrected running gesture data, the azimuth relation of the gesture point in each frame of detection frame image is compared with the azimuth relation of the gesture point of the key moment defined in the embodiment, such that the detection frame image corresponding to each key moment in the detection frame image is determined, and finally the key moment of the running process of the target object is determined.

It should be noted that, in this embodiment, the whole running process corresponding to the running video is not considered to be divided into multiple running stages, so for each key time, at least one detection frame image may be determined, and further, each key time is determined to be a time corresponding to at least one detection frame image, that is, it may be determined that there are multiple key times of the running process of the target object.

Optionally, correcting the leg azimuth relation in the running gesture data according to the detection frame image and the running gesture data comprises determining the movement direction of a target object according to the detection frame image and/or the running gesture data, grouping the detection frame image according to the running gesture data, and correcting the leg azimuth relation in the running gesture data according to the movement direction, a grouping result and a left-right alternation principle.

The motion direction of the target object refers to the running direction of the target object in the running video, and specifically may include running from the left to the right of the video image and running from the right to the left of the video image. The left-right alternating principle refers to the principle that the left and right legs alternately run forwards in the running process of a target object.

Optionally, according to the detection frame image and/or the running gesture data, one implementation manner of determining the movement direction of the target object is to determine the direction of the nose gesture point of the target object according to the running gesture data, further determine the movement direction of the target object according to the determined direction of the nose gesture point, specifically, determine the movement direction of the target object according to the azimuth relation between the nose position and the trunk position (i.e. the connection line between the hip joint midpoint and the shoulder midpoint), i.e. according to whether the nose position is located on the left side or the right side of the trunk connection line, if the nose is located on the left side of the trunk connection line, then the movement direction of the target object is indicated to run from the right side to the left side of the video image, otherwise, the movement direction of the target object is indicated to run from the left side to the right side of the video image, or determine the azimuth relation between the nose position and the trunk position according to a cross-multiplication calculation mode, for example, two vectors formed by determining the nose position to the hip joint midpoint and the shoulder midpoint respectively can also be determined according to a preset calculation rule.

Optionally, according to the detection frame image and/or the running gesture data, another implementation manner of determining the movement direction of the target object is to determine, according to the detection frame image, a position interval of a gesture point related to the first frame detection frame image (such as a position interval of a gesture point included in a left leg) and a position interval of a gesture point related to a subsequent frame detection frame image (such as a second frame) in the time sequence in the detection frame image, further compare the change condition of the position intervals of the same related gesture point in the two frame detection frame images in the time sequence, determine the movement direction of the target object, for example, if the position interval of the gesture point related to the first frame detection frame image is greater than the position interval value of the second frame, determine that the movement direction of the target object runs from the right side to the left side of the video image.

Optionally, according to the detection frame image and/or the running gesture data, another implementation manner of determining the movement direction of the target object is that, according to the detection frame image and the running gesture data, based on the two modes of determining the movement direction of the target object according to the detection frame image or the running gesture data respectively, corresponding movement directions are determined respectively, if the movement directions determined by the two modes are the same, the movement direction of the final target object is determined to be the corresponding determined movement direction, if the movement directions determined by the two modes are different, the movement directions can be determined again by the two modes respectively until the same movement direction is determined, and the movement direction is taken as the movement direction of the final target object.

Optionally, grouping the detection frame images according to the running gesture data comprises determining the corresponding double-leg overlapping state of the detection frame images according to the running gesture data, and taking the detection frame images with the same double-leg overlapping state as a group of detection frame sequences.

The double-leg overlapping state refers to a state in which the left and right legs of the target object overlap. The two leg overlapping condition includes overlapping and non-overlapping.

Specifically, for each frame of detection frame image, the position interval of the gesture point contained in each leg can be determined according to the gesture point regression position of the target object in the associated running gesture data, and further, the double-leg overlapping state corresponding to the detection frame image is determined according to the relation between the position intervals of the two legs. Specifically, for the left leg and the right leg, the positions of the knee, the toe, the heel and the ankle joint gesture points included in the leg can be combined to respectively determine the position intervals of the left leg and the right leg, if the position intervals of the left leg and the right leg are not overlapped, the two leg overlapping states of the detection frame images are not overlapped, otherwise, the two leg overlapping states of the detection frame images can be considered to be overlapped, after the two leg overlapping states of each frame of detection frame image are determined, the detection frame images with the two leg overlapping states being not overlapped can be used as a group of detection frame sequences, the detection frame images with the two leg overlapping states being overlapped are used as a group of detection frame sequences, and the detection frame images with the same two leg overlapping states are used as a group of detection frame sequences.

Optionally, correcting the leg azimuth relation in the running gesture data according to the movement direction, the grouping result and the left-right alternation principle comprises performing first correction processing on the leg azimuth relation in the running gesture data associated with the first group of detection frame sequences according to the movement direction and the left-right alternation principle, and performing second correction processing on the leg azimuth relation in the running gesture data associated with the second group of detection frame sequences according to the movement direction, the left-right alternation principle and the corrected running gesture data associated with the first group of detection frame sequences.

The two-leg overlapping state corresponding to the first group of detection frame sequences is non-overlapping, and the two-leg overlapping state corresponding to the second group of detection frame sequences is overlapping. The first correction process is a correction process performed on the left-right relationship of the two legs in the leg direction relationship. The second correction process may be the same as the first correction process, or may be a correction process for correcting the upper and lower relationship of the two legs in the leg direction relationship, in addition to the left-right relationship correction process. The left-right alternate principle refers to a running principle that a target subject follows during running. Specifically, the running process of the target object may be divided into a plurality of running stages, and each running stage may include a state, a B state, an a state and a B state in time sequence, where the a state is a state in which the left foot is overlapped at the top, the B state is a state in which the left foot is not overlapped at the front (in front of the finger), the a state is a state in which the right foot is overlapped at the top, and the B state is a state in which the right foot is not overlapped at the front.

Specifically, according to the principle of motion direction and left-right alternation, one implementation manner of carrying out first correction processing on leg azimuth relations in running gesture data associated with a first group of detection frame sequences is that for a first group of detection frame images and a second group of detection frame images, analysis is firstly carried out on running gesture data associated with the detection frame images according to the principle of motion direction and left-right alternation, at least one running stage of time sequence is determined, each running stage comprises at least two first groups of detection frame images and at least two second groups of detection frame images, and for each running stage, one second group of detection frame images in an a state, one first group of detection frame images in a state, one second group of detection frame images in a state, one first group of detection frame images in a state. That is, the condition that the left and right legs of two adjacent first group detection frame images are in front should be interchanged, if the condition that the left and right legs of two adjacent first group detection frame images are in front is detected not to accord with the rule of each running stage, the leg azimuth relation of the second first group detection frame image is considered to be wrongly determined (the identification of the left and right legs is opposite), and the position of the regression position of the attitude point and the attitude point heat map of the left and right legs in the second first group detection frame image are further exchanged, namely, the leg azimuth relation in running attitude data associated with the first group detection frame sequence is subjected to first correction processing.

The correction of the leg azimuth may include correction of the left and right azimuth of the knees, toes, heels, and ankles of the left and right legs, and after determining the exact left and right azimuth of the knees, toes, heels, and ankles of the left and right legs, the right and left azimuth of the hip joint azimuth may be further determined according to the azimuth angle between the posture points of the human body.

Optionally, after performing the first correction processing on the leg azimuth relation in the running gesture data associated with the first group of detection frame sequences, the second correction processing may be further performed on the leg azimuth relation in the running gesture data associated with the second group of detection frame sequences according to the motion direction, the left-right alternation principle and the corrected running gesture data associated with the first group of detection frame sequences. Specifically, only the upper and lower relationships (numerical value magnitude relationships in the y-axis direction) of the position intervals of the ankle, heel and toe joint attitude points in running attitude data associated with the second group of detection frame sequences can be analyzed, whether the corrected first group of detection frame sequences and the second group of detection frame sequences meet the running stage time sequence determined according to the principle of motion direction and left-right alternation, for example, the sequence of an a state, a B state, an A state and a B state is judged, if the running stage time sequence meets the running stage time sequence, the second correction processing is not carried out, and if the running stage time sequence meets the running stage time sequence, the wrong attitude point regression positions and attitude point heat maps of the left leg and the right leg in the second group of detection frame images are exchanged, namely, the leg azimuth relationships in running attitude data associated with the second group of detection frame sequences are subjected to the second correction processing.

Optionally, the left-right leg azimuth relationship of the running gesture data associated with the second group of detection frame sequences may be corrected, specifically, the left-right leg azimuth relationship of each second group of detection frames is determined according to the left-right leg azimuth relationship of two adjacent first groups of detection frames.

For example, if the first, second and third images of the time sequence are found to be the first group of detection frames with the front left leg, the second group of detection frames with the overlapped left and right legs and the first group of detection frames with the front right leg, the state of the target object is indicated to be the running state with the left leg falling to the ground and the right leg extending forward.

Example two

Fig. 2 is a flowchart of a video processing method according to a second embodiment of the present invention, where, based on the foregoing embodiment, a detailed explanation is further provided on how to correct a position of a regression of a posture point of a target object in running posture data before determining a key time of the running process of the target object according to corrected running posture data, and as shown in fig. 2, the video processing method provided in this embodiment specifically includes:

S201, acquiring a detection frame image containing a target object in the running video and running gesture data associated with the detection frame image.

S202, correcting the leg azimuth relation in the running gesture data according to the detection frame image and the running gesture data.

S203, correcting the regression position of the attitude point of the target object in the running gesture data according to the regression position of the attitude point of the target object in the running gesture data and the thermal map of the attitude point.

Optionally, the posture points of the left hip, the left knee, the left ankle, the left heel, the right toe joint may be used as important posture points, the regression positions of the rest posture points may be corrected based on the regression positions of the posture points in the running posture data associated with the acquired detection frame image, or the regression positions of all the posture points corresponding to the acquired running posture data may be corrected, which is not limited in this embodiment.

Optionally, the regression positions of the important posture points in each frame of the detection frame image may be iteratively updated, that is, the regression positions of the posture points of the target object in the running posture data may be corrected, by directly taking the regression positions of the important posture points in the running posture data as references, and the regression positions of the important posture points in the second frame of the detection frame image may be determined based on the regression positions of the corresponding posture points in the first frame image and the heat map of the posture points in the second frame image, for example, according to the relationship between the regression positions of the corresponding posture points finally determined in the first frame image and the candidate positions with different probabilities determined by the heat map of the posture points in the second frame image, the candidate regression position closest to the final regression position determined by the first frame image in the candidate positions may be used as the final regression position of the corresponding posture points of the second frame image.

It should be noted that, the regression position finally determined by each gesture point in each frame of image may be the regression position (the regression position with the largest thermodynamic diagram probability) given by the running gesture data, or may be the regression position selected from at least one candidate regression position of the thermodynamic diagram according to the above iterative updating manner.

It should be noted that, for each important posture point of each detection frame image of each frame, a final regression position is determined according to the above procedure, that is, correction of the regression position of the posture point of the target object in the running posture data is completed.

It should be noted that, the probability that the left knee and the right knee are at the regression position determined by the running gesture data is very high, so that the regression position determined by the gesture recognition algorithm only may be inaccurate.

S204, determining the key moment of the running process of the target object according to the corrected running gesture data.

Alternatively, the running process may comprise at least one step, each step being a running phase comprising four key moments of touchdown, cushioning, kick-off and vacation.

Optionally, determining the key time of the running process of the target object according to the corrected running gesture data comprises determining the key data from the running gesture data according to the regression position of the gesture point and the azimuth relation of the leg in the corrected running gesture data, and determining the key time of the running process of the target object according to the corresponding time stamp of the detection frame image associated with the key data in the running video.

The key data refer to data of four key moments of each step of a target object in the running process. The time stamps refer to the respective moments in the video time period.

Optionally, the corrected regression position of the gesture point and the leg azimuth relationship can be input into a pre-trained determination model, key data determined from the running gesture data is output, the running gesture data can be sorted according to a preset rule, and the key data in the running gesture data can be determined. The method comprises the steps of grouping corrected running gesture data according to a left-right alternation principle to obtain running gesture data corresponding to each step in the running process of a target object, and determining key moments of each step in the running process of the target object according to the running gesture data corresponding to each step.

Optionally, the corrected running gesture data may be divided into two groups of detection frame sequences according to the double-leg overlapping state of each frame of detection frame, and the running process is further divided into a plurality of running stages based on a left-right alternation principle, that is, divided into a plurality of steps, and the running gesture data of each running stage is obtained, so as to obtain the running gesture data corresponding to each step in the running process of the target object.

Optionally, after the running gesture data corresponding to each step in the running process of the target object is obtained, the azimuth relation of the gesture points in each frame of detection frame image contained in each step can be further determined based on the running gesture data corresponding to each step, the azimuth relation of the gesture points in each frame of detection frame image is compared with the azimuth relation of the gesture points of preset ground contact, buffering, stepping off and vacation key moments, and the detection frame image corresponding to each key moment in each step is determined, so that four key moments of each step are determined.

According to the embodiment of the invention, after the leg azimuth relation in the running gesture data is corrected according to the detection frame image and the running gesture data, the regression position of the gesture point of the target object in the running gesture data is further corrected, and finally, the key moment of the running process of the target object is determined according to the corrected running gesture data. By correcting the acquired running gesture data, the accuracy of the finally determined running gesture data can be ensured, and the subsequent running gesture state analysis based on the running gesture data is facilitated.

Example III

Fig. 3 is a block diagram of a video processing apparatus according to a third embodiment of the present invention, where the video processing apparatus according to the third embodiment of the present invention may execute the video processing method according to any of the embodiments of the present invention, and has functional modules and beneficial effects corresponding to the execution method.

The video processing apparatus may include an acquisition module 301, a correction module 302, and a determination module 303.

The acquiring module 301 is configured to acquire a detection frame image including a target object in a running video, and running gesture data associated with the detection frame image; the running gesture data comprise a gesture point regression position, a gesture point heat map and a leg azimuth relation of a target object;

the correction module 302 is configured to correct a leg azimuth relationship in the running gesture data according to the detection frame image and the running gesture data;

and the determining module 303 is configured to determine a key moment of the running process of the target object according to the corrected running gesture data.

Further, the correction module 302 may include:

The determining unit is used for determining the movement direction of the target object according to the detection frame image and/or the running gesture data;

the grouping unit is used for grouping the detection frame images according to the running gesture data;

And the correction unit is used for correcting the leg azimuth relation in the running gesture data according to the movement direction, the grouping result and the left-right alternation principle.

Further, the grouping unit is specifically configured to:

Determining a double-leg overlapping state corresponding to the detection frame image according to the running gesture data, wherein the double-leg overlapping state comprises overlapping and non-overlapping;

And taking the detection frame images with the same double-leg overlapping state as a group of detection frame sequences.

Further, the correction unit is specifically configured to:

According to the principle of the movement direction and left-right alternation, carrying out first correction processing on the leg azimuth relation in running gesture data associated with a first group of detection frame sequences;

Performing second correction processing on leg azimuth relations in running gesture data associated with a second group of detection frame sequences according to the motion direction, the left-right alternation principle and the corrected running gesture data associated with the first group of detection frame sequences;

The two-leg overlapping states corresponding to the first group of detection frame sequences are non-overlapping, and the two-leg overlapping states corresponding to the second group of detection frame sequences are overlapping.

Further, the device is also used for:

And before the key moment of the running process of the target object is determined according to the corrected running gesture data, correcting the regression position of the gesture point of the target object in the running gesture data according to the regression position of the gesture point of the target object in the running gesture data and the gesture point heat map.

Further, the determining module 303 may include:

the data determining unit is used for determining key data from the running gesture data according to the relation between the regression position of the gesture point in the corrected running gesture data and the azimuth of the leg;

And the moment determining unit is used for determining the key moment of the running process of the target object according to the corresponding time stamp of the detection frame image associated with the key data in the running video.

Further, the data determining unit is specifically configured to:

grouping the corrected running gesture data according to a left-right alternation principle to obtain running gesture data corresponding to each step in the running process of a target object;

and determining the key moment of each step in the running process of the target object according to the running gesture data corresponding to each step.

Example IV

Fig. 4 is a schematic structural diagram of an electronic device according to a fourth embodiment of the present invention. Fig. 4 shows a block diagram of an exemplary device suitable for use in implementing the embodiments of the invention. The device shown in fig. 4 is only an example and should not be construed as limiting the functionality and scope of use of the embodiments of the invention.

As shown in fig. 4, the electronic device 12 is in the form of a general purpose computing device. The components of the electronic device 12 may include, but are not limited to, one or more processors or processing units 16, a system memory 28, and a bus 18 that connects the various system components, including the system memory 28 and the processing units 16.

Bus 18 represents one or more of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, a processor, and a local bus using any of a variety of bus architectures. By way of example, and not limitation, such architectures include Industry Standard Architecture (ISA) bus, micro Channel Architecture (MCA) bus, enhanced ISA bus, video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.

Electronic device 12 typically includes a variety of computer system readable media. Such media can be any available media that is accessible by electronic device 12 and includes both volatile and nonvolatile media, removable and non-removable media.

The system memory 28 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM) 30 and/or cache memory (cache 32). The electronic device 12 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 34 may be used to read from or write to non-removable, nonvolatile magnetic media (not shown in FIG. 4, commonly referred to as a "hard disk drive"). Although not shown in fig. 4, the present embodiment may provide a magnetic disk drive for reading from and writing to a removable nonvolatile magnetic disk (e.g., a "floppy disk"), and an optical disk drive for reading from and writing to a removable nonvolatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media). In such cases, each drive may be coupled to bus 18 through one or more data medium interfaces. The system memory 28 may include at least one program product having a set (e.g., at least one) of program modules configured to carry out the functions of embodiments of the invention.

A program/utility 40 having a set (at least one) of program modules 42 may be stored in, for example, system memory 28, such program modules 42 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment. Program modules 42 generally perform the functions and/or methods of the embodiments described herein.

The electronic device 12 may also communicate with one or more external devices 14 (e.g., keyboard, pointing device, display 24, etc.), one or more devices that enable a user to interact with the electronic device 12, and/or any devices (e.g., network card, modem, etc.) that enable the electronic device 12 to communicate with one or more other computing devices. Such communication may occur through an input/output (I/O) interface 22. Also, the electronic device 12 may communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN) and/or a public network, such as the Internet, through a network adapter 20. As shown, the network adapter 20 communicates with other modules of the electronic device 12 over the bus 18. It should be appreciated that although not shown in FIG. 4, other hardware and/or software modules may be used in connection with electronic device 12, including, but not limited to, microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, data backup storage systems, and the like.

The processing unit 16 executes various functional applications and data processing by running programs stored in the system memory 28, for example, implementing the video processing method provided by the embodiment of the present invention.

Example five

The fifth embodiment of the present invention also provides a computer-readable storage medium having stored thereon a computer program (or referred to as computer-executable instructions) for performing the video processing method provided by the embodiment of the present invention when the program is executed by a processor.

The computer storage media of embodiments of the invention may take the form of any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium include an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The computer readable signal medium may include a digital signal in baseband or transmitted by a carrier wave signal, with computer readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations for embodiments of the present invention may be written in one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).

Note that the above is only a preferred embodiment of the present invention and the technical principle applied. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, while the embodiments of the present invention have been described in connection with the above embodiments, the embodiments of the present invention are not limited to the above embodiments, but may include many other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims

1. A video processing method, characterized in that the method comprises:

Acquire a detection frame image containing a target object in a running video, and running posture data associated with the detection frame image; wherein the running posture data includes: a posture point regression position of the target object, a posture point heat map, and a leg orientation relationship;

Correcting the leg orientation relationship in the running posture data according to the detection frame image and the running posture data;

Determining the key moments of the target object's running process according to the corrected running posture data;

Wherein, the step of correcting the leg orientation relationship in the running posture data according to the detection frame image and the running posture data includes:

Determine a moving direction of the target object according to the detection frame image and/or the running posture data;

grouping the detection frame images according to the running posture data;

According to the movement direction, the grouping result and the left-right alternating principle, the leg orientation relationship in the running posture data is corrected, including: according to the movement direction and the left-right alternating principle, a first correction processing is performed on the leg orientation relationship in the running posture data associated with the first group of detection frame sequences; according to the movement direction, the left-right alternating principle and the corrected running posture data associated with the first group of detection frame sequences, a second correction processing is performed on the leg orientation relationship in the running posture data associated with the second group of detection frame sequences; wherein the overlapping state of both legs corresponding to the first group of detection frame sequences is non-overlapping; and the overlapping state of both legs corresponding to the second group of detection frame sequences is overlapping.

2. The method according to claim 1, characterized in that the detection frame images are grouped according to the running posture data, comprising:

Determine, according to the running posture data, a double-leg overlap state corresponding to the detection frame image; wherein the double-leg overlap state includes overlap and non-overlap;

The detection frame images with the same overlapping state of the two legs are taken as a set of detection frame sequences.

3. The method according to claim 1, characterized in that before determining the key moment of the target object's running process based on the corrected running posture data, it also includes:

According to the regression position of the posture points of the target object in the running posture data and the posture point heat map, the regression position of the posture points of the target object in the running posture data is corrected.

4. The method according to claim 1 or 3, characterized in that determining the key moments of the target object's running process according to the corrected running posture data comprises:

Determining key data from the running posture data according to the relationship between the regression positions of the posture points in the corrected running posture data and the leg orientations;

The key moment of the target object's running process is determined according to the timestamp corresponding to the detection frame image associated with the key data in the running video.

5. The method according to claim 4, characterized in that determining key data from the running posture data according to the relationship between the posture point regression position and the leg orientation in the corrected running posture data comprises:

According to the left-right alternation principle, the corrected running posture data are grouped to obtain the running posture data corresponding to each step of the target object during the running process;

According to the running posture data corresponding to each step, the key moment of each step in the running process of the target object is determined.

6. A video processing device, characterized in that the device comprises:

An acquisition module is used to acquire a detection frame image containing a target object in a running video, and running posture data associated with the detection frame image; wherein the running posture data includes: a posture point regression position of the target object, a posture point heat map, and a leg orientation relationship;

A correction module, used for correcting the leg orientation relationship in the running posture data according to the detection frame image and the running posture data;

A determination module, used to determine the key moments of the target object's running process according to the corrected running posture data;

The correction module includes:

a determination unit, configured to determine a moving direction of a target object according to the detection frame image and/or the running posture data;

A grouping unit, used for grouping the detection frame images according to the running posture data;

A correction unit, used for correcting the leg orientation relationship in the running posture data according to the movement direction, the grouping result and the left-right alternation principle;

Among them, the correction unit is specifically used to: perform a first correction processing on the leg orientation relationship in the running posture data associated with the first group of detection frame sequences according to the movement direction and the left-right alternation principle; perform a second correction processing on the leg orientation relationship in the running posture data associated with the second group of detection frame sequences according to the movement direction, the left-right alternation principle and the corrected running posture data associated with the first group of detection frame sequences; wherein the overlapping state of both legs corresponding to the first group of detection frame sequences is non-overlapping; and the overlapping state of both legs corresponding to the second group of detection frame sequences is overlapping.

7. An electronic device, comprising:

one or more processors;

A memory for storing one or more programs;

When the one or more programs are executed by the one or more processors, the one or more processors implement the video processing method according to any one of claims 1 to 5.

8. A computer-readable storage medium having a computer program stored thereon, wherein when the program is executed by a processor, the video processing method according to any one of claims 1 to 5 is implemented.