CN112651996B

CN112651996B - Target detection tracking method, device, electronic equipment and storage medium

Info

Publication number: CN112651996B
Application number: CN202011534538.2A
Authority: CN
Inventors: 程晓明; 陈林; 钱林波; 杨涛; 周娇; 侯佳; 李旭
Original assignee: Nanjing Institute Of City & Transport Planning Co ltd
Current assignee: Nanjing Institute Of City & Transport Planning Co ltd
Priority date: 2020-12-22
Filing date: 2020-12-22
Publication date: 2024-06-14
Anticipated expiration: 2040-12-22
Also published as: CN112651996A

Abstract

The application provides a target detection tracking method, a target detection tracking device, electronic equipment and a storage medium, and relates to the technical field of video image processing, wherein the target detection tracking method comprises the following steps: acquiring a target video, detecting first-class image frames in the target video to determine targets contained in the first-class image frames, updating a target template set according to target information of the targets contained in the first-class image frames, extracting features of images in the target template set to obtain an image feature set, and training a current target tracking model according to second-class image frames and the image feature set in the target video to obtain an updated target tracking model. The method and the device can accurately detect and track a plurality of targets in the target video in real time by respectively processing the frame numbers in different positions in the target video and updating the target template set and the target tracking model in real time in the detection and tracking process.

Description

Target detection tracking method, device, electronic equipment and storage medium

Technical Field

The present application relates to the field of video image processing technologies, and in particular, to a target detection tracking method, apparatus, electronic device, and storage medium.

Background

With the continuous development of economy and society, in the processing technology of video images, when detecting and tracking various targets in various videos, for example, when detecting and tracking vehicle targets in traffic information videos or pedestrian targets in monitoring videos in various occasions, the technology is continuously developed to the direction of intellectualization and informatization due to the rapid development of new artificial intelligence technologies such as neural networks, machine learning and the like in recent years, and the technology based on the artificial intelligence is becoming a hot point of current research due to the characteristics of traditional artificial intelligence technologies such as overreliance on people and hardware equipment.

Compared with the traditional video image processing technology, the artificial intelligence-based target detection and tracking technology can simulate the visual function of a person to detect and track a target, and has the advantages of low cost, easiness in installation, good maintainability and the like. Currently, there are mainly detection tracking methods based on image feature points, detection tracking methods based on optical flows, detection tracking methods based on neural networks and the like, and these methods have certain limitations and cannot achieve both accuracy and instantaneity.

Disclosure of Invention

In view of the above, an objective of the embodiments of the present application is to provide a target detecting and tracking method, apparatus, electronic device and storage medium, so as to solve the problem in the prior art that the real-time performance and accuracy are not compatible.

In order to solve the above problems, in a first aspect, an embodiment of the present application provides a target detection tracking method, including:

acquiring a target video;

Detecting first-type image frames in the target video to determine targets contained in the first-type image frames;

Updating the target template set according to target information of targets contained in each first type image frame;

extracting features of the images in the target template set to obtain an image feature set;

and training the current target tracking model according to the second-class image frames in the target video and the image feature set to obtain an updated target tracking model.

In the implementation process, after the target video is acquired, the content in the target video is read, and each frame of image in the target video is detected or tracked. Detecting first-class image frames in the target video, updating the target template set according to target information of a plurality of targets contained in each first-class image frame obtained after detection, extracting image features of the updated target template set, extracting to obtain the image feature set, and training a current target tracking model according to the image feature set and second-class image frames to obtain the updated target tracking model. The first type image frames and the second type image frames are respectively positioned at different frame number positions in the target video, a plurality of targets in the target video can be detected and tracked by respectively processing the frame numbers at different positions in the target video, and the target template set and the target tracking model are updated in real time in the detection and tracking process, so that the accuracy and the instantaneity of detection and tracking are improved.

Optionally, the detecting the first type of image frames in the target video to determine the target contained in each first type of image frame includes:

Identifying each first type of image frame to obtain a first target set corresponding to each first type of image frame, wherein the first target set corresponding to any first type of image frame comprises one or more targets in the first type of image frame, the first type of image frame is an nth frame in the target video, the m remainder result of n is a 0 frame, m is a preset integer value, and the second type of image frame is an image frame except the first type of image frame in the target video;

Comparing a first target in the first target set with each target template in the target template set for each first target set to determine a first target template, wherein the first target is any target in the first target set, and the target template set comprises one or more target templates;

And calculating the similarity between the first target and the first target template.

In the implementation process, the first type image frame is an nth frame in the target video, n is used for performing remainder processing on m, m is an integer value preset by a user, the user selects the m according to own needs and actual conditions, the remainder result of n to m of the first type image frame is 0, and the second type image frame is other image frames except the first type image frame in the target video. When detecting each first type of image frame to determine a plurality of targets contained in each first type of image frame, identifying each first type of image frame, obtaining a first target set corresponding to each first type of image frame after identification, wherein the first target set corresponding to any first type of image frame contains one or more targets in the first type of image frame, randomly selecting one target first target in each first target set, wherein the target template set contains one or more target templates, comparing the first target with each target template in the target template set in turn, determining the first target template with the highest image intersection ratio with the first target, and calculating the similarity of the first target template and the first target template according to the information contained in the first target set and the first target template. And comparing the targets in the target set of the first type image frame with the target templates in the target template set to determine target information of a plurality of targets contained in the first type image frame.

Optionally, the calculating the similarity between the first target and the first target template includes:

Calculating a first similarity between the first target and the first target template according to the motion vectors of the first target and the first target template;

calculating a second similarity between the first target and the first target template according to the histograms of the first target and the first target template;

And calculating the similarity of the first target and the first target template according to the first similarity and the second similarity.

In the implementation process, calculating a first similarity between the first target and the first target template according to the motion vector between the first target and the first target template and the motion vector of the first target template; calculating the duty ratio of the first target and the first target template in the image to obtain histograms of the first target and the first target template, normalizing the histograms of the first target and the first target template to the same size, and obtaining second similarity of the first target and the first target template according to the normalized histograms of the first target and the first target template; and calculating the similarity according to the first similarity and the second similarity, wherein the similarity is comprehensive similarity, and the comprehensive similarity with more accuracy and real-time performance can be obtained by calculating the motion vector and the histogram of the first target and the first target template.

Optionally, the identifying the respective first type of image frames to obtain a first target set corresponding to the respective first type of image frames includes:

Identifying each first type image frame according to a target detection algorithm to determine an identification result of each first type image frame, wherein the identification result comprises at least one target contained in the first type image and a confidence corresponding to the at least one target;

And constructing a first target set corresponding to each first type of image frame according to the identification result of each first type of image frame.

In the implementation process, the first target set corresponding to each first type of image frame includes a plurality of targets and confidence data corresponding to the targets, each first type of image frame is identified according to a target detection algorithm, at least one target included in each first type of image frame and the confidence corresponding to the at least one target are identified, as the identification result of the first type of image frame, the acquired targets and the confidence corresponding to each target are combined according to the identification result of each first type of image frame, and the first target set is constructed. And collecting a plurality of targets in the first type of image frames and target confidence information by constructing the first target set so as to calculate.

The updating the target template set according to the target information of the targets contained in the first-type image frames comprises the following steps:

when the similarity between the first target and the first target template is in a first numerical value interval and the confidence of the first target is in a second numerical value interval, judging that the first target and the first target template are the same target, and updating the target information of the first target into the information of the first target template;

When the similarity between the first target and the first target template is in a third numerical value interval and the confidence of the first target is in a fourth numerical value interval, judging that the first target is a new target, and adding target information of the first target to the target template set to serve as the new target;

And when the similarity between the first target and the first target template is in a fifth numerical value interval and the confidence of the first target is in a sixth numerical value interval, not processing target information in the first target.

In the implementation process, the relationship between the first target and the first target template set is judged according to the similarity and the confidence level of the first target, and three judgment results are generated after the judgment is completed according to the similarity and the digital interval where the confidence level of the first target is located: when the first target and the first target template are judged to be the same target, updating the target information in the first target to the information of the first target template, and updating a target template set in which the first target template is located; when the first target is judged to be a new target, updating the target information in the first target to the information of the first target template, and updating a target template set in which the first target template is located; in the third case, the target information in the first target is not processed. And obtaining the relation between the first target and the first target template according to the similarity and the confidence of the first target, and accurately and real-timely updating the target template set.

Optionally, the training the current target tracking model according to the second type image frame and the image feature set in the target video to obtain an updated target tracking model includes:

And calculating each second type image frame and the image feature set in the target video according to a target tracking algorithm so as to update and train the target tracking model and obtain an updated current target tracking model.

In the implementation process, the second-class image frame is taken as a parameter of a target tracking algorithm, and the target tracking model is updated and trained by combining the image feature set so as to obtain the updated current target tracking model. The second type image frame is converted into a sequence matrix through a target tracking algorithm, and the sequence matrix is used as an input parameter, so that the target tracking model can be accurately updated and trained in real time.

Optionally, the method further comprises:

Inputting each second type image frame in the target video into the target tracking model for calculation to obtain a tracking result of the target in each second type image frame;

If the tracking result of any second type of image frame represents that the second target in the second type of image frame is successfully tracked, updating the tracking information of the second target, wherein the tracking information comprises the motion data of the second target;

And if the tracking result of any second type image frame represents that the tracking of a third target in the second type image frame fails, deleting the third target from the target template set.

In the implementation process, after the target tracking model is updated, the target information of the targets in each second type of image frame is input into the target tracking model to be calculated, so as to obtain a tracking result of the targets in each second type of image frame, when the tracking result of any second type of image frame is characterized as successful in tracking the second targets in the second type of image frame, the tracking information of the second targets is fed back and updated, the tracking information comprises data such as a motion track of the second targets, and when the tracking result of any second type of image frame is characterized as failed in tracking the third targets in the second type of image frame, the third targets are deleted from the target template set, wherein errors between the targets failed in tracking and the corresponding targets in the second type of image frame are larger, and the accuracy is lower, so that the third targets are deleted from the target template set. And judging the tracking result of the target in the second type image frame, and performing correlation processing according to the tracking result, so that the tracking information of the target can be updated in real time, and the accuracy of the target template set is maintained.

In a second aspect, the present application also provides an object detection tracking device, the device comprising: the device comprises a reading module, a detecting module and a tracking module;

the reading module is used for acquiring a target video;

the detection module is used for detecting first-class image frames in the target video to determine targets contained in the first-class image frames, and updating a target template set according to target information of the targets contained in the first-class image frames;

The tracking module is used for extracting the characteristics of the images in the target template set to obtain an image characteristic set, and training the current target tracking model according to the second-class image frames in the target video and the image characteristic set to obtain an updated target tracking model.

In the implementation process, the content of the target video is acquired through the reading module, each first type image frame of the target video acquired in the reading module is detected through the detecting module, the target template set is updated according to the target information contained in each first type image frame obtained after detection, the image feature of the updated target template set is extracted through the tracking module, the image feature set is obtained through extraction, the current target tracking model is trained according to the image feature set and each second type image frame, and the updated target tracking model is obtained. The frame numbers of different positions in the target video are respectively processed, so that a plurality of targets in the target video can be detected and tracked, and the target template set and the target tracking model are updated in real time in the detection and tracking process, so that the detection and tracking accuracy and the real-time performance are improved.

In a third aspect, an embodiment of the present application further provides an electronic device, where the electronic device includes a memory and a processor, where the memory stores program instructions, and when the processor reads and executes the program instructions, the processor executes the steps in any implementation manner of the first aspect.

In a fourth aspect, embodiments of the present application also provide a readable storage medium having stored therein computer program instructions which, when read and executed by a processor, perform the steps of any of the implementations of the first aspect.

In summary, the present application provides a method, an apparatus, an electronic device, and a storage medium for target detection and tracking, which are capable of detecting and tracking a plurality of targets in different frames in a target video by respectively processing the frames in different positions in the target video, and updating a target template set and a target tracking model in real time during the detection and tracking process, thereby improving the accuracy and instantaneity of detection and tracking.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and should not be considered as limiting the scope, and other related drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic flow chart of a target detection tracking method according to an embodiment of the present application;

Fig. 2 is a flowchart of step S2 of the target detection tracking method according to the embodiment of the present application;

Fig. 3 is a flowchart of step S21 of the target detection tracking method according to the embodiment of the present application;

fig. 4 is a flowchart of step S23 of the target detection tracking method according to the embodiment of the present application;

FIG. 5 is a schematic diagram of a portion of a target detection tracking method according to an embodiment of the present application;

Fig. 6 is a schematic structural diagram of a target detecting and tracking device according to an embodiment of the present application.

Icon: 10-a target detection tracking device; 11-a reading module; 12-a detection module; 13-tracking module.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application. It will be apparent that the described embodiments are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art based on embodiments of the present application without making any inventive effort, are intended to fall within the scope of the embodiments of the present application.

The embodiment of the application provides a target detection tracking method, a device, a storage medium and a storage medium, which relate to the technical field of video image processing, can monitor and track various videos, can be videos in traffic information, can be monitoring videos in various occasions such as schools, enterprises and the like, can be other videos containing a plurality of targets, can be moving vehicles, can be moving pedestrians and other various targets, and is described below.

Referring to fig. 1, fig. 1 is a flowchart of a target detection tracking method according to an embodiment of the present application, where the method may include the following steps:

step S1, obtaining a target video.

And reading the content in the target video after the target video is acquired, and classifying each frame in the target video.

And step S2, detecting first type image frames in the target video to determine targets contained in the first type image frames.

After each frame of the target video is classified, a first type of image frame is selected, and an algorithm is used for detecting the first type of image frame to obtain targets contained in each first type of image frame, for example, YOLO (You Only Look Once) algorithm can be used for detecting the first type of image frame, and YOLO algorithm has the advantages of simplicity and rapidness, and can convolve the whole picture, has a larger visual field when detecting the targets, is not easy to misjudge the background, and improves the detection accuracy.

And step S3, updating the target template set according to the target information of the targets contained in each first-type image frame.

According to the target information contained in the targets of the first-type image frames detected in the step S2, the data of the target templates contained in the target template set can be accurately updated in real time.

For example, when detecting and tracking a plurality of vehicle targets in a section of traffic video, the first type image frame of the vehicle video contains a plurality of moving vehicle targets, each vehicle target has own target information, and the target information can include a plurality of information such as license plate information of the vehicle, label information of the vehicle, movement track of the vehicle, time information of the vehicle and position information of the vehicle on a picture; when detecting and tracking a plurality of moving pedestrian targets in a monitoring video of a public occasion, the first image frame of the monitoring video contains a plurality of moving pedestrian targets, each pedestrian target has own target information, and the target information can comprise various information such as identification information of the pedestrian, label information of the pedestrian, movement track of the pedestrian, time information of the pedestrian, position information of the pedestrian in a picture and the like.

And S4, extracting the characteristics of the images in the target template set to obtain an image characteristic set.

The feature extraction is performed on the images in the target template set through an algorithm, the extracted image features are combined together to form an image feature set, for example, HOG (Histogram of Oriented Gradient) features of the image can be extracted when the features of the image are extracted, the features are formed by calculating and counting gradient direction histograms of local areas of the image, so that the performance and the shape of the local target can be well described by gradient or edge direction density distribution, the HOG features can keep invariance of geometric and optical deformation of the image, and some fine actions of the target can be allowed without affecting the detection effect.

And step S5, training a current target tracking model according to the second-class image frames and the image feature set in the target video to obtain an updated target tracking model.

The method comprises the steps of calculating an image feature set and a second type image frame by using an algorithm to obtain a current target tracking model, training and updating the target tracking model, and calculating the second type image frame and the image feature set by using a KCF (Kernel Correlation Filter kernel correlation filtering algorithm) tracking algorithm, collecting positive and negative samples by using a cyclic matrix of a surrounding area of the target, tracking the detected target by using ridge regression, training to obtain the current target tracking model, updating the target tracking model, reducing the operand, and improving the operation speed and instantaneity.

In the embodiment shown in fig. 1, a method for detecting a target is provided, which can detect and track multiple targets in a target video by respectively processing the frame numbers of different positions of the target video, and update a target template set and a target tracking model in real time in the process of detection and tracking, so that the accuracy and instantaneity of detection and tracking are improved.

Referring to fig. 2, fig. 2 is a flowchart of step S2 of the target detection tracking method according to an embodiment of the present application, where step S2 of the method may include the following steps:

And step S21, identifying each first type image frame to obtain a first target set corresponding to each first type image frame.

Wherein each first type image frame in the target video is identified, a first target set corresponding to each first type image frame can be obtained, the first type image frame is an nth frame in the target video, n is a remainder processing for m, m is an integer value preset by a user, the user selects the size of m according to own needs and actual conditions, the remainder result of n to m of the first type image frame is 0, the second type image frame is other image frames except for the first type image frame in the target video, for example, m may be 3, n-to-3 is used to perform a remainder process to obtain that multiple frames of 3, such as a third frame, a sixth frame, a ninth frame, etc., are first-class image frames, and non-multiple frames of 3, such as a first frame, a second frame, a fourth frame, a fifth frame, etc., are second-class image frames, m may also be 4, n-to-4 is used to perform a remainder process to obtain that fourth frame, an eighth frame, a twelfth frame, etc., are first-class image frames, and first frame, second frame, third frame, fifth frame, sixth frame, seventh frame, etc., are second-class image frames, so that in order to ensure real-time performance of object detection tracking, a larger integer value should be avoided for the value of m.

It should be noted that, the user may also select, according to the actual situation in the target video, to randomly select the number of frames of the target video, for example, to randomly select a part of the number of frames as the number of frames of the first type of image, and select the remaining number of frames as the number of frames of the second type of image, or select a mode of cross-selecting the number of frames, for example, to select the odd number of frames as the number of frames of the first type of image, select the even number of frames as the number of frames of the second type of image, and select the first type of image frame and the second type of image frame in multiple selection modes.

For example, fig. 3 provides a flowchart of step S21 of the target detection tracking method according to an embodiment of the present application, where step S21 of the method may further include the following steps:

Step S211, identifying each first type image frame according to the target detection algorithm, so as to determine an identification result of each first type image frame.

The object detection algorithm is used for identifying each first type of image frame, the identification result comprises at least one object contained in the first type of image frame and confidence information corresponding to the object, and the confidence is the confidence that the model considers that the object exists in the detection frame. For a detection frame, a confidence threshold is used for filtering first, and when the confidence of the detection frame is greater than the threshold, the detection frame is considered to have a target (i.e. positive sample), otherwise, the detection frame is considered to have no target (i.e. negative sample). The YOLO (You Only Look Once) algorithm may also be selected for use in the selection of the algorithm.

Step S212, according to the identification result of each first type image frame, constructing a first target set corresponding to each first type image frame.

And combining the identification results of the identified first type image frames to form a first target set corresponding to the first type image frames, wherein the first target set comprises targets contained in the first type image frames and confidence information corresponding to the targets.

Step S22, comparing the first targets in the first target set with each target template in the target template set for each first target set to determine a first target template.

For each first target set, selecting one target in the first target set as a first target, comparing each target template in the first target set with each target template in the target template set in turn, determining a first target template, for example, when comparing, the first target and the target template can be selectively compared with each other by using an IOU (Intersection-over-Union), which is a measurement index for measuring the overlapping degree of two rectangular frames, and finding out the target template first target template with the largest image overlapping ratio after comparing with the first target in the target template set, wherein the calculation process is as follows:

Where I _imax is the maximum of the IOU, D _i is the first target in the first target set, and T _i is the first target template in the target template set.

Step S23, calculating the similarity between the first target and the first target template.

And calculating the similarity between the first target and the first target template by using an algorithm.

Fig. 4 is a schematic flow chart of step S23 of the target detection tracking method according to the embodiment of the present application, where the method includes the following steps:

step S231, calculating a first similarity according to the motion vector of the first target and the first target template.

Wherein, the motion vector of the first target template T _i is calculated firstThen calculate the motion vector/>, between the first target D _i and the first target template T _i And according to/>And/>Calculate the first similarity/>The calculation formula is as follows:

Step S232, calculating a second similarity according to the histograms of the first target and the first target template.

Wherein the first target and the duty ratio of the first target template in the image can be selectively calculated to obtain the first target and the histogram of the first target template, the first target and the histogram of the first target template are normalized to the same size, the second similarity between the first target and the first target template is obtained according to the normalized first target and the histogram of the first target template, the histogram of the first target is marked asThe histogram of the first target template is denoted/>The second similarity is denoted as/>The calculation method is as follows:

Wherein k is a parameter preset by a user, and N is a first target histogram Histogram with first target templateThe number of segmented sub-regions (called bins), the motion vector/>, of the histogram of the first objectMotion vector/>, of histogram of first target template

Step S233, calculating the similarity according to the first similarity and the second similarity.

Wherein, according to the calculated first similarityAnd a second similarity/>Combining with the maximum value I _imax of the IOU, calculating to obtain the comprehensive similarity S _i, wherein the calculation method is as follows:

Wherein, alpha, beta and gamma are related weighting coefficients, and the reference values are alpha=0.6, beta=0.2 and gamma=0.2.

It should be noted that, when the similarity between the first target and the first target template is within the first numerical interval, that is, S _i is greater than or equal to 0.6, and the confidence of the first target is within the second numerical interval, that is, P _i is greater than or equal to 0.8:

For example, when S _i＝0.6,P_i =0.8, or when S _i＝0.7,P_i =0.9, the similarity between the first target and the first target template is the highest, the first target and the first target template are judged to be the same target, and the target information of the first target is updated into the information of the first target template;

When the similarity between the first target and the first target template is within the third numerical interval, that is, S _i <0.2, and the confidence of the first target is within the fourth numerical interval, that is, P _i is more than or equal to 0.8:

For example, when S _i＝0.1,P_i =0.8, or when S _i＝0.15,P_i =0.9, the similarity between the first target and the first target template is low, the first target is judged to be a new target, the target information in the first target is collected in the target template set of the computer, and the target template set is updated;

When the similarity between the first target and the first target template is within the fifth numerical interval, i.e., 0.2< s _i <0.6, and the confidence of the first target is within the sixth numerical interval, i.e., P _i < 0.8:

for example, when S _i＝0.4,P_i =0.7, or when S _i＝0.5,P_i =0.5, the target information of the first target is not processed temporarily, where the relationship between the first target and the first target template can be obtained through the similarity between the first target and the first target template and the confidence of the first target, and the target template set is updated accurately and in real time.

In the above embodiment, the first target template with the highest intersection ratio with the first target image is found by the correlation algorithm, and the similarity between the first target and the first target template is calculated, so that the relationship between the first target and the first target template is determined, and the target template set can be updated accurately and in real time.

Referring to fig. 5, fig. 5 is a partial flow chart of a target detection tracking method according to an embodiment of the present application, where the method includes:

and S6, inputting each second type image frame in the target video into the target tracking model for calculation to obtain a tracking result of the target in each second type image frame.

The image of the second class image frame can be used as an input parameter, a KCF (Kernel Correlation Filter kernel correlation filtering algorithm) tracking algorithm is used for calculating an image feature set in each second class image frame and a target template set in the target video by a target tracking algorithm, an updated target tracking model is obtained, target information of targets in each second class image frame is input into the target tracking model, and a tracking result of the targets in each second class image frame is obtained.

The updated target tracking model can also be used as reference data in the next tracking process, so that the problems of large errors and the like caused by resetting the target tracking model and the like are reduced, and the tracking accuracy and the real-time performance are maintained.

And S7, if the tracking result of any second type image frame represents that the second target in the second type image frame is successfully tracked, updating the tracking information of the second target.

When the tracking result of the second target meets the relevant threshold, the second target is a target which is successfully tracked, the tracking information of the second target is updated, the tracking information comprises relevant data such as a motion track of the second target, and the relevant threshold is, for example, a threshold which is set in an algorithm by a user and is used for displaying whether the tracking result is successful or not, and the user obtains the tracking result in actual calculation.

And S8, if the tracking result of any second type image frame represents that the tracking of a third target in the second type image frame fails, deleting the third target from the target template set.

When the tracking result of the third target does not meet the relevant threshold value, the third target is a target with failed tracking, target information of the third target is deleted from the target template set, the error between the third target with failed tracking and the corresponding tracking target in the second-class image frame is larger, and the accuracy is lower.

In the embodiment, the data updating is performed on the target which is successfully tracked, the data deleting is performed on the target which is failed to track, the tracking accuracy and the real-time performance can be ensured, the data with larger errors can be deleted, and the accurate data can be updated in real time.

Referring to fig. 6, fig. 6 is a schematic structural diagram of an object detection tracking device according to an embodiment of the present application, and the object detection tracking device 10 includes: the device comprises a reading module 11, a detecting module 12 and a tracking module 13, wherein the reading module 11, the detecting module 12 and the tracking module 13 are mutually connected.

A reading module 11, configured to acquire a target video;

The detection module 12 is configured to detect first type image frames in the target video, so as to determine targets contained in each first type image frame, and update a target template set according to target information of the targets contained in each first type image frame;

And the tracking module 13 is used for extracting the characteristics of the images in the target template set to obtain an image characteristic set, and training the current target tracking model according to the second-class image frames in the target video and the image characteristic set to obtain an updated target tracking model.

Optionally, the detection module 12 further includes an identification module, a comparison module, a calculation module, and an analysis module.

The identifying module is configured to identify each first type of image frame to obtain a first target set corresponding to each first type of image frame, where the first type of image frame is an nth frame in the target video, a remainder result of n to m is a 0 frame, m is a preset integer value, and the second type of image frame is an image frame in the target video except for the first type of image frame;

The identification module is further configured to identify each first type of image frame according to an object detection algorithm, so as to determine an identification result of each first type of image frame, where the identification result includes at least one object included in the first type of image frame and a confidence level corresponding to the at least one object, and construct a first object set corresponding to each first type of image frame according to the identification result of each first type of image frame.

The comparison module is used for comparing the first target in the first target set with each target template in the target template set to determine a first target template, wherein the first target is any one target in the first target set, and the target template set comprises one or more target templates.

The calculation module is configured to calculate a similarity between the first target and the first target template, where the similarity is calculated by a first similarity and a second similarity, calculate the first similarity between the first target and the first target template according to a motion vector between the first target and the first target template, calculate the second similarity between the first target and the first target template according to a histogram between the first target and the first target template, and calculate the similarity between the first target and the first target template according to the first similarity and the second similarity.

The analysis module is configured to analyze and process a relationship between the first target and the first target template according to a similarity between the first target and the first target template and a numerical interval where a confidence of the first target is located:

Optionally, the tracking module 13 further comprises an updating module.

And the updating module is used for calculating each second type image frame and the image feature set in the target video according to a target tracking algorithm so as to update and train the target tracking model and obtain an updated current target tracking model.

The updating module is further used for inputting each second type of image frame in the target video into the target tracking model for calculation so as to obtain a tracking result of the target in each second type of image frame;

If the tracking result of any second type of image frame represents that the second target in the second type of image frame is successfully tracked, updating the tracking information of the second target, wherein the tracking comprises the motion data of the second target;

In the implementation process, the frame numbers of different positions in the target video can be processed respectively, so that a plurality of targets in the target video can be detected and tracked, and the target template set and the target tracking model are updated in real time in the detection and tracking process, so that the detection and tracking accuracy and real-time performance are improved.

The embodiment of the application also provides electronic equipment, which comprises a memory and a processor, wherein the memory stores program instructions, and when the processor reads and runs the program instructions, the processor executes the steps in the method for providing any one of the target detection tracking methods.

It should be understood that the electronic device may be an electronic device having a logic computing function, such as a personal computer (personal computer, PC), a tablet computer, a smart phone, a Personal Digital Assistant (PDA), or the like.

The embodiment of the application also provides a readable storage medium, wherein the readable storage medium stores computer program instructions which, when read and executed by a processor, perform the steps of a target detection tracking method.

In summary, the method, the device, the electronic device and the storage medium for target detection and tracking provided by the embodiments of the present application use a plurality of algorithms, and can detect and track a plurality of targets in different frames in a target video by respectively processing the frames in different positions in the target video, and update a target template set and a target tracking model in real time in the process of detection and tracking, thereby improving the accuracy and instantaneity of detection and tracking.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus may be implemented in other manners. The apparatus embodiments described above are merely illustrative, for example, block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of devices according to various embodiments of the present application. In this regard, each block in the block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams, and combinations of blocks in the block diagrams, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

In addition, functional modules in the embodiments of the present application may be integrated together to form a single part, or each module may exist alone, or two or more modules may be integrated to form a single part.

The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. The present embodiment therefore also provides a readable storage medium having stored therein computer program instructions which, when read and executed by a processor, perform the steps of any one of the methods of block data storage. Based on this understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a RanDom Access Memory (RAM), a magnetic disk or an optical disk, or other various media capable of storing program codes.

The above description is only an example of the present application and is not intended to limit the scope of the present application, and various modifications and variations will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the protection scope of the present application. It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures.

The foregoing is merely illustrative of the present application, and the present application is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present application.

It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

Claims

1. A method of target detection tracking, the method comprising:

acquiring a target video;

training a current target tracking model according to the second-class image frames in the target video and the image feature set to obtain an updated target tracking model;

The detecting the first type image frames in the target video to determine the targets contained in the first type image frames includes: identifying each first type of image frame to obtain a first target set corresponding to each first type of image frame, wherein the first target set corresponding to any first type of image frame comprises one or more targets in the first type of image frame, the first type of image frame is an nth frame in the target video, the m remainder result of n is a 0 frame, m is a preset integer value, and the second type of image frame is an image frame except the first type of image frame in the target video; comparing a first target in the first target set with each target template in the target template set for each first target set to determine a first target template, wherein the first target is any target in the first target set, and the target template set comprises one or more target templates; calculating the similarity between the first target and the first target template;

The calculating the similarity between the first target and the first target template comprises: calculating a first similarity between the first target and the first target template according to the motion vectors of the first target and the first target template; calculating a second similarity between the first target and the first target template according to the histograms of the first target and the first target template; according to the first similarity and the second similarity, combining a maximum value of an image cross-over ratio and a related weighting coefficient, calculating the similarity between the first target and the first target template, including:

Wherein, For the first similarity,/>For the second similarity, α, β, γ are the correlation weighting coefficients, S _i is the similarity, and I _imax is the maximum value of the image intersection ratio.

2. The method of claim 1, wherein the identifying the respective first type of image frames to obtain the first target set corresponding to the respective first type of image frames comprises:

Identifying each first type of image frame according to a target detection algorithm to determine an identification result of each first type of image frame, wherein the identification result comprises at least one target contained in the first type of image frame and a confidence corresponding to the at least one target;

3. The method of claim 2, wherein updating the set of target templates based on target information for targets contained in the respective first type of image frames comprises:

4. The method of claim 1, wherein training the current target tracking model based on the second type of image frames and the set of image features in the target video to obtain an updated target tracking model comprises:

5. The method according to claim 1, wherein the method further comprises:

6. An object detection tracking device, the device comprising: the device comprises a reading module, a detecting module and a tracking module;

the reading module is used for acquiring a target video;

The tracking module is used for extracting the characteristics of the images in the target template set to obtain an image characteristic set, and training a current target tracking model according to the second-class image frames in the target video and the image characteristic set to obtain an updated target tracking model;

The detection module comprises an identification module, a comparison module and a calculation module; the identification module is used for identifying each first type of image frame to obtain a first target set corresponding to each first type of image frame, wherein the first target set corresponding to any first type of image frame comprises one or more targets in the first type of image frame, the first type of image frame is an nth frame in the target video, the n-to-m remainder result is a frame of 0, m is a preset integer value, and the second type of image frame is an image frame except the first type of image frame in the target video; the comparison module is used for comparing a first target in the first target set with each target template in the target template set for each first target set to determine a first target template, wherein the first target is any one target in the first target set, and the target template set comprises one or more target templates; the computing module is used for computing the similarity between the first target and the first target template;

The computing module is specifically configured to: calculating a first similarity between the first target and the first target template according to the motion vectors of the first target and the first target template; calculating a second similarity between the first target and the first target template according to the histograms of the first target and the first target template; according to the first similarity and the second similarity, combining a maximum value of an image cross-over ratio and a related weighting coefficient, calculating the similarity between the first target and the first target template, including:

7. An electronic device comprising a memory and a processor, the memory having stored therein program instructions which, when executed by the processor, perform the steps of the method of any of claims 1-5.

8. A readable storage medium, characterized in that the readable storage medium has stored therein computer program instructions which, when executed by a processor, perform the steps of the method according to any of claims 1-5.