Disclosure of Invention
The embodiment of the application provides an anti-interference tracking method and system based on target infrared imaging characteristics, which are used for solving the problem of poor tracking precision in the prior art.
In a first aspect, an embodiment of the present application provides an anti-interference tracking method based on a target infrared imaging feature, including:
Acquiring continuous infrared image sequences of a target object at different time points;
determining potential motion trajectories of target objects in the infrared image sequence by utilizing space-time correlation analysis;
Generating a dynamic weight map based on the potential motion trail, wherein the dynamic weight map adaptively adjusts the importance of each pixel according to the background noise and the change of the target characteristics;
combining the dynamic weight map with the infrared image sequence to strengthen a target area and inhibit a non-target area, so as to obtain an enhanced infrared image sequence;
and (3) adopting a multi-hypothesis tracking strategy to locate a target object in real time in the reinforced infrared image sequence, and updating the dynamic weight graph to keep continuous tracking of the target object.
Optionally, the determining the potential motion trail of the target object in the infrared image sequence by using space-time correlation analysis includes:
Performing inter-frame difference calculation on the continuous infrared image sequence to obtain change information between adjacent frames;
constructing an optical flow field based on the change information, wherein the optical flow field represents the movement condition of a pixel point in the infrared image sequence from a previous frame to a next frame;
removing noise and abnormal values in the optical flow field through a filtering technology to obtain optical flow estimation;
according to the optical flow estimation, a dynamic clustering algorithm is applied to identify a motion area related to a target object;
and establishing a corresponding relation between the motion area and the continuous frames to form a potential motion trail of the target object.
Optionally, the generating a dynamic weight map based on the potential motion trail includes:
extracting a region associated with the target object from the potential motion trail as an interest region;
performing feature analysis on the region of interest, and identifying key feature points representing a target object;
calculating the importance coefficient of each pixel point to the target object according to the position of the key feature point and the change condition of the key feature point along with time;
Allocating a noise suppression factor for reflecting the influence degree of the background noise on each pixel point by combining the background noise level in the current frame;
the importance coefficients are combined with the noise suppression factors to generate a dynamic weight map.
Optionally, the combining the dynamic weight map with the infrared image sequence to strengthen the target area and inhibit the non-target area, so as to obtain an enhanced infrared image sequence, including:
For each pixel point in each frame of infrared image sequence, applying a weight value in a dynamic weight graph corresponding to the position of the pixel point, and enhancing or weakening the intensity value of the pixel point to form an enhanced infrared image sequence;
The process of enhancing or weakening the intensity value of the pixel point comprises the steps of enhancing the intensity value of the pixel point by using a first preset weight value for the pixel point in the target area to improve the visibility of the target area, and weakening the intensity value of the pixel point by using a second preset weight value for the pixel point in the non-target area to reduce the visibility of the target area.
Optionally, the adopting a multi-hypothesis tracking strategy to locate the target object in real time in the enhanced infrared image sequence, and updating the dynamic weight graph to keep continuous tracking of the target object includes:
constructing a plurality of hypotheses about the position and state of the target object for each frame of the enhanced infrared image sequence;
Predicting the position of the target object in each assumed next frame by using a prediction model;
In the infrared image sequence after the reinforcement of the next frame, calculating a likelihood score corresponding to each hypothesis according to the predicted position;
Selecting the hypothesis with the highest likelihood score as the current optimal estimate, and retaining a preset number of suboptimal hypotheses to cope with the uncertainty;
and updating weight distribution in the dynamic weight graph according to the current optimal estimation result, so that the weight value is more beneficial to highlighting the target object and suppressing background noise, and continuous tracking of the target object is kept.
Optionally, the calculating the importance coefficient of each pixel point for the target object includes:
calculating the importance coefficient of each pixel point to the target object through the following calculation formula:
Where I (x, y, t) represents the importance coefficient of the pixel point at the position (x, y) at time t, N is the number of key feature points, ω i (t) is the adaptive weighting factor of the ith key feature point at time t, x i(t),yi (t) is the abscissa and ordinate of the ith key feature point at time t, respectively, σ i (t) is the spatial scale parameter associated with the ith key feature point, Is a two-dimensional gaussian function for measuring the influence of the distance between the current position (x, y) and the key feature point (x i(t),yi (t)) on the importance coefficient,Is gradient information of the key feature point i at time t,Is a nonlinear function and is based on gradient information of key feature pointsThe importance coefficients are adjusted by the direction parameter θ i, the local texture properties ψ i (t), and the attention weight a t.
Optionally, the updating the weight distribution in the dynamic weight map according to the result of the current optimal estimation includes:
calculating the weight distribution in the updated dynamic weight map through the following calculation formula:
W′(x,y,t+1)
=W(x,y,t)+α(t)·(β(t)·I(x,y,t)-W(x,y,t))+γ(t)
·(N(x,y,t)-μN(t))+λ·ΔW(x,y,t)+
Where W' (x, y, t+1) is the weight value of the updated dynamic weight map at the position (x, y) at time t+1, W (x, y, t) is the weight value of the current time t, α (t) is the learning rate over time, β (t) is the target importance gain coefficient over time, I (x, y, t) is as defined above, is the importance coefficient of that position, γ (t) is the noise suppression gain coefficient over time, N (x, y, t) is a function reflecting the background noise level of position (x, y) at time t, μ N (t) is the mean of the background noise level of the whole image at time t, λ is the weight coefficient of the smoothing term, ΔW (x, y, t) is the Laplace operator result at the (x, y) position, η is the weight coefficient of the additional adjustment term, Is a function based on Convolutional Neural Network (CNN) and attention mechanism to receive the current weight map W (x, y, t), context information C t and hidden state H t, and output a correction term for further optimization of the weight map.
In a second aspect, an embodiment of the present application provides an anti-interference tracking system based on a target infrared imaging feature, including:
the acquisition module is used for acquiring continuous infrared image sequences of the target object at different time points;
the analysis and determination module is used for determining potential motion trail of the target object in the infrared image sequence by utilizing space-time correlation analysis;
The generation adjustment module is used for generating a dynamic weight graph based on the potential motion trail, wherein the dynamic weight graph adaptively adjusts the importance of each pixel according to the background noise and the change of the target characteristics;
The combining module is used for combining the dynamic weight graph with the infrared image sequence to strengthen a target area and inhibit a non-target area so as to obtain an enhanced infrared image sequence;
and the real-time positioning tracking module is used for positioning the target object in real time in the reinforced infrared image sequence by adopting a multi-hypothesis tracking strategy, and keeping continuous tracking of the target object by updating the dynamic weight graph.
In a third aspect, an embodiment of the present application provides a computing device, including a processing component and a storage component, where the storage component stores one or more computer instructions, and the one or more computer instructions are used to be invoked and executed by the processing component to implement an anti-interference tracking method and system based on the target infrared imaging feature according to any one of the first aspect.
In a fourth aspect, an embodiment of the present application provides a computer storage medium storing a computer program, where the computer program is executed by a computer to implement an anti-interference tracking method and system based on the target infrared imaging feature according to any one of the first aspect.
The method comprises the steps of obtaining continuous infrared image sequences of a target object at different time points, determining potential motion tracks of the target object in the infrared image sequences by utilizing space-time correlation analysis, generating a dynamic weight map based on the potential motion tracks, wherein the dynamic weight map is used for adaptively adjusting importance of each pixel according to background noise and changes of target features, combining the dynamic weight map with the infrared image sequences to strengthen target areas and inhibit non-target areas to obtain the strengthened infrared image sequences, and adopting a multi-hypothesis tracking strategy to locate the target object in the strengthened infrared image sequences in real time and keeping continuous tracking of the target object by updating the dynamic weight map. The technical scheme provided by the application can improve the tracking precision.
These and other aspects of the application will be more readily apparent from the following description of the embodiments.
Detailed Description
In order to enable those skilled in the art to better understand the present application, the following description will make clear and complete descriptions of the technical solutions according to the embodiments of the present application with reference to the accompanying drawings.
In some of the flows described in the specification and claims of the present application and in the foregoing figures, a plurality of operations occurring in a particular order are included, but it should be understood that the operations may be performed out of order or performed in parallel, with the order of operations such as 101, 102, etc., being merely used to distinguish between the various operations, the order of the operations themselves not representing any order of execution. In addition, the flows may include more or fewer operations, and the operations may be performed sequentially or in parallel. It should be noted that, the descriptions of "first" and "second" herein are used to distinguish different messages, devices, modules, etc., and do not represent a sequence, and are not limited to the "first" and the "second" being different types.
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to fall within the scope of the application.
Fig. 1 is a flowchart of an anti-interference tracking method based on target infrared imaging features according to an embodiment of the present application, as shown in fig. 1, the method includes:
101. Acquiring continuous infrared image sequences of a target object at different time points;
In this step, a series of successive infrared images are intended to be collected, these images being the target areas taken at different points in time. In this way, the change condition of the target with time can be captured, and basic data can be provided for subsequent analysis.
In the embodiment of the application, the high-resolution infrared camera is used for continuously shooting the monitoring area, and the frame rate is set to ensure that the fast moving object can be clearly recorded. For example, for a night-driving vehicle tracking task, a 30 frame per second rate may be set to capture all infrared images of the vehicle from in-view to out-of-view.
102. Determining potential motion trajectories of target objects in the infrared image sequence by utilizing space-time correlation analysis;
In this step, a position path that the target may pass through is identified by performing correlation analysis in the spatio-temporal domain on the acquired series of infrared images. This approach takes into account the relationship between neighboring frames and helps to improve prediction accuracy.
In the embodiment of the application, an optical flow method is adopted to calculate the displacement of the pixel points between two frames, so as to further estimate the moving direction and the moving speed of the object in the whole sequence. Assuming that a pedestrian is being tracked, the walking path of the pedestrian can be estimated by comparing the position changes of the pedestrian in the previous and subsequent frames, and a preliminary trajectory model can be constructed accordingly.
The method and the device consider the problem that in the existing infrared image sequence analysis technology, the determination of the motion trail of the target object often has the following problems that firstly, the target detection is inaccurate due to environmental interference (such as temperature change, background clutter and the like), secondly, the real moving target is difficult to distinguish when the contrast between the target and the background is insufficient, and thirdly, the image is blurred due to camera shake or the rapid movement of the target. These problems can lead to deviations in tracking and identifying the target object, which can affect subsequent applications such as security monitoring, obstacle detection in unmanned vehicles, etc. In order to solve the technical problem, the application provides a method for determining the potential motion trail of a target object in an infrared image sequence more accurately based on space-time correlation analysis. The method can effectively filter noise interference and improve the identification capability of a real moving target by comprehensively considering the continuity in the time dimension and the change of the space position.
The alternative scheme is specifically as follows:
Optionally, the step 102 of determining the potential motion trail of the target object in the infrared image sequence by using space-time correlation analysis includes performing inter-frame difference calculation on the continuous infrared image sequence to obtain change information between adjacent frames, constructing an optical flow field based on the change information, wherein the optical flow field represents the movement condition of a pixel point in the infrared image sequence from a previous frame to a next frame, removing noise and abnormal values in the optical flow field through a filtering technology to obtain optical flow estimation, identifying a motion area related to the target object by using a dynamic clustering algorithm according to the optical flow estimation, and establishing a corresponding relation between the motion area and the continuous frames to form the potential motion trail of the target object.
Inter-frame difference calculation refers to comparing the difference of pixel values between two consecutive frames to find the change of image content.
The optical flow field is a data structure used for describing the displacement condition of all points in an image sequence from one frame to the next frame.
And the filtering technology is used for reducing noise components in the signals, so that effective information is clearer and more discernable.
Dynamic clustering algorithm-a method of automatically forming categories based on data characteristics, herein referred to as categorizing together regions of pixels having similar motion characteristics.
Potential motion trail, which is a path possibly passed by the target predicted based on the processing result.
The inter-frame difference calculation refers to obtaining a difference image by subtracting adjacent frames, and this step can highlight a changed part in a scene.
The optical flow field is constructed by estimating the velocity vector of each pixel using an optical flow algorithm (e.g., lucas-Kanade algorithm), i.e., how they move from the current frame to the next frame.
Filtering and denoising refers to removing mismatching points generated in the optical flow estimation process by adopting median filtering or other appropriate filters.
The dynamic clustering is to use a clustering algorithm such as K-means to aggregate pixels with similar velocity vectors to form a plurality of clusters representing different object motion modes.
Establishing a corresponding relation and forming a track means that the development change of the clusters along with time is tracked, and then the complete movement route of each target object is drawn.
In the embodiment of the application, a street video clip shot at night is assumed to be provided, wherein the street video clip comprises pictures of walking pedestrians. Firstly, a frame difference method is utilized to obtain a difference image between each pair of adjacent frames, and then an optical flow algorithm is applied to estimate the moving direction and distance of each pixel point. Then eliminating some isolated outliers by median filtering. The K-means algorithm is then used to classify the remaining effective optical-flow vectors to find those clusters that belong to pedestrians. Finally, the position coordinates of the central points of the clusters are linked according to the time sequence, so that the exact path of the pedestrians crossing the street is obtained.
By implementing the scheme, the problems of high false detection rate, weak anti-interference capability and the like in the traditional method can be solved, and the accuracy and stability of target tracking can be remarkably improved. Particularly for complex and changeable practical application scenes, the technical means based on space-time correlation analysis show stronger adaptability and robustness.
103. Generating a dynamic weight map based on the potential motion trail, wherein the dynamic weight map adaptively adjusts the importance of each pixel according to the background noise and the change of the target characteristics;
In this step, a map reflecting the importance of each region is created based on the known or predicted target motion path. The figure automatically adjusts the weight value of each pixel point according to the background noise level and the characteristics of the target itself, so as to highlight the key part.
In the embodiment of the application, a model is trained by using a machine learning algorithm, so that the model learns to distinguish which are real targets and which are only interference factors. When a new frame is detected, the model will evaluate the probability that each pixel belongs to the target and assign a corresponding weight accordingly. For example, in looking for wild animals in a forest environment, the system may give higher weight to those areas that exhibit thermal signature characteristics of the animal.
The application considers that in the prior art, the target tracking based on the potential motion trail generally faces the problems that firstly, the background noise is interfered, which may cause the target object to be misjudged or lost, and secondly, the target is difficult to continuously and stably identify due to the change of the target characteristics (such as partial shielding, gesture change and the like). These problems limit the performance and reliability of the target tracking system. In order to solve the problems, the application provides a method for generating a dynamic weight map, which can adaptively adjust the importance of each pixel in an image according to the background noise and the change of target characteristics, thereby improving the accuracy and the robustness of target detection and tracking.
The alternative scheme is specifically as follows:
Optionally, the step 103 of generating a dynamic weight map based on the potential motion trail includes extracting a region associated with a target object from the potential motion trail as a region of interest, performing feature analysis on the region of interest to identify key feature points representing the target object, calculating an importance coefficient of each pixel point for the target object according to the positions of the key feature points and the change condition of the key feature points with time, and allocating a noise suppression factor for reflecting the influence degree of the background noise on each pixel point by combining the background noise level in a current frame, and combining the importance coefficient with the noise suppression factor to generate the dynamic weight map.
Region of interest refers to the portion of an image that is considered to contain important information, such as moving objects.
Key feature points refer to specific locations or structures that may be used to uniquely identify a target object.
Importance coefficient is a value that measures the degree of contribution of a pixel to determining the target location.
Noise suppression factor-a parameter that represents the extent to which a pixel is affected by background noise.
Dynamic weighting map an image in which each pixel value represents the importance of that pixel to the current task (e.g., object tracking).
Extracting the region of interest refers to selecting the regions most likely to belong to the target object from the potential motion tracks obtained in the previous step as the key points of further processing.
Feature analysis and keypoint identification use feature extraction algorithms such as SI FT, SURF, etc. to find stable and distinguishing feature points within the region of interest.
Calculating the importance coefficient refers to giving each pixel a score according to the position of the key feature point and the change condition of the key feature point along with time, wherein the score reflects the importance degree of the pixel to the tracking target.
The noise suppression factors are assigned by evaluating the noise level of each region in the current frame, and each pixel is assigned a suppression factor to mitigate the effect of noise on that pixel.
Generating a dynamic weight map creates a new image for combining the importance coefficient and noise suppression factor of each pixel, where different gray levels or colors represent the importance of different pixels.
In the embodiment of the application, it is assumed that pedestrians in a video stream are being monitored. First, the potential motion trajectory of the pedestrian is obtained using the aforementioned technique, and a specific area where the pedestrian is located is determined therefrom as the region of interest. Next, some salient feature points on the pedestrian, such as the head, shoulders, etc., are identified by applying the S IFT algorithm. Then, according to the positions of the feature points and the manner in which the feature points move with pedestrians, the importance coefficient of each pixel is calculated, namely, the pixels close to the feature points are weighted more highly. Meanwhile, in consideration of a higher noise level that may exist in night photographing, a noise distribution in the entire scene is also estimated, and a noise suppression factor is given to each pixel accordingly. Finally, the importance coefficient is multiplied by the noise suppression factor, and the obtained result forms a dynamic weight graph. In this weight map, the pixel values of the pedestrian body part are higher, while the pixel values of the background and other extraneous regions are lower.
By adopting the method, the system can more accurately distinguish the target area from the non-target area under a complex background, and particularly can keep good tracking effect under the condition of a large amount of noise. In addition, through the dynamic attention to the target feature points, the attention degree to the target feature points can be effectively maintained even when the target is shielded or the gesture is changed, and the continuity and the accuracy in the tracking process are greatly improved. The improvement not only enhances the anti-interference capability of the system, but also improves the whole application range and practicability.
104. Combining the dynamic weight map with the infrared image sequence to strengthen a target area and inhibit a non-target area, so as to obtain an enhanced infrared image sequence;
In the step, the dynamic weight map obtained in the last step is applied to the original infrared image sequence, the area where the target is located becomes more obvious through weighting processing, and meanwhile, the influence of non-target areas is reduced.
In the embodiment of the application, an image fusion technology is developed, which can intelligently enhance the target contour lines and reduce the background brightness according to the information of the weight map. This is particularly useful in military reconnaissance scenarios, which can help operators more easily find enemy units hidden in complex terrain.
The application considers that in the existing infrared imaging technology, especially in the condition of night or low illumination, the following problems are often encountered, namely, the object is not easy to be clearly identified due to insufficient contrast between the background and the foreground (namely, the object), and the image quality is possibly reduced due to thermal noise and interference sources in the environment, so that the detection of the object is further influenced. In order to solve the problems, the application provides a method for combining a dynamic weight map with an infrared image sequence, and improving the visibility and recognition accuracy of a target by enhancing a target area and suppressing a non-target area.
The alternative scheme is specifically as follows:
Optionally, the combining the dynamic weight map with the infrared image sequence to strengthen the target area and inhibit the non-target area to obtain the strengthened infrared image sequence in 104 includes applying a weight value in the dynamic weight map corresponding to the position of each pixel point in each frame of the infrared image sequence to strengthen or weaken the intensity value of the pixel point to form the strengthened infrared image sequence, wherein the process of strengthening or weakening the intensity value of the pixel point includes strengthening the intensity value of the pixel point by using a first preset weight value for the pixel point in the target area to improve the visibility of the target area, and weakening the intensity value of the pixel point by using a second preset weight value for the pixel point in the non-target area to reduce the visibility of the target area.
An infrared image sequence is a series of continuously taken infrared photographs used to capture the heat distribution in a scene.
Dynamic weighting map a mapping in which each pixel value represents the importance of the location to the current task (e.g., highlighting the target).
The first preset weight value is a numerical value larger than 1 and is used for amplifying the intensity of the pixel points in the target area.
And a second preset weight value, namely a value smaller than 1 but larger than 0, is used for reducing the intensity of the pixel points in the non-target area.
Combining the dynamic weight map with the infrared image means that for each pixel in each frame of the infrared image, a corresponding weight value is found according to the corresponding position of the pixel on the dynamic weight map.
The pixel intensities are adjusted to multiply the original intensity values by a higher first preset weight value for pixels belonging to the target area, thereby increasing the brightness of these pixels.
For pixels within the non-target area, a lower second preset weight value is applied to reduce their brightness.
The enhanced infrared image sequence is generated, and after the processing, the whole image sequence becomes more focused on the target object, and the background information becomes less obvious.
In the embodiment of the application, it is assumed that a person needs to be accurately tracked from a series of infrared images photographed at night in a security monitoring scene. First, a dynamic weight map for this person has been obtained according to the aforementioned technique. Next, consider a frame-specific infrared image I, and define a first preset weight value w1=1.5, and a second preset weight value w2=0.7. For any pixel P in this frame of image, if it is within the previously determined region of interest (i.e. considered as part of the object), the new intensity value P '=p×w1, otherwise if P is in the non-region of interest, P' =p×w2. After the processing, the obtained new image not only makes the target more obvious, but also effectively weakens the influence of the surrounding environment.
Let the original pixel intensity be I (x, y) and the corresponding dynamic weight map be W (x, y). If a pixel belongs to a target area, there are:
I'(x,y)=I(x,y)×1.5
Otherwise, if the pixel does not belong to the target region, then:
I'(x,y)=I(x,y)×0.7
where I' (x, y) represents the processed pixel intensity value.
By adopting the scheme, the visibility of the target object under a complex background can be remarkably improved, and particularly under the condition of strong background interference, the object of interest can be more effectively highlighted. In addition, through adjusting the relative intensity of pixels in different areas, the overall visual effect of the image can be optimized, and the subsequent manual analysis or automatic processing process is facilitated. The method not only enhances the anti-noise capability of the system, but also improves the performance of the infrared imaging system in the actual application scene.
105. And (3) adopting a multi-hypothesis tracking strategy to locate a target object in real time in the reinforced infrared image sequence, and updating the dynamic weight graph to keep continuous tracking of the target object.
In the last stage, a multi-hypothesis tracking method is adopted to locate the target in real time in the enhanced image sequence, and the dynamic weight map is continuously updated, so that effective tracking can be maintained even if the target behavior is suddenly changed.
In an embodiment of the application, a set of particle filter based tracking frameworks is designed, which comprises a plurality of hypothetical branches that run in parallel, each branch representing a possible target state. With the arrival of new information, the system will re-evaluate the likelihood of each hypothesis and select the most likely one as the current best estimate. If the tracked object suddenly changes direction or speed, this change is accommodated by adding a new hypothetical branch, ensuring that the tracking process is not interrupted. For example, monitoring the movement of abnormal packages on baggage conveyor during airport security, the system needs to be able to respond quickly to any unexpected behavior pattern changes.
The application considers that in the existing target tracking system, especially when processing targets in a dynamic environment, the problems are that firstly, the targets can be partially blocked or rapidly moved to cause tracking interruption, and secondly, the accurate prediction of the target positions can be influenced by background noise and interference. In order to solve the problems, the application provides a method adopting a multi-hypothesis tracking strategy, which improves the accuracy of real-time positioning by constructing a plurality of hypotheses about the position and state of a target and combining the reinforced infrared image sequence. In addition, by continuously updating the dynamic weight map, the change of the target can be better adapted, so that the continuous tracking of the target object is maintained.
The alternative scheme is specifically as follows:
Optionally, the method in 105 of positioning the target object in real time in the enhanced infrared image sequence by adopting a multi-hypothesis tracking strategy and keeping the target object continuously tracked by updating the dynamic weight graph includes constructing a plurality of hypotheses about the position and the state of the target object for each frame of the enhanced infrared image sequence, predicting the position of the target object in the next frame of each hypothesis by using a prediction model, calculating likelihood scores corresponding to each hypothesis according to the predicted positions in the next frame of the enhanced infrared image sequence, selecting the hypothesis with the highest likelihood score as a current optimal estimate and keeping a preset number of suboptimal hypotheses to cope with uncertainty, and updating the weight distribution in the dynamic weight graph according to the result of the current optimal estimate, so that the weight value is more beneficial to highlighting the target object and suppressing background noise to keep the target object continuously tracked.
Multiple hypothesis tracking strategy, namely a tracking method based on multiple prediction models, can simultaneously consider multiple possible target tracks to cope with uncertainty.
Predictive model-a mathematical model for estimating the position and state of a target at some point in the future.
Likelihood score, a numerical value that measures how well a hypothesis matches actual observed data.
Optimal estimate of all hypotheses the hypothesis with the highest likelihood score is considered the current closest estimate to the true case.
Suboptimal assumptions-some assumptions that remain more likely to be correct, in addition to the optimal estimates, are used to handle uncertainties and bursty conditions.
Constructing multiple hypotheses refers to generating several different hypotheses about the current location and state of the target for each frame of the enhanced infrared image.
Predicting the next frame location refers to predicting the likely location of the target in the next frame for each hypothesis using an appropriate prediction model (e.g., a kalman filter).
Calculating likelihood scores means that in the next frame of image, a likelihood score is assigned to each hypothesis based on the predicted position compared with the actual observed data.
The optimal and sub-optimal hypotheses are selected as the hypotheses with highest scores as the current best estimate, and a certain number of sub-optimal hypotheses are retained in order to cope with possible occlusion or other uncertainty factors.
Updating the dynamic weight map is to adjust the dynamic weight map according to the latest best estimation result, so that the weight value is more beneficial to highlighting the target and suppressing the background noise.
In the embodiment of the application, it is assumed that a video stream is being monitored, wherein a car is traveling. After the previous steps, a series of enhanced infrared images have been obtained. These images are now input into a multi-hypothesis tracking system. First, five different hypotheses H 1 through H 5 are constructed for the current frame, each hypothesis representing a possible position and speed of the car. Next, the Kalman filter is used to predict the position of the car in the next frame for each hypothesis. When a new frame of image arrives, a likelihood score for each hypothesis is calculated by comparing the predicted position with the actual detected car position. Suppose H 1 achieves the highest score of 0.9, while the other hypotheses score 0.7, 0.5, 0.3, and 0.1, respectively. Thus, H 1 is selected as the current best estimate, and H 2 and H 3 are reserved as backup hypotheses. Finally, according to the result of H 1, the dynamic weight map is updated, the pixel intensity of the area where the automobile is located is further enhanced, and the influence of the surrounding environment is reduced.
Let H i denote the ith hypothesis, and L (H i) denote the likelihood score for that hypothesis. For each hypothesis, its likelihood score may be calculated by the following equation:
Here, x pred,i is the predicted position under the assumption of H i, x obs is the actually observed target position, T is the transpose of the matrix or vector, and P is the prediction error covariance matrix. This formula shows that the smaller the deviation between the predicted value and the observed value, the higher the likelihood score.
By introducing a multi-hypothesis tracking strategy, complex situations such as target shielding, rapid movement and the like can be effectively processed, and the influence caused by environmental change can be reduced to a certain extent. In addition, along with the continuous updating of the dynamic weight graph, the system can concentrate on the target area more, and the interference of background noise is reduced, so that the stability and the accuracy of target tracking are obviously improved. The method not only enhances the robustness of the system, but also improves the performance of the system in various practical application scenes.
The formula of the application is used for calculating the importance coefficient of each pixel point to the target object, which is an important step in many computer vision tasks (such as target detection, segmentation, tracking and the like). By evaluating the importance of each pixel, the system can pay more attention to the key area of the target object and ignore the background noise, thereby improving the accuracy of the task.
The alternative scheme is specifically as follows:
Optionally, the calculating the importance coefficient of each pixel point for the target object includes:
calculating the importance coefficient of each pixel point to the target object through the following calculation formula:
Where I (x, y, t) represents the importance coefficient of the pixel point at the position (x, y) at time t, N is the number of key feature points, ω i (t) is the adaptive weighting factor of the ith key feature point at time t, x i(t),yi (t) is the abscissa and ordinate of the ith key feature point at time t, respectively, σ i (t) is the spatial scale parameter associated with the ith key feature point, Is a two-dimensional gaussian function for measuring the influence of the distance between the current position (x, y) and the key feature point (x i(t),yi (t)) on the importance coefficient,Is gradient information of the key feature point i at time t,Is a nonlinear function and is based on gradient information of key feature pointsThe importance coefficients are adjusted by the direction parameter θ i, the local texture properties ψ i (t), and the attention weight a t.
The formula comprehensively considers a plurality of factors to evaluate the importance of the pixel points, including the distance between the pixel points and the key feature points, the self-adaptive weight of the key feature points, the gradient information of the key feature points, the direction parameters, the local texture characteristics and the attention weight. The design enables the importance coefficient to comprehensively reflect the characteristics of the target object, and helps the system to locate and identify the target more accurately.
Two-dimensional Gaussian function
The effect is to measure the influence of the distance between the current position (x, y) and the key feature point (x i(t),yi (y)) on the importance coefficient.
The principle is that the closer the pixel point is to the key feature point, the higher the importance coefficient is, and the farther the distance is, the lower the importance coefficient is.
Adaptive weight factor ω i (t):
The contribution of each key feature point to the importance coefficient is adjusted.
The principle is that the importance of different key feature points on a target object is different, and the influence of each key feature point can be flexibly adjusted through the self-adaptive weight factors.
Gradient information
The method has the effect that the calculation of the importance coefficient is enhanced by utilizing gradient information of the key feature points.
The principle is that gradient information reflects image edges and contours, which are often more critical for the identification of target objects.
Nonlinear function
The method has the effect of comprehensively considering gradient information, direction parameters, local texture characteristics and attention weights and adjusting importance coefficients.
The principle is that more complex characteristic relation can be captured through a nonlinear function, and the accuracy of the importance coefficient is improved.
The parameters were obtained as follows:
n is the number of key feature points, typically obtained by feature detection algorithms (e.g., SIFT, harris corner detection, etc.).
Omega i (t) is an adaptive weight factor, which can be set by an optimization algorithm (e.g. gradient descent) or empirically.
X i(t),yi (t) is the abscissa of the ith key feature point at time t, and is obtained by a feature detection algorithm.
Σ i (t) is a spatial scale parameter, typically set according to the scale information of the key feature points.
The gradient information of the key feature point i at time t is obtained through image gradient calculation.
Θ i is a direction parameter, typically obtained by a feature detection algorithm.
Psi i (t) is a local texture property, which can be obtained by local texture analysis (e.g. LBP).
A t is the attention weight, which can be obtained by an attention mechanism model (e.g., self-attention mechanism).
In the embodiment of the application, it is assumed that in one target detection task, an importance coefficient of a certain pixel point (x, y) at time t needs to be calculated. The specific parameters are as follows:
N=3-there are 3 key feature points. ω 1(t)=0.6,ω2(t)=0.3,ω3 (t) =0.1:adaptive weighting factor.
x1(t)=10,y1(t)=20;x2(t)=30,y2(t)=40;x3(t)=50,y3(t)=60: Coordinates of the key feature points.
Σ 1(t)=5,σ2(t)=10,σ3 (t) =15: spatial scale parameter.
Gradient information.
Θ 1=π/4,θ2=π/2,θ3 =3pi/4, direction parameters.
Psi 1(t)=0.7,ψ2(t)=0.6,ψ3 (t) =0.5: local texture properties.
A t =0.9, attention weight.
(X, y) = (25, 35) the pixel point coordinates currently calculated.
Let the nonlinear function F be defined as:
substituting formula to calculate I (x, y, t):
Each item is calculated separately:
For i=1:
ω1(t)·0.000123·0.36≈0.000027
for i=2:
ω2(t)·0.7788·0=0
For i=3:
ω3(t)·0.0625·(-0.0945)≈-0.00176
Finally, add all items:
I(25,35,t)=0.000027+0+(-0.00176)≈-0.001733
The calculation shows that the importance factor of the pixel at the location (25, 35) is about-0.001733. Negative values may indicate that the position is of less importance to the target object in the present case, possibly because the position is far from the key feature points, or because the influence of factors such as gradient information, direction parameters, local texture characteristics, etc. of the key feature points is small. This can help the system to focus on other more important areas in subsequent processing, improving the efficiency and accuracy of target detection.
The formula of the application is derived from an adaptive weight updating mechanism in a dynamic system, and is generally applied to scenes needing to adjust weights in real time to optimize performance, such as the fields of target detection, tracking, image processing and the like. This mechanism allows the system to continuously adjust its own weight distribution according to changes in the environment and newly acquired data, thereby improving the accuracy and robustness of the system. .
The alternative scheme is specifically as follows:
optionally, the updating the weight distribution in the dynamic weight map according to the result of the current optimal estimation includes:
calculating the weight distribution in the updated dynamic weight map through the following calculation formula:
W′(x,y,t+1)
=W(x,y,t)+α(t)·(β(t)·I(x,y,t)-W(x,y,t))+γ(t)
·(N(x,y,t)-μN(t))+λ·ΔW(x,y,t)+
Where W' (x, y, t+1) is the weight value of the updated dynamic weight map at the position (x, y) at time t+1, W (x, y, t) is the weight value of the current time t, α (t) is the learning rate over time, β (t) is the target importance gain coefficient over time, I (x, y, t) is as defined above, is the importance coefficient of that position, γ (t) is the noise suppression gain coefficient over time, N (x, y, t) is a function reflecting the background noise level of position (x, y) at time t, μ N (t) is the mean of the background noise level of the whole image at time t, λ is the weight coefficient of the smoothing term, ΔW (x, y, t) is the Laplace operator result at the (x, y) position, η is the weight coefficient of the additional adjustment term, Is a function based on Convolutional Neural Network (CNN) and attention mechanism to receive the current weight map W (x, y, t), context information C t and hidden state H t, and output a correction term for further optimization of the weight map.
The formula aims to construct a dynamic model capable of adaptively adjusting weight distribution according to environmental changes. By combining multiple factors (such as target importance, background noise, smoothness, etc.), the model can more intelligently allocate resources, ensure that the critical areas get enough attention, reduce noise interference, and maintain the smoothness of weight distribution.
Learning rate (α (t)): a parameter that controls the rate of update of the weight values, typically varies over time to accommodate the needs of the different stages.
Smoothing terms (lambda DeltaW (x, y, t)) weight map smoothing terms calculated using the Laplacian operator are used to maintain spatial consistency of the weight map.
Additional adjustment itemsThe function output based on convolutional neural network and attention mechanism is used to further optimize the weight map.
The target importance gain β (t) ·i (x, y, t) emphasizes the importance of the target regions, ensuring that these regions occupy higher weights in the weight map.
The noise suppression gain gamma (t) (N (x, y, t) -mu N (t)) reduces the influence of background noise and improves the signal to noise ratio.
The parameters were obtained as follows:
W (x, y, t) and W' (x, y, t+1) are obtained by initializing the model and calculating the previous time step.
The parameters α (t), β (t), γ (t), λ, η are typically determined experimentally or are automatically adjusted using some optimization algorithm (e.g., gradient descent).
I (x, y, t) is manually marked or automatically identified by an algorithm according to the characteristics and the position of the target.
N (x, y, t) and mu N (t) were obtained by statistical analysis of the background area.
ΔW (x, y, t) is calculated by applying the Laplacian operator to the current weight map.
Calculated by a pretrained convolutional neural network and an attention mechanism model.
In the present embodiment, it is assumed that an object detection system is being developed that requires tracking of a moving object in a video stream. To simplify the calculation, all parameters are set to fixed values, but in practical applications, these parameters should be dynamically adjusted.
Setting initial conditions:
W (x, y, t) 0.5, i (x, y, t) =1, n (x, y, t) =0.2, μ N(t)=0.1,ΔW(x,y,t)=0.05,Ct = [ some context information ], H t = [ some hidden states ].
Fixed parameters α (t) =0.1, β (t) =0.8, γ (t) =0.5, λ=0.01, η=0.1.
Assume that(This typically needs to be calculated by CNN and attention mechanisms).
Substituting formula to calculate W' (x, y, t+1):
W'(x,y,t+1)=0.5+0.1·(0.8·1-0.5)+0.5·(0.2-0.1)+0.01·0.05+0.1·0.05
W'(x,y,t+1)=0.5+0.03+0.005+0.0005+0.005
W'(x,y,t+1)≈0.5405
the calculation shows that the weight at location (x, y) is updated from 0.5 to about 0.5405. This means that based on the current observations and context information, the system considers this location to be of slightly increasing importance to the target detection task. This may be due to the target being present near the location or the background noise being low at the location, resulting in an increased level of trust in the area by the system. By the mode, the system can be better adapted to environmental changes, and accuracy and reliability of target detection are improved.
Fig. 2 is a schematic structural diagram of an anti-interference tracking system based on target infrared imaging features according to an embodiment of the present application, where, as shown in fig. 2, the device includes:
an acquisition module 21 for acquiring a sequence of consecutive infrared images of the target object at different points in time;
An analysis determination module 22 for determining potential motion trajectories of the target object in the sequence of infrared images using spatiotemporal correlation analysis;
A generation adjustment module 23, configured to generate a dynamic weight map based on the potential motion trail, where the dynamic weight map adaptively adjusts importance of each pixel according to changes of background noise and target features;
A combining module 24, configured to combine the dynamic weight map with the infrared image sequence to strengthen a target area and inhibit a non-target area, so as to obtain an enhanced infrared image sequence;
The real-time positioning tracking module 25 is configured to use a multi-hypothesis tracking strategy to position a target object in real time in the enhanced infrared image sequence, and update the dynamic weight map to keep tracking the target object continuously.
The anti-interference tracking system based on the target infrared imaging features shown in fig. 2 may perform an anti-interference tracking method based on the target infrared imaging features shown in the embodiment of fig. 1, and its implementation principle and technical effects are not repeated. The specific manner in which the respective modules and units perform the operations in the anti-interference tracking system based on the target infrared imaging features in the above embodiment has been described in detail in the embodiment related to the method, and will not be described in detail here.
In one possible design, an anti-interference tracking system based on target infrared imaging features of the embodiment of FIG. 2 may be implemented as a computing device, as shown in FIG. 3, which may include a storage component 31 and a processing component 32;
the storage component 31 stores one or more computer instructions for execution by the processing component 32.
The processing component 32 is configured to obtain a continuous infrared image sequence of a target object at different time points, determine a potential motion track of the target object in the infrared image sequence by using space-time correlation analysis, generate a dynamic weight map based on the potential motion track, wherein the dynamic weight map adaptively adjusts importance of each pixel according to background noise and change of target features, combine the dynamic weight map with the infrared image sequence to strengthen a target area and inhibit a non-target area to obtain an enhanced infrared image sequence, and perform real-time positioning on the target object in the enhanced infrared image sequence by adopting a multi-hypothesis tracking strategy and keep continuous tracking on the target object by updating the dynamic weight map.
Wherein the processing component 32 may include one or more processors to execute computer instructions to perform all or part of the steps of the methods described above. Of course, the processing component may also be implemented as one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors or other electronic elements for executing the methods described above.
The storage component 31 is configured to store various types of data to support operations at the terminal. The memory component may be implemented by any type or combination of volatile or nonvolatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk.
Of course, the computing device may necessarily include other components as well, such as input/output interfaces, display components, communication components, and the like.
The input/output interface provides an interface between the processing component and a peripheral interface module, which may be an output device, an input device, etc.
The communication component is configured to facilitate wired or wireless communication between the computing device and other devices, and the like.
The computing device may be a physical device or an elastic computing host provided by the cloud computing platform, and at this time, the computing device may be a cloud server, and the processing component, the storage component, and the like may be a base server resource rented or purchased from the cloud computing platform.
The embodiment of the application also provides a computer storage medium, and a computer program is stored, and when the computer program is executed by a computer, the anti-interference tracking method based on the target infrared imaging characteristics of the embodiment shown in the figure 1 can be realized.
It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, which are not repeated herein.
The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.
It should be noted that the above-mentioned embodiments are merely for illustrating the technical solution of the present application, and not for limiting the same, and although the present application has been described in detail with reference to the above-mentioned embodiments, it should be understood by those skilled in the art that the technical solution described in the above-mentioned embodiments may be modified or some technical features may be equivalently replaced, and these modifications or substitutions do not make the essence of the corresponding technical solution deviate from the spirit and scope of the technical solution of the embodiments of the present application.