CN115619815A

CN115619815A - Object tracking method and object tracking device

Info

Publication number: CN115619815A
Application number: CN202110797357.7A
Authority: CN
Inventors: 熊俊凯; 周辰威
Original assignee: Xinhua Technology Co ltd
Current assignee: Xinhua Technology Co ltd
Priority date: 2021-07-14
Filing date: 2021-07-14
Publication date: 2023-01-17

Abstract

Embodiments of the present invention provide an object tracking method and an object tracking device, which are suitable for low-latency applications. In the method, object detection is performed on one of consecutive image frames. Object detection is used to identify objects. Temporarily store consecutive image frames. Object tracking is performed on the temporarily stored continuous image frames according to the object detection results. Object tracking is used to associate objects in one of successive image frames with another. In this way, the accuracy of tracking can be improved and the requirement of low latency can be met.

Description

Object tracking method and object tracking device

技术领域technical field

本发明涉及一种图像处理技术，尤其是，还涉及一种对象追踪方法及对象追踪装置。The present invention relates to an image processing technology, and in particular, relates to an object tracking method and an object tracking device.

背景技术Background technique

对象检测(detection)及对象追踪(tracking)是计算机视觉技术中的重要研究，并已广泛应用在诸如视讯通话、医疗、行车辅助、保安等领域。Object detection (detection) and object tracking (tracking) are important researches in computer vision technology, and have been widely used in fields such as video communication, medical treatment, driving assistance, and security.

对象检测的主要功能在于辨识感兴趣区(Region of Interest，ROI)中的对象类型。对象检测的算法有很多种。例如，YOLO(You Only Look Once)是类神经网络算法，并具有数据轻量及高效率的特点。值得注意的是，YOLO第三代(version 3，V3)的架构中，上采样(upsampling)层可学习更加细微的特征，进而助于检测尺寸较小的对象。又例如，RetinaFace主要是针对人脸检测。RetinaFace可提供在自然场景下的单阶段密集脸部定位，使用特征金字塔网络(Feature Pyramid Network，FPN)负责不同尺寸的脸部(例如，更小脸部)，并采用多任务损失(multi-task loss)，进而对人脸检测提供较高的准确度。再例如，自适应增强(Adaptive Boosting，AdaBoost)使用前一个分类器分错的样本来训练下一个分类器，并加入弱分类器来增进分类结果，进而对异常数据或噪声数据有较高的敏感度。The main function of object detection is to identify object types in a Region of Interest (ROI). There are many algorithms for object detection. For example, YOLO (You Only Look Once) is a neural network algorithm with the characteristics of light data and high efficiency. It is worth noting that in the YOLO third-generation (version 3, V3) architecture, the upsampling layer can learn more subtle features, which in turn helps to detect objects of smaller size. Another example, RetinaFace is mainly for face detection. RetinaFace can provide single-stage dense face positioning in natural scenes, using Feature Pyramid Network (FPN) to be responsible for faces of different sizes (for example, smaller faces), and using multi-task loss (multi-task loss), thus providing higher accuracy for face detection. For another example, Adaptive Boosting (AdaBoost) uses the wrong samples of the previous classifier to train the next classifier, and adds weak classifiers to improve the classification results, which is more sensitive to abnormal data or noise data. Spend.

另一方面，对象追踪的主要功能在于追踪前后图像讯框(frame)所框选的相同对象。对象追踪的算法也有很多种。例如，光流法(optical flow)透过检测图像像素点的强度(intensity)随时间的变化，进而推断出对象的移动速度及方向。然而，光流法容易受光线变化、其他对象的影响而误判。又例如，最小平方误差输出和(Minimum Output Sum ofSquared Error，MOSSE)滤波器利用待检测区域与追踪目标的相关性确定待检测区域为追踪目标。值得注意的是，MOSSE滤波器可对受遮蔽的追踪目标更新滤波器参数，使得追踪目标再次出现时能对其重新追踪。再例如，尺度不变特征变换(Scale Invariant FeatureTransform，SIFT)算法确定特征点的位置、尺度及旋转不变量并对应产生特征向量，且透过匹配特征向量来确定目标的位置及方位。On the other hand, the main function of object tracking is to track the same object framed by the front and rear image frames. There are also many algorithms for object tracking. For example, the optical flow method can infer the moving speed and direction of the object by detecting the change of the intensity of image pixels over time. However, the optical flow method is easily misjudged by light changes and the influence of other objects. For another example, the Minimum Output Sum of Squared Error (MOSSE) filter utilizes the correlation between the area to be detected and the tracking target to determine the area to be detected as the tracking target. It is worth noting that the MOSSE filter can update the filter parameters for the covered tracking target, so that it can be re-tracked when the tracking target reappears. For another example, the Scale Invariant Feature Transform (SIFT) algorithm determines the position, scale and rotation invariance of feature points and generates corresponding feature vectors, and determines the position and orientation of the target by matching feature vectors.

一般而言，对象检测相较于对象追踪耗时，但对象追踪的结果可能有不准确的问题。在一些应用情境中，两种技术都可能影响使用体验。例如，实时视频会议的应用情境具有低延迟的需求。若检测耗时过长，则无法准确地框选移动中的物体。例如，对象检测经过四张讯框才得出的第一讯框中的框选结果，但目标的位置已在四张讯框之间改变，并使得实时显示的第四张讯框中的框选结果不准确。或者，追踪的目标不正确。由此可知，针对低延迟且高准确度的需求，现有技术仍有待改进。Generally speaking, object detection is time-consuming compared to object tracking, but the results of object tracking may have inaccurate problems. In some application scenarios, both technologies may affect the user experience. For example, the application scenario of real-time video conferencing has low-latency requirements. If the detection takes too long, moving objects cannot be accurately framed. For example, the frame selection result in the first frame is obtained after four frames of object detection, but the position of the object has changed between the four frames, making the frame in the fourth frame displayed in real time The selection result is inaccurate. Or, the wrong target is being tracked. It can be seen that, for the requirement of low delay and high accuracy, the existing technology still needs to be improved.

发明内容Contents of the invention

本发明实施例是针对一种对象追踪方法及对象追踪装置，基于对象检测的结果进行连续追踪，进而满足低延迟的需求并提供高准确度。Embodiments of the present invention are directed to an object tracking method and an object tracking device, which perform continuous tracking based on object detection results, thereby meeting the requirement of low latency and providing high accuracy.

根据本发明的实施例，对象追踪方法适用于低延迟应用，并包括(但不仅限于)下列步骤：对一张或更多张连续图像讯框中的一者进行对象检测。对象检测用于辨识目标。暂存连续图像讯框。依据对象检测的结果对暂存的连续图像讯框进行对象追踪。对象追踪用于将连续图像讯框中的一者与另一者中的目标相关联。According to an embodiment of the present invention, the object tracking method is suitable for low-latency applications and includes (but is not limited to) the following steps: performing object detection on one of one or more consecutive image frames. Object detection is used to identify objects. Temporarily store consecutive image frames. Object tracking is performed on the temporarily stored continuous image frames according to the object detection results. Object tracking is used to associate one of the successive image frames with an object in the other.

根据本发明的实施例，对象追踪装置适用于低延迟应用，并包括(但不仅限于)存储器及处理器。存储器用以存储程序代码。处理器耦接存储器。处理器经配置用以加载且执行程序代码而执行下列步骤：对一张或更多张连续图像讯框中的一者进行对象检测，暂存连续图像讯框，并依据对象检测的结果对暂存的连续图像讯框进行对象追踪。对象检测用于辨识目标。对象追踪用于将在连续图像讯框中的一者与另一者中的目标相关联。According to an embodiment of the present invention, an object tracking device is suitable for low-latency applications and includes, but is not limited to, a memory and a processor. The memory is used to store program codes. The processor is coupled to the memory. The processor is configured to load and execute program code to perform the following steps: perform object detection on one or more consecutive image frames, temporarily store the consecutive image frames, and perform object detection based on the result of the object detection. The stored continuous image frames are used for object tracking. Object detection is used to identify objects. Object tracking is used to associate objects in one of the successive image frames with the other.

基于上述，依据本发明实施例的对象追踪方法及对象追踪装置，暂存对象检测过程中的连续图像讯框，并待对象检测的结果得出而基于这结果追踪那些暂存的连续图像讯框中的目标。藉此，可结合对象检测的高准确度及对象追踪的高效率，并可符合低延迟应用的需求。Based on the above, according to the object tracking method and the object tracking device according to the embodiment of the present invention, the continuous image frames in the process of object detection are temporarily stored, and the result of the object detection is obtained, and those temporarily stored continuous image frames are tracked based on this result target in . In this way, the high accuracy of object detection and the high efficiency of object tracking can be combined, and the requirements of low-latency applications can be met.

附图说明Description of drawings

包含附图以便进一步理解本发明，且附图并入本说明书中并构成本说明书的一部分。附图说明本发明的实施例，并与描述一起用于解释本发明的原理。The accompanying drawings are included to provide a further understanding of the invention, and are incorporated in and constitute a part of this specification. The drawings illustrate embodiments of the invention and, together with the description, serve to explain principles of the invention.

图1是依据本发明一实施例的对象追踪装置的组件方块图；FIG. 1 is a component block diagram of an object tracking device according to an embodiment of the present invention;

图2是依据本发明一实施例的对象追踪方法的流程图；FIG. 2 is a flowchart of an object tracking method according to an embodiment of the present invention;

图3是依据本发明一实施例描述对连续图像讯框的追踪的示意图；FIG. 3 is a schematic diagram describing tracking of consecutive image frames according to an embodiment of the present invention;

图4是依据本发明一实施例的目标更新机制的流程图；FIG. 4 is a flowchart of a target update mechanism according to an embodiment of the present invention;

图5是依据本发明一实施例的对象检测及追踪的时序图；FIG. 5 is a sequence diagram of object detection and tracking according to an embodiment of the present invention;

图6是依据本发明另一实施例的对象检测及追踪的时序图；FIG. 6 is a sequence diagram of object detection and tracking according to another embodiment of the present invention;

图7是依据本发明一实施例的目标更新机制的时序图；FIG. 7 is a sequence diagram of a target update mechanism according to an embodiment of the present invention;

图8是依据本发明另一实施例的目标更新机制的时序图。FIG. 8 is a timing diagram of a target updating mechanism according to another embodiment of the present invention.

附图标号说明Explanation of reference numbers

100:对象追踪装置；100: object tracking device;

110:存储器；110: memory;

111:缓冲器；111: buffer;

130:处理器；130: processor;

131:检测追踪器；131: detect tracker;

132:检测器；132: detector;

133:主追踪器；133: main tracker;

135:次追踪器；135: secondary tracker;

S210～S250、S410～S460、S510～S530:步骤；S210～S250, S410～S460, S510～S530: steps;

F1～F5:连续图像讯框；F1～F5: continuous image frame;

ROI～ROI6:感兴趣区；ROI～ROI6: region of interest;

QF:队列讯框；QF: queue frame;

501:对象检测；501: object detection;

503:对象追踪；503: object tracking;

D1:期间；D1: period;

D2、D3、D4:周期；D2, D3, D4: period;

C1～C4:信心度；C1～C4: Confidence degree;

t1:时间点。t1: time point.

具体实施方式detailed description

现将详细地参考本发明的示范性实施例，示范性实施例的实例说明于附图中。只要有可能，相同组件符号在图式和描述中用来表示相同或相似部分。Reference will now be made in detail to the exemplary embodiments of the present invention, examples of which are illustrated in the accompanying drawings. Wherever possible, the same reference symbols are used in the drawings and descriptions to refer to the same or like parts.

图1是依据本发明一实施例的对象追踪装置100的组件方块图。请参照图1，对象追踪装置100包括(但不仅限于)存储器110及处理器130。对象追踪装置100可以是桌面计算机、笔记本电脑、智能型手机、平板计算机、服务器、监视装置、医疗检测仪器、光学检测仪器或其他运算装置。FIG. 1 is a block diagram of components of an object tracking device 100 according to an embodiment of the invention. Referring to FIG. 1 , the object tracking device 100 includes (but not limited to) a memory 110 and a processor 130 . The object tracking device 100 may be a desktop computer, a notebook computer, a smart phone, a tablet computer, a server, a monitoring device, a medical testing instrument, an optical testing instrument or other computing devices.

存储器110可以是任何型态的固定或可移动随机存取内存(Radom AccessMemory，RAM)、只读存储器(Read Only Memory，ROM)、闪存(flash memory)、传统硬盘(HardDisk Drive，HDD)、固态硬盘(Solid-State Drive，SSD)或类似组件。在一实施例中，存储器110用以记录程序代码、软件模块、组态配置、数据(例如，图像讯框、检测/追踪结果、信心度等)或其他档案，并待后文详述其实施例。The memory 110 can be any type of fixed or removable random access memory (Radom Access Memory, RAM), read only memory (Read Only Memory, ROM), flash memory (flash memory), traditional hard disk (HardDisk Drive, HDD), solid state Hard disk (Solid-State Drive, SSD) or similar components. In one embodiment, the memory 110 is used to record program codes, software modules, configuration configurations, data (such as image frames, detection/tracking results, confidence levels, etc.) or other files, and its implementation will be described in detail later. example.

在一实施例中，存储器110包括缓冲器111。缓冲器111可以是一个或更多个存储器110中的一者，也可以代表存储器110中的一个或更多个内存区块。缓冲器111用于暂存图像讯框，并待后续实施例详述其功能。一张或更多张图像讯框可以是以有线或无线连接的图像捕获设备(例如，相机、摄影机、或监视器)、服务器(例如，图像串流服务器、或云端服务器)或存储媒体(例如，闪存盘、硬盘或数据库服务器)所提供。In one embodiment, the memory 110 includes a buffer 111 . The buffer 111 may be one of the one or more memories 110 , and may also represent one or more memory blocks in the memories 110 . The buffer 111 is used for temporarily storing image frames, and its function will be described in detail in the following embodiments. One or more image frames can be wired or wirelessly connected to an image capture device (such as a camera, video camera, or monitor), a server (such as an image streaming server, or a cloud server), or a storage medium (such as , flash disk, hard disk or database server).

处理器130耦接存储器110，处理器130并可以是中央处理单元(CentralProcessing Unit，CPU)、图形处理单元(Graphic Processing unit，GPU)，或是其他可程序化的一般用途或特殊用途的微处理器(Microprocessor)、数字信号处理器(DigitalSignal Processor，DSP)、可程序化控制器、现场可程序化逻辑门阵列(FieldProgrammable Gate Array，FPGA)、特殊应用集成电路(Application-SpecificIntegrated Circuit，ASIC)、神经网络加速器或其他类似组件或上述组件的组合。在一实施例中，处理器130用以执行对象追踪装置100的所有或部份作业，且可加载并执行存储器110所记录的程序代码、软件模块、档案及数据。在一些实施例中，处理器130的功能可透过软件实现。The processor 130 is coupled to the memory 110, and the processor 130 may be a central processing unit (Central Processing Unit, CPU), a graphics processing unit (Graphic Processing unit, GPU), or other programmable general-purpose or special-purpose microprocessors Microprocessor, Digital Signal Processor (DSP), Programmable Controller, Field Programmable Gate Array (Field Programmable Gate Array, FPGA), Application-Specific Integrated Circuit (ASIC), A neural network accelerator or other similar component or combination of the above. In one embodiment, the processor 130 is used to execute all or part of the operations of the object tracking device 100 , and can load and execute program codes, software modules, files and data recorded in the memory 110 . In some embodiments, the functions of the processor 130 can be realized by software.

处理器130包括检测追踪器131及次追踪器135。检测追踪器131及次追踪器135中的任一者或两者可由独立的数字电路、芯片、神经网络加速器或其他处理器实现，或其功能可由软件实现。The processor 130 includes a detection tracker 131 and a secondary tracker 135 . Either or both of the detection tracker 131 and the sub-tracker 135 can be realized by independent digital circuits, chips, neural network accelerators or other processors, or their functions can be realized by software.

在一实施例中，检测追踪器131包括检测器132及主追踪器133。检测器132用以进行对象检测。对象检测例如是确定图像讯框中对应于目标(例如，人、动物、非生物体或其部位的对象)的感兴趣区(Region of Interest，ROI)(或是定界框(bounding box)、矩形框(bounding rectangle))，进而辨识目标的类型(例如，男性或女性、狗或猫、桌或椅、车或号志灯等)。检测器132例如可应用基于神经网络的算法(例如，YOLO、基于区域的卷积神经网络(Region Based Convolutional Neural Networks，R-CNN)、或快速R-CNN(Fast CNN))或是基于特征匹配的算法(例如，方向梯度直方图(Histogram of Oriented Gradient，HOG)、Harr、或加速稳健特征(Speeded Up Robust Features，SURF)的特征比对)实现对象检测。须说明的是，本发明实施例不加以限制检测器132所用的算法。In one embodiment, the detection tracker 131 includes a detector 132 and a main tracker 133 . The detector 132 is used for object detection. Object detection is, for example, determining a region of interest (Region of Interest, ROI) (or a bounding box, Rectangular box (bounding rectangle)), and then identify the type of the target (for example, male or female, dog or cat, table or chair, car or traffic lights, etc.). The detector 132, for example, can apply an algorithm based on a neural network (for example, YOLO, a region-based convolutional neural network (Region Based Convolutional Neural Networks, R-CNN), or a fast R-CNN (Fast CNN)) or based on feature matching Object detection can be achieved using algorithms such as Histogram of Oriented Gradient (HOG), Harr, or Feature Alignment of Speeded Up Robust Features (SURF). It should be noted that the embodiment of the present invention does not limit the algorithm used by the detector 132 .

在一实施例中，主追踪器133及次追踪器135用以进行对象追踪。对象追踪用于将连续图像讯框中的一者与另一者中的目标相关联。连续图像讯框代表影片或视讯串流的那些连续的图像讯框。而对象追踪例如是判断相邻图像讯框中相同目标(可由感兴趣区框选其对应位置)的位置、移动、方向及其他运动的关联性，进而定位移动中的目标。主追踪器133及次追踪器135例如可应用光流法、排序法(Simple Online And Realtime Tracking，SORT)、深度排序法(Deep SORT)、联合检测及嵌入向量(Joint Detection and Embedding，JDE)模型或其他追踪算法实现对象追踪。须说明的是，本发明实施例不加以限制主追踪器133及次追踪器135所用的算法，且主追踪器133及次追踪器135可使用相同或不同的算法。In one embodiment, the primary tracker 133 and the secondary tracker 135 are used for object tracking. Object tracking is used to associate one of the successive image frames with an object in the other. Sequential image frames represent those consecutive image frames of a video or video stream. Object tracking, for example, is to determine the position, movement, direction and correlation of other motions of the same target (the corresponding position can be selected by the region of interest) in adjacent image frames, and then locate the moving target. The main tracker 133 and the secondary tracker 135 can apply optical flow method, sorting method (Simple Online And Realtime Tracking, SORT), depth sorting method (Deep SORT), joint detection and embedding vector (Joint Detection and Embedding, JDE) model, for example Or other tracking algorithms to achieve object tracking. It should be noted that the embodiments of the present invention do not limit the algorithms used by the primary tracker 133 and the secondary tracker 135 , and the primary tracker 133 and the secondary tracker 135 can use the same or different algorithms.

在一些实施例中，对象追踪装置100可还包括显示器(图未示)。显示器耦接处理器130。显示器可以是液晶显示器(Liquid-Crystal Display，LCD)、发光二极管(Light-Emitting Diode，LED)显示器、有机发光二极管(Organic Light-Emitting Diode，OLED)、量子点显示器(Quantum dot display)或其他类型显示器。在一实施例中，显示器用以显示图像讯框或经对象检测/追踪的图像讯框。In some embodiments, the object tracking device 100 may further include a display (not shown). The display is coupled to the processor 130 . The display can be a liquid crystal display (Liquid-Crystal Display, LCD), a light-emitting diode (Light-Emitting Diode, LED) display, an organic light-emitting diode (Organic Light-Emitting Diode, OLED), a quantum dot display (Quantum dot display) or other types monitor. In one embodiment, the display is used for displaying image frames or object detected/tracked image frames.

下文中，将搭配对象追踪装置100中的各项装置、组件及/或模块说明本发明实施例所述的方法。本方法的各个流程可依照实施情形而调整，且并不仅限于此。Hereinafter, the methods described in the embodiments of the present invention will be described in conjunction with various devices, components and/or modules in the object tracking device 100 . Each process of the method can be adjusted according to the implementation situation, and is not limited thereto.

图2是依据本发明一实施例的对象追踪方法的流程图。请参照图2，检测追踪器131的检测器132对一张或更多张连续图像讯框中的一者进行对象检测(步骤S210)。具体而言，在一些应用情境中，例如是视讯通话、图像串流、图像监控或游戏，处理器130可取得一张或更多张连续的图像讯框(本文称为连续图像讯框)。连续图像讯框是指基于图像捕获设备或录制图像的影格率(例如以每秒显示讯框数(Frames Per Second，FPS)或频率计量)下相邻图像讯框的集合。例如，若影格率为60FPS，则一秒内的60张图像讯框可被称为连续图像讯框。然而，连续图像讯框不以一秒内的图像讯框为限。例如，连续图像讯框亦可为一秒半、两秒或二又三分之一秒内的图像讯框。FIG. 2 is a flowchart of an object tracking method according to an embodiment of the invention. Referring to FIG. 2 , the detector 132 of the detection tracker 131 performs object detection on one of one or more consecutive image frames (step S210 ). Specifically, in some application scenarios, such as video calling, image streaming, image monitoring or games, the processor 130 can obtain one or more continuous image frames (herein referred to as continuous image frames). A continuous image frame refers to a collection of adjacent image frames based on the image capture device or the frame rate of the recorded image (eg, measured in Frames Per Second (FPS) or frequency). For example, if the frame rate is 60FPS, 60 image frames within one second can be called continuous image frames. However, the continuous image frame is not limited to the image frame within one second. For example, the continuous image frame can also be an image frame within one and a half seconds, two seconds or two and one-third seconds.

反应于连续图像讯框的输入(例如，源自于图像捕获设备、服务器或存储媒体，并可存储于存储器110)，检测器132自存储器110存取输入的一张连续图像讯框。在一实施例中，为了达到实时处理的功能，检测器132可对当前输入的第一张连续图像讯框进行对象检测。在另一实施例中，检测器132可对输入的其他张连续图像讯框进行对象检测。即，忽略第一张连续图像讯框或忽略复数张连续图像讯框。须说明的是，这处的第一张讯框表示在某一时间点下所输入的第一张讯框或是这时间点下对存储器110存取的第一张讯框，且不限于图像或视讯串流的起始讯框。In response to the input of consecutive image frames (eg, originating from the image capture device, server or storage medium, and stored in the memory 110 ), the detector 132 accesses an input consecutive image frame from the memory 110 . In one embodiment, in order to achieve the function of real-time processing, the detector 132 may perform object detection on the first continuous image frame currently input. In another embodiment, the detector 132 may perform object detection on other input continuous image frames. That is, ignore the first continuous image frame or ignore multiple consecutive image frames. It should be noted that the first frame here means the first frame input at a certain time point or the first frame frame accessed to the memory 110 at this point in time, and is not limited to images or the starting frame of the video stream.

另一方面，对象检测的说明可参照前述针对检测器132的说明，且于这不再赘述。On the other hand, the description of the object detection can refer to the above description for the detector 132 , and will not be repeated here.

举例而言，图3是依据本发明一实施例描述检测追踪器131对连续图像讯框的追踪的示意图。请参照图3，检测器132在连续图像讯框F1～F4中的第一张连续图像讯框F1决定目标所处位置对应的感兴趣区ROI，并据以辨识这感兴趣区ROI中的目标。须说明的是，图3所示的第二张至第四张连续图像讯框F2～F4代表第一张连续图像讯框F1的后续图像讯框。For example, FIG. 3 is a schematic diagram describing the tracking of continuous image frames by the detection tracker 131 according to an embodiment of the present invention. Please refer to FIG. 3 , the detector 132 determines the ROI corresponding to the location of the target in the first consecutive image frame F1 among the consecutive image frames F1-F4, and identifies the target in the ROI accordingly. . It should be noted that the second to fourth consecutive image frames F2-F4 shown in FIG. 3 represent subsequent image frames of the first continuous image frame F1.

处理器130可暂存一张或更多张连续图像讯框(步骤S230)至缓冲器111。具体而言，部分低延迟应用需要对输入、存取或撷取的图像实时处理。低延迟应用相关于一张连续讯框图像的输入时间点与对同张连续讯框图像的输出时间点之间的时间延迟在特定容许时间内的视讯应用。例如，视讯通话/会议、或直播串流。依据不同需求，这些视讯应用可能额外提供诸如人脸检测、亮度调整、特效处理或其他图像处理。然而，若图像处理期间过长，则将影响应用的体验结果。例如，在实时视频会议中，若人脸检测期间过长，则头部的运动可能导致检测结果所得的人脸位置偏离当前输出图像中的人脸位置，并使显示的图像无法准确框选人脸。因此，在本发明实施例中，对象检测过程中所接收的连续图像讯框可保留下来，使对象检测的结果可更新所保留的图像讯框中的追踪目标，并使得这张图像讯框的输出时间点可晚于其对象检测的结束时间点。The processor 130 may temporarily store one or more consecutive image frames (step S230 ) into the buffer 111 . Specifically, some low-latency applications require real-time processing of incoming, accessed, or captured images. The low-latency application relates to a video application in which the time delay between the input time point of a continuous frame image and the output time point of the same continuous frame image is within a certain allowable time. For example, video calling/meeting, or live streaming. According to different needs, these video applications may additionally provide such as face detection, brightness adjustment, special effect processing or other image processing. However, if the image processing period is too long, it will affect the experience result of the application. For example, in a real-time video conference, if the face detection period is too long, the movement of the head may cause the face position obtained by the detection result to deviate from the face position in the current output image, and the displayed image cannot accurately frame the person. Face. Therefore, in the embodiment of the present invention, the continuous image frames received during the object detection process can be retained, so that the object detection result can update the tracking target in the retained image frame, and make the image frame of this image frame The output time point can be later than the end time point of its object detection.

在一实施例中，在步骤S210的对象检测的全部或部分期间中，处理器130可暂存这期间内输入(系统，例如，对象追踪装置100)的一张或更多张连续图像讯框在缓冲器111。以图3为例，在检测器132对连续图像讯框F1进行对象检测。而在检测器132接收连续图像讯框F1及得出连续图像讯框F1中的感兴趣区ROI之间，存储器110依序存储连续图像讯框F1～F4。处理器130可将这些连续图像讯框F1～F4作为队列讯框QF并存储在缓冲器111中。In one embodiment, during all or part of the object detection in step S210, the processor 130 may temporarily store one or more continuous image frames input (system, for example, the object tracking device 100) during this period in buffer 111. Taking FIG. 3 as an example, the detector 132 performs object detection on the continuous image frames F1. While the detector 132 receives the consecutive image frames F1 and obtains the ROI in the consecutive image frames F1, the memory 110 sequentially stores the consecutive image frames F1-F4. The processor 130 can store these consecutive image frames F1 - F4 as a queue frame QF in the buffer 111 .

在另一实施例中，处理器130可进一步暂存对象检测期间外所存取的其他连续图像讯框。例如，处理器130将对象检测期间之前的最后一张或期间之后的下一张连续图像讯框。In another embodiment, the processor 130 may further temporarily store other consecutive image frames accessed outside the object detection period. For example, the processor 130 frames the last image before the object detection period or the next continuous image after the period.

又一实施例中，处理器130可暂存在对象追踪完结前的全部或部分期间中输入至系统的一个或更多个连续图像讯框。In yet another embodiment, the processor 130 may temporarily store one or more consecutive image frames input to the system during all or part of the period before the object tracking is completed.

须说明的是，图3所示范例是以对象检测的期间内所有连续图像讯框皆暂存到缓冲器111，然不以此为限。It should be noted that in the example shown in FIG. 3 , all consecutive image frames are temporarily stored in the buffer 111 during the object detection period, but it is not limited thereto.

在一实施例中，处理器130可比较暂存的那些连续图像讯框与数量上限。这数量上限相关于缓冲器111的空间大小、检测器132的检测速度或处理效率的需求。例如，数量上限为8、10或20张。处理器130可依据连续图像讯框与数量上限的比较结果删除暂存的那些连续图像讯框中的至少一者。反应于暂存的那些连续图像讯框等于或大于数量上限，处理器130可删除缓冲器111中部分的连续图像讯框。例如，处理器130可删除排序在偶数顺位或奇数顺位的连续图像讯框，或者随机数删除缓冲器111中特定数量的连续讯框。另一方面，反应于暂存的那些连续图像讯框未达到数量上限，处理器130可保留缓冲器111中全部或部分的连续图像讯框。In one embodiment, the processor 130 can compare those consecutive image frames stored temporarily with the upper limit. The upper limit of the number is related to the space size of the buffer 111 , the detection speed of the detector 132 or the requirement of processing efficiency. For example, the maximum quantity is 8, 10 or 20 sheets. The processor 130 may delete at least one of the temporarily stored consecutive image frames according to the comparison result of the consecutive image frames and the upper limit. In response to those consecutive image frames temporarily stored being equal to or greater than the upper limit, the processor 130 may delete some of the consecutive image frames in the buffer 111 . For example, the processor 130 may delete consecutive image frames sorted in even order or odd order, or randomly delete a specific number of consecutive frames in the buffer 111 . On the other hand, the processor 130 may reserve all or part of the consecutive image frames in the buffer 111 in response to the fact that the number of consecutive image frames temporarily stored does not reach the upper limit.

在另一实施例中，若缓冲器111的空间容许对象检测期间所收到的所有连续图像讯框，则处理器130可保留那些连续图像讯框。In another embodiment, if the space of the buffer 111 allows all consecutive image frames received during object detection, the processor 130 may retain those consecutive image frames.

须说明的是，数量上限可能固定，也可能反应于检测器132的实时处理速度、系统运算复杂度、后续应用需求等因素而变动。It should be noted that the upper limit of the number may be fixed, or may change in response to factors such as the real-time processing speed of the detector 132 , the complexity of the system operation, and subsequent application requirements.

主追踪器133可依据对象检测的结果对暂存的一张或更多张连续图像讯框进行对象追踪(步骤S250)。在一实施例中，对象检测的结果包括目标的感兴趣区。如图3所示的感兴趣区ROI，感兴趣区ROI对应于在受对象检测的那张连续图像讯框中的目标的位置。须说明的是，感兴趣区ROI可能完全或部分框选到目标，且本发明实施例不加以限制。在一些实施例中，对象检测的结果还包括目标的类型。The main tracker 133 may perform object tracking on one or more temporarily stored consecutive image frames according to the object detection result (step S250 ). In one embodiment, the result of the object detection includes the ROI of the object. As shown in FIG. 3 , the ROI corresponds to the position of the object in the continuous image frame subject to object detection. It should be noted that the region of interest ROI may completely or partially frame the target, and this embodiment of the present invention does not limit it. In some embodiments, the result of the object detection also includes the type of the target.

另一方面，对象追踪的说明可参照前述针对主追踪器133的说明，且于这不再赘述。On the other hand, the description of the object tracking can refer to the above description for the main tracker 133 , and will not be repeated here.

此外，反应于连续图像讯框中的某一者的对象检测的完成(即，取得对象检测的结果，例如，图3所示检测到连续图像讯框F1的感兴趣区ROI)，主追踪器133才对缓冲器111中的一张或更多张连续图像讯框进行对象追踪。换句而言，在对第一张连续图像讯框的对象检测完成之前，主追踪器133禁能或不追踪第一张连续图像讯框或后续输入的其他连续图像讯框。In addition, in response to the completion of the object detection of one of the consecutive image frames (i.e., obtaining the result of object detection, for example, detecting the ROI of the consecutive image frame F1 shown in FIG. 3 ), the main tracker 133 to perform object tracking on one or more consecutive image frames in the buffer 111. In other words, before the object detection on the first consecutive image frame is completed, the main tracker 133 disables or does not track the first consecutive image frame or other subsequent input consecutive image frames.

在一实施例中，主追踪器133可决定对象检测的结果中的感兴趣区在暂存的那些连续图像讯框之间的关联性，并依据这关联性决定出另一个感兴趣区。这关联性相关于一个或更多个感兴趣区中的一个或更多个目标在相邻连续图像讯框之间的位置、方位及/或速度。In one embodiment, the main tracker 133 can determine the correlation between the ROIs in the object detection result and those consecutive image frames temporarily stored, and determine another ROI according to the correlation. The correlation relates to the position, orientation and/or velocity of one or more objects in the one or more regions of interest between adjacent consecutive image frames.

以图3为例，主追踪器133在连续图像讯框F1～F4连续追踪检测器132所得出的感兴趣区ROI中的目标，并随目标移动而更新成感兴趣区ROI2。Taking FIG. 3 as an example, the main tracker 133 continuously tracks the object in the region of interest ROI obtained by the detector 132 in consecutive image frames F1 - F4 , and updates the region of interest ROI2 as the object moves.

在一实施例中，假设对象检测的结果包括对应于目标的检测感兴趣区(即，对应于在受对象检测的那张连续图像讯框中的目标的位置)。此外，假设追踪感兴趣区是指对象追踪先前追踪的区域。换句而言，追踪感兴趣区是当前时间点下或邻近时间点之前对象追踪在一张或更多张连续图像讯框中作为追踪基础的感兴趣区。主追踪器133可将对象追踪所针对的追踪感兴趣区更新为对象检测所得出的检测感兴趣区。换句而言，追踪感兴趣区直接被检测感兴趣区取代。In one embodiment, it is assumed that the result of the object detection includes a detection ROI corresponding to the object (ie, corresponding to the position of the object in the consecutive image frames subject to object detection). In addition, it is assumed that tracking a region of interest means that the object tracks a previously tracked area. In other words, the tracking ROI is the ROI that the object is tracked in one or more consecutive image frames at or before the current time point as a tracking basis. The main tracker 133 can update the tracking ROI targeted by the object tracking to the detection ROI obtained from the object detection. In other words, tracking ROIs is directly replaced by detecting ROIs.

图4是依据本发明一实施例的目标更新机制的流程图。请参照图4，处理器130自存储器110存取输入的连续图像讯框(步骤S410)，并透过检测追踪器131检测存取的那一张连续图像讯框中的目标。此时，次追踪器135可能完成先前连续图像讯框的追踪，并进一步决定检测追踪器131是否忙碌(步骤S420)？然而，无论检测追踪器131是否忙碌，次追踪器135仍使用先前连续图像讯框所得出的感兴趣区来追踪目标(步骤S430)。另一方面，若检测追踪器131未忙碌，代表已得出一检测追踪感兴趣区(即，即132检测器完成检测，且133主追踪器已完成追踪所有暂存的连续图像)(步骤S440)，则主追踪器133可使用检测器132所输出的新的感兴趣区更新当前追踪的感兴趣区(即，更新追踪目标，步骤S450)，并于连续追踪完毕/完结并暂存的所有连续图像讯框后得出一检测追踪感兴趣区，与次追踪器135追踪得出的一追踪感兴趣区相比较或计算，选择其一或混和运算得出一最终感兴趣区，用来更新次追踪器135当前追踪的感兴趣区(步骤S460)。FIG. 4 is a flowchart of an object update mechanism according to an embodiment of the invention. Referring to FIG. 4 , the processor 130 accesses the input continuous image frame from the memory 110 (step S410 ), and detects the object in the accessed continuous image frame through the detection tracker 131 . At this time, the sub-tracker 135 may complete the tracking of the previous continuous image frames, and further determine whether the detection tracker 131 is busy (step S420 ). However, no matter whether the detection tracker 131 is busy or not, the secondary tracker 135 still uses the ROI obtained from the previous consecutive image frames to track the target (step S430 ). On the other hand, if the detection tracker 131 is not busy, it means that a detection and tracking region of interest has been obtained (that is, the 132 detector has completed detection, and 133 the main tracker has completed tracking all temporarily stored continuous images) (step S440 ), then the main tracker 133 can use the new ROI output by the detector 132 to update the currently tracked ROI (that is, update the tracking target, step S450), and after the continuous tracking is completed/finished and temporarily stored A detection and tracking region of interest is obtained after the continuous image frame, compared or calculated with a tracking region of interest obtained by the secondary tracker 135, and one of them is selected or mixed and calculated to obtain a final region of interest for updating The ROI that the secondary tracker 135 is currently tracking (step S460).

在一实施例中，处理器130可依据对象检测追踪结果产生的时间禁能对先前追踪感兴趣区的对象追踪。假设检测追踪器131于次追踪器135开始一轮追踪但尚未完结的过程中产出一检测追踪结果，次追踪器135可在开启下一轮检测追踪前禁能或不进行对象追踪。而在下一个对象追踪的周期中，次追踪器135直接以将该检测追踪结果作为基础开始追踪。In one embodiment, the processor 130 may disable the tracking of the object that previously tracked the ROI according to the time generated by the object detection tracking result. Assuming that the detection tracker 131 produces a detection tracking result when the secondary tracker 135 starts a round of tracking but has not completed it, the secondary tracker 135 can disable or not perform object tracking before starting the next round of detection tracking. In the next cycle of object tracking, the secondary tracker 135 directly starts tracking based on the detection and tracking results.

举例而言，图5是依据本发明一实施例的对象检测及追踪的时序图，用以详加解释图4的步骤S460的决策机制。请参照图5，在检测追踪器131进行对象检测追踪501的期间D1中，次追踪器135的对象追踪503完成两张连续图像讯框的追踪。次追踪器135进行第三张连续图像讯框的对象追踪503的过程中，检测追踪器131即已完成或几乎完成对象检测追踪501。即，在第三个对象追踪503期间D2中，检测追踪器131执行对象检测追踪501并据以得出新的感兴趣区(步骤S510)。而在下一次检测追踪器131开启对象检测追踪501之前后一定期间内，对象追踪503可以对象检测追踪501所得出新的感兴趣区为基础进行追踪(步骤S530)。在另一实施例中，重新开启追踪的次追踪器135可基于对象检测追踪501所最新得出的检测追踪感兴趣区及前一次对象追踪503得出的追踪感兴趣区进行对象追踪503。例如，次追踪器135可使用检测追踪感兴趣区及追踪感兴趣区两者的加权平均。其中，加权平均所用的权重可端视应用者的需求而自行变更，且本发明实施例不加以限制。或者，次追踪器135可自检测追踪感兴趣区及追踪感兴趣区两者中择一。For example, FIG. 5 is a sequence diagram of object detection and tracking according to an embodiment of the present invention, which is used to explain the decision-making mechanism of step S460 in FIG. 4 in detail. Referring to FIG. 5 , during the period D1 when the detection tracker 131 performs object detection and tracking 501 , the object tracking 503 of the secondary tracker 135 completes the tracking of two consecutive image frames. While the secondary tracker 135 is performing the object tracking 503 of the third continuous image frame, the detection tracker 131 has completed or almost completed the object detection and tracking 501 . That is, during the third object tracking 503 period D2, the detection tracker 131 executes the object detection tracking 501 and obtains a new ROI accordingly (step S510). And within a certain period before and after the next time the detection tracker 131 starts the object detection and tracking 501 , the object tracking 503 can perform tracking based on the new ROI obtained by the object detection and tracking 501 (step S530 ). In another embodiment, the secondary tracker 135 that restarts the tracking may perform object tracking 503 based on the newly obtained detection tracking ROI obtained from the object detection tracking 501 and the tracking ROI obtained from the previous object tracking 503 . For example, the secondary tracker 135 may use a weighted average of both the detection tracking ROI and the tracking ROI. Wherein, the weight used in the weighted average can be changed according to the requirement of the user, and is not limited by the embodiment of the present invention. Alternatively, the secondary tracker 135 may select one of detecting and tracking the ROI and tracking the ROI.

在一实施例中，处理器130可决定最近一次对象检测追踪501完成时间点与最近一次对象追踪503完成时间点之间的时间差。这时间差代表次追踪器135最新得出结果的时间点是否接近于检测追踪器131最新得出结果的时间点。次追踪器135以及检测追踪器131可依据这时间差决定是否使用检测追踪感兴趣区及追踪感兴趣区两者进行对象追踪以及对象检测。In one embodiment, the processor 130 may determine the time difference between the time point when the latest object detection tracking 501 is completed and the time point when the last object tracking 503 is completed. The time difference represents whether the time point of the latest result obtained by the secondary tracker 135 is close to the time point of the latest result obtained by the detection tracker 131 . The secondary tracker 135 and the detection tracker 131 can decide whether to use both the detection and tracking ROI and the tracking ROI for object tracking and object detection according to the time difference.

举例而言，图6是依据本发明另一实施例的对象检测及追踪的时序图。请参照图6，对象追踪503不考虑对象追踪503是否得出结果，一直持续执行。然而，次追踪器135可判断期间D1的结尾与周期D4的结果之间的时间差，并将这时间差与差异阈值比较。若这时间差小于差异阈值，则对象追踪503可使用周期D4所得出的感兴趣区及对象检测追踪501在期间D1所得出的感兴趣区两个的加权平均。另一方面，若这时间差未小于差异阈值，则对象追踪503以及对象检测追踪501仅使用对象检测追踪501在期间D1所得出的感兴趣区。For example, FIG. 6 is a timing diagram of object detection and tracking according to another embodiment of the present invention. Please refer to FIG. 6 , the object tracking 503 is continuously executed regardless of whether the object tracking 503 obtains a result. However, secondary tracker 135 may determine the time difference between the end of period D1 and the result of period D4 and compare this time difference to a difference threshold. If the time difference is less than the difference threshold, the object tracking 503 can use the weighted average of the ROI obtained during the period D4 and the ROI obtained during the period D1 by the object detection tracking 501 . On the other hand, if the time difference is not less than the difference threshold, then the object tracking 503 and the object detection tracking 501 only use the ROI obtained by the object detection tracking 501 during the period D1.

在一实施例中，假设对象检测的期间未被记录，次追踪器135可依据追踪感兴趣区(即，次追踪器135先前追踪的区域)在对象追踪的信心度决定将追踪感兴趣区更新为检测追踪感兴趣区(即，对象检测追踪的结果)。在一些应用情境中，对象追踪的目标可能突然被遮蔽，使得对象追踪的结果可能信心度较低(例如，小于信心度阈值)。此时，当次追踪器135的对象追踪完结时，次追踪器135可更新成对象检测追踪的结果或使用检测追踪感兴趣区及追踪感兴趣区两者的加权平均，并作为最终感兴趣区。In one embodiment, assuming that the period of object detection has not been recorded, the secondary tracker 135 may decide to update the tracking ROI according to the confidence level of the tracking ROI (that is, the area previously tracked by the secondary tracker 135 ) in object tracking. A region of interest is tracked for detection (ie, the result of object detection tracking). In some application scenarios, the target of object tracking may be occluded suddenly, so that the confidence of the object tracking result may be low (for example, less than a confidence threshold). At this time, when the object tracking of the secondary tracker 135 is completed, the secondary tracker 135 can update the result of object detection and tracking or use the weighted average of the detection tracking ROI and the tracking ROI as the final ROI .

举例而言，图7是依据本发明一实施例的目标更新机制的时序图。请参照图7，假设次追踪器135针对连续图像讯框F1～F4的结果的信心度C1～C4中感兴趣区ROI3的信心度C4小于信心度阈值。此时，次追踪器135可将感兴趣区ROI3更新为检测追踪器131所得出的感兴趣区ROI4。又例如，若信心度C1～C4中小于信心度阈值的数量大于数量阈值，则次追踪器135也可将感兴趣区ROI3更新为检测追踪器131所得出的感兴趣区ROI4。再例如，次追踪器135也可使用感兴趣区ROI3,ROI4两者的加权平均，且感兴趣区ROI3的权重可较低。For example, FIG. 7 is a sequence diagram of a target updating mechanism according to an embodiment of the present invention. Referring to FIG. 7 , it is assumed that the confidence level C4 of the ROI3 among the confidence levels C1 - C4 of the results of the secondary tracker 135 for the consecutive image frames F1 - F4 is less than the confidence level threshold. At this time, the secondary tracker 135 can update the ROI3 to the ROI4 obtained by the detection tracker 131 . For another example, if the number of the confidence levels C1 - C4 is smaller than the confidence level threshold is greater than the number threshold, the secondary tracker 135 may also update the ROI3 to the ROI4 obtained by the detection tracker 131 . For another example, the secondary tracker 135 may also use the weighted average of ROI3 and ROI4, and the weight of ROI3 may be lower.

在一实施例中，次追踪器135可依据场景转换的检测结果决定将追踪感兴趣区(即，次追踪器135先前追踪的区域)更新为检测追踪感兴趣区(即，对象检测追踪的结果)。场景转换相关于相邻的二张连续图像讯框的场景不同。处理器130可判断背景的颜色、对比度或特定图案的变化程度，并据以得出场景转换的检测结果(例如，场景不同/已转换或相同/未转换)。例如，变化程度大于变化阈值，则检测结果为场景已转换，且次追踪器135可更新感兴趣区。又例如，变化程度未大于变化阈值，则检测结果为场景未转换，且次追踪器135可维持追踪感兴趣区或使用检测追踪感兴趣区及追踪感兴趣区两者。In one embodiment, the secondary tracker 135 may decide to update the tracking ROI (that is, the area previously tracked by the secondary tracker 135) to the detection and tracking ROI (ie, the object detection tracking result) according to the detection result of the scene transition. ). The scene transition is different from the scenes of two adjacent consecutive image frames. The processor 130 can determine the change degree of the background color, contrast or specific pattern, and obtain the detection result of the scene transition (for example, the scene is different/converted or the same/unconverted). For example, if the change degree is greater than the change threshold, the detection result is that the scene has changed, and the secondary tracker 135 can update the ROI. For another example, if the degree of change is not greater than the change threshold, the detection result is that the scene has not changed, and the secondary tracker 135 can keep tracking the ROI or use detection to track the ROI and track the ROI.

举例而言，图8是依据本发明另一实施例的目标更新机制的时序图。请参照图8，假设处理器130在时间点t1检测到场景转换已转换。例如，连续图像讯框F2的内容是白天，但连续图像讯框F3的内容是夜晚。此外，针对连续图像讯框F3，次追踪器135可将连续图像讯框F2所得出的感兴趣区ROI5更新成检测追踪器131最近输出的感兴趣区ROI6。For example, FIG. 8 is a sequence diagram of a target updating mechanism according to another embodiment of the present invention. Referring to FIG. 8 , assume that the processor 130 detects that the scene transition has been converted at time point t1. For example, the content of the continuous image frame F2 is daytime, but the content of the continuous image frame F3 is night. In addition, for the consecutive image frames F3 , the sub-tracker 135 can update the ROI5 obtained from the consecutive image frames F2 to the ROI6 recently output by the detection tracker 131 .

在一实施例中，反应于一张或更多张连续图像讯框中的一者的对象追踪的完成，处理器130可要求这对象追踪的结果的显示。例如，处理器130可透过显示器显示连续图像讯框及对象追踪所框选的感兴趣区。In one embodiment, in response to completion of object tracking in one or more consecutive image frames, processor 130 may request display of the results of the object tracking. For example, the processor 130 can display continuous image frames and the ROI framed by the object tracking through the display.

以图3为例，表(1)是时间关系表：Taking Figure 3 as an example, table (1) is a time relationship table:

表(1)Table 1)

在检测器132检测连续图像讯框F1的期间中，处理器130输入连续图像讯框F1～F4至缓冲器111。此时，显示器所显示的连续图像讯框F1～F3尚未有对象检测或对象追踪的结果。当显示器显示连续图像讯框F4时，主追踪器133可使用检测器132所输出的感兴趣区追踪暂存的那些连续图像讯框F1～F4中的目标，并可据以显示对象追踪的结果(如图3所示连续图像讯框F4中的感兴趣区ROI2)。于其他实施例中，感兴趣区ROI2被用以与次追踪器135追踪得出的追踪感兴趣区相比较或计算，选择其一或混和运算得出最终感兴趣区，当显示器显示连续图像讯框F4时，并同时显示该最终感兴趣区。During the period when the detector 132 detects the continuous image frames F1 , the processor 130 inputs the continuous image frames F1 ˜ F4 to the buffer 111 . At this time, the continuous image frames F1-F3 displayed on the display have no object detection or object tracking results yet. When the continuous image frame F4 is displayed on the display, the main tracker 133 can use the region of interest output by the detector 132 to track the targets in those consecutive image frames F1-F4 temporarily stored, and can display the object tracking results accordingly (the region of interest ROI2 in the continuous image frame F4 shown in FIG. 3 ). In other embodiments, the region of interest ROI2 is used to compare or calculate with the tracking region of interest obtained by the secondary tracker 135, select one or mix and calculate the final region of interest, when the display shows continuous image information box F4, and at the same time display the final ROI.

在一实施例中，检测器132可对缓冲器111所暂存的那些连续图像讯框之后的图像讯框进行对象检测，并禁能或不对原先暂存的那些连续图像讯框中的其他者进行对象检测。也就是说，检测器132禁能或不对所有输入的连续图像讯框进行对象检测。检测器132针对单一讯框的检测期间可能远大于主追踪器133针对单一讯框的追踪期间，且检测期间甚至无法因应应用情境的低延迟需求。待检测器132输出一笔结果，检测期间的其他连续图像讯框可能已被多次要求输出或其他处理。如表(1)所示，显示器输出连续图像讯框F1～F3，但检测器132仍在进行连续图像讯框F1的对象检测。反应于对象检测的结果输出，检测器132可直接对新输入的连续图像讯框进行对象检测，而禁能或不对先前暂存的其他连续图像讯框继续进行对象检测。以图3为例，检测器132检测连续图像讯框F4之后输入的图像讯框。In one embodiment, the detector 132 may perform object detection on the image frames after those consecutive image frames temporarily stored in the buffer 111, and disable or disable the object detection on other of those consecutive image frames previously buffered. Perform object detection. That is, the detector 132 disables or does not perform object detection on all input consecutive image frames. The detection period of the detector 132 for a single frame may be much longer than the tracking period of the main tracker 133 for a single frame, and the detection period cannot even meet the low-latency requirement of the application scenario. When the detector 132 outputs a result, other continuous image frames during the detection period may have been requested for output or other processing multiple times. As shown in Table (1), the display outputs continuous image frames F1-F3, but the detector 132 is still performing object detection on the continuous image frame F1. In response to the output of the object detection result, the detector 132 can directly perform object detection on the newly input continuous image frames, while disabling or not continuing to perform object detection on other previously temporarily stored continuous image frames. Taking FIG. 3 as an example, the detector 132 detects the input image frame following the continuous image frame F4.

于另一实施例中，检测追踪器131系依据固定时间间隔、固定图像讯框张数间隔、或者场景转换的检测结果启动对新输入的连续图像讯框的对象检测，且任一次对象检测均为独立事件，不问目前是否尚有未完成的对象检测。任一次对象检测追踪的结果输出时，均用以更新前一次对象检测追踪的结果输出。由于每一次检测追踪所花费的时间长短不定，这处的前一次对象检测追踪系以输出结果的时间点判定。于另一实施例中，检测追踪器131系依据固定时间间隔、固定图像讯框张数间隔、或者场景转换的检测结果选择针对连续图像讯框中的哪一张讯框进行对象检测。检测追踪器131的启动时间点可略早或略晚于依据固定时间间隔、固定图像讯框张数间隔、或者场景转换的检测结果启动之前一实施例，但启动后依据固定时间间隔、固定图像讯框张数间隔、或者场景转换的检测结果选择针对连续图像讯框中的特定一张讯框进行对象检测，并选择性地停止先前对象检测或者对象追踪，以增加检测追踪器131的启动时间点的弹性。In another embodiment, the detection tracker 131 starts object detection on newly input continuous image frames according to a fixed time interval, a fixed number of image frame intervals, or detection results of scene transitions, and any object detection As an independent event, it does not ask whether there are currently outstanding object detections. The result output of any object detection and tracking is used to update the result output of the previous object detection and tracking. Since the time spent for each detection and tracking is variable, the previous object detection and tracking here is determined by the time point of the output result. In another embodiment, the detection tracker 131 selects which frame in the continuous image frames to perform object detection according to a fixed time interval, a fixed number of image frame intervals, or detection results of scene transitions. The start time point of the detection tracker 131 can be slightly earlier or slightly later than the previous embodiment based on a fixed time interval, a fixed image frame number interval, or the detection result of scene transition, but after the start, it will be based on a fixed time interval, fixed image The frame number interval or the detection result of the scene transition is selected to perform object detection on a specific frame in the continuous image frame, and selectively stop the previous object detection or object tracking, so as to increase the startup time of the detection tracker 131 point of elasticity.

综上所述，在本发明实施例的对象追踪方法及对象追踪装置中，可基于对象检测的结果追踪先前暂存的连续图像讯框中的目标。藉此，无论目标的类型(例如，人、动物或非生物)，可提升对象追踪的准确度。此外，有鉴于追踪器的高处理效率，本发明实施例可满足实时视讯或其他低延迟应用的要求。To sum up, in the object tracking method and the object tracking device according to the embodiments of the present invention, the objects in the previously temporarily stored consecutive image frames can be tracked based on the object detection results. In this way, the accuracy of object tracking can be improved regardless of the type of the object (eg, human, animal, or non-biological). In addition, due to the high processing efficiency of the tracker, embodiments of the present invention can meet the requirements of real-time video or other low-latency applications.

最后应说明的是：以上各实施例仅用以说明本发明的技术方案，而非对其限制；尽管参照前述各实施例对本发明进行了详细的说明，本领域的普通技术人员应当理解：其依然可以对前述各实施例所记载的技术方案进行修改，或者对其中部分或者全部技术特征进行等同替换；而这些修改或者替换，并不使相应技术方案的本质脱离本发明各实施例技术方案的范围。Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present invention, rather than limiting them; although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: It is still possible to modify the technical solutions described in the foregoing embodiments, or perform equivalent replacements for some or all of the technical features; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the technical solutions of the various embodiments of the present invention. scope.

Claims

1. An object tracking method suitable for low latency applications, the object tracking method comprising:

performing object detection on one of at least one continuous image frame, wherein the object detection is used for identifying a target;

temporarily storing the continuous image frames; and

performing object tracking on the temporarily stored continuous image frames according to the object detection result, wherein the object tracking is used for associating one of the continuous image frames with the target in another one of the continuous image frames.

2. The object tracking method of claim 1, wherein the step of temporarily storing the successive image frames comprises:

the successive image frames input to the system during all or part of the period before the object detection and object tracking are completed are temporarily stored.

3. The object tracking method of claim 1, wherein the continuous image frame comprises a plurality of continuous image frames, and the step of temporarily storing the continuous image frames comprises:

comparing the temporarily stored continuous image frames with the upper limit of the number; and

deleting at least one of the temporarily stored continuous image frames according to the comparison result of the continuous image frames and the number upper limit.

4. The object tracking method according to claim 1, wherein the result of the object detection includes a first region of interest of the target, the first region of interest corresponding to a position of the target in the continuous image frame subject to the object detection, and the step of performing the object tracking on the temporarily stored continuous image frame according to the result of the object detection includes:

and determining the relevance of the first interested region among a plurality of temporary stored continuous image frames, and determining a second interested region according to the relevance.

5. The object tracking method of claim 4, further comprising:

another object tracking is performed on said one of said successive image frames to determine a third region of interest.

6. The object tracking method of claim 5, further comprising:

updating the third interested region determined by the other object tracking to the second interested region, and performing the other object tracking again according to the second interested region.

7. The object tracking method of claim 6, wherein the step of updating the third region of interest determined by the other object tracking to the second region of interest comprises:

determining to update the third interested region to the second interested region or the weighted average of the second interested region and the third interested region according to the confidence of the third interested region in the tracking of the other object.

8. The object tracking method of claim 7, further comprising:

when the other object tracking is finished, updating the output final region of interest with the second region of interest or a weighted average of the second region of interest and the third region of interest.

9. The object tracking method of claim 1, wherein the step of performing the object detection on one of the successive image frames comprises:

determining the one for the object detection according to a time interval, an interval of frame numbers of the image, or a detection result of scene transition.

10. The object tracking method according to claim 1, further comprising:

when the object tracking is finished, updating the output final region of interest with the result of the object tracking.

11. The object tracking method according to claim 1, further comprising:

initiating another object detection on another of the consecutive image frames before the object tracking is completed according to a detection result of a time interval, an image frame number interval, or a scene transition, wherein the another object detection is also used for identifying the target; and

selectively stopping the object detection or the object tracking.

12. An object tracking device suitable for low latency applications, the object tracking device comprising:

a memory to store program code; and

a processor, coupled to the memory, configured to load and execute the program code to perform:

temporarily storing the continuous image frames; and

performing object tracking on the temporarily stored continuous image frames according to the object detection result, wherein the object tracking is used for associating the target in one of the continuous image frames with the target in the other one of the continuous image frames.

13. The object tracking device of claim 12, wherein the processor is also configured to:

the successive image frames input to the system during all or part of the period before the object detection and object tracking are complete are temporarily stored.

14. The object tracking device of claim 12, wherein the continuous image frame comprises a plurality of continuous image frames, and the processor is further configured to:

deleting at least one of the temporarily stored consecutive image frames according to a comparison result of the consecutive image frames and the number upper limit.

15. The object tracking device of claim 12, wherein a result of the object detection comprises a first region of interest of the target, the first region of interest corresponding to a location of the target in the successive image frames detected by the object, and the processor is further configured to:

16. The object tracking device of claim 15, wherein the processor is also configured to:

17. The object tracking device of claim 16, wherein the processor is also configured to:

updating the third region of interest determined by the other object tracking to the second region of interest, and performing the other object tracking again according to the second region of interest.

18. The object tracking device of claim 17, wherein the processor is also configured to:

and determining to update the third interest region to the second interest region or a weighted average of the second interest region and the third interest region according to the confidence of the third interest region in the tracking of the other object.

19. The object tracking device of claim 18, wherein the processor is also configured to:

20. The object tracking device of claim 12, wherein the processor is also configured to:

determining the one for performing the object detection according to a time interval, an interval of frame numbers of the image, or a detection result of the scene transition.

21. The object tracking device of claim 12, wherein the processor is also configured to:

when the object tracking is finished, updating the output final interested area with the result of the object tracking.

22. The object tracking device of claim 12, wherein the processor is also configured to:

initiating another object detection on another one of the consecutive image frames before the object tracking is completed according to a detection result of a time interval, an image frame number interval, or a scene transition, wherein the another object detection is also used for identifying the target; and

selectively stopping the object detection or the object tracking.