CN111801703A

CN111801703A - Hardware and systems for bounding box generation in image processing pipelines

Info

Publication number: CN111801703A
Application number: CN201980016252.4A
Authority: CN
Inventors: A·F·加里多; J·科鲁兹-阿尔布雷克特; T·J·德罗西耶; S·劳
Original assignee: HRL Laboratories LLC
Current assignee: HRL Laboratories LLC
Priority date: 2018-04-17
Filing date: 2019-02-14
Publication date: 2020-10-20
Also published as: EP3782114A1; EP3782114A4; WO2019203920A1

Abstract

A system for bounding box generation is described. The system operates on images composed of multiple pixels each having a one-bit value. The bounding box is generated around the connected components in the image, which have pixel coordinates and pixel count information. Based on pixel coordinates and pixel count information, a ranking score is generated for each bounding box. Filter bounding boxes based on pixel coordinates and pixel count information to remove bounding boxes that exceed a predetermined size and a predetermined pixel count. The bounding boxes are also filtered to remove bounding boxes below a predetermined ranking score, resulting in the remaining bounding boxes. Finally, the device may be controlled or otherwise operated based on the remaining bounding box.

Description

Hardware and systems for bounding box generation in image processing pipelines

政府权益government interest

本发明是在编号为HR0011-13-C-0052的题为“Revolutionary AnalogProbabilistic Inference Devices for Unconventional Processing of Signals forData Exploitation(用于数据开发的非常规信号处理的革命性模拟概率推断设备)”(RAPID-UPSIDE)的美国政府合同的政府支持下完成的。政府拥有本发明的特定权利。The present invention is made in a paper entitled "Revolutionary AnalogProbabilistic Inference Devices for Unconventional Processing of Signals for Data Exploitation" (RAPID- UPSIDE) with government support under a U.S. government contract. The government has certain rights in this invention.

相关申请的交叉引用CROSS-REFERENCE TO RELATED APPLICATIONS

本申请是2016年9月21日提交的美国申请No.15/272,247的部分继续申请，该美国申请No.15/272,247是2015年9月21日提交的美国临时申请No.62/221,550的非临时申请，上述申请的全部内容通过引用并入于此。This application is a continuation-in-part of US Application No. 15/272,247, filed on September 21, 2016, which is a non-individual continuation of US Provisional Application No. 62/221,550, filed on September 21, 2015 Provisional application, the entire contents of which are incorporated herein by reference.

美国申请No.15/272,247是2016年3月24日提交的美国申请No.15/079,899的部分继续申请，该美国申请No.15/079,899是2015年3月24日提交的美国临时申请No.62/137,665的非临时申请，上述申请的全部内容通过引用并入于此。美国申请No.15/079,899也是2015年4月30日提交的美国临时申请No.62/155,355的非临时申请，上述申请的全部内容通过引用并入于此。US Application No. 15/272,247 is a continuation-in-part of US Application No. 15/079,899, filed March 24, 2016, which is US Provisional Application No. 15/079,899, filed March 24, 2015. Non-provisional application 62/137,665, the entire contents of which are incorporated herein by reference. US Application No. 15/079,899 is also a non-provisional application of US Provisional Application No. 62/155,355, filed April 30, 2015, the entire contents of which are incorporated herein by reference.

美国申请No.15/272,247也是2016年2月12日提交的美国申请No.15/043,478的部分继续申请，上述申请的全部内容通过引用并入于此。US Application No. 15/272,247 is also a continuation-in-part of US Application No. 15/043,478, filed February 12, 2016, the entire contents of which are incorporated herein by reference.

美国申请No.15/272,247也是2016年7月6日提交的美国申请No.15/203,596的部分继续申请，该美国申请No.15/203,596是2015年9月21日提交的美国临时申请No.62/221,550的非临时申请。US Application No. 15/272,247 is also a continuation-in-part of US Application No. 15/203,596, filed July 6, 2016, which is US Provisional Application No. 15/203,596, filed September 21, 2015. 62/221,550 for non-provisional applications.

本申请还要求2018年4月17日提交的非临时专利申请US.62/659,129的权益，上述申请的全部内容通过引用并入于此。This application also claims the benefit of non-provisional patent application US. 62/659,129, filed April 17, 2018, the entire contents of which are incorporated herein by reference.

发明背景Background of the Invention

(1)技术领域(1) Technical field

本发明涉及一种图像处理系统，并且更具体地，涉及一种用于在图像中生成边界框以进行图像处理的系统。The present invention relates to an image processing system, and more particularly, to a system for generating bounding boxes in an image for image processing.

(2)相关技术的描述(2) Description of related technologies

图像处理用于多种实现(包括跟踪和监测应用)。在跟踪或监测中，边界框用于识别对象，并且在理想情况下，用于跨图像帧和场景跟踪该对象。可以通过为连通分量设置框来形成边界框。例如，Walczyk等人的工作结果描述了执行二进制图像的连通分量标记(参见Robert Walczyk、Alistair Armitage和TD Binnie的“Comparative Study onConnected Components Labeling Algorithms for Embedded Video ProcessingSystems(用于嵌入式视频处理系统的连通分量标记算法的比较研究)”，2010年图像处理、计算机视觉和模式识别(IPCV)国际大会的会议记录(CSREA,2010)，其全部内容通过引用并入于此)。尽管Walczyk等人公开了执行连通分量标记，但是该公开仅针对图像的标记，而未能进一步有效地处理框或图像。Image processing is used in a variety of implementations (including tracking and monitoring applications). In tracking or monitoring, bounding boxes are used to identify an object and, ideally, to track that object across image frames and scenes. A bounding box can be formed by setting boxes for connected components. For example, the work of Walczyk et al. describes performing connected component labeling of binary images (see "Comparative Study on Connected Components Labeling Algorithms for Embedded Video Processing Systems" by Robert Walczyk, Alistair Armitage, and TD Binnie A Comparative Study of Labeling Algorithms)", Proceedings of the 2010 International Congress on Image Processing, Computer Vision and Pattern Recognition (IPCV) (CSREA, 2010, the entire contents of which are hereby incorporated by reference). Although Walczyk et al. disclose performing connected component labeling, this disclosure is only for labeling of images, and fails to further efficiently process boxes or images.

因此，仍然需要生成边界框，同时有效地计算边界框坐标和边界框对象像素计数，以便于随后对对象框进行排序和过滤，以进行图像处理。Therefore, there is still a need to generate bounding boxes while efficiently computing bounding box coordinates and bounding box object pixel counts for subsequent sorting and filtering of object boxes for image processing.

发明内容SUMMARY OF THE INVENTION

本公开提供了一种用于边界框生成的系统。在各个方面中，所述系统包括存储器以及一个或更多个处理器。所述存储器具有可执行指令，以使在执行所述指令时，所述一个或更多个处理器执行多个操作，诸如接收图像，所述图像由每像素具有一位值的多个像素构成；围绕所述图像中的连通分量生成边界框，所述连通分量具有像素坐标和像素计数信息；基于所述像素坐标和所述像素计数信息，针对各个边界框生成排序分数；基于所述像素坐标和所述像素计数信息，过滤所述边界框，以移除超过预定大小和预定像素计数的边界框；以及过滤所述边界框，以移除低于预定排序分数的边界框，从而得到剩余边界框；并且基于所述剩余边界框控制设备。The present disclosure provides a system for bounding box generation. In various aspects, the system includes a memory and one or more processors. The memory has executable instructions such that when the instructions are executed, the one or more processors perform operations, such as receiving an image consisting of a plurality of pixels having a value of one bit per pixel generating a bounding box around the connected components in the image, the connected components having pixel coordinates and pixel count information; based on the pixel coordinates and the pixel count information, generate a ranking score for each bounding box; based on the pixel coordinates and the pixel count information, filtering the bounding boxes to remove bounding boxes exceeding a predetermined size and a predetermined pixel count; and filtering the bounding boxes to remove bounding boxes below a predetermined ranking score, resulting in the remaining bounds box; and control the device based on the remaining bounding box.

在另一方面中，所述处理器是现场可编程门阵列(FPGA)。In another aspect, the processor is a field programmable gate array (FPGA).

在又一方面中，生成所述边界框还包括以下操作：对所述图像中的连续像素进行分组；以及将连通像素合并为连通分量，其中所述边界框由包围所述连通分量的框形成。In yet another aspect, generating the bounding box further comprises the operations of: grouping consecutive pixels in the image; and merging connected pixels into connected components, wherein the bounding box is formed by boxes surrounding the connected components .

另外，控制所述设备包括使视频平台移动，以将所述剩余边界框中的至少一个剩余边界框保持在所述视频平台的视场内。Additionally, controlling the apparatus includes moving a video stage to maintain at least one remaining bounding box of the remaining bounding boxes within a field of view of the video stage.

最后，本发明还包括计算机程序产品和计算机实现的方法。所述计算机程序产品包括在非暂时性计算机可读介质上存储的计算机可读指令，所述计算机可读指令能够由具有一个或更多个处理器的计算机执行，使得在执行所述指令时，所述一个或更多个处理器执行本文列出的操作。另选地，计算机实现的方法包括使计算机执行这种指令并且执行所得到的操作的动作。Finally, the present invention also includes computer program products and computer-implemented methods. The computer program product includes computer readable instructions stored on a non-transitory computer readable medium, the computer readable instructions being executable by a computer having one or more processors such that when executed, the instructions, The one or more processors perform the operations listed herein. Alternatively, a computer-implemented method includes acts of causing a computer to execute such instructions and perform the resulting operations.

附图说明Description of drawings

结合参考以下附图，本发明的目的、特征以及优点将从本发明的各个方面的以下详细描述变得显而易见，其中：Objects, features, and advantages of the present invention will become apparent from the following detailed description of various aspects of the invention, taken in conjunction with the accompanying drawings, wherein:

图1是示出了根据本发明的各个实施方式的系统的组件的框图；1 is a block diagram illustrating components of a system according to various embodiments of the present invention;

图2是具体实现本发明的一个方面的计算机程序产品的示图；2 is a diagram of a computer program product embodying an aspect of the present invention;

图3是例示了根据本发明的各个实施方式的在准备期间的变量与数组之间的关系的流程图；3 is a flowchart illustrating the relationship between variables and arrays during preparation according to various embodiments of the present invention;

图4是根据本发明的各个实施方式的用于找到像素标记值的搜索块的示图；4 is a diagram of a search block for finding pixel label values, according to various embodiments of the present invention;

图5是例示了根据本发明的各个实施方式的搜索/标记处理的流程图；FIG. 5 is a flowchart illustrating a search/marking process according to various embodiments of the present invention;

图6A是示出了根据本发明的各个实施方式的部分图像和对应标记的示图；Figure 6A is a diagram showing a portion of an image and corresponding indicia according to various embodiments of the present invention;

图6B是示出了根据本发明的各个实施方式的部分图像和对应标记的示图；Figure 6B is a diagram showing a partial image and corresponding indicia according to various embodiments of the present invention;

图6C是根据本发明的各个实施方式的如在图6A和图6B中部分示出的完整图像和对应标记的示图；Figure 6C is a diagram of the complete image and corresponding indicia as partially shown in Figures 6A and 6B, according to various embodiments of the present invention;

图7是例示了根据本发明的各个实施方式的合并区域的流程图；7 is a flowchart illustrating a merged region according to various embodiments of the present invention;

图8是例示了根据本发明的各个实施方式的状态转变的流程图；8 is a flowchart illustrating state transitions according to various embodiments of the present invention;

图9A是例示了根据本发明的各个实施方式的状态1的流程图；9A is a flow diagram illustrating State 1 according to various embodiments of the present invention;

图9B是根据本发明的各个实施方式的状态2代码的示例；9B is an example of a state 2 code according to various embodiments of the present invention;

图10是例示了根据本发明的各个实施方式的增量器的流程图；10 is a flowchart illustrating an incrementer according to various embodiments of the present invention;

图11是例示了根据本发明的各个实施方式的状态2的流程图；FIG. 11 is a flowchart illustrating state 2 according to various embodiments of the present invention;

图12是根据本发明的各个实施方式的当前标记模块的示图；12 is a diagram of a current tagging module according to various embodiments of the present invention;

图13是例示了根据本发明的各个实施方式的状态3的流程图；Figure 13 is a flowchart illustrating state 3 according to various embodiments of the present invention;

图14是例示了根据本发明的各个实施方式的状态4、状态5和状态6的流程图；Figure 14 is a flowchart illustrating State 4, State 5, and State 6 according to various embodiments of the present invention;

图15是例示了根据本发明的各个实施方式的状态7和回调操作的流程图；Figure 15 is a flowchart illustrating state 7 and callback operations according to various embodiments of the present invention;

图16是例示了根据本发明的各个实施方式的状态7和排序操作的流程图；Figure 16 is a flow diagram illustrating state 7 and sorting operations in accordance with various embodiments of the present invention;

图17是例示了根据本发明的各个实施方式的状态7和排序操作以及排序模块的流程图；Figure 17 is a flow diagram illustrating state 7 and sorting operations and sorting modules in accordance with various embodiments of the present invention;

图18是示出了根据本发明的各个实施方式的示例输入图像的示图，其中各个像素位置具有一位值；18 is a diagram illustrating an example input image with each pixel location having a one-bit value in accordance with various embodiments of the present invention;

图19是示出了具有经过根据本发明的各个实施方式的边界框处理和过滤之后所得到的边界框的图像的示图；以及FIG. 19 is a diagram showing an image with a resulting bounding box after bounding box processing and filtering in accordance with various embodiments of the present invention; and

图20是示出了根据各个实施方式的设备的控制的框图。FIG. 20 is a block diagram illustrating control of a device according to various embodiments.

具体实施方式Detailed ways

本发明涉及一种图像处理系统，并且更具体地，涉及一种用于在图像中生成边界框以进行图像处理的系统。呈现以下描述以使本领域普通技术人员能够作出和使用本发明并且将其结合到特定应用的上下文中。多种修改以及不同应用中的多种用途对于本领域技术人员来说将是显而易见的，并且本文限定的总体原理可以应用于广泛多个方面。因此，本发明不旨在限于所呈现的各个方面，而是涵盖与本文所公开的原理和新颖特征相一致的最广范围。The present invention relates to an image processing system, and more particularly, to a system for generating bounding boxes in an image for image processing. The following description is presented to enable any person of ordinary skill in the art to make and use the invention and to incorporate it in the context of a particular application. Numerous modifications and various uses in different applications will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied in a wide variety of aspects. Thus, the present invention is not intended to be limited to the various aspects presented but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

在下面的详细说明中，阐述了许多具体细节，以使得能够更加彻底地理解本发明。然而，本领域技术人员将明白，本发明可以在不限于这些具体细节的情况下实施。在其它情况下，公知结构和设备按框图形式示出而不被详细示出，以免模糊本发明。In the following detailed description, numerous specific details are set forth in order to provide a more thorough understanding of the present invention. However, it will be apparent to those skilled in the art that the present invention may be practiced without being limited to these specific details. In other instances, well-known structures and devices are shown in block diagram form, rather than in detail, in order not to obscure the present invention.

读者应留意与本说明书同时提交的所有文件和文档，这些文件和文档与本说明书一起公开以供公众查阅，所有这些文件和文档的内容通过引用并入于此。本说明书(包括任何所附权利要求、摘要以及附图)中公开的所有特征可以由用于相同、等同或相似目的的替代特征来代替，除非另有明确说明。因此，除非另有明确说明，否则所公开的各个特征仅是典型系列的等同或相似特征的一个示例。The reader is reminded to take note of all files and documents filed concurrently with this specification, which are made available for public inspection with this specification, and the contents of all such files and documents are incorporated herein by reference. All features disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise. Thus, unless expressly stated otherwise, each feature disclosed is only one example of a typical series of equivalent or similar features.

此外，权利要求中的未明确陈述用于执行指定功能的“装置”或用于执行特定功能的“步骤”的任何要素不被解释为在35U.S.C.第112节第6款中指定的“装置”或“步骤”条款。具体地，在本文的权利要求中使用“…的步骤”或“…的动作”不旨在援引35U.S.C.第112节第6款的规定。Furthermore, no element of a claim that does not expressly recite "means" for performing the specified function or "step" for performing the specified function shall be construed as "means" specified in 35 U.S.C. § 112, subsection 6 ” or “steps” clause. Specifically, the use of "the step of" or "the act of" in the claims herein is not intended to invoke the provisions of 35 U.S.C. § 112, paragraph 6.

在详细描述本发明之前，首先提供了对本发明的各个主要方面的说明。随后，介绍部分为读者提供了本发明的总体理解。最后，提供本发明的各个实施方式的具体细节，以给出具体方面的理解。Before describing the invention in detail, a description of various major aspects of the invention is first provided. The introductory section then provides the reader with a general understanding of the invention. Finally, specific details of various embodiments of the invention are provided in order to provide an understanding of specific aspects.

(1)主要方面(1) Main aspects

本发明的各个实施方式包括三个“主要”方面。第一方面是一种用于图像处理的系统。该系统通常采用计算机系统操作软件的形式或采用“硬编码”指令集的形式或作为现场可编程门阵列(FPGA)。该系统可以结合到提供不同功能的广泛多种设备中。第二主要方面是使用数据处理系统(计算机)运行的通常采用软件形式的方法。第三主要方面是计算机程序产品。所述计算机程序产品通常表示存储在诸如光学存储设备(例如，光盘(CD)或数字通用盘(DVD))或磁存储设备(诸如，软盘或磁带)的非暂时性计算机可读介质上的计算机可读指令。计算机可读介质的其它非限制性示例包括硬盘、只读存储器(ROM)以及闪存型存储器。这些方面将在下文进行更详细说明。Various embodiments of the invention include three "main" aspects. A first aspect is a system for image processing. The system typically takes the form of computer system operating software or in the form of a "hard-coded" instruction set or as a Field Programmable Gate Array (FPGA). The system can be incorporated into a wide variety of devices providing different functions. The second major aspect is a method, usually in the form of software, run using a data processing system (computer). The third major aspect is the computer program product. The computer program product generally represents a computer stored on a non-transitory computer-readable medium such as an optical storage device (eg, a compact disc (CD) or digital versatile disc (DVD)) or a magnetic storage device (eg, a floppy disk or magnetic tape) Readable instructions. Other non-limiting examples of computer readable media include hard disks, read only memory (ROM), and flash-type memory. These aspects will be explained in more detail below.

图1提供了示出本发明的系统(即，计算机系统100)的示例的框图。计算机系统100被配置成执行与程序或算法相关联的计算、处理、操作和/或功能。在一个方面中，本文讨论的某些处理和步骤被实现为存在于计算机可读存储器单元内并由计算机系统100的一个或更多个处理器执行的一系列指令(例如，软件程序)。在执行时，这些指令使计算机系统100执行特定动作并呈现特定行为，诸如本文所描述的。1 provides a block diagram illustrating an example of a system of the present invention (ie, computer system 100). Computer system 100 is configured to perform calculations, processes, operations and/or functions associated with programs or algorithms. In one aspect, certain processes and steps discussed herein are implemented as a series of instructions (eg, software programs) residing within a computer-readable memory unit and executed by one or more processors of computer system 100 . When executed, these instructions cause computer system 100 to perform particular actions and exhibit particular behaviors, such as described herein.

计算机系统100可以包括被配置成传送信息的地址/数据总线102。另外，一个或更多个数据处理单元(诸如处理器104(或多个处理器))与地址/数据总线102联接。处理器104被配置成处理信息和指令。在一个方面中，处理器104是微处理器。另选地，处理器104可以是不同类型的处理器，诸如并行处理器、专用集成电路(ASIC)、可编程逻辑阵列(PLA)、复杂可编程逻辑器件(CPLD)或现场可编程门阵列(FPGA)，其被配置成执行本文描述的操作。Computer system 100 may include an address/data bus 102 configured to communicate information. Additionally, one or more data processing units, such as processor 104 (or processors), are coupled with address/data bus 102 . The processor 104 is configured to process information and instructions. In one aspect, the processor 104 is a microprocessor. Alternatively, the processor 104 may be a different type of processor, such as a parallel processor, an application specific integrated circuit (ASIC), a programmable logic array (PLA), a complex programmable logic device (CPLD), or a field programmable gate array ( FPGA) configured to perform the operations described herein.

计算机系统100被配置成利用一个或更多个数据存储单元。计算机系统100可以包括与地址/数据总线102联接的易失性存储器单元106(例如，随机存取存储器(“RAM”)、静态RAM、动态RAM等)，其中，易失性存储器单元106被配置成存储用于处理器104的信息和指令。计算机系统100还可以包括与地址/数据总线102联接的非易失性存储器单元108(例如，只读存储器(“ROM”)、可编程ROM(“PROM”)、可擦除可编程ROM(“EPROM”)、电可擦除可编程ROM(“EEPROM”)、闪存存储器等)，其中，非易失性存储器单元108被配置成存储用于处理器104的静态信息和指令。另选地，计算机系统100可以执行诸如在“云”计算中从在线数据存储单元取回的指令。在一个方面中，计算机系统100还可以包括与地址/数据总线102联接的一个或更多个接口(诸如，接口110)。所述一个或更多个接口被配置成使得计算机系统100能够与其它电子设备和计算机系统对接。由所述一个或更多个接口实现的通信接口可以包括有线通信技术(例如，串行电缆、调制解调器、网络适配器等)和/或无线通信技术(例如，无线调制解调器、无线网络适配器等)。Computer system 100 is configured to utilize one or more data storage units. Computer system 100 may include a volatile memory unit 106 (eg, random access memory ("RAM"), static RAM, dynamic RAM, etc.) coupled to address/data bus 102, wherein volatile memory unit 106 is configured The components store information and instructions for the processor 104 . Computer system 100 may also include a non-volatile memory unit 108 (eg, read only memory ("ROM"), programmable ROM ("PROM"), erasable programmable ROM ("ROM"), coupled to address/data bus 102. EPROM”), electrically erasable programmable ROM (“EEPROM”), flash memory, etc.), wherein the non-volatile memory unit 108 is configured to store static information and instructions for the processor 104. Alternatively, computer system 100 may execute instructions such as fetched from an online data storage unit in "cloud" computing. In one aspect, computer system 100 may also include one or more interfaces (such as interface 110 ) coupled to address/data bus 102 . The one or more interfaces are configured to enable the computer system 100 to interface with other electronic devices and computer systems. Communication interfaces implemented by the one or more interfaces may include wired communication technologies (eg, serial cables, modems, network adapters, etc.) and/or wireless communication technologies (eg, wireless modems, wireless network adapters, etc.).

在一个方面中，计算机系统100可以包括与地址/数据总线102联接的输入设备112，其中，输入设备112被配置成将信息和命令选择传送至处理器100。根据一个方面，输入设备112是可以包括字母数字键和/或功能键的字母数字输入设备(诸如键盘)。另选地，输入设备112可以是除字母数字输入设备之外的输入设备。在一个方面中，计算机系统100可以包括与地址/数据总线102联接的光标控制设备114，其中，光标控制设备114被配置成将用户输入信息和/或命令选择传送至处理器100。在一个方面中，光标控制设备114使用诸如鼠标、轨迹球、触控板、光学跟踪设备或触摸屏的设备来实现。尽管如此，但在一个方面中，诸如响应于使用与输入设备112相关联的特殊键和键序列命令，光标控制设备114经由来自输入设备112的输入被引导和/或激活。在另选方面中，光标控制设备114被配置成由语音命令来引导或指导。In one aspect, computer system 100 may include input device 112 coupled to address/data bus 102 , wherein input device 112 is configured to communicate information and command selections to processor 100 . According to one aspect, input device 112 is an alphanumeric input device (such as a keyboard) that may include alphanumeric keys and/or function keys. Alternatively, input device 112 may be an input device other than an alphanumeric input device. In one aspect, computer system 100 may include cursor control device 114 coupled to address/data bus 102 , wherein cursor control device 114 is configured to communicate user input information and/or command selections to processor 100 . In one aspect, cursor control device 114 is implemented using a device such as a mouse, trackball, trackpad, optical tracking device, or touchscreen. Nonetheless, in one aspect, cursor control device 114 is directed and/or activated via input from input device 112 , such as in response to commands using special keys and key sequences associated with input device 112 . In an alternative aspect, the cursor control device 114 is configured to be guided or guided by voice commands.

在一个方面中，计算机系统100还可以包括与地址/数据总线102联接的一个或更多个可选计算机可用数据存储设备(诸如存储设备116)。存储设备116被配置成存储信息和/或计算机可执行指令。在一个方面中，存储设备116是诸如磁或光盘驱动器(例如，硬盘驱动器(“HDD”)、软盘、光盘只读存储器(“CD-ROM”)、数字通用盘(“DVD”))的存储设备。依据一个方面，显示设备118与地址/数据总线102联接，其中，显示设备118被配置成显示视频和/或图形。在一个方面中，显示设备118可以包括阴极射线管(“CRT”)、液晶显示器(“LCD”)、场发射显示器(“FED”)、等离子体显示器或适于显示视频和/或图形图像以及用户可识别的字母数字字符的任何其它显示设备。In one aspect, computer system 100 may also include one or more optional computer-usable data storage devices (such as storage device 116 ) coupled with address/data bus 102 . Storage device 116 is configured to store information and/or computer-executable instructions. In one aspect, storage device 116 is a storage device such as a magnetic or optical disk drive (eg, hard disk drive ("HDD"), floppy disk, compact disk read only memory ("CD-ROM"), digital versatile disk ("DVD")) equipment. According to one aspect, a display device 118 is coupled to the address/data bus 102, wherein the display device 118 is configured to display video and/or graphics. In one aspect, display device 118 may include a cathode ray tube ("CRT"), a liquid crystal display ("LCD"), a field emission display ("FED"), a plasma display, or suitable for displaying video and/or graphic images and Any other display device with alphanumeric characters recognizable to the user.

本文所提出的计算机系统100是根据一个方面的示例计算环境。然而，计算机系统100的非限制性示例并不严格限于是计算机系统。例如，一个方面规定了计算机系统100表示可以根据本文所述各个方面使用的一种数据处理分析。此外，还可以实现其它计算系统。实际上，本技术的精神和范围不限于任何单个数据处理环境。因此，在一个方面中，使用通过计算机执行的计算机可执行指令(诸如，程序模块)来控制或实现本技术的各个方面的一个或更多个操作。在一个实现中，这样的程序模块包括被配置成执行特定任务或实现特定抽象数据类型的例程、程序、对象、组件和/或数据结构。另外，一个方面规定了通过利用一个或更多个分布式计算环境来实现本技术的一个或更多个方面，诸如，在分布式计算环境中，由通过通信网络链接的远程处理设备执行任务，或者诸如，在分布式计算环境中，各种程序模块位于包括存储器-存储设备的本地和远程计算机存储介质中。The computer system 100 presented herein is an example computing environment according to one aspect. However, the non-limiting example of computer system 100 is not strictly limited to being a computer system. For example, one aspect specifies that computer system 100 represents a data processing analysis that may be used in accordance with various aspects described herein. In addition, other computing systems may also be implemented. Indeed, the spirit and scope of the present technology is not limited to any single data processing environment. Thus, in one aspect, computer-executable instructions, such as program modules, executed by a computer are used to control or implement one or more operations of various aspects of the present technology. In one implementation, such program modules include routines, programs, objects, components, and/or data structures configured to perform particular tasks or implement particular abstract data types. Additionally, one aspect provides for implementing one or more aspects of the technology by utilizing one or more distributed computing environments, such as where tasks are performed by remote processing devices that are linked through a communications network, Alternatively, such as in a distributed computing environment, various program modules are located in both local and remote computer storage media including memory-storage devices.

图2示出了具体实现本发明的计算机程序产品(即，存储设备)的示图。计算机程序产品被示出为软盘200或诸如CD或DVD的光盘202。然而，如先前提到的，计算机程序产品通常表示存储在任何兼容的非暂时性计算机可读介质上的计算机可读指令。关于本发明所使用的术语“指令”通常指示要在计算机上执行的一组操作，并且可以表示整个程序的片段或者单个可分离的软件模块。“指令”的非限制性示例包括计算机程序代码(源代码或目标代码)和“硬编码”电子器件(即，编码到计算机芯片中的计算机操作)。“指令”被存储在任何非暂时性计算机可读介质上，诸如存储在计算机的存储器中或软盘、CD-ROM以及闪存驱动器上。在任一种情况下，这些指令被编码在非暂时性计算机可读介质上。Figure 2 shows a diagram of a computer program product (ie, a storage device) embodying the present invention. The computer program product is shown as a floppy disk 200 or an optical disk 202 such as a CD or DVD. However, as previously mentioned, a computer program product generally represents computer readable instructions stored on any compatible non-transitory computer readable medium. The term "instructions" as used in connection with the present invention generally refers to a set of operations to be performed on a computer, and may represent a fragment of an entire program or a single separable software module. Non-limiting examples of "instructions" include computer program code (source code or object code) and "hard-coded" electronics (ie, computer operations coded into a computer chip). "Instructions" are stored on any non-transitory computer-readable medium, such as in a computer's memory or on floppy disks, CD-ROMs, and flash drives. In either case, the instructions are encoded on a non-transitory computer-readable medium.

(2)介绍(2) Introduction

本公开提供了用于图像处理管线的边界框生成的系统和对应硬件实现。在各个方面中，该系统在现场可编程门阵列(FPGA)上实现，该FPGA接收二进制图像，从该二进制图像检测对象像素并且围绕连通分量生成边界框。该系统实现了一种连通分量标记方法，该连通分量标记方法用于对贯穿图像找到的连续像素进行分组，然后合并连通像素以创建唯一加框位置(boxed location)。这些唯一框被保存为单独单元，该单独单元包含边界框坐标和所包含的对象像素的计数。该系统还基于边界框的高度和宽度以及所包含的对象像素的数量计算排序分数，该排序分数用于基于大小和纵横比进行对象的后续过滤。该处理旨在提供该边界框信息，同时最小化FPGA资源并实现足够的吞吐量，以跟上所需的输入图像帧速率(例如，每秒30帧)。The present disclosure provides a system and corresponding hardware implementation for bounding box generation of an image processing pipeline. In various aspects, the system is implemented on a field programmable gate array (FPGA) that receives a binary image, detects object pixels from the binary image, and generates bounding boxes around connected components. The system implements a connected component labeling method for grouping consecutive pixels found throughout an image, and then combining connected pixels to create a unique boxed location. These unique boxes are saved as individual cells that contain bounding box coordinates and counts of contained object pixels. The system also calculates a ranking score based on the height and width of the bounding box and the number of contained object pixels, which is used for subsequent filtering of objects based on size and aspect ratio. The processing is designed to provide this bounding box information while minimizing FPGA resources and achieving sufficient throughput to keep up with the desired input image frame rate (eg, 30 frames per second).

在执行二进制图像的连通分量标记时，系统还同时记录边界框坐标和针对各个边界框检测到的对象像素的数量。该附加信息用于基于大小和纵横比对对象进行后续排序和过滤，并且无需大量额外计算时间和硬件资源即可收集该附加信息。另外，本发明的设计被优化，以同时最小化FPGA利用和计算时间。所述优点允许本发明将被用作以高图像帧速率(例如，>每秒30帧)运行的小尺寸、轻重量和低功率(SWAP)图像处理管线(诸如在美国申请No.15/272,247中所描述的)的一部分。通过减少整个图像上的对检测到的特定对象的必要计算，本公开的使用处理将更好地简化图像处理管线。When performing connected component labeling of binary images, the system also simultaneously records bounding box coordinates and the number of object pixels detected for each bounding box. This additional information is used for subsequent sorting and filtering of objects based on size and aspect ratio, and can be collected without significant additional computational time and hardware resources. Additionally, the design of the present invention is optimized to minimize both FPGA utilization and computation time. The advantages allow the present invention to be used as a small size, light weight and low power (SWAP) image processing pipeline (such as in US Application No. 15/272,247) operating at high image frame rates (eg, >30 frames per second). as described in ). The use of the process of the present disclosure will better simplify the image processing pipeline by reducing the necessary computations on the detected specific object over the entire image.

可以将本文所述的系统和处理实现为低SWAP图像处理管线的关键组成部分。此外，所述系统和处理可以应用于多种实现(包括具有严重受限的SWAP的无人驾驶自主交通工具和平台)。通过对传感器附近的硬件上的任务相关目标和障碍物进行快速检测，本发明能够改善任务响应性并减少必须在受约束的通信带宽上传输的原始传感器数据量。此外，所述系统和处理可以用于主动安全性和自主驾驶应用两者。通过在相机附近的低功耗、低成本硬件中执行目标检测，汽车可以更快速且更鲁棒地检测道路上的障碍物，从而向驾驶员提供更及时的警告或在自主交通工具中提供对障碍物的更迅速的自动响应。下面提供了另外的详细信息。The systems and processes described herein can be implemented as key components of a low SWAP image processing pipeline. Furthermore, the systems and processes can be applied to a variety of implementations including unmanned autonomous vehicles and platforms with severely limited SWAP. By rapidly detecting mission-related targets and obstacles on hardware near the sensor, the present invention can improve mission responsiveness and reduce the amount of raw sensor data that must be transmitted over constrained communication bandwidth. Furthermore, the systems and processes can be used for both active safety and autonomous driving applications. By performing object detection in low-power, low-cost hardware close to the camera, cars can more quickly and robustly detect obstacles on the road, providing more timely warnings to drivers or in autonomous vehicles Faster automatic response to obstacles. Additional details are provided below.

(3)各个实施方式的具体细节(3) Specific details of each embodiment

(3.1)边界框介绍(3.1) Introduction to bounding boxes

边界框是如下方法：系统通过该方法接受单个位数据的矩阵作为输入图像，系统将该矩阵用作创建框的数组的基础。各个“框”将包含两个x位置、两个y位置的坐标以及有效像素计数。作为一个示例，来自边界框数组的一组框数据将包含x位置(Xmin＝100，Xmax＝150)和y位置(Ymin＝80，Ymax＝90)，其中像素计数＝70。这将是包含70个有效像素的10个像素高且50个像素宽的“框”。将其它像素集放入框中，并指派其各自的x位置、y位置和有效像素计数。将各个框分开的是有效像素的邻近和分离。下面分别关于软件实现和硬件实现进一步描述该区别。A bounding box is a method by which the system accepts, as an input image, a matrix of single-bit data, which the system uses as the basis for creating an array of boxes. Each "box" will contain the coordinates of the two x positions, the two y positions, and the effective pixel count. As an example, a set of box data from a bounding box array would contain x-position (Xmin=100, Xmax=150) and y-position (Ymin=80, Ymax=90), where pixel count=70. This would be a "box" 10 pixels high and 50 pixels wide containing 70 effective pixels. Put other sets of pixels into boxes and assign their respective x-position, y-position, and valid pixel count. What separates the boxes is the proximity and separation of valid pixels. This difference is further described below with respect to software implementation and hardware implementation, respectively.

(3.2)Matlab中的软件边界框设计(3.2) Software bounding box design in Matlab

可以使用任何合适的软件产品来实现边界框处理。作为一个示例，边界框是在Matlab中实现的。基于软件的边界框设计可以被限定为3个不同部分：准备、搜索/标记和合并区域。各个部分稍后将被转换成在硬件设计上实现。Bounding box processing can be implemented using any suitable software product. As an example, the bounding box is implemented in Matlab. Software-based bounding box design can be defined into 3 different parts: preparation, search/marking, and merge area. The various parts will later be converted to be implemented on the hardware design.

(3.2.1)准备(3.2.1) Preparation

准备处理是变量和数组到其默认值的简单实例化和初始化。初始化的变量将是区域计数、先前y位置和当前y位置。初始化的数组将是“Image(图像)”、“Labeled Image(标记图像)”、“Merge to Region(合并到区域)”和“Bounding Box Data(边界框数据)”。区域计数将用作找到的标记像素的“票据”值。先前/当前Y位置用作减小标记图像矩阵大小的参考。减小的标记图像总大小允许数字电路在这样的电路中实现边界框方法时使用较少的硬件资源。由于标记算法搜索模式使用先前位置，所以必须通过在图像周围设置额外空白像素来容纳较大的图像大小，从而将宽度和高度的大小增加2。标记图像的高度为2，宽度设置为图像宽度。合并到区域是为最大区域计数的长度的单维数组集合，该单维数组集合是限于最大期望分布多少标记的集合。该限制是为了反映在数字实现期间使用的有限资源。边界框数据(Bounding Box Data)是二维的，其宽度为5份：两个x位置、两个y位置以及有效像素计数。边界框数据高度被设置为最大区域计数。Preparation processing is the simple instantiation and initialization of variables and arrays to their default values. The variables initialized will be the region count, previous y position and current y position. The initialized arrays will be "Image", "Labeled Image", "Merge to Region" and "Bounding Box Data". The area count will be used as the "ticket" value for the found marked pixels. The previous/current Y position is used as a reference to reduce the size of the marker image matrix. The reduced overall marker image size allows digital circuits to use fewer hardware resources when implementing the bounding box method in such circuits. Since the tagging algorithm search mode uses the previous position, the size of the width and height must be increased by 2 by placing extra blank pixels around the image to accommodate the larger image size. The height of the marker image is 2 and the width is set to the image width. Merge into regions is a collection of single-dimensional arrays of length counted for the maximum region, which is a collection limited to how many tokens are distributed at the maximum expected. This limit is to reflect the limited resources used during digital implementation. The Bounding Box Data is two-dimensional and has a width of 5 copies: two x positions, two y positions, and a count of valid pixels. The bounding box data height is set to the maximum area count.

例如，图3示出了在各个矩阵和变量的准备处理的实例化期间在各个矩阵和变量之间的关系。既然已经创建了变量，则必须将变量初始化为正确的起始值。例如，系统启动多维度图像(即，大小)加上像素边界(即，填充多个维度的空白像素)的数组316，以形成图像318。合并到区域300和标记图像302两者的所有值均已被初始化至最大区域计数304。标记图像302继续改变312值，以使Y_先前(Y Previous)被设置为0并且Y_当前(Y Current)被设置为1。边界框数据306的最小x值和最小y值314分别被发送至图像宽度和图像高度的最大大小308。Y_先前被设置为1，并且Y_当前被设置为2。因此，上述矩阵中的值被初始化为非0值。初始化的最后一件事是被设置为0的区域计数(Region Count)310。这将跟踪生成了多少标记，将防止处理超过最大数组大小。For example, FIG. 3 shows the relationship between various matrices and variables during instantiation of their preparation process. Now that the variable has been created, it must be initialized to the correct starting value. For example, the system initiates an array 316 of a multi-dimensional image (ie, size) plus pixel boundaries (ie, blank pixels that fill multiple dimensions) to form image 318 . All values merged into both the region 300 and the marker image 302 have been initialized to the maximum region count 304 . The marker image 302 continues to change 312 the value so that Y _Previous is set to 0 and Y _Current is set to 1. The minimum x and minimum y values 314 of the bounding box data 306 are sent to the maximum size 308 for image width and image height, respectively. Y was _previously set to 1, and Y _{is currently} set to 2. Therefore, the values in the above matrix are initialized to non-zero values. The last thing initialized is the Region Count 310 which is set to 0. This will keep track of how many tokens are generated and will prevent processing exceeding the maximum array size.

(3.2.2)搜索/标记(3.2.2) Search/Mark

在准备处理之后，系统继续进行搜索/标记处理，如图5所示。如标题所示，系统将搜索图像并且标记找到的像素(从图像的顶部到底部，或任何其它预定顺序)。像素是0或1范围内的二进制数，从而“找到的”像素是具有一值的像素。图4示出了如何进行搜索的示例。将图像中的当前像素位置用作(X,Y)，系统首先进行检查以查看(X,Y)是否具有有效像素。如果是，则系统继续检查(X-1,Y-1)、(X,Y-1)、(X+1,Y-1)和/或(X-1,Y)是否已被标记。如图5所示，该处理继续，直到通过500搜索完图像(例如，搜索到其底部或顶部等)为止，此时完成502搜索/标记。另选地，假定这是被找到的第一像素；因此，当前没有位置具有标记(例如，图像尚未被读取到其边缘504，并且在(x,y)处存在有效像素506，并且没有邻近像素已被标记508)。如果是这种情况，则该像素应被标记508有为1的区域计数，并且区域计数510增加。使用区域计数，系统将存储像素计数的边界框数据数组编索引至为1的值，并且使用当前x位置、当前y位置存储其最小值/最大值。由于找到的框的大小仅为一个像素，因此x位置和y位置的最大值和最小值将相等。现在，假定下一个像素也是有效的。这便创建了(X-1,Y-1)、(X,Y-1)、(X+1,Y-1)和/或(X-1,Y)已被标记的条件。然后，系统比较各个邻居的标记，以找到最低标记位置并将其称为“当前标记(Current Label)”512。为此，该处理使用先前Y和当前Y来对标签图像编索引；然而，使用(X-1)、(X)、(X+1)沿另一维度(即，沿x维度而不是y维度)移动。使得想到，在该示例中，通过使(X-1,Y_当前)最低为值“1”，标记图像数组被加载有针对该区域的最大值。然后，将值“1”指派给该像素的标记。在完成对一个像素位置的评估之后，系统实现框526，增加“X”索引，以移动更接近图像的边缘。此外，在如框504所示到达图像的边缘之后，系统开始实现框524，框524将增加“Y”维度并且交换Y_当前值和Y_先前值。请记住，Y_当前和Y_先前用于保持小标记图像数组，因此，Y_当前和Y_先前交换将保留下一次评估所需的数据，同时打开将被重写的新集合。After the preparation process, the system continues with the search/mark process, as shown in Figure 5. As the title says, the system will search the image and mark the pixels found (from top to bottom of the image, or any other predetermined order). A pixel is a binary number in the range of 0 or 1, so a "found" pixel is a pixel with a value. Figure 4 shows an example of how the search is performed. Using the current pixel position in the image as (X,Y), the system first checks to see if (X,Y) has a valid pixel. If so, the system proceeds to check whether (X-1,Y-1), (X,Y-1), (X+1,Y-1) and/or (X-1,Y) have been marked. As shown in Figure 5, the process continues until the image has been searched through 500 (eg, to its bottom or top, etc.), at which point the search/marking is complete 502. Alternatively, assume that this is the first pixel found; therefore, there are currently no locations with markers (eg, the image has not been read to its edge 504 and there is a valid pixel 506 at (x,y) and no adjacent Pixels have been marked 508). If this is the case, the pixel should be marked 508 with a region count of 1 and the region count 510 incremented. Using the region count, the system indexes the bounding box data array storing the pixel count to a value of 1, and uses the current x position, current y position to store its min/max values. Since the size of the found box is only one pixel, the maximum and minimum values for the x and y positions will be equal. Now, assume that the next pixel is also valid. This creates conditions where (X-1,Y-1), (X,Y-1), (X+1,Y-1) and/or (X-1,Y) have been flagged. The system then compares the labels of the various neighbors to find the lowest label position and calls it the "Current Label" 512 . To this end, the process uses the previous Y and the current Y to index the label image; however, (X-1), (X), (X+1) are used along another dimension (ie, along the x dimension instead of the y dimension) )move. It comes to mind that, in this example, by making (X-1, Y _current ) a minimum value of "1", the marker image array is loaded with the maximum value for that region. Then, assign the value "1" to the flag for that pixel. After completing the evaluation of a pixel location, the system implements block 526 and increments the "X" index to move closer to the edge of the image. Additionally, after reaching the edge of the image as shown in block 504, the system begins to implement block 524, which will increase the "Y" dimension and swap the Y _current value and the Y _previous value. Remember that Y _current and Y _previous are used to hold small arrays of marker images, so swapping Y _current and Y _previous will hold the data needed for the next evaluation, while opening a new collection that will be overwritten.

然后，继续进行该处理，以通过将新数据“(X,Y)”与所存储的最大/最小X和Y值进行比较，确定是否已更新边界框数据。如果所存储的最大/最小X和Y值小于/大于新数据，则系统将使用新的X位置和Y位置更新边界框数据。使用以上示例，系统必须通过增加Xmax来更新514边界框数据中的一些值，因为连通像素沿x方向将框大小增加1，并且像素计数将增加1。The process then continues to determine if the bounding box data has been updated by comparing the new data "(X,Y)" with the stored maximum/minimum X and Y values. If the stored max/min X and Y values are smaller/greater than the new data, the system will update the bounding box data with the new X and Y positions. Using the above example, the system must update 514 some values in the bounding box data by increasing Xmax, because connected pixels will increase the box size by 1 in the x direction, and the pixel count will increase by 1.

在更新边界框数据之后，注意稍后合并不可见像素，因为处理可能没有机会在当前扫描中重新标记这些像素。如下所述，在更复杂的示例的情况下，该概念将变得更有意义。为了记录合并，首先考虑识别516哪些邻居具有有效像素。具有有效像素的任何邻居都将与当前标记相比较来检查其合并到区域位置518，以找到最小标记。然后，将两者中的最小者存储回其同一合并到区域位置520。在该示例中，通过用先前像素位置的标记对合并到区域编索引并且然后存储当前标记的值来结束搜索。最后，如果当前像素是无效522的，则标记图像数组通过y_当前被编索引，并且x被设置为针对区域计数+1的最大值。After updating the bounding box data, take care to merge invisible pixels later, as processing may not have a chance to relabel these pixels in the current scan. This concept becomes more meaningful in the context of more complex examples, as described below. To record the merge, first consider identifying 516 which neighbors have valid pixels. Any neighbors with valid pixels will be checked against the current marker to merge into the region position 518 to find the smallest marker. Then, the smallest of the two is stored back to its same merged-region location 520 . In this example, the search ends by indexing the merged region with the marker pair of the previous pixel position and then storing the value of the current marker. Finally, if the current pixel is invalid 522, the marker image array is _currently indexed by y, and x is set to the maximum value for region count + 1.

(3.2.3)合并区域(3.2.3) Merge area

为了理解合并区域处理，查看和理解更复杂的“搜索/标记”示例很有帮助。例如，假定系统正在处理如图6C所示的图像。假设系统正在扫描图6C所示的像素图像，系统将从图6A至图6B至图6C逐渐“看到”图像。注意，在图6A和图6B中，图像在扫描期间包含分离的分量，使得人们将不知道两个分离的分量是连通的。在这种情况下，作为一个示例，假定左侧具有标记1，而右侧具有标记2。只有当像素在两侧之间桥接时，才知道这两侧是同一图像的一部分，并且应该被相应地标记；然而，系统/处理无法改变在标记2中找到的信息，因为此时不知道分量是否连通，也不知道多少个分量被连通。为了解决该问题，合并到区域数组利用为1的值设置位置2。稍后，该改变将被用于将为2的所有标记位置转换成为1的标记位置。To understand merge region processing, it is helpful to see and understand the more complex "search/mark" example. For example, assume that the system is processing an image as shown in Figure 6C. Assuming the system is scanning the pixel image shown in Figure 6C, the system will gradually "see" the image from Figures 6A to 6B to 6C. Note that in Figures 6A and 6B, the images contain separate components during scanning, so that one would not know that the two separate components are connected. In this case, assume, as an example, that the left side has marker 1 and the right side has marker 2. Only when a pixel is bridged between two sides is it known that the two sides are part of the same image and should be labeled accordingly; however, the system/processing cannot change the information found in label 2 because the components are not known at this point Whether it is connected or not, it is not known how many components are connected. To solve this problem, merge into the region array by setting position 2 with a value of 1. Later, this change will be used to convert all marker positions that will be 2 to marker positions of 1.

在完整图像中并且如图6C所示，线600可以用于切割或以其它方式分割图像，从而示出了在该示例中，A和C将包含被标记为1的所有那些分量。B将包含被标记为2并且在该示例中仍被标记为2的所有那些分量。D将包含本该被标记为2但是现在反而被标记为1的所有那些分量。In the full image and as shown in Figure 6C, line 600 may be used to cut or otherwise divide the image, showing that in this example, A and C would contain all those components marked as 1s. B will contain all those components that are labeled 2 and are still labeled 2 in this example. D will contain all those components that should have been marked 2 but are now marked 1 instead.

最后一步是合并区域处理，如图7所示。在该步骤中，将在合并到区域数组中找到的更新提醒设置在适当的边界框中。在该部分开始之前，创建700新数组，该新数组将跟踪有效边界框数据。有效数据数组将以为“真”的项达到区域计数并且为“假”的上述任何项达到最大区域计数(具有与边界框数据的长度相同的长度)开始。所存储的边界框数据值可以合并到中央位置，从而使一组或更多组数据无效。由于已知标记是从小数到大数分配的，因此系统通过从当前区域计数(先前用于计数分配了多少标记)中追溯边界框数据，在for循环中向下计数至1来开始。for循环是循环重复直到完成的处理。存在不同类型的这些循环，但通常for循环执行一些动作，直到达到结束条件为止。通常在循环中，要么向下计数，要么向上计数，以达到结束处理并退出循环的条件。The final step is the merged region processing, as shown in Figure 7. In this step, update reminders found in the Merge into Regions array are set in the appropriate bounding box. Before the section starts, create a new array of 700 that will keep track of valid bounding box data. The valid data array will start with items that are "true" up to the region count and any of the above items that are "false" reaching the maximum region count (having the same length as the bounding box data). The stored bounding box data values can be merged into a central location, invalidating one or more sets of data. Since markers are known to be assigned from decimal to large, the system starts by counting down to 1 in a for loop by tracing back the bounding box data from the current region count (previously used to count how many markers were assigned). A for loop is a process that repeats in a loop until completion. Different types of these loops exist, but usually a for loop performs some action until an end condition is reached. Usually in a loop, either count down or count up to reach a condition that ends processing and exits the loop.

通过查看for循环的当前索引并将其与由同一值索引的合并到区域702进行比较，来确定更新的必要性。如果在搜索/标记阶段期间的某个时间找到至不同的较小标记的连通，则合并到区域702数组应该被更新704为比索引更低的值。如果这种情况从未发生过，那么合并到区域自然将具有来自原始准备阶段的更大数量。因此，如果for循环的索引大于存储在具有该同一索引的合并到区域中的信息，则系统需要更新706边界框数据以包括最新找到的信息。边界框数据将包含Xmin、Ymin、Xmax、Ymax和像素计数信息；因此，为了更新706，该处理继续查看两个索引位置中的边界框数据(即，(1)当前索引的边界框数据和(2)存储在具有该同一索引的合并到区域中的值的边界框数据)。使用这两个位置；比较最小值以查看哪一者具有较小值，比较最大值以查看哪一者具有较大值，并组合像素计数值。然后将信息存储回由合并到区域值索引的边界框数据中，该合并到区域值由将包含较低标记的for循环编索引。一旦数组到达该较低索引708，边界框数据便在该位置利用与最小计数值、最大计数值和像素计数值有关的大多数当前信息被完全更新。重复该处理，直到区域计数为一710，这是最低标记，并且不可能有另一位置合并到其中。此时，合并区域处理终止712。现在，输出将是与验证数组一起存储为边界框数据的值，该验证数组识别边界框数据的哪些部分包含具有未被合并或具有最新合并信息的区域的框。The necessity of an update is determined by looking at the current index of the for loop and comparing it to the merged into area 702 indexed by the same value. If a connection to a different smaller marker is found sometime during the search/mark phase, the merged into region 702 array should be updated 704 to a lower value than the index. If this never happened, then the merge into the region would naturally have a larger number from the original preparation stage. Therefore, if the index of the for loop is greater than the information stored in the merged region with that same index, the system needs to update 706 the bounding box data to include the newly found information. The bounding box data will contain Xmin, Ymin, Xmax, Ymax, and pixel count information; therefore, to update 706, the process continues to look at the bounding box data in both index locations (ie, (1) the bounding box data for the current index and ( 2) Store the bounding box data of the value merged into the region with that same index). Use both positions; compare the min to see which has a smaller value, compare the max to see which has a larger value, and combine the pixel count values. The information is then stored back into the bounding box data indexed by the merge-to-region value indexed by the for loop that will contain the lower marker. Once the array reaches this lower index 708, the bounding box data is fully updated at that location with most of the current information about the minimum count, maximum count, and pixel count. This process is repeated until the region count is one 710, which is the lowest mark, and it is impossible for another location to merge into it. At this point, the merge area process terminates 712 . The output will now be a value stored as bounding box data along with a validation array that identifies which parts of the bounding box data contain boxes with regions that are not merged or have the latest merged information.

(3.3)硬件边界框实现(3.3) Hardware bounding box implementation

如上所述，本公开还提供了一种用于生成边界框的数字硬件实现。创建数字硬件实现需要将边界框减小到某个已知值范围。出于例示的目的，关于宽为512个像素、长为256个像素的图像描述实现，其中各个像素都包含一位值。该设计将从外部模块进行控制，使得边界框模块将接收启动信号，并且需要允许根据请求对必要图像位置编索引。为了满足图像处理设计的其它规范，还实现了附加过滤，并将所提供的结果减少到排序前15位的框。找到所有边界框并将其存储在存储器中；然而，该模块将具体提供15个边界框，所述15个边界框按照下文进一步详细讨论的方式进行排序。As mentioned above, the present disclosure also provides a digital hardware implementation for generating bounding boxes. Creating a digital hardware implementation requires reducing the bounding box to some known range of values. For illustration purposes, the implementation is described with respect to an image that is 512 pixels wide and 256 pixels long, where each pixel contains a one-bit value. The design will be controlled from an external module such that the bounding box module will receive the enable signal and needs to allow the necessary image positions to be indexed upon request. To meet other specifications for image processing design, additional filtering is implemented and the provided results are reduced to the top 15 sorted boxes. All bounding boxes are found and stored in memory; however, the module will specifically provide 15 bounding boxes, ordered in a manner discussed in further detail below.

就像在软件设计中一样，硬件可以概括为三个阶段，即，准备、搜索/标记和合并区域。然而，在该硬件实现中，还有一个称为回调和排序的附加阶段。硬件中的转换还要求功能按特定时钟周期完成。在这种情况下，算法已分解为不同状态。另外，为了减少使用许多触发器的硬件负担，将大边界框数组存储在块随机存取存储器(BRAM)中。触发器是存储位的一种类型的寄存器。该触发器可以在FPGA的构造中找到，并且为了减少存储创建数据所需的量，可以将该触发器放入BRAM(这是FPGA中的另一组件)。这进一步要求将算法分解为多个状态，其中一些状态用于隐藏索引并且从BRAM接收信息。Just like in software design, hardware can be generalized into three phases, namely, prepare, search/mark, and merge regions. However, in this hardware implementation, there is an additional stage called callback and ordering. The translation in hardware also requires the function to complete on a specific clock cycle. In this case, the algorithm has been decomposed into different states. Additionally, to reduce the hardware burden of using many flip-flops, a large bounding box array is stored in block random access memory (BRAM). A flip-flop is a type of register that stores bits. This flip-flop can be found in the construction of the FPGA, and to reduce the amount of data needed to store the creation, the flip-flop can be placed in BRAM (another component in the FPGA). This further requires decomposing the algorithm into multiple states, some of which are used to hide indexes and receive information from BRAM.

(3.3.1)准备(3.3.1) Preparation

硬件准备部分翻译为算法中所需的变量的实例化和一些初始化。如之前所讨论的并且如图8所示，硬件的限制要求标记变量的大小。该处理在图8中例示，其中，一些框表示数组800，而其余框均表示值。最大限制是可以提供的标记数量。在一个示例中，标记数量减少到256，这反过来将设置许多数组的范围。就像前文所描述的，针对输入图像801包括标记图像数组802，在该示例中，输入图像801的高度将是255个维度并且宽度是511(图像的宽度)(维度)。包含的数据将具有最大为255的最大区域计数804，这意味着8位的大小可以表示这些范围。合并到区域806的宽度将为256，其中值为255那样大(这意味着8位的大小)。根据经验，关于宽度的任何项将为511那样大(需要9位)，而高度将为255那样大(需要8位)。被呈现以供使用的像素的宽度为1位。由于BRAM 808的大小与最大标记(其为256)一样长，因此写入和读取地址的宽度将为8位。BRAM将包含来自边界框BRAM数组808的所有信息810。如在上述软件情况中一样，所包含的信息810将是Xmax、Ymax、Xmin、Ymin和像素计数的信息。然而，除此之外，硬件实现还包括Xsize和Ysize，其是简单的(Xmax–Xmin或Ymax–Ymin)计算。The hardware preparation part translates to the instantiation and some initialization of the variables needed in the algorithm. As discussed earlier and shown in Figure 8, hardware limitations dictate the size of the tag variable. This process is illustrated in Figure 8, where some boxes represent arrays 800 and the remaining boxes represent values. The maximum limit is the number of tokens that can be provided. In one example, the number of tokens is reduced to 256, which in turn will set the range of many arrays. As previously described, for input image 801 to include marker image array 802, in this example, input image 801 would have a height of 255 dimensions and a width of 511 (the width of the image) (dimensions). The included data will have a maximum region count of 804 of 255, which means that 8 bits in size can represent these ranges. The width merged into region 806 will be 256, where the value is as large as 255 (which means 8 bits in size). As a rule of thumb, any item for width will be as large as 511 (requires 9 bits) and height will be as large as 255 (requires 8 bits). Pixels rendered for use have a width of 1 bit. Since the size of BRAM 808 is as long as the maximum mark (which is 256), the width of the write and read addresses will be 8 bits. The BRAM will contain all the information 810 from the bounding box BRAM array 808 . As in the software case described above, the included information 810 would be that of Xmax, Ymax, Xmin, Ymin and pixel count. However, in addition to this, the hardware implementation also includes Xsize and Ysize, which are simple (Xmax-Xmin or Ymax-Ymin) calculations.

就像在软件中一样，所有变量都必须被初始化。可以在重置和状态0中初始化变量。在重置中，标记图像和合并到区域806中的所有值均被设置为255，这将是最大值标记，将所有其它值设置为0。由于不知道BRAM中包含的内容，因此边界框BRAM808将不被设置为任何值。相反，期望跟踪写指针的位置，以获知BRAM的哪个部分有效。在状态0 812中，当给出启动命令时，标记图像和合并到区域值被设置为255，状态被设置为1，Y_先前被设置为0，Y_当前被设置为1，所有其它值被设置为0。作为提醒，数字实现从0开始而不是从1开始对第一行编索引，这先前在软件实现中完成。Just like in software, all variables must be initialized. Variables can be initialized in reset and state 0. In reset, the marker image and all values incorporated into area 806 are set to 255, which will be the maximum value marker, and all other values are set to 0. Since it is not known what is contained in the BRAM, the bounding box BRAM 808 will not be set to any value. Instead, it is desirable to keep track of the position of the write pointer to know which part of the BRAM is valid. In state 0 812, when the start command is given, the Mark Image and Merge to Region values are set to 255, the state is set to 1, Y was _previously set to 0, Y _{is currently} set to 1, and all other values are set is 0. As a reminder, the numeric implementation indexes the first row from 0 instead of 1, which was previously done in the software implementation.

(3.3.2)搜索/标记(3.3.2) Search/Mark

与软件一起，后续是搜索/标记部分。由于边界框位于BRAM中，因此搜索/标记功能被拆分，以允许读取BRAM。考虑到该约束，可以将搜索/标记软件设计进一步分成不同状态部分，以利用时钟延迟。因此，搜索/标记通过特殊增量器阶段被分为三个状态。Along with the software, the search/tagging part follows. Since the bounding box is in the BRAM, the search/mark function is split to allow the BRAM to be read. Given this constraint, the search/mark software design can be further divided into different state parts to take advantage of clock delays. Therefore, the search/mark is divided into three states by a special incrementer stage.

如图9A和图10所示，状态1将包含用于在当前搜索位置找到新像素或未找到像素的条件。如果不存在硬性停止条件900，并且如果找到902当前有效像素，则期望确定像素的邻居(如图4所示)中的任一个是否具有有效像素904。如果没有一个邻居具有有效像素，则标记图像数和边界框BRAM被更新906，并且增量器908被激活918。对边界框BRAM的写命令将花费一个时钟周期，但是由于该处理(经由增量器908)使写入地址递增，以决不重写BRAM中的区域，并且状态1不读取BRAM，所以该处理可以返回到状态1 910而没有任何问题，从而不需要不必要的等待状态。为了进一步理解，FPGA/硬件由每个时钟周期并行运行的处理组成，并且将在每个时钟周期结束时确认动作。还应该注意，BRAM在某些时钟周期延迟下运行。BRAM是系统将要写入和读取的位置。考虑到FPGA在时钟周期结束时确认动作并且BRAM中存在延迟，因此期望在写入操作完成之前不会不正确地访问BRAM。换句话说，系统不尝试读取当前正在写入的位置，而是仅应在写入延迟结束后才读取该位置。还应注意，状态1不重写位置或不需要读取，它仅在处理返回到状态1 910的情况下才进行写入。这将隐藏写入时钟周期，使得BRAM准备就绪并且无需将等待时钟添加到该状态。As shown in Figures 9A and 10, state 1 will contain the conditions for finding a new pixel or no pixel at the current search position. If no hard stop condition 900 exists, and if a currently valid pixel is found 902 , it is desirable to determine whether any of the pixel's neighbors (as shown in FIG. 4 ) have a valid pixel 904 . If none of the neighbors has a valid pixel, the labeled image number and bounding box BRAM are updated 906, and the incrementer 908 is activated 918. A write command to the bounding box BRAM will take one clock cycle, but since the process increments the write address (via incrementer 908) to never rewrite the region in the BRAM, and state 1 does not read the BRAM, the Processing can return to state 1 910 without any problems, thus eliminating the need for unnecessary wait states. For further understanding, the FPGA/hardware consists of processing running in parallel each clock cycle and will acknowledge actions at the end of each clock cycle. It should also be noted that BRAM operates with some clock cycle delay. BRAM is where the system will write and read. Considering that the FPGA acknowledges the action at the end of the clock cycle and there is a delay in the BRAM, it is expected that the BRAM will not be accessed incorrectly until the write operation is complete. In other words, the system does not attempt to read the location that is currently being written to, but should only read that location after the write delay has elapsed. It should also be noted that state 1 does not rewrite the location or does not require a read, it only writes if processing returns to state 1 910. This hides the write clock cycles so that the BRAM is ready and no wait clocks need to be added to this state.

如果相邻像素904中存在有效像素，则移动至状态2 912。数字实现的独特之处是硬性停止900和增量器908的功能。增量器908将充当for循环，从而移动当前像素并请求下一组像素值。一旦读取了整个图像，增量器908就将状态机移动至状态4 914，以开始合并区域部分。然而，由于给出过多标记而存在溢出的机会，因此状态1实现了硬性停止900，查看该处理何时比最大标记范围小。如果找到了，则不需要继续搜索图像，而是处理前进至状态4 916以开始合并区域部分。如果系统没有成功实现硬性停止并且没有找到有效像素，则920标记图像必须将存储在该像素位置的数据重置为最大区域计数。这将确保在系统比较标记图像位置(参见图12)时，最低位置是最新的，并且仅包含与该图像的该部分有关的有效数据。If there is a valid pixel in adjacent pixel 904, move to state 2 912. Unique to the digital implementation is the functionality of the hard stop 900 and incrementer 908. Incrementor 908 will act as a for loop, moving the current pixel and requesting the next set of pixel values. Once the entire image has been read, the incrementer 908 moves the state machine to state 4 914 to begin merging the region portions. However, since there is a chance of overflow given too many marks, state 1 implements a hard stop 900 to see when the process is less than the maximum mark range. If found, there is no need to continue searching for the image, but instead the process proceeds to state 4 916 to begin merging the region portions. If the system does not successfully implement a hard stop and does not find a valid pixel, the 920 marker image must reset the data stored at that pixel location to the maximum area count. This will ensure that when the system compares marker image positions (see Figure 12), the lowest position is up-to-date and only contains valid data pertaining to that part of the image.

此外，图10例示了增量器908处理，示出了进行至状态1 910或状态4 914之间的决策。在激活增量器908之后，如果该处理尚未读取到图像的边缘1000，则系统使“X”索引递增，以移动更接近图像的边缘1002并进行至状态1 910。另选地，如果该处理已读取到图像的边缘1000，则将“X”索引重置为1 1004，并且确定处理是否在图像的末尾处1006。如果不是，则使“Y”索引递增，以移动更接近图像的末尾，并且交换“Y_当前”和“Y_先前”的值1008，并且进行至状态1 910。另选地，如果是，则将“Y_当前”和“Y_先前”重置为其初始值，并且将有效区域计数锁定为区域计数的当前值1010，并且进行至状态4 914。In addition, FIG. 10 illustrates incrementer 908 processing, showing the decision to proceed between state 1 910 or state 4 914. After activating the incrementer 908, if the process has not read the edge 1000 of the image, the system increments the "X" index to move closer to the edge 1002 of the image and proceeds to state 1 910. Alternatively, if the process has read to the edge of the image 1000, the "X" index is reset to 1 1004, and it is determined 1006 whether the process is at the end of the image. If not, increment the "Y" index to move closer to the end of the image and swap the values of "Y _current " and "Y _previous " 1008 and proceed to state 1 910. Alternatively, if so, reset "Y _current " and "Y _previous " to their initial values, and lock the active area count to the current value of the area count 1010 and proceed to state 4 914 .

如图11所示，状态2使用被称为当前标记模块1100的另一模块，该当前标记模块1100在图12中进一步详细示出。这里，时钟周期延迟被用于执行当前标记指派1112，以准备用于该阶段(注意上面关于FPGA时钟周期和BRAM读取/写入延迟的讨论)。再次参考图11，由于边界框BRAM 1102将花费一个时钟周期来读取，因此必须与当前标记1104一起发送读取信号以读取包含将被组合的数据的地址。BRAM需要信号以及地址来指示该处理将进行读取。“当前标记”1102将是读取地址。请注意上面关于“搜索/标记”部分的功能的注释，其中该系统在此处组合标记以供之后合并。As shown in FIG. 11 , state 2 uses another module called the current tagging module 1100 , which is shown in further detail in FIG. 12 . Here, the clock cycle delay is used to perform the current tag assignment 1112 in preparation for this stage (note the discussion above regarding FPGA clock cycles and BRAM read/write delays). Referring again to Figure 11, since the bounding box BRAM 1102 will take one clock cycle to read, a read signal must be sent along with the current tag 1104 to read the address containing the data to be combined. The BRAM needs a signal as well as an address to indicate that the process is to read. "Current Tag" 1102 will be the read address. Note the note above about the functionality of the Search/Tag section, where the system combines tags here for later merging.

状态2将仅包含设置合并到区域的部分。存在状态2以利用数据设置“合并到区域”1108数组，该数据稍后将在状态3a 1110和“合并区域”阶段期间使用。State 2 will only contain the part where the settings are merged into the region. State 2 exists to set the "Merge to Region" 1108 array with data that will be used later during State 3a 1110 and the "Merge Region" phase.

就像在软件方面一样，将当前标记1104与存储在合并到区域中的内容进行比较1106，以基于有效邻近像素1114进行更新，针对有效邻近像素1114，系统将“合并到区域”数组中找到的值与最低标记值进行比较。较低“赋值”标记将是存储在“合并到区域”数组中的值。因为标记是按连续顺序提供的，所以较低标记应该是合并区域部分中随后使用的参考。参见例如例示了状态2代码的图9B和例示了当前标记模块1102的图12。As on the software side, the current marker 1104 is compared 1106 to the content stored in the merged into region to make an update based on the effective neighbors 1114 for which the system will "merge into region" found in the array The value is compared to the lowest tag value. The lower "assignment" token will be the value stored in the "merge into area" array. Because the markers are provided in consecutive order, the lower markers should be the reference used later in the merged region section. See, eg, FIG. 9B , which illustrates the state 2 code, and FIG. 12 , which illustrates the current tagging module 1102 .

图13所示的状态3涵盖了搜索/标记阶段的最后部分。在状态3中，边界框BRAM1102被更新。然而，由于在状态2中发送了读取，因此该数据仅在一个时钟周期后才有效。因此，将状态3参考为两个部分。状态3A 1300将等待一个时钟周期以获得来自BRAM 1102的有效数据，然后状态3B 1302将更新边界框BRAM 1102。假定在第二周期内接收到数据，则可以执行对边界框BRAM 1102的更新。更新包括将从计算出的当前标记位置读出的数据与当前信息进行组合。就像在软件中一样，期望比较x位置和y位置的最大值和最小值，然后将较大和较小的那个值设置回到边界框BRAM 1102中。针对找到的新像素，还期望将像素计数增加到更大的值。数字实现还增加了另一部分来计算X和Y方向的大小。这稍后将被用作针对不需要的边界框的过滤器。另一增加是过滤出不满足像素大小的特定范围1308的不需要框。下边界和上边界像素计数被用于验证边界框BRAM 1102存储(其中像素计数位于被指定为有效的有效像素范围1310内，而有效像素范围之外的像素计数被指定为无效1312)。因此，有效区域数组被用于通过将值“1”指派给容纳有效数据集的地址来确定应在稍后阶段测试存储在BRAM中的哪些信息。稍后将循环通过有效区域数组，以查看系统在回调和排序阶段期间是否应从该地址位置读取存储在BRAM中的数据。最终，系统激活增量器908并将处理1304发送至状态1。然而，如果增量器908检测到该处理在最后像素位置，则它反而将处理1306发送回至状态4，如图10所示。State 3 shown in Figure 13 covers the last part of the search/marking phase. In state 3, the bounding box BRAM 1102 is updated. However, since the read was sent in state 2, this data is only valid after one clock cycle. Therefore, state 3 is referred to as two parts. State 3A 1300 will wait one clock cycle for valid data from BRAM 1102, then state 3B 1302 will update bounding box BRAM 1102. Assuming data is received within the second cycle, an update to bounding box BRAM 1102 may be performed. Updating includes combining the data read from the calculated current marker position with the current information. As in software, it is desirable to compare the maximum and minimum values for the x and y positions, and then set the larger and smaller values back into the bounding box BRAM 1102. It is also desirable to increase the pixel count to a larger value for new pixels found. The digital implementation also adds another part to calculate the size in the X and Y directions. This will later be used as a filter against unwanted bounding boxes. Another addition is to filter out unwanted boxes that do not meet a certain range 1308 of pixel sizes. The lower and upper boundary pixel counts are used to validate bounding box BRAM 1102 storage (where pixel counts are within the valid pixel range 1310 designated as valid, and pixel counts outside the valid pixel range are designated as invalid 1312). Therefore, the valid area array is used to determine which information stored in the BRAM should be tested at a later stage by assigning the value "1" to the address containing the valid data set. The array of valid regions will be looped through later to see if the system should read the data stored in BRAM from this address location during the callback and sort phases. Finally, the system activates incrementer 908 and sends process 1304 to state 1 . However, if the incrementer 908 detects that the process is at the last pixel position, it instead sends the process 1306 back to state 4, as shown in FIG. 10 .

如图14所示，状态4、状态5和状态6涵盖了合并区域阶段。在状态4中，系统通过开始于区域计数1400然后向下计数以确定是否需要合并数据，来在合并到区域1402中进行搜索。如果是，则需要来自边界框BRAM 1102的两个地址(区域计数1400和合并到区域[区域计数]1402)，并且系统写回到边界框BRAM 1002位置。如果不是，则系统基于在状态6 14261430期间创建并最终通过1418实现的数据，通过减小区域计数1422和减小有效区域计数1420来继续在合并到区域中进行搜索。As shown in Figure 14, state 4, state 5 and state 6 cover the merged region phase. In state 4, the system searches in the merged area 1402 by starting at the area count 1400 and then counting down to determine if the data needs to be merged. If so, two addresses from the bounding box BRAM 1102 are required (region count 1400 and merge into region [region count] 1402), and the system writes back to the bounding box BRAM 1002 location. If not, the system continues the search in the merged region by decreasing the region count 1422 and decreasing the valid region count 1420 based on the data created during state 6 14261430 and finally implemented through 1418 .

由于BRAM 1102读取，所以该处理必须返回至状态4，以涵盖读取的一个时钟周期延迟并请求不同地址。为了清楚起见，状态4被分为两部分，用于确定是否需要合并的状态4a 1404以及包括时钟周期等待和读取下一地址的状态4b 1406。Since the BRAM 1102 reads, the process must return to state 4 to cover the one clock cycle delay of the read and request a different address. For clarity, state 4 is divided into two parts, state 4a 1404, which is used to determine if a merge is required, and state 4b 1406, which includes clock cycle waiting and reading the next address.

状态5 1408非常简单，因为已知BRAM 1102具有来自状态4a 1404的有效数据。因此，必须保存1410已到达的数据，使得稍后可以使用该数据与在状态4b 1406中读取的地址进行比较。State 5 1408 is very simple because BRAM 1102 is known to have valid data from state 4a 1404. Therefore, the data that has arrived at 1410 must be saved so that it can be used later to compare with the address read in state 4b 1406.

状态6 1412然后将比较1414从两个边界框BRAM 1102读取接收的两组数据，然后将合并信息相应地存储回具有较低地址的边界框BRAM 1102中。由于写入将发生在状态4a1404期间，因此在激活下一地址的读取之前，系统将已正确写入。到达状态6指示当前区域计数将被合并，因此系统将使该区域对于有效区域数组无效，并且减小有效区域计数1424。如上所述，可以添加过滤器以过滤掉不满足像素计数的特定范围1426的区域(不需要的框)，其中像素计数位于被指定为有效的有效像素范围1428内，而将有效像素范围之外的像素计数指定为无效1430。State 6 1412 will then compare 1414 to read the two sets of data received from the two bounding box BRAMs 1102 and then store the merge information back into the bounding box BRAM 1102 with the lower address accordingly. Since the write will occur during state 4a1404, the system will have written correctly before activating the read of the next address. Reaching state 6 indicates that the current region count will be merged, so the system invalidates the region for the valid region array and decrements the valid region count by 1424. As described above, a filter can be added to filter out regions (unwanted boxes) that do not meet a specific range 1426 of pixel counts that are within the valid pixel range 1428 designated as valid, and will be outside the valid pixel range The pixel count specified as invalid 1430.

该实现的独特之处是状态7 1416承载的附加回调和排序阶段。具体地并且如图15和图16所示，状态7分别集中于回调操作和排序操作。如图15的回调操作所示，该阶段将从BRAM 1102回调所有有效信息，直到该处理读出了已经存储的所有有效区域为止。根据先前状态，该处理一直对边界框BRAM 1102中有多少个位置在过滤后仍是有效的进行计数，并利用有效数组跟踪地址。因此，为了回调所有有效位置，有效数组用于检查BRAM 1102中的区域是否具有有效数据，然后使有效数据计数递减。在读取时，BRAM具有延迟的两个时钟周期。通过在地址零1512处的有效区域1510开始从BRAM读取1508，并且通过在返回状态71516之前迫使一个时钟周期等待1514，来处理从状态4a移动至状态7。该思想是不断从边界框BRAM 1102读取地址，使得该处理仅比每次读取晚一个时钟周期1500。状态7 1416将基于从边界框BRAM 1102的读取是否有效1502并且满足附加过滤来进行过滤。在返回至状态01506之前，在1504中包括延迟时钟周期，以解决附加过滤延迟。框1518示出了对有效区域的搜索的结束，因此必须强制系统停止对1520中看到的地址进行持续读取。Unique to this implementation is the additional callback and ordering phase carried by state 7 1416. Specifically and as shown in Figures 15 and 16, State 7 focuses on callback operations and sorting operations, respectively. As shown in the recall operation of Figure 15, this stage will recall all valid information from BRAM 1102 until the process has read out all valid areas that have been stored. From the previous state, the process keeps counting how many locations in the bounding box BRAM 1102 are still valid after filtering, and keeps track of the addresses using the valid array. Therefore, in order to recall all valid locations, the valid array is used to check whether the area in BRAM 1102 has valid data, and then decrements the valid data count. When reading, the BRAM has a delay of two clock cycles. The move from state 4a to state 7 is handled by starting a read 1508 from BRAM at the active region 1510 at address zero 1512, and by forcing a one clock cycle wait 1514 before returning to state 71516. The idea is to keep reading addresses from the bounding box BRAM 1102 so that the process is only one clock cycle 1500 later than each read. State 7 1416 will filter based on whether the read from bounding box BRAM 1102 is valid 1502 and satisfies additional filtering. A delay clock cycle is included in 1504 to account for additional filtering delays before returning to state 0 1506. Block 1518 shows the end of the search for the valid region, so the system must be forced to stop continuous reads from the address seen in 1520.

在状态3和状态6期间，像素计数有效范围被用作验证边界框BRAM 1102数据的初始条件。现在，期望通过比较Xsize(Xmax–Xmin)与Ysize(Ymax–Ymin)来过滤出形状奇怪的“框”。如果Xsize/Ysize或Ysize/Xsize<＝30％(或任何其它预定值)，则这些框无效。除此之外，期望找到填充有像素斑点的框，从而覆盖找到的框的良好区域。因此，系统还通过检查以查看(像素计数)/Xsize×Ysize<＝30％(或任何其它预定值)并将这些位置设置为无效来进行过滤。如果数据已通过所有过滤器，则将其标记为有效排序并传递到“排序”部分。只有那些有效的框才被排序，并且在本地保存以用于其它模块。由于排序中的延迟，状态7可能在最后一次从BRAM 1102读取有效数据之后完成8个时钟周期。因此，例如，该处理在确定边界框已完成并且当前被保存之前等待8个时钟周期。由于排序部分地在另一模块中完成，因此将重置值，重置是从状态0到状态1的转变。During State 3 and State 6, the pixel count valid range is used as an initial condition for validating bounding box BRAM 1102 data. Now, expect to filter out oddly shaped "boxes" by comparing Xsize(Xmax – Xmin) with Ysize(Ymax – Ymin). These boxes are invalid if Xsize/Ysize or Ysize/Xsize <= 30% (or any other predetermined value). In addition to this, it is desirable to find boxes filled with pixel blobs, thus covering a good area of the found boxes. Therefore, the system also filters by checking to see (pixel count)/Xsize x Ysize <= 30% (or any other predetermined value) and setting these positions to invalid. If the data has passed all filters, it is marked as a valid sort and passed to the "sort" section. Only those boxes that are valid are sorted and saved locally for use in other modules. Due to delays in sequencing, state 7 may complete 8 clock cycles after the last valid data read from BRAM 1102. So, for example, the process waits 8 clock cycles before determining that the bounding box is complete and is currently saved. Since the sorting is partially done in another module, the value will be reset, which is a transition from state 0 to state 1.

如图16所示，在状态7和另外的排序模块1600中完成排序。有效bram读取1508和边界框数据1102被过滤1604以识别有效排序。另外，读取的结果和有效排序具有添加的一个时钟延迟1606、1608。如果发现数据是有效排序1602，则由排序模块1600进一步对其进行排序。As shown in FIG. 16 , the sorting is done in state 7 and further sorting module 1600 . Valid bram reads 1508 and bounding box data 1102 are filtered 1604 to identify valid orderings. In addition, the result of the read and the valid ordering have an added one clock delay 1606, 1608. If the data is found to be in a valid ordering 1602, it is further ordered by the ordering module 1600.

为了进一步理解，图17示出了状态7的流程图，其集中于各个单独排序模块中的排序操作。各个排序模块首先通过确定1700与边界框或区域相关联的排序是否有效或是否已经发布重置排序来开始。如果由有效排序值0确定排序是无效排序，则模块将发出传递无效排序命令和相关数据的“0”值。如果排序是重置排序1704，则系统清空存储在排序中的数据1706。存储在排序中的清空数据1706是指本地存储的排序数据。该排序数据稍后将保存从回调BRAM读取中找到的值，其最终本地存储像素计数、xmax、xmin、ymax、ymin和(Xsize＝xmax-xmin与Ysize＝ymax-ymin)之间的最大者。排序模块均通过将像素计数除以Xsize与Ysize之间的最大者来创建排序编号。该排序编号和有效排序在排序模块之间传递1708，使得较高排序编号保持在顶部，而较低排序编号下降以开放排序或完全离开保存区域。通过首先确定1710传入排序是否大于当前存储数据的较高排序来进行该处理。如果是，则将传入排序设置为1712当前存储数据的较高排序，其中先前存储的较高排序数据然后被设置1714为所存储的较低数据，并且从排序模块中移除1716。如果传入排序小于当前存储数据的较高排序，则确定1718传入排序是否大于先前存储数据的较低排序。如果否，则从排序模块中移除1722该传入排序数据。另选地，如果传入排序大于先前存储数据的较低排序，则将传入排序数据设置1720为较低排序存储数据，并且将其从排序模块传递出去1716。For further understanding, Figure 17 shows a flow diagram of State 7, which focuses on the sorting operations in each individual sorting module. Each ordering module begins by first determining 1700 whether the ordering associated with the bounding box or region is valid or whether a reset ordering has been issued. If the sort is determined to be an invalid sort by a valid sort value of 0, the module will issue a "0" value passing the invalid sort command and associated data. If the ordering is a reset ordering 1704, the system empties 1706 the data stored in the ordering. Flush data 1706 stored in the sort refers to locally stored sort data. This sorted data will later hold the value found from the callback BRAM read, which ends up locally storing the maximum of the pixel count, xmax, xmin, ymax, ymin and (Xsize=xmax-xmin and Ysize=ymax-ymin) . Sorting modules all create sorting numbers by dividing the pixel count by the largest between Xsize and Ysize. This sort number and the valid sort are passed 1708 between the sort modules so that the higher sort numbers remain at the top, while the lower sort numbers drop to open the sort or leave the save area entirely. This is done by first determining 1710 whether the incoming ordering is greater than the higher ordering of the currently stored data. If so, the incoming ordering is set 1712 to the higher ordering of the currently stored data, where the previously stored higher ordering data is then set 1714 to the lower stored data, and removed 1716 from the ordering module. If the incoming ordering is less than the higher ordering of the currently stored data, then it is determined 1718 whether the incoming ordering is greater than the lower ordering of the previously stored data. If not, the incoming sorting data is removed 1722 from the sorting module. Alternatively, if the incoming ordering is greater than the lower ordering of the previously stored data, then the incoming ordering data is set 1720 to the lower ordering storage data and passed out 1716 from the ordering module.

(3.3.3)硬件实现结果(3.3.3) Hardware Implementation Results

进行模拟，其中使已知图像经历上述边界框实现。遵守算法需求并且如图18所示，已知图像1800的大小为512×256个像素，具有单个位像素(即，一位值用于各个像素位置)。通过使图像1800通过过滤器，系统识别出220个标记位置，这些标记位置随后合并为182个唯一边界框。通过排序，系统将边界框过滤成只剩前15位“排序”位置(如图19所示)。因此，这表明了本公开的边界框处理在图像中的被识别对象中并且围绕这样的对象生成边界框是有效的。基于此，可以对视频图像中的连续帧实现本文所述的边界框处理，以按照任何期望设置用作高效且有效的移动跟踪器。A simulation is performed in which the known image is subjected to the bounding box implementation described above. Following the algorithm requirements and as shown in Figure 18, the known image 1800 is 512x256 pixels in size, with a single bit pixel (ie, one bit value for each pixel location). By passing the image 1800 through the filter, the system identified 220 marker locations, which were then merged into 182 unique bounding boxes. With sorting, the system filters the bounding box to only the top 15 "sorted" positions (as shown in Figure 19). Thus, this demonstrates that the bounding box processing of the present disclosure is effective in identifying objects in an image and generating bounding boxes around such objects. Based on this, the bounding box processing described herein can be implemented on consecutive frames in a video image to function as an efficient and effective motion tracker in any desired setting.

(3.4)设备的控制。(3.4) Control of equipment.

如图20所示，处理器2000可以用于基于边界框生成来控制设备2002(例如，移动设备显示器、虚拟现实显示器、增强现实显示器、计算机监测器、马达、机器、无人机、相机等)。设备2002的控制可以用于将对象的定位转换成表示对象的静止图像或视频。在其它实施方式中，可以基于鉴别和定位来控制设备2002，以使设备移动或以其它方式发起物理动作。As shown in Figure 20, a processor 2000 can be used to control a device 2002 (eg, mobile device display, virtual reality display, augmented reality display, computer monitor, motor, machine, drone, camera, etc.) based on bounding box generation . The controls of device 2002 can be used to convert the location of the object into a still image or video representing the object. In other embodiments, device 2002 may be controlled based on authentication and positioning to move the device or otherwise initiate physical action.

在一些实施方式中，可以控制无人机或其它自主交通工具移动到基于图像确定对象的定位的区域。在又一些其它实施方式中，可以通过将移动边界框保持在视场内来控制相机，以跟踪所识别的对象。换句话说，致动器或马达被激活以使相机(或传感器)移动，以将边界框保持在视场内，使得操作者或其它系统可以识别并跟踪对象。作为又一示例，该设备可以是自主交通工具(诸如，无人飞行器(UAV))，该自主交通工具包括相机和本文所述的边界框设计。在操作中并且当由在UAV中实现的系统生成边界框时，可以使UAV操纵跟随对象，使得边界框保持在UAV的视场内。例如，UAV的转子和其它组件被致动以使UAV追踪并跟随对象。In some embodiments, a drone or other autonomous vehicle may be controlled to move to an area where the location of the object is determined based on the imagery. In still other embodiments, the camera may be controlled to track the identified object by maintaining the moving bounding box within the field of view. In other words, actuators or motors are activated to move the camera (or sensor) to keep the bounding box within the field of view so that an operator or other system can identify and track objects. As yet another example, the device may be an autonomous vehicle, such as an unmanned aerial vehicle (UAV), that includes a camera and the bounding box design described herein. In operation and when a bounding box is generated by a system implemented in the UAV, the UAV can be made to manipulate the following object so that the bounding box remains within the UAV's field of view. For example, the rotors and other components of the UAV are actuated to cause the UAV to track and follow objects.

最后，虽然已经根据多个实施方式对本发明进行了说明，但本领域普通技术人员应当容易地认识到，本发明可以在其它环境中具有其它应用。应注意，可以有许多实施方式和实现。另外，“用于…的装置”的任何用语旨在引发要素和权利要求的装置加功能的解读，而未特别使用“用于…的装置”用语的任何要素不应被解读为装置加功能要素，即使权利要求以其它方式包括了“装置”一词。此外，虽然已经按特定顺序陈述了特定方法步骤，但这些方法步骤可以按任何期望的顺序进行，并且落入本发明的范围内。Finally, while the invention has been described in terms of various embodiments, those of ordinary skill in the art will readily recognize that the invention may have other applications in other environments. It should be noted that many implementations and implementations are possible. Additionally, any term "means for" is intended to induce a means-plus-function interpretation of the elements and claims, and any element that does not specifically use the term "means for" should not be read as a means-plus-function element , even if the claim otherwise includes the term "means". Furthermore, although specific method steps have been stated in a specific order, these method steps can be performed in any desired order and are within the scope of the present invention.

Claims

1. A system for bounding box generation, the system comprising:

a memory and one or more processors, the memory having executable instructions such that when the instructions are executed, the one or more processors perform the following operations:

receiving an image consisting of pixels having a value of one bit per pixel;

generating bounding boxes around connected components in the image, the connected components having pixel coordinates and pixel count information;

generating a ranking score for each bounding box based on the pixel coordinates and the pixel count information;

filtering the bounding box to remove bounding boxes that exceed a predetermined size and a predetermined pixel count based on the pixel coordinates and the pixel count information; and

filtering the bounding boxes to remove bounding boxes below a predetermined ranking score, resulting in remaining bounding boxes; and

The device is controlled based on the remaining bounding box.

2. The system of claim 1, wherein the processor is a field programmable gate array (FPGA).

3. The system of claim 1, wherein generating the bounding box further comprises the following operations:

grouping consecutive pixels in the image; and

Connected pixels are merged into connected components, and the bounding box is formed by boxes surrounding the connected components.

4. The system of claim 1, wherein controlling the device comprises moving a video stage to maintain at least one remaining bounding box of the remaining bounding boxes within a field of view of the video stage.

5. A computer program product for bounding box generation, the computer program product comprising:

A non-transitory computer-readable medium having executable instructions encoded thereon so that when the instructions are executed by one or more processors, the one or more processors Do the following:

receiving an image consisting of pixels having a value of one bit per pixel;

The device is controlled based on the remaining bounding box.

6. The computer program product of claim 5, wherein the processor is a field programmable gate array (FPGA).

7. The computer program product of claim 5, wherein generating the bounding box further comprises the following operations:

grouping consecutive pixels in the image; and

8. The computer program product of claim 5, wherein controlling the apparatus comprises moving a video platform to maintain at least one remaining bounding box of the remaining bounding boxes within a field of view of the video platform .

9. A computer-implemented method for bounding box generation, the method comprising the actions of:

Cause one or more processors to execute instructions encoded on a non-transitory computer-readable medium, such that when the instructions are executed, the one or more processors:

receiving an image consisting of pixels having a value of one bit per pixel;

The device is controlled based on the remaining bounding box.

10. The method of claim 9, wherein the processor is a field programmable gate array (FPGA).

11. The method of claim 9, wherein generating the bounding box further comprises the following operations:

grouping consecutive pixels in the image; and

12. The method of claim 9, wherein controlling the device comprises moving a video platform to maintain at least one remaining bounding box of the remaining bounding boxes within a field of view of the video platform.