CN117911954B - Weak supervision target detection method and system for operation and maintenance of new energy power station - Google Patents
Weak supervision target detection method and system for operation and maintenance of new energy power station Download PDFInfo
- Publication number
- CN117911954B CN117911954B CN202410111517.1A CN202410111517A CN117911954B CN 117911954 B CN117911954 B CN 117911954B CN 202410111517 A CN202410111517 A CN 202410111517A CN 117911954 B CN117911954 B CN 117911954B
- Authority
- CN
- China
- Prior art keywords
- maintenance
- target detection
- new energy
- energy power
- image block
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/52—Surveillance or monitoring of activities, e.g. for recognising suspicious objects
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
- G06N3/0455—Auto-encoder networks; Encoder-decoder networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/0895—Weakly supervised learning, e.g. semi-supervised or self-supervised learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/07—Target detection
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y04—INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
- Y04S—SYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
- Y04S10/00—Systems supporting electrical power generation, transmission or distribution
- Y04S10/50—Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Biophysics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Multimedia (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Testing And Monitoring For Control Systems (AREA)
- Supply And Distribution Of Alternating Current (AREA)
Abstract
Description
技术领域Technical Field
本发明属于新能源电站运维技术领域,具体涉及一种用于新能源电站运维的弱监督目标检测方法及系统。The present invention belongs to the technical field of operation and maintenance of new energy power stations, and specifically relates to a weakly supervised target detection method and system for operation and maintenance of new energy power stations.
背景技术Background Art
本部分的陈述仅仅是提供了与本发明相关的背景技术信息,不必然构成在先技术。The statements in this section merely provide background information related to the present invention and do not necessarily constitute prior art.
在光伏、风电等新能源电站运维中,因物理环境复杂,存在着设备巡检定位不准确、人工操作不规范等问题;同时,新能源电站也存在着复杂目标的数量多和尺寸小问题。因此,为高效、合理地对电站设备进行监控和故障排查,对这种复杂性场景的目标检测和识别非常重要。In the operation and maintenance of photovoltaic, wind power and other new energy power stations, due to the complex physical environment, there are problems such as inaccurate equipment inspection positioning and irregular manual operation; at the same time, new energy power stations also have problems with the large number and small size of complex targets. Therefore, in order to efficiently and reasonably monitor and troubleshoot power station equipment, target detection and recognition in such complex scenes is very important.
据发明人了解,现有的用于新能源电站运维的目标检测较多采用基于监督的监测方法,但是这种方法在训练的过程中往往比较依赖于大量的数据注释(比如:对象边界框)。在复杂的电站环境中,获得大量的标注数据信费时费力;而弱监督目标检测可解决这一难题。但是,现有的弱监督目标检测方法使用多实例学习的方法来进行弱监督目标检测,存在生成对象边界框冗余,导致推理速度慢,且易将检测重点放在突出最具判别性的区域(例如:动物的头部)而不是整个目标等明显缺点。可通过采用Transformer的弱监督目标检测在一定程度上缓解将检测重点放在突出最具判别性的区域这一难题,但是,极易忽略局部细节问题和目标检测过程中所存在的小目标。According to the inventors, the existing target detection methods used for the operation and maintenance of new energy power stations mostly adopt supervision-based monitoring methods, but this method is often more dependent on a large amount of data annotations (such as object bounding boxes) during the training process. In a complex power station environment, obtaining a large amount of labeled data is time-consuming and laborious; weakly supervised target detection can solve this problem. However, the existing weakly supervised target detection method uses a multi-instance learning method to perform weakly supervised target detection, which has obvious disadvantages such as redundant object bounding box generation, slow reasoning speed, and easy to focus detection on highlighting the most discriminative areas (for example: the head of an animal) rather than the entire target. The problem of focusing detection on highlighting the most discriminative areas can be alleviated to a certain extent by using Transformer's weakly supervised target detection, but it is very easy to ignore local details and small targets that exist in the target detection process.
发明内容Summary of the invention
为了解决上述问题,本发明提出了一种用于新能源电站运维的弱监督目标检测方法及系统,为减少在新能源电站运维过程中获取大规模数据标注带来的人力物力成本以最大限度减少人工标注产生错误的可能性,通过基于Transformer的弱监督训练发现新能源电站运维目标以检测局部细节特征,有效解决了必须使用大规模精确的数据标注才能获取新能源电站运维的小目标检测难题,进而有效降低所消耗的人力物力,提高新能源电站运维过程中小目标检测的效率和精确度。In order to solve the above problems, the present invention proposes a weakly supervised target detection method and system for operation and maintenance of new energy power stations. In order to reduce the manpower and material costs brought by obtaining large-scale data annotation during the operation and maintenance of new energy power stations and minimize the possibility of errors in manual annotation, the operation and maintenance targets of new energy power stations are discovered through weakly supervised training based on Transformer to detect local detail features, which effectively solves the problem of small target detection that must use large-scale and accurate data annotation to obtain the operation and maintenance of new energy power stations, thereby effectively reducing the manpower and material resources consumed and improving the efficiency and accuracy of small target detection in the operation and maintenance of new energy power stations.
根据一些实施例,本发明的第一方案提供了一种用于新能源电站运维的弱监督目标检测方法,采用如下技术方案:According to some embodiments, a first solution of the present invention provides a weakly supervised target detection method for operation and maintenance of a new energy power station, which adopts the following technical solution:
一种用于新能源电站运维的弱监督目标检测方法,包括:A weakly supervised target detection method for operation and maintenance of new energy power stations, comprising:
获取新能源电站的运维场景图片;Get pictures of the operation and maintenance scenarios of new energy power stations;
对所获取的运维场景图片进行弱监督训练,生成若干个小目标运维场景的边界框;Perform weak supervision training on the acquired operation and maintenance scene images to generate bounding boxes of several small target operation and maintenance scenes;
基于所获取的运维场景图片构建基于Transformer的目标检测模型;Build a Transformer-based object detection model based on the acquired operation and maintenance scene images;
根据所构建的目标检测模型以及所生成的小目标运维场景的边界框,完成用于新能源电站运维的弱监督目标检测。Based on the constructed target detection model and the generated bounding box of the small target operation and maintenance scenario, weakly supervised target detection for the operation and maintenance of new energy power stations is completed.
作为进一步的技术限定,所构建的基于Transformer的目标检测模型采用基于Transformer的端到端目标检测网络(DETR);根据所得到的若干个小目标运维场景的边界框匹配所述基于Transformer的端到端目标检测网络的损失函数,完成用于新能源电站运维的弱监督目标检测。As a further technical limitation, the constructed Transformer-based target detection model adopts a Transformer-based end-to-end target detection network (DETR); according to the obtained bounding box matching of several small target operation and maintenance scenarios, the loss function of the Transformer-based end-to-end target detection network is matched to complete the weakly supervised target detection for the operation and maintenance of new energy power stations.
作为进一步的技术限定,所构建的基于Transformer的目标检测模型包括用于诱导Transformer模型突出局部视图中弱局部特征的全类别映射模块和局部唤醒模块;基于所述全类别映射模块得到图像块细节线索特征图,基于所述局部唤醒模块得到图像块细节增强特征图MLAM。As a further technical limitation, the constructed Transformer-based object detection model includes a full-category mapping module and a local awakening module for inducing the Transformer model to highlight weak local features in the local view; based on the full-category mapping module, an image block detail clue feature map is obtained. , an image block detail enhancement feature map M LAM is obtained based on the local awakening module.
作为进一步的技术限定,在进行弱监督训练的过程中,将所获取的运维场景图片划分成若干个图像块,感知所划分的若干个图像块在所获取的运维场景图片中的位置信息,计算划分后的若干个图像块的权重,得到注意力热力图。As a further technical limitation, in the process of weakly supervised training, the acquired operation and maintenance scene pictures are divided into several image blocks, the position information of the divided several image blocks in the acquired operation and maintenance scene pictures is perceived, the weights of the divided several image blocks are calculated, and the attention heat map is obtained.
进一步的,对划分后的图像块进行注意力计算,得到注意力矩阵A;根据所得到的注意力矩阵和多层感知机MLP,得到图像块特征图FL,即FL=wi×A,其中Wi为多层感知机中各个神经元的权重;i表示矩阵A中元素的个数,即MLP中神经元的个数。Furthermore, attention calculation is performed on the divided image blocks to obtain the attention matrix A; based on the obtained attention matrix and the multi-layer perceptron MLP, the image block feature map F L is obtained, that is, F L = w i × A, where W i is the weight of each neuron in the multi-layer perceptron; i represents the number of elements in the matrix A, that is, the number of neurons in the MLP.
进一步的,对划分后的图像块进行注意力计算,得到图像块的注意力矩阵,所得到的图像块注意力矩阵包括图像块矩阵和图像块语义感知块矩阵,融合所述图像块矩阵和所述图像块语义感知矩阵,计算注意力矩阵之间的远程依赖关系,得到图像块细节线索特征图;根据所得到的图像块语义感知块矩阵和图像块特征图,得到图像块细节增强特征图MLAM。Further, the divided image blocks are subjected to attention calculation to obtain an attention matrix of the image blocks. The obtained image block attention matrix includes an image block matrix and an image block semantic perception block matrix. The image block matrix and the image block semantic perception matrix are fused to calculate the long-range dependency relationship between the attention matrices to obtain an image block detail clue feature map. ; According to the obtained image block semantic perception block matrix and image block feature map, the image block detail enhancement feature map M LAM is obtained.
进一步的,根据所得到的图像块细节线索特征图和图像块细节增强特征图MLAM,计算图像块细节特征图Mf;将所得到的图像块细节特征图和注意力热力图进行累加,得到含初始边界框的图像块,即得到若干个小目标运维场景的边界框。Further, according to the obtained image block detail clue feature map and the image block detail enhancement feature map M LAM , calculate the image block detail feature map M f ; accumulate the obtained image block detail feature map and the attention heat map to obtain the image block containing the initial bounding box, that is, obtain the bounding boxes of several small target operation and maintenance scenarios.
根据一些实施例,本发明的第二方案提供了一种用于新能源电站运维的弱监督目标检测系统,采用如下技术方案:According to some embodiments, a second solution of the present invention provides a weakly supervised target detection system for operation and maintenance of a new energy power station, which adopts the following technical solution:
一种用于新能源电站运维的弱监督目标检测系统,包括:A weakly supervised target detection system for operation and maintenance of new energy power stations, comprising:
获取模块,其被配置为获取新能源电站的运维场景图片;An acquisition module configured to acquire operation and maintenance scene images of a new energy power station;
生成模块,其被配置为对所获取的运维场景图片进行弱监督训练,生成若干个小目标运维场景的边界框;A generation module is configured to perform weak supervision training on the acquired operation and maintenance scene images to generate bounding boxes of several small target operation and maintenance scenes;
构建模块,其被配置为基于所获取的运维场景图片构建基于Transformer的目标检测模型;A construction module, which is configured to construct a Transformer-based object detection model based on the acquired operation and maintenance scene pictures;
检测模块,其被配置为根据所构建的目标检测模型以及所生成的小目标运维场景的边界框,完成用于新能源电站运维的弱监督目标检测。The detection module is configured to complete weakly supervised target detection for operation and maintenance of new energy power stations based on the constructed target detection model and the generated bounding box of the small target operation and maintenance scenario.
根据一些实施例,本发明的第三方案提供了一种计算机可读存储介质,采用如下技术方案:According to some embodiments, a third solution of the present invention provides a computer-readable storage medium, which adopts the following technical solution:
一种计算机可读存储介质,其上存储有程序,该程序被处理器执行时实现如本发明第一方案所述的用于新能源电站运维的弱监督目标检测方法中的步骤。A computer-readable storage medium stores a program thereon, which, when executed by a processor, implements the steps of a weakly supervised target detection method for operation and maintenance of a new energy power station as described in the first solution of the present invention.
根据一些实施例,本发明的第四方案提供了一种电子设备,采用如下技术方案:According to some embodiments, a fourth solution of the present invention provides an electronic device, which adopts the following technical solution:
一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的程序,所述处理器执行所述程序时实现如本发明第一方案所述的用于新能源电站运维的弱监督目标检测方法中的步骤。An electronic device comprises a memory, a processor and a program stored in the memory and executable on the processor, wherein when the processor executes the program, the steps in the weakly supervised target detection method for operation and maintenance of new energy power stations as described in the first solution of the present invention are implemented.
与现有技术相比,本发明的有益效果为:Compared with the prior art, the present invention has the following beneficial effects:
本发明可以更好地应对弱监督目标检测问题,使训练数据不再依赖精确标注的数据集,采用任意仅包含类别信息的图像数据进行模型的训练;结合全局信息和局部细节信息,获得了更良好的表征;考虑到生成对象边界框冗余和推理速度问题,使用DETR检测器以一对一匹配的方式预测对象的位置信息,解决了生成对象边界框冗余和推理速度慢的问题;考虑如何使用Transformer模型中,图像块之间的交互和细节信息,在语义感知块的指导下,融合图像块内部的细节信息和图像块-语义感知块的全局信息。The present invention can better cope with the problem of weakly supervised target detection, so that the training data no longer depends on the accurately labeled data set, and any image data containing only category information can be used to train the model; global information and local detail information are combined to obtain a better representation; considering the redundancy of generating object bounding boxes and the problem of reasoning speed, the DETR detector is used to predict the location information of the object in a one-to-one matching manner, which solves the problem of redundant generating object bounding boxes and slow reasoning speed; considering how to use the interaction and detail information between image blocks in the Transformer model, under the guidance of the semantic perception block, the detail information inside the image block and the global information of the image block-semantic perception block are fused.
本发明通过基于Transformer的弱监督训练发现新能源电站运维目标以检测局部细节特征,有效解决了必须使用大规模精确的数据标注才能获取新能源电站运维的小目标检测难题,进而有效降低所消耗的人力物力,提高新能源电站运维过程中小目标检测的效率和精确度。The present invention discovers the operation and maintenance targets of new energy power stations through weakly supervised training based on Transformer to detect local detail features, effectively solving the problem of small target detection that can only be obtained by using large-scale and accurate data annotation for the operation and maintenance of new energy power stations, thereby effectively reducing the manpower and material resources consumed and improving the efficiency and accuracy of small target detection in the operation and maintenance process of new energy power stations.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
构成本实施例的一部分的说明书附图用来提供对本实施例的进一步理解,本实施例的示意性实施例及其说明用于解释本实施例,并不构成对本实施例的不当限定。The drawings in the specification that constitute a part of this embodiment are used to provide a further understanding of this embodiment. The schematic embodiments of this embodiment and their descriptions are used to explain this embodiment and do not constitute improper limitations on this embodiment.
图1为本发明实施例一中的用于新能源电站运维的弱监督目标检测方法的流程图;FIG1 is a flow chart of a weakly supervised target detection method for operation and maintenance of a new energy power station in Embodiment 1 of the present invention;
图2为本发明实施例一中的用于新能源电站运维的弱监督目标检测方法的基本框架图;FIG2 is a basic framework diagram of a weakly supervised target detection method for operation and maintenance of a new energy power station in Embodiment 1 of the present invention;
图3为本发明实施例一中的全类别映射模块的基本架构图;FIG3 is a basic architecture diagram of a full-category mapping module in Embodiment 1 of the present invention;
图4为本发明实施例一中的局部唤醒模块的基本架构图;FIG4 is a basic architecture diagram of a local wake-up module in Embodiment 1 of the present invention;
图5为本发明实施例一中的弱监督训练的流程示意图;FIG5 is a schematic diagram of a process of weakly supervised training in Embodiment 1 of the present invention;
图6为传统的有监督训练的流程示意图;FIG6 is a schematic diagram of a traditional supervised training process;
图7为本发明实施例二中的用于新能源电站运维的弱监督目标检测系统的结构框图。FIG7 is a structural block diagram of a weakly supervised target detection system for operation and maintenance of a new energy power station in Embodiment 2 of the present invention.
具体实施方式DETAILED DESCRIPTION
下面结合附图与实施例对本发明作进一步说明。The present invention will be further described below in conjunction with the accompanying drawings and embodiments.
应该指出,以下详细说明都是示例性的,旨在对本申请提供进一步的说明。除非另有指明,本文使用的所有技术和科学术语具有与本申请所属技术领域的普通技术人员通常理解的相同含义。It should be noted that the following detailed descriptions are exemplary and are intended to provide further explanation of the present application. Unless otherwise specified, all technical and scientific terms used herein have the same meanings as those commonly understood by those skilled in the art to which the present application belongs.
需要注意的是,这里所使用的术语仅是为了描述具体实施方式,而非意图限制根据本发明的示例性实施方式。如在这里所使用的,除非上下文另外明确指出,否则单数形式也意图包括复数形式,此外,还应当理解的是,当在本说明书中使用术语“包含”和/或“包括”时,其指明存在特征、步骤、操作、器件、组件和/或它们的组合。It should be noted that the terms used herein are only for describing specific embodiments and are not intended to limit exemplary embodiments according to the present invention. As used herein, unless the context clearly indicates otherwise, the singular form is also intended to include the plural form. In addition, it should be understood that when the terms "comprising" and/or "including" are used in this specification, it indicates the presence of features, steps, operations, devices, components and/or combinations thereof.
在本发明中,术语如“上”、“下”、“左”、“右”、“前”、“后”、“竖直”、“水平”、“侧”、“底”等指示的方位或位置关系为基于附图所示的方位或位置关系,只是为了便于叙述本发明各部件或元件结构关系而确定的关系词,并非特指本发明中任一部件或元件,不能理解为对本发明的限制。In the present invention, terms such as "upper", "lower", "left", "right", "front", "back", "vertical", "horizontal", "side", "bottom" and the like indicate directions or positional relationships based on the directions or positional relationships shown in the accompanying drawings. They are relational words determined only for the convenience of describing the structural relationships of the various parts or elements of the present invention, and do not specifically refer to any part or element in the present invention and should not be understood as limitations on the present invention.
本发明中,术语如“固接”、“相连”、“连接”等应做广义理解,表示可以是固定连接,也可以是一体地连接或可拆卸连接;可以是直接相连,也可以通过中间媒介间接相连。对于本领域的相关科研或技术人员,可以根据具体情况确定上述术语在本实发明中的具体含义,不能理解为对本发明的限制。In the present invention, terms such as "fixed connection", "connected", "connection", etc. should be understood in a broad sense, indicating that it can be fixedly connected, integrally connected or detachably connected; it can be directly connected or indirectly connected through an intermediate medium. For relevant scientific research or technical personnel in this field, the specific meanings of the above terms in the present invention can be determined according to specific circumstances, and they cannot be understood as limitations on the present invention.
在不冲突的情况下,本发明中的实施例及实施例中的特征可以相互组合。In the absence of conflict, the embodiments of the present invention and the features of the embodiments may be combined with each other.
实施例一Embodiment 1
本发明实施例一介绍了一种用于新能源电站运维的弱监督目标检测方法。Embodiment 1 of the present invention introduces a weakly supervised target detection method for operation and maintenance of a new energy power station.
如图1所示的一种用于新能源电站运维的弱监督目标检测方法,包括:As shown in FIG1 , a weakly supervised target detection method for operation and maintenance of a new energy power station includes:
获取新能源电站的运维场景图片;Get pictures of the operation and maintenance scenarios of new energy power stations;
对所获取的运维场景图片进行弱监督训练,生成若干个小目标运维场景的边界框;Perform weak supervision training on the acquired operation and maintenance scene images to generate bounding boxes of several small target operation and maintenance scenes;
基于所获取的运维场景图片构建基于Transformer的目标检测模型;Build a Transformer-based object detection model based on the acquired operation and maintenance scene images;
根据所构建的目标检测模型以及所生成的小目标运维场景的边界框,完成用于新能源电站运维的弱监督目标检测。Based on the constructed target detection model and the generated bounding box of the small target operation and maintenance scenario, weakly supervised target detection for the operation and maintenance of new energy power stations is completed.
针对弱监督训练的方式容易将检测重点放在突出最具判别性的区域,本实施例使用具有全局信息建模能力的Transformer模型通过端到端的弱监督训练发现感兴趣的整个目标;利用Transformer的注意力机制生成高质量的稀疏边界框,训练DETR检测器以一对一匹配的方式预测对象的位置信息,解决了生成对象边界框冗余和推理速度慢的问题。本实施例通过全类别映射模块和局部唤醒模块,诱导Transformer模型突出局部视图中的弱局部特征,从而保留局部细节,解决了以往极少数使用Transformer进行弱监督目标检测的方法导致的忽略了局部细节问题。In view of the fact that weakly supervised training methods tend to focus detection on highlighting the most discriminative areas, this embodiment uses a Transformer model with global information modeling capabilities to discover the entire target of interest through end-to-end weakly supervised training; the Transformer's attention mechanism is used to generate high-quality sparse bounding boxes, and the DETR detector is trained to predict the location information of the object in a one-to-one matching manner, solving the problems of redundant object bounding box generation and slow reasoning speed. This embodiment uses a full category mapping module and a local awakening module to induce the Transformer model to highlight weak local features in the local view, thereby retaining local details, solving the problem of ignoring local details caused by the very few previous methods of using Transformers for weakly supervised target detection.
本实施例采用如图2所示的基于新能源电站运维的弱监督目标检测方法的基本框架图,基于局部细节感知Transformer实现新能源电站运维的弱监督目标检测;采用基于Transformer的端到端目标检测网络(DETR);根据所得到的若干个小目标运维场景的边界框匹配所述基于Transformer的端到端目标检测网络的损失函数,完成用于新能源电站运维的弱监督目标检测。This embodiment adopts the basic framework diagram of the weakly supervised target detection method based on the operation and maintenance of new energy power stations as shown in Figure 2, and implements the weakly supervised target detection of the operation and maintenance of new energy power stations based on the local detail perception Transformer; adopts the Transformer-based end-to-end target detection network (DETR); matches the loss function of the Transformer-based end-to-end target detection network according to the obtained bounding boxes of several small target operation and maintenance scenes, and completes the weakly supervised target detection for the operation and maintenance of new energy power stations.
本实施例在匹配过程遵循DETR中的匈牙利匹配算法,所采用的损失函数为L1损失函数和GIoU损失函数;具体的,从多个候选框中找出与生成的边界框中相似度较高(即两种框的重叠面积尽可能最大)的边界框;利用GIoU损失函数和L1损失函数,计算每个候选框与生成边界框重叠面积最大的边界框,保留所得到的重叠面积最大的边界框,所保留的边界框即为匹配成功的边界框;其中,GIoU损失函数为其中,I表示两个框相交的面积,A1表示一个候选框的面积,A2表示一个生成框的面积,Ac表示两个框最小外接矩形的面积;L1损失函数为L1loss=|Box1-Box2|,其中,BOX1和BOX2包含一个框的四个顶点的坐标;||表示绝对值;L1loss又叫绝对值误差。In the matching process of this embodiment, the Hungarian matching algorithm in DETR is followed, and the loss functions adopted are L1 loss function and GIoU loss function. Specifically, a bounding box with a high similarity to the generated bounding box (i.e., the overlapping area of the two boxes is as large as possible) is found from multiple candidate boxes. The bounding box with the largest overlapping area between each candidate box and the generated bounding box is calculated by using GIoU loss function and L1 loss function, and the bounding box with the largest overlapping area is retained. The retained bounding box is the bounding box that is successfully matched. Among them, the GIoU loss function is Among them, I represents the area of the intersection of two boxes, A1 represents the area of a candidate box, A2 represents the area of a generated box, and Ac represents the area of the minimum circumscribed rectangle of the two boxes; the L1 loss function is L1loss=|Box1-Box2|, where BOX1 and BOX2 contain the coordinates of the four vertices of a box; || represents the absolute value; L1loss is also called absolute value error.
本实施例所构建的基于Transformer的目标检测模型包括用于诱导Transformer模型突出局部视图中弱局部特征的全类别映射模块和局部唤醒模块;基于所述全类别映射模块得到图像块细节线索特征图M*r,基于所述局部唤醒模块得到图像块细节增强特征图MLAM。The Transformer-based object detection model constructed in this embodiment includes a full-category mapping module and a local awakening module for inducing the Transformer model to highlight weak local features in the local view; based on the full-category mapping module, an image block detail clue feature map M*r is obtained, and based on the local awakening module, an image block detail enhancement feature map M LAM is obtained.
作为一种或多种实施方式,如图2所示,本实施例将从Transformer骨干网输出的图像块和自主添加的语义感知块拼接之后,输入注意力块中,使语义感知块能够很好的融合图像块包含的类别和位置信息,以帮助识别同一图像中出现的不同类别;提取经注意力计算产生的权重,生成注意力热力图;提取经过注意力计算后的注意力矩阵Al。最终,注意力矩阵Al经过两个FFN计算分类损失,经过一个MLP产生特征图FL输入到局部唤醒模块中进行下一步计算;本实施例感知了不同目标在图像中的位置信息,挖掘了图像块和语义感知块中的共同语义信息。As one or more implementation methods, as shown in FIG2, this embodiment splices the image block output from the Transformer backbone network and the self-added semantic perception block, and then inputs it into the attention block, so that the semantic perception block can well integrate the category and position information contained in the image block to help identify different categories appearing in the same image; extract the weights generated by the attention calculation to generate the attention heat map; extract the attention matrix A l after the attention calculation. Finally, the attention matrix A l is calculated through two FFNs for classification loss, and a MLP generates a feature map FL , which is input into the local awakening module for the next step of calculation; this embodiment perceives the position information of different targets in the image and mines the common semantic information in the image block and the semantic perception block.
作为一种或多种实施方式,注意力机制计算的是语义感知块与图像块之间的全局交互作用,但忽略了图像块内部的相关性,一定程度上损失了部分细节信息。如图3所示,本实施例中的全类别映射模块重点关注图像块内部的细节信息,并用更多的局部线索对全局表示进行建模。As one or more implementations, the attention mechanism calculates the global interaction between the semantic perception block and the image block, but ignores the correlation within the image block, and loses some detail information to a certain extent. As shown in Figure 3, the full category mapping module in this embodiment focuses on the detail information within the image block and uses more local clues to model the global representation.
具体的,在分类模块中提取经过注意力计算后的注意力矩阵Al,Al中包含了含有图像块的矩阵和语义感知块矩阵。对于,它揭示了语义感知块与所有图像块之间的全局交互作用,忽略了每个图像块之间的相关性。而则反映了图像块之间的局部关系。为了充分利用这两种矩阵的关系,使用对应元素相乘来融合这两种矩阵,得到图像块细节线索特征图,即获得了注意力矩阵Al之间的远程依赖关系。Specifically, in the classification module, the attention matrix A l is extracted after attention calculation, and A l contains the matrix containing the image block and the semantically aware block matrix .for , which reveals the global interaction between the semantic-aware patch and all image patches, ignoring the correlation between each image patch. It reflects the local relationship between image blocks. In order to make full use of the relationship between the two matrices, the corresponding elements are multiplied to fuse the two matrices to obtain the image block detail clue feature map , that is, the long-range dependency between the attention matrix A l is obtained.
需要说明的是,在矩阵融合的过程中所采用的方法是元素相乘法,如矩阵矩阵矩阵A和矩阵B融合所得到的矩阵C为 It should be noted that the method used in the process of matrix fusion is element multiplication. matrix The matrix C obtained by fusing matrix A and matrix B is
作为一种或多种实施方式,局部唤醒模块利用局部特征引导模型的学习趋势,突出局部弱响应,缓解了Transformer模型易忽略局部细节问题。局部唤醒模块使用基于池化的方法来掩盖最显著的区域,从而允许模型进行更深入的细节探索。As one or more implementations, the local awakening module uses local features to guide the learning trend of the model, highlighting local weak responses, and alleviating the problem that the Transformer model easily ignores local details. The local awakening module uses a pooling-based method to mask the most significant areas, allowing the model to explore more in-depth details.
如图4所示,将语义感知块矩阵经过转置为并将与分类模块输出的特征图FL经过对应元素相乘和卷积后得到随后,为了消除显著区域,对应用了两个单独的运算符,即全局平均池化(GAP)和全局最大池化(GMP)。池化后,对它们求和,然后应用Sigmoid函数得到一个加权的卷积核。应用该卷积核对FL进行和卷积操作,得到最终的细节特征图MLAM;细节特征图MLAM包含了一张图片中所有的弱响应区域(如:物体的边界)。As shown in Figure 4, the semantic perception block matrix After transposing and will The feature map F L output by the classification module is obtained by multiplying and convolving the corresponding elements Then, in order to eliminate the salient areas, Two separate operators are applied, namely global average pooling (GAP) and global maximum pooling (GMP). After pooling, they are summed and then a weighted convolution kernel is obtained by applying the sigmoid function. The convolution kernel is applied to the F L and convolution operation to obtain the final detail feature map M LAM ; the detail feature map M LAM contains all the weak response areas in an image (such as the boundaries of objects).
作为一种或多种实施方式,得到全类别映射模块和局部唤醒模块的输出和MLAM后,和MLAM先进行对应元素相乘获得Mf,再将其与注意力热力图进行对应元素相加后得到含初始边界框的图像块,即得到若干个小目标运维场景的边界框。As one or more implementations, the outputs of the full category mapping module and the local wake-up module are obtained. After M LAM , First, the corresponding elements of M f are multiplied with M LAM to obtain M f , and then the corresponding elements of M f are added with the attention heat map to obtain the image block containing the initial bounding box, that is, the bounding boxes of several small target operation and maintenance scenes are obtained.
需要说明的事,本实施例中所采用的弱监督训练方式只有类别标注,没有边界框标注;但训练目标检测器必须有物体边界框进行监督,如图5所示。如图6所示的有监督目标检测,我必须手动在新能源运维场景的数据中标注物体边界框,但输入数据只包含类别信息,并不包含物体边界框信息)。It should be noted that the weakly supervised training method used in this embodiment only has category annotation, but no bounding box annotation; however, the training target detector must be supervised by the object bounding box, as shown in Figure 5. For supervised target detection as shown in Figure 6, I must manually annotate the object bounding box in the data of the new energy operation and maintenance scene, but the input data only contains category information, not object bounding box information).
由于传统的Transformer容易忽略局部细节,导致在生成物体边界框的时候容易忽略一些小目标(等同于生成目标框质量不高),这不符合新能源运维场景下小目标多的检测需求。因此,本实施例通过全类别映射模块和局部唤醒模块为模型增加了局部细节信息,生成物体边界框时能够为小目标生成物体边界框。Since traditional Transformers tend to ignore local details, some small targets are easily ignored when generating object bounding boxes (equivalent to low quality of generated target boxes), which does not meet the detection requirements of many small targets in new energy operation and maintenance scenarios. Therefore, this embodiment adds local detail information to the model through the full category mapping module and the local wake-up module, and can generate object bounding boxes for small targets when generating object bounding boxes.
本实施例充分考虑到新能源运维场景下复杂的场景问题,通过简单的技术极大减少了人为标注新能源运维场景数据产生的时间成本,同时一定程度上避免了人为标注出错导致的目标检测器检测出错问题;通过基于Transformer的弱监督训练发现新能源电站运维目标以检测局部细节特征,有效解决了必须使用大规模精确的数据标注才能获取新能源电站运维的小目标检测难题,进而有效降低所消耗的人力物力,提高新能源电站运维过程中小目标检测的效率和精确度。This embodiment fully takes into account the complex scenario problems under the new energy operation and maintenance scenario, and greatly reduces the time cost of manually labeling the new energy operation and maintenance scenario data through simple technology, while avoiding the problem of target detector detection errors caused by manual labeling errors to a certain extent; through Transformer-based weak supervision training, the new energy power station operation and maintenance targets are discovered to detect local detail features, which effectively solves the problem of small target detection that can only be obtained by large-scale and accurate data labeling for new energy power station operation and maintenance, thereby effectively reducing the manpower and material resources consumed, and improving the efficiency and accuracy of small target detection in the operation and maintenance process of new energy power stations.
本实施例可更好地应对弱监督目标检测问题,使训练数据不再依赖精确标注的数据集,采用任意仅包含类别信息的图像数据进行模型的训练;结合全局信息和局部细节信息,获得了更良好的表征;考虑到生成对象边界框冗余和推理速度问题,使用DETR检测器以一对一匹配的方式预测对象的位置信息,解决了生成对象边界框冗余和推理速度慢的问题;考虑如何使用Transformer模型中,图像块之间的交互和细节信息,在语义感知块的指导下,融合图像块内部的细节信息和图像块-语义感知块的全局信息。This embodiment can better cope with the problem of weakly supervised target detection, so that the training data no longer depends on the accurately labeled data set, and any image data containing only category information can be used to train the model; global information and local detail information are combined to obtain a better representation; considering the redundancy of generating object bounding boxes and the problem of reasoning speed, the DETR detector is used to predict the location information of the object in a one-to-one matching manner, which solves the problem of redundant generation of object bounding boxes and slow reasoning speed; consider how to use the interaction and detail information between image blocks in the Transformer model, and under the guidance of the semantic perception block, fuse the detail information inside the image block and the global information of the image block-semantic perception block.
实施例二Embodiment 2
本发明实施例二介绍了一种用于新能源电站运维的弱监督目标检测系统。Embodiment 2 of the present invention introduces a weakly supervised target detection system for operation and maintenance of new energy power stations.
如图7所示的一种用于新能源电站运维的弱监督目标检测系统,包括:As shown in FIG7 , a weakly supervised target detection system for operation and maintenance of a new energy power station includes:
获取模块,其被配置为获取新能源电站的运维场景图片;An acquisition module configured to acquire operation and maintenance scene images of a new energy power station;
生成模块,其被配置为对所获取的运维场景图片进行弱监督训练,生成若干个小目标运维场景的边界框;A generation module is configured to perform weak supervision training on the acquired operation and maintenance scene images to generate bounding boxes of several small target operation and maintenance scenes;
构建模块,其被配置为基于所获取的运维场景图片构建基于Transformer的目标检测模型;A construction module, which is configured to construct a Transformer-based object detection model based on the acquired operation and maintenance scene pictures;
检测模块,其被配置为根据所构建的目标检测模型以及所生成的小目标运维场景的边界框,完成用于新能源电站运维的弱监督目标检测。The detection module is configured to complete weakly supervised target detection for operation and maintenance of new energy power stations based on the constructed target detection model and the generated bounding box of the small target operation and maintenance scenario.
详细步骤与实施例一提供的用于新能源电站运维的弱监督目标检测方法相同,在此不再赘述。The detailed steps are the same as the weakly supervised target detection method for operation and maintenance of new energy power stations provided in Example 1, and will not be repeated here.
实施例三Embodiment 3
本发明实施例三提供了一种计算机可读存储介质。Embodiment 3 of the present invention provides a computer-readable storage medium.
一种计算机可读存储介质,其上存储有程序,该程序被处理器执行时实现如本发明实施例一所述的用于新能源电站运维的弱监督目标检测方法中的步骤。A computer-readable storage medium stores a program thereon, which, when executed by a processor, implements the steps of the weakly supervised target detection method for operation and maintenance of a new energy power station as described in Embodiment 1 of the present invention.
详细步骤与实施例一提供的用于新能源电站运维的弱监督目标检测方法相同,在此不再赘述。The detailed steps are the same as the weakly supervised target detection method for operation and maintenance of new energy power stations provided in Example 1, and will not be repeated here.
实施例四Embodiment 4
本发明实施例四提供了一种电子设备。A fourth embodiment of the present invention provides an electronic device.
一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的程序,所述处理器执行所述程序时实现如本发明实施例一所述的用于新能源电站运维的弱监督目标检测方法中的步骤。An electronic device comprises a memory, a processor and a program stored in the memory and executable on the processor, wherein when the processor executes the program, the steps in the weakly supervised target detection method for operation and maintenance of a new energy power station as described in the first embodiment of the present invention are implemented.
详细步骤与实施例一提供的用于新能源电站运维的弱监督目标检测方法相同,在此不再赘述。The detailed steps are the same as the weakly supervised target detection method for operation and maintenance of new energy power stations provided in Example 1, and will not be repeated here.
以上所述仅为本实施例的优选实施例而已,并不用于限制本实施例,对于本领域的技术人员来说,本实施例可以有各种更改和变化。凡在本实施例的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本实施例的保护范围之内。The above description is only a preferred embodiment of the present embodiment and is not intended to limit the present embodiment. For those skilled in the art, the present embodiment may have various modifications and variations. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present embodiment shall be included in the protection scope of the present embodiment.
Claims (7)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202410111517.1A CN117911954B (en) | 2024-01-25 | 2024-01-25 | Weak supervision target detection method and system for operation and maintenance of new energy power station |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202410111517.1A CN117911954B (en) | 2024-01-25 | 2024-01-25 | Weak supervision target detection method and system for operation and maintenance of new energy power station |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN117911954A CN117911954A (en) | 2024-04-19 |
| CN117911954B true CN117911954B (en) | 2024-08-09 |
Family
ID=90694932
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202410111517.1A Active CN117911954B (en) | 2024-01-25 | 2024-01-25 | Weak supervision target detection method and system for operation and maintenance of new energy power station |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN117911954B (en) |
Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN114241413A (en) * | 2021-12-16 | 2022-03-25 | 国网河南省电力公司电力科学研究院 | Substation multi-target detection method based on attention mechanism and feature balance |
| CN115359254A (en) * | 2022-07-25 | 2022-11-18 | 华南理工大学 | Vision transform network-based weak supervision instance segmentation method, system and medium |
Family Cites Families (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2017123358A2 (en) * | 2015-12-09 | 2017-07-20 | Flir Systems, Inc. | Unmanned aerial system based thermal imaging and aggregation systems and methods |
| US9811765B2 (en) * | 2016-01-13 | 2017-11-07 | Adobe Systems Incorporated | Image captioning with weak supervision |
| US10198671B1 (en) * | 2016-11-10 | 2019-02-05 | Snap Inc. | Dense captioning with joint interference and visual context |
| EP4330859B1 (en) * | 2021-04-28 | 2025-08-20 | Bayer Aktiengesellschaft | Method and apparatus for processing of multi-modal data |
| CN114882340B (en) * | 2022-04-15 | 2024-09-24 | 西安电子科技大学 | Weakly supervised object detection method based on bounding box regression |
| CN116958742B (en) * | 2023-07-07 | 2025-10-10 | 复旦大学 | A weakly supervised small sample target detection system and method based on positioning pre-training |
-
2024
- 2024-01-25 CN CN202410111517.1A patent/CN117911954B/en active Active
Patent Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN114241413A (en) * | 2021-12-16 | 2022-03-25 | 国网河南省电力公司电力科学研究院 | Substation multi-target detection method based on attention mechanism and feature balance |
| CN115359254A (en) * | 2022-07-25 | 2022-11-18 | 华南理工大学 | Vision transform network-based weak supervision instance segmentation method, system and medium |
Also Published As
| Publication number | Publication date |
|---|---|
| CN117911954A (en) | 2024-04-19 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN113609896B (en) | Object-level Remote Sensing Change Detection Method and System Based on Dual Correlation Attention | |
| CN114863348B (en) | Video target segmentation method based on self-supervision | |
| CN112837315B (en) | A method for detecting defects in transmission line insulators based on deep learning | |
| CN114676776B (en) | Fine-grained image classification method based on Transformer | |
| CN115830392B (en) | Student behavior recognition method based on improved YOLOv5 | |
| CN111325347A (en) | Automatic danger early warning description generation method based on interpretable visual reasoning model | |
| CN113033520A (en) | Tree nematode disease wood identification method and system based on deep learning | |
| CN116258937A (en) | Small sample segmentation method, device, terminal and medium based on attention mechanism | |
| CN110399518A (en) | A Visual Question Answering Enhancement Method Based on Graph Convolution | |
| Lin et al. | Deep structured scene parsing by learning with image descriptions | |
| CN105389589A (en) | Random-forest-regression-based rib detection method of chest X-ray film | |
| Zhou et al. | Urbench: A comprehensive benchmark for evaluating large multimodal models in multi-view urban scenarios | |
| CN118609172B (en) | A motion posture recognition model and evaluation method | |
| Qin et al. | PointSkelCNN: Deep learning‐based 3D human skeleton extraction from point clouds | |
| CN116010578A (en) | Answer positioning method and device based on weak supervision double-flow visual language interaction | |
| CN120448563A (en) | A semantic understanding-driven cross-modal information fusion and retrieval method and system | |
| CN117671426A (en) | Concept distillation and CLIP-based hintable segmentation model pre-training method and system | |
| Zhao et al. | Rethinking two-stage referring expression comprehension: A novel grounding and segmentation method modulated by point | |
| Liu et al. | D-vpnet: A network for real-time dominant vanishing point detection in natural scenes | |
| CN118799716A (en) | Crab detection and counting method, device, medium and product based on instance segmentation | |
| Yang et al. | Topo2seq: Enhanced topology reasoning via topology sequence learning | |
| CN117911954B (en) | Weak supervision target detection method and system for operation and maintenance of new energy power station | |
| Wang et al. | Semantic segmentation of fire and smoke images based on dual attention mechanism | |
| CN115147922A (en) | Monocular pedestrian detection method, system, device and medium based on embedded platform | |
| CN113920424A (en) | Method and device for extracting visual objects of power transformation inspection robot |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |