CN120219818A - Automatic labeling method, system, storage medium and electronic device for small target detection data - Google Patents
Automatic labeling method, system, storage medium and electronic device for small target detection data Download PDFInfo
- Publication number
- CN120219818A CN120219818A CN202510279404.7A CN202510279404A CN120219818A CN 120219818 A CN120219818 A CN 120219818A CN 202510279404 A CN202510279404 A CN 202510279404A CN 120219818 A CN120219818 A CN 120219818A
- Authority
- CN
- China
- Prior art keywords
- field
- target detection
- narrow
- wide
- view
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/30—Determination of transform parameters for the alignment of images, i.e. image registration
- G06T7/33—Determination of transform parameters for the alignment of images, i.e. image registration using feature-based methods
- G06T7/337—Determination of transform parameters for the alignment of images, i.e. image registration using feature-based methods involving reference images or patches
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/25—Determination of region of interest [ROI] or a volume of interest [VOI]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20092—Interactive image processing based on input by user
- G06T2207/20104—Interactive definition of region of interest [ROI]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/07—Target detection
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Multimedia (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Medical Informatics (AREA)
- Databases & Information Systems (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- General Engineering & Computer Science (AREA)
- Molecular Biology (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses an automatic labeling method, a system, a storage medium and electronic equipment for small target detection data, wherein the method comprises the steps of utilizing a target detection model to carry out target detection on an acquired narrow-view field camera image to obtain a large target detection boundary frame of the narrow-view field image, carrying out image registration on a wide-and narrow-view field camera image to obtain a homography matrix of the narrow-view field image transformed to the wide-view field image, the registered narrow-view field camera image and a small target boundary frame with a wide view field before adjustment, adjusting the position of the small target boundary frame with the wide view field to obtain a small target boundary frame with accurate labeling, and training the target detection model by utilizing the small target boundary frame with accurate labeling to obtain a trained small target detection model. The method improves the labeling precision of the small target detection data and reduces labeling deletion.
Description
Technical Field
The invention belongs to the technical field of image processing, and particularly relates to an automatic labeling method, an automatic labeling system, a storage medium and electronic equipment for small target detection data.
Background
Object detection is an important task in computer vision, aimed at identifying and locating objects in images or videos. Along with the rapid development of deep learning technology, target detection has made remarkable progress, and is widely applied to multiple fields of monitoring, automatic driving, unmanned aerial vehicle analysis, robot vision and the like. In a high-altitude security monitoring scene, target detection is generally directly performed by a wide-field camera, and because the camera installation height is high and the ground resolution of a wide-field camera image is low, the size of a target in the image is extremely small, and accurate positioning and labeling of a small target still remains a challenge.
The existing small target labeling means still adopts manual labeling, and the data set obtained by manual labeling often has the problems of small target labeling deficiency or inaccurate labeling and the like, which directly affects the performance of a small target detector.
In the aspect of large open vocabulary target detection model, a data intelligent labeling method and device are disclosed in a patent with a publication number of CN118298250A, in the scheme, the large open vocabulary target detection model is used for labeling targets, but the method is only good in labeling effect on large targets in high-ground-resolution images, and is difficult to label on small targets in low-ground-resolution images.
Therefore, a labeling scheme of small target detection data is needed to be provided, so that the problems of small target labeling deficiency, inaccurate labeling and the like existing in the existing small target detection data labeling scheme are solved.
Disclosure of Invention
The invention mainly aims to provide an automatic labeling method, an automatic labeling system, a storage medium and electronic equipment for small target detection data based on wide-narrow view field camera image registration and an open vocabulary target detection model, so as to improve the labeling precision of the small target detection data and reduce labeling deletion.
In order to achieve the aim of the invention, the technical scheme adopted by the invention comprises an automatic labeling method of small target detection data, comprising the following steps:
s1, performing target detection on an acquired narrow-view-field camera image by using a target detection model to obtain a large target detection boundary frame of the narrow-view-field image;
S2, performing image registration on the wide-and-narrow-field camera images to obtain a homography matrix of transforming the narrow-field image into the wide-field image, the registered narrow-field camera images and a small-target boundary frame with a wide field before adjustment;
s3, adjusting the position of the wide-view-field small target boundary frame to obtain a small target boundary frame with accurate marking;
And S4, training the target detection model by using the precisely marked small target boundary box to obtain a trained small target detection model.
In a preferred embodiment, in the step S1, the object detection model is an open vocabulary object detection model, and/or the step S1 includes:
S11, extracting word vectors from the vocabulary by using a BERT model;
S12, inputting the word vector and the narrow-field-of-view camera image into a target detection model for feature fusion;
s13, the target detection model infers the large target detection boundary box of the narrow-view-field image.
In a preferred embodiment, in the step S2, the image registration is performed on the wide and narrow field of view camera image by using an image registration method, and/or the step S2 includes:
S21, extracting characteristic points of the wide and narrow view field camera images;
S22, matching the characteristic points to obtain a plurality of matching pairs;
s23, eliminating mismatching points, and calculating the homography matrix from the narrow-view-field image to the wide-view-field image;
S24, transforming the narrow-field-of-view camera image to a wide-field-of-view camera image by using the homography matrix to obtain the registered narrow-field-of-view camera image;
s25, transforming the large target detection boundary frame of the narrow-view-field image to the wide-view-field camera image by utilizing the homography matrix to obtain the small target boundary frame of the wide view field.
In a preferred embodiment, in S21, the feature points of the narrow-field-of-view camera image are extracted by using SuperPoint, and/or in S22, the feature points are matched by using LightGlue, and/or in S23, the mismatching points are removed by using a random sampling consistency method.
In a preferred embodiment, in the step S3, using a target tracking method, the registered narrow-field camera image, the wide-field camera image, and the wide-field small target bounding box obtained by image registration in the step S2 are taken as inputs, and the position of the wide-field small target bounding box is iteratively adjusted to generate a precisely labeled small target bounding box, and/or the step S3 includes:
S31, respectively extracting features of the registered narrow-view-field camera image and the wide-view-field small target boundary frame;
s32, calculating a response chart of the extracted features according to a kernel correlation filtering method in the target tracking method;
And S33, taking the largest part in the response diagram as a position adjustment result of the wide-field small target bounding box.
In a preferred embodiment, in S31, the registered narrow-field camera image and the wide-field small target bounding box are multiplied, and HOG feature extraction is performed, and/or, in S32, the features extracted in step S31 are mapped to a high-dimensional space by a kernel method, so as to train a kernel correlation filter, and the trained kernel correlation filter is used to perform fast fourier convolution on the extracted features, so as to obtain the response map, and/or, in S33, the part with the largest median of the response map is the center point position of the adjusted wide-field small target bounding box.
On the other hand, the technical scheme adopted by the invention comprises an automatic labeling system for small target detection data, which comprises the following steps:
The large target detection boundary box obtaining module is used for carrying out target detection on the acquired narrow-view-field camera image by utilizing the target detection model to obtain a narrow-view-field image large target detection boundary box;
The registration module is used for carrying out image registration on the wide-and-narrow-field camera images to obtain a homography matrix of transforming the narrow-field images into the wide-field images, the registered narrow-field camera images and the wide-field small-target bounding box before adjustment;
the boundary frame position adjusting module is used for adjusting the position of the wide-view-field small target boundary frame to obtain a small target boundary frame with accurate marking;
and the model training module is used for training the target detection model by utilizing the precisely marked small target boundary box to obtain a trained small target detection model.
In a preferred embodiment, the large object detection bounding box obtaining module performs object detection by using an open vocabulary object detection model, and the registration module performs image registration by using an image registration method.
On the other hand, the technical scheme adopted by the invention comprises a readable storage medium, and is characterized in that the readable storage medium is stored with a computer program, and when the computer program is run, the steps in the automatic labeling method of the small target detection data are executed.
On the other hand, the technical scheme adopted by the invention comprises that the electronic equipment comprises a memory and a processor, wherein the memory stores a computer program, and the computer program executes the steps in the automatic labeling method of the small target detection data when being run by the processor.
Compared with the prior art, the invention has the beneficial effects that:
The invention provides an automatic labeling scheme for small target detection data based on wide and narrow view field camera image registration and an open vocabulary target detection large model, wherein the large target of a narrow view field image is labeled through the open vocabulary target detection large model, then the large target is reversely mapped onto the small target of a wide view field through image registration, and finally the automatic labeling of the small target detection data is realized. The automatic labeling scheme of the small target detection data improves the labeling precision of the small target detection data and reduces labeling loss.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments described in the present invention, and other drawings may be obtained according to the drawings without inventive effort to those skilled in the art.
FIG. 1 is a flow chart of an automatic labeling method for small target detection data provided by the invention;
FIG. 2 is a schematic diagram of the present invention for detecting narrow field of view camera images using an open vocabulary object detection model;
FIG. 3 is a schematic diagram of the wide-narrow field image registration result and the registration bounding box of the present invention;
FIG. 4 is a schematic diagram of a response chart in the wide-narrow field target position correction process according to the present invention;
FIG. 5 is a schematic diagram of the bounding boxes before and after correction of the wide and narrow field of view target position according to the present invention;
Fig. 6 is a schematic diagram of the small target detection data finally obtained by the present invention.
Detailed Description
The invention will be more fully understood from the following detailed description, which should be read in conjunction with the accompanying drawings. Detailed embodiments of the present invention are disclosed herein, however, it is to be understood that the disclosed embodiments are merely exemplary of the invention, which can be embodied in various forms. Therefore, specific functional details disclosed herein are not to be interpreted as limiting, but merely as a basis for the claims and as a representative basis for teaching one skilled in the art to variously employ the present invention in virtually any appropriately detailed embodiment.
As shown in FIG. 1, the automatic labeling method for small target detection data disclosed by the invention specifically comprises the following steps:
S1, performing target detection on the acquired narrow-view-field camera image by using a target detection model to obtain a large target detection boundary frame of the narrow-view-field image.
Specifically, a narrow-view field camera image is acquired through a narrow-view field camera from a real environment, then an open vocabulary object detection large model is utilized, and particularly, a Grounding-Dino model (namely an open set detection model) in the open vocabulary object detection large model is adopted to perform object detection on the acquired narrow-view field camera image, so as to obtain a narrow-view field image large-object detection boundary frame, which is marked as B n, as shown in fig. 2, fig. 2 is a schematic diagram of the narrow-view field camera image detection through the open vocabulary object detection model, the left side is a flow of the narrow-view field object detection, the upper left corner in the right side is an effect of failure in detecting the wide-view field image through the open vocabulary object detection model, and the lower left corner in the right side is an effect of success in detecting the narrow-view field image through the open vocabulary object detection model. The success rate of target annotation representing the narrow-field image is higher than that of the wide-field image. Because the ground resolution of the narrow-view-field camera image is high, the target size is large, the detection success rate is high, and the labeling effect is good. In addition, the wide-field camera and the narrow-field camera are synchronously acquired, and the wide-field camera image acquired by the wide-field camera is used for subsequent registration labeling.
The process for obtaining the large target detection boundary frame of the narrow-view-field image specifically comprises the following steps:
S11, extracting word vectors from the vocabulary by using a BERT model (namely, a bi-directional encoder representation converter).
Specifically, the vocabulary herein is words entered by the user, such as "person. Word vectors are extracted from words (e.g., text data such as "person", "car") entered by the user. The BERT model is a pre-trained language model based on a transducer architecture, and is capable of capturing semantic information of words through bi-directional context coding. The extraction process comprises the steps of firstly dividing a vocabulary input by a user into Token (such as 'person' and 'car') and adding special marks (such as [ CLS ] and [ SEP ]), then carrying out bi-directional context coding on the Token through a multi-layer transducer encoder of BERT to generate a semantic vector of each Token, and finally extracting the vectors as semantic representations of the vocabulary.
S12, inputting the word vector and the narrow-field camera image into a target detection model for feature fusion.
The method comprises the steps of firstly mapping word vectors and image features into the same feature space to ensure dimension consistency, then calculating attention weights of texts to images by taking text features as Query and image features as Key and Value through a cross attention mechanism, injecting semantic information into the image features, then fusing the enhanced image features with original features in a weighted summation or splicing mode to generate multi-modal feature representation, and finally inputting the fused features into a target detection model.
S13, the target detection model infers a large target detection boundary box Bn of the narrow-view-field image.
Specifically, the multi-mode features after feature fusion are input into a target detection model, and a detection boundary box Bn of a large target in a narrow-view-field image is obtained through reasoning of a detection head. The detection head is a core component of the object detection model and is typically composed of classification branches for predicting the class of objects (e.g. "person", "car") and regression branches for predicting the bounding box coordinates of the objects. On the basis of the image features fused with text semantic information, the detection head decodes the features through convolution or a full-connection layer and directly outputs the category probability and the bounding box position information of the target.
S2, performing image registration on the wide-and-narrow-field camera images to obtain a homography matrix of transforming the narrow-field image into the wide-field image, the registered narrow-field camera images and the wide-field small target bounding box before adjustment.
The wide-and-narrow-field camera images are the acquired wide-and-narrow-field camera images, specifically, the wide-and-narrow-field camera images are registered by using an image registration method to obtain a homography matrix of transforming the narrow-field image into the wide-field image, which is marked as H n2w, according to the homography matrix H n2w, the registered narrow-field camera image is obtained, which is marked as Ir, and the small-target bounding box of the wide-field before adjustment is marked as B w, as shown in fig. 3, fig. 3 is a schematic diagram of the registration result of the wide-and-narrow-field image and the registration bounding box thereof, wherein the part with obvious difference between the hue and the background in the figure is the registered area, and the red marked frame is the registration bounding box.
The process of carrying out image registration on the wide and narrow view field camera images to obtain a homography matrix of transforming the narrow view field images into the wide view field images, the registered narrow view field camera images and the wide view field small target bounding box before adjustment comprises the following steps:
s21, extracting characteristic points of the wide-and-narrow-field camera image.
In implementation, superPoint (SuperPoint: self-Supervised Interest Point Detection and Description, self-supervised learning super-point feature extraction algorithm) is specifically used to extract feature points of the narrow-field-of-view camera image. SuperPoint is a feature point detector based on deep learning, which extracts salient feature points from an image through a Convolutional Neural Network (CNN) and simultaneously generates a Descriptor (Descriptor) of each feature point. These descriptors are used to characterize local visual information of feature points, providing a basis for subsequent matching.
S22, the extracted characteristic points are matched, and a plurality of matching pairs are obtained.
In implementation, lightG ue (LightGlue: local Feature MATCHING AT LIGHT SPEED, lightweight fast Local Feature matching algorithm) is specifically used to match the Feature points. LightGlue is based on a transducer and a graph neural network, and the computational complexity is dynamically adjusted by optimizing a cross attention layer and introducing a position coding and matching prediction module so as to realize efficient and accurate feature matching.
S23, the mismatching points are removed, and a homography matrix H n2w for converting the narrow-field image into the wide-field image is calculated.
In implementation, a random sampling consistency method (Random Sample Consensus, for short, RANSAC) is specifically utilized to reject mismatching points. RANSAC calculates a candidate homography matrix by randomly sampling a set of matching point pairs and counts the number of matching points (i.e., interior points) that fit the matrix. And repeating the process, and finally selecting the matrix with the most inner points as the optimal solution, and simultaneously eliminating the matching points (namely outer points and mismatching points) which do not accord with the matrix. The RANSAC can effectively process noise and outliers, and improve matching robustness. The homography matrix is a 3×3 matrix for describing perspective transformation relationship of the narrow field image to the wide field image. After the mismatching points are removed, the homography matrix is solved by utilizing the residual matching point pairs through a least square method or other optimization methods. Specifically, the homography matrix satisfies the following relationship:
Where (x n,yn) is a point in the narrow field of view image and (x w,yw) is a point in the corresponding wide field of view image. By solving this linear system of equations, parameters of the homography matrix can be obtained.
S24, transforming the narrow-field-of-view camera image onto the wide-field-of-view camera image by using the homography matrix H n2w to obtain a registered narrow-field-of-view camera image I r.
S25, transforming the large target detection boundary frame B n of the narrow-field image onto the wide-field camera image by utilizing the homography matrix H n2w to obtain the small target boundary frame B w of the wide-field image.
And S3, adjusting the position of the wide-view-field small target boundary box B w to obtain the small target boundary box with accurate marking.
Because of the frame asynchronism and registration error phenomenon of the wide-and-narrow-field camera image, the wide-field small-target bounding box B w obtained in the step S2 has a position error on the wide-field camera image, so that the position of the wide-field small-target bounding box B w needs to be adjusted. Specifically, by using a target tracking method, the registered narrow-field camera image I r and the wide-field camera image and the wide-field small-target bounding box B w obtained by image registration in the step S2 are taken as inputs, and the position of the wide-field small-target initial bounding box B w is iteratively adjusted to generate a small-target bounding box with accurate labeling.
The process of adjusting the position of the wide-view-field small target boundary box B w to obtain the precisely marked small target boundary box specifically comprises the following steps:
And S31, respectively extracting features of the registered narrow-field camera image I r and the wide-field small-target boundary box B w.
In implementation, the registered narrow-field camera image I r and the wide-field small-target bounding box B w are respectively subjected to multiple amplification, for example, 1.5 times of multiple amplification, so that the main purpose of the operation is to expand the search range, thereby providing more sufficient context information for the KCF (Kemelized Correlation Filters) tracker and avoiding target loss or tracking failure caused by initial mapping deviation. The KCF itself enhances the capturing capability of the target by expanding the searching range, and after the image and the bounding box are enlarged, HOG features in Ir and Bw can be extracted more effectively, and the robustness and the accuracy of feature matching are improved. The preferred magnification is controlled between 1.2 and 1.8 times, so that the search range is ensured to be large enough, and the tracking effect is not influenced by introducing excessive irrelevant background information due to excessive amplification. The HOG (Histogram of Oriented Gradients, directional gradient histogram) features in the registered narrow field-of-view camera image I r are extracted after magnification, and the HOG features in the wide field-of-view small target bounding box B w are extracted.
S32, calculating a response chart of the extracted features according to a kernel correlation filtering method in the target tracking method.
In the implementation, the HOG features extracted in the step S31 are trained to be relevant filters, the trained relevant filters are utilized to carry out fast fourier convolution on the extracted HOG features, a response chart of the extracted HOG features is obtained, the response chart is shown in fig. 4, fig. 4 is a schematic diagram of the response chart in the process of correcting the wide and narrow view field target position according to the invention, the response chart only comprises a target boundary box area, and the larger the pixel value is, the more likely the corrected target is at the position. And it can be observed that the maximum pixel value position indicated by the arrow in fig. 4 corresponds to the correction direction of the bounding box in fig. 5.
And S33, taking the part with the largest value in the response diagram as a position adjustment result of the wide-view-field small target boundary box B w to obtain the small target boundary box with accurate marking.
In implementation, the portion with the largest pixel value in the response graph is the center point position of the adjusted wide-field small target bounding box B w, the adjusted wide-field small target bounding box B w is shown in fig. 5, fig. 5 is a schematic diagram of the bounding box before and after the wide-field small target position correction, the left side in the graph is the effect before the bounding box correction, and the right side in the graph is the effect after the bounding box correction, so that the effectiveness of the target position correction can be seen.
S4, training the target detection model by using the precisely marked small target bounding box to obtain a trained small target detection model, wherein the small target detection model is shown in FIG. 6, and FIG. 6 is a schematic diagram of small target detection data finally obtained by the method. The top line image is a large target obtained by labeling, the middle line image is a middle target obtained by labeling, and the bottom line image is a small target obtained by labeling. This means that the algorithm can annotate a large amount of data.
Correspondingly, the invention also discloses an automatic labeling system of the small target detection data, which comprises the following steps:
and the large target detection boundary box obtaining module is used for carrying out target detection on the acquired narrow-view-field camera image by utilizing the target detection model to obtain a large target detection boundary box of the narrow-view-field image.
The registration module is used for carrying out image registration on the wide-and-narrow-field camera images to obtain a homography matrix of transforming the narrow-field images into the wide-field images, the registered narrow-field camera images and the wide-field small-target bounding box before adjustment.
And the boundary frame position adjusting module is used for adjusting the position of the small target boundary frame with the wide view field to obtain the small target boundary frame with the accurate marking.
And the model training module is used for training the target detection model by utilizing the accurately marked small target boundary box to obtain a trained small target detection model.
The working principle of each module may refer to the description of each step in the method, which is not repeated herein.
In another aspect, the present invention also provides a readable storage medium having stored thereon a computer program which when executed implements the steps in the automatic labeling method for small target detection data provided in the above embodiment.
In still another aspect, the present invention further provides an electronic device, including a memory and a processor, where the memory stores a computer program, and the computer program executes steps in the automatic labeling method of small target detection data when executed by the processor.
Logic and/or steps represented in the flowcharts or otherwise described herein, e.g., a ordered listing of executable instructions for implementing logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
More specific examples (a non-exhaustive list) of the readable storage medium include an electrical connection (an electronic device) having one or more wires, a portable computer diskette (a magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). In addition, the readable storage medium may even be paper or other suitable medium on which the program is printed, as the program may be electronically captured, via, for instance, optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.
It is to be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, may be implemented using any one or combination of techniques known in the art, discrete logic circuits with logic gates for implementing logic functions on data signals, application specific integrated circuits with appropriate combinational logic gates, programmable Gate Arrays (PGAs), field Programmable Gate Arrays (FPGAs), and the like.
The invention has the advantages that the invention provides an automatic labeling scheme for small target detection data based on wide and narrow view field camera image registration and an open vocabulary target detection large model, the large target of the narrow view field image is labeled through the open vocabulary target detection large model, then the large target is reversely mapped to the small target of the wide view field through image registration, and finally the automatic labeling of the small target detection data is realized. The automatic labeling scheme of the small target detection data improves the labeling precision of the small target detection data and reduces labeling loss.
The various aspects, embodiments, features and examples of the invention are to be considered in all respects as illustrative and not intended to limit the invention, the scope of which is defined solely by the claims. Other embodiments, modifications, and uses will be apparent to those skilled in the art without departing from the spirit and scope of the claimed invention.
The use of headings and chapters in this disclosure is not meant to limit the disclosure, and each chapter may apply to any aspect, embodiment, or feature of the disclosure.
Claims (10)
1. An automatic labeling method for small target detection data, which is characterized by comprising the following steps:
s1, performing target detection on an acquired narrow-view-field camera image by using a target detection model to obtain a large target detection boundary frame of the narrow-view-field image;
S2, performing image registration on the wide-and-narrow-field camera images to obtain a homography matrix of transforming the narrow-field image into the wide-field image, the registered narrow-field camera images and a small-target boundary frame with a wide field before adjustment;
s3, adjusting the position of the wide-view-field small target boundary frame to obtain a small target boundary frame with accurate marking;
And S4, training the target detection model by using the precisely marked small target boundary box to obtain a trained small target detection model.
2. The method for automatically labeling small target detection data according to claim 1, wherein in S1, the target detection model is an open vocabulary target detection model, and/or S1 comprises:
S11, extracting word vectors from the vocabulary by using a BERT model;
S12, inputting the word vector and the narrow-field-of-view camera image into a target detection model for feature fusion;
s13, the target detection model infers the large target detection boundary box of the narrow-view-field image.
3. The method for automatically labeling small target detection data according to claim 1, wherein in S2, the image registration method is used for performing image registration on the wide and narrow field-of-view camera images, and/or S2 comprises:
S21, extracting characteristic points of the wide and narrow view field camera images;
S22, matching the characteristic points to obtain a plurality of matching pairs;
s23, eliminating mismatching points, and calculating the homography matrix from the narrow-view-field image to the wide-view-field image;
S24, transforming the narrow-field-of-view camera image to a wide-field-of-view camera image by using the homography matrix to obtain the registered narrow-field-of-view camera image;
s25, transforming the large target detection boundary frame of the narrow-view-field image to the wide-view-field camera image by utilizing the homography matrix to obtain the small target boundary frame of the wide view field.
4. The method according to claim 3, wherein in S21, the feature points of the narrow-field-of-view camera image are extracted by SuperPoint, and/or in S22, the feature points are matched by LightGlue, and/or in S23, the mismatching points are eliminated by using a random sampling consistency method.
5. The method for automatically labeling small target detection data according to claim 1, wherein in the step S3, a small target bounding box with accurate labeling is generated by iteratively adjusting the positions of the small target bounding boxes with the narrow field of view camera image, the wide field of view camera image, and the small target bounding box with the wide field of view obtained by image registration in the step S2 as inputs by using a target tracking method, and/or the step S3 comprises:
S31, respectively extracting features of the registered narrow-view-field camera image and the wide-view-field small target boundary frame;
s32, calculating a response chart of the extracted features according to a kernel correlation filtering method in the target tracking method;
And S33, taking the largest part in the response diagram as a position adjustment result of the wide-field small target bounding box.
6. The method for automatically labeling small target detection data according to claim 5, wherein in S31, the registered narrow-field camera image and the registered wide-field small target bounding box are multiplied respectively, HOG feature extraction is performed respectively, and/or in S32, the features extracted in the step S31 are mapped to a high-dimensional space through a kernel method to train a kernel correlation filter, the trained kernel correlation filter is utilized to perform fast Fourier convolution on the extracted features to obtain the response map, and/or in S33, the part with the largest median of the response map is the center point position of the adjusted wide-field small target bounding box.
7. An automatic labeling system for small target detection data, the system comprising:
The large target detection boundary box obtaining module is used for carrying out target detection on the acquired narrow-view-field camera image by utilizing the target detection model to obtain a narrow-view-field image large target detection boundary box;
The registration module is used for carrying out image registration on the wide-and-narrow-field camera images to obtain a homography matrix of transforming the narrow-field images into the wide-field images, the registered narrow-field camera images and the wide-field small-target bounding box before adjustment;
the boundary frame position adjusting module is used for adjusting the position of the wide-view-field small target boundary frame to obtain a small target boundary frame with accurate marking;
and the model training module is used for training the target detection model by utilizing the precisely marked small target boundary box to obtain a trained small target detection model.
8. The automatic labeling system of small target detection data of claim 7, wherein the large target detection bounding box acquisition module performs target detection by using an open vocabulary target detection model, and the registration module performs image registration by using an image registration method.
9. A readable storage medium is characterized in that a computer program is stored in the readable storage medium, and when the computer program is executed, the steps in the automatic labeling method of small target detection data according to any one of the claims 1-6 are executed.
10. An electronic device, characterized in that the electronic device comprises a memory and a processor, wherein the memory stores a computer program, and the computer program executes the steps in the automatic labeling method of small target detection data according to any one of claims 1-6 when the computer program is executed by the processor.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202510279404.7A CN120219818A (en) | 2025-03-10 | 2025-03-10 | Automatic labeling method, system, storage medium and electronic device for small target detection data |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202510279404.7A CN120219818A (en) | 2025-03-10 | 2025-03-10 | Automatic labeling method, system, storage medium and electronic device for small target detection data |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN120219818A true CN120219818A (en) | 2025-06-27 |
Family
ID=96111214
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202510279404.7A Pending CN120219818A (en) | 2025-03-10 | 2025-03-10 | Automatic labeling method, system, storage medium and electronic device for small target detection data |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN120219818A (en) |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN120833362A (en) * | 2025-09-19 | 2025-10-24 | 东海实验室 | Image registration method and device based on multi-frame fusion |
-
2025
- 2025-03-10 CN CN202510279404.7A patent/CN120219818A/en active Pending
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN120833362A (en) * | 2025-09-19 | 2025-10-24 | 东海实验室 | Image registration method and device based on multi-frame fusion |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN111709409B (en) | Face living body detection method, device, equipment and medium | |
| KR102266529B1 (en) | Method, apparatus, device and readable storage medium for image-based data processing | |
| CN113076891B (en) | Human body posture prediction method and system based on improved high-resolution network | |
| CN110807422A (en) | Natural scene text detection method based on deep learning | |
| CN111339975B (en) | Object Detection, Recognition and Tracking Method Based on Central Scale Prediction and Siamese Neural Network | |
| CN106845430A (en) | Pedestrian detection and tracking based on acceleration region convolutional neural networks | |
| CN111027381A (en) | Method, device, device and storage medium for identifying obstacles using monocular camera | |
| CN110781744A (en) | A small-scale pedestrian detection method based on multi-level feature fusion | |
| CN113781563B (en) | Mobile robot loop detection method based on deep learning | |
| CN109341703A (en) | A full-cycle visual SLAM algorithm using CNNs feature detection | |
| CN115953744B (en) | Vehicle identification tracking method based on deep learning | |
| CN119149856B (en) | A method and device for detecting and tracking unmanned aerial vehicles in complex backgrounds | |
| CN120219818A (en) | Automatic labeling method, system, storage medium and electronic device for small target detection data | |
| He et al. | FOS-YOLO: Multiscale context aggregation with attention-driven modulation for efficient target detection in complex environments | |
| Zhang et al. | Small object detection using deep convolutional networks: applied to garbage detection system | |
| CN117115555A (en) | A semi-supervised 3D target detection method based on noise data | |
| CN116580232A (en) | A method, system and electronic device for automatic image labeling | |
| CN112418207B (en) | A weakly supervised text detection method based on self-attention distillation | |
| CN119763056A (en) | Target identification method, device, nonvolatile storage medium and computer equipment | |
| CN118629007A (en) | Traffic sign recognition method and system | |
| Taromi et al. | A novel data-driven algorithm for object detection, tracking, distance estimation, and size measurement in stereo vision systems | |
| Li et al. | Long short-term memory improved Siamese network for robust target tracking | |
| CN116310902A (en) | A UAV target detection method and system based on lightweight neural network | |
| CN120495349B (en) | Multi-target tracking method, multi-target tracking device, electronic apparatus, and storage medium | |
| CN119723668B (en) | A Visual Text Multimodal Animal Pose Estimation Method Based on Probabilistic Representation Learning |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination |