CN114494736A - A method for outdoor location re-identification based on saliency region detection - Google Patents
A method for outdoor location re-identification based on saliency region detection Download PDFInfo
- Publication number
- CN114494736A CN114494736A CN202210104480.0A CN202210104480A CN114494736A CN 114494736 A CN114494736 A CN 114494736A CN 202210104480 A CN202210104480 A CN 202210104480A CN 114494736 A CN114494736 A CN 114494736A
- Authority
- CN
- China
- Prior art keywords
- feature
- image
- area
- visual
- feature map
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/216—Parsing using statistical methods
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Probability & Statistics with Applications (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Image Analysis (AREA)
Abstract
Description
技术领域technical field
本发明涉及计算机视觉、深度学习技术领域,尤其涉及一种基于显著性区域检测的室外地点重识别方法。The invention relates to the technical fields of computer vision and deep learning, in particular to an outdoor location re-identification method based on saliency area detection.
背景技术Background technique
对于自主导航机器人而言,定位与建图是首要的目的。而依赖视觉传感器的机器人在定位问题上的解决方案则是视觉地点识别。给定一张描述指定地点的场景图象,机器人需要判断此地点是否曾经到达过,判断的过程需要对数据库中路径轨迹的关键帧进行相似度的匹配。由于场景地点图像普遍存在光照变化,视角朝向变化,行人遮挡等干扰因素,传统方法的图像特征点的提取方式过度依赖人工设计的特征,即使在稳定的室内环境有不错的效果,但在上述室外场景的干扰下,所获得的效果却不太好。For autonomous navigation robots, localization and mapping are the primary goals. The solution to the positioning problem of robots relying on vision sensors is visual place recognition. Given a scene image that describes a specified location, the robot needs to determine whether the location has been reached before, and the process of judging requires the similarity matching of the key frames of the path trajectory in the database. Due to the common interference factors such as illumination change, viewing angle change, pedestrian occlusion, etc. in scene location images, the extraction method of image feature points of traditional methods relies too much on artificially designed features, even in a stable indoor environment. Under the interference of the scene, the effect obtained is not very good.
如何解决上述技术问题为本发明面临的课题。How to solve the above technical problems is the subject of the present invention.
发明内容SUMMARY OF THE INVENTION
本发明的目的在于提供一种基于显著性区域检测的室外地点重识别方法,本发明的目的是对室外场景的各种干扰下能更好地提取图像全局特征,通过深度学习特征构建的视觉词袋模型,把显著性区域的局部特征融合成全局特征,提高匹配的准确度。The purpose of the present invention is to provide an outdoor location re-identification method based on saliency region detection. The bag model fuses local features of salient regions into global features to improve matching accuracy.
本发明的发明思想是:本发明把整个流程分为两个部分,一部分是检测显著性区域,第二部分是区域特征转化为更为鲁棒的词袋向量,获得全局特征,并对图片进行相似度匹配,本发明先将卷积神经网络对图像提取的特征进行分析,通过对激活值区域均值大小的判断,对图像显著性区域进行检测,并提取显著性区域特征,此外本发明还收集了具有代表性的室外场景图片集合,并以此训练得到深度学习特征的视觉词袋模型,显著性区域提取的局部特征通过视觉词袋模型聚合成全局特征,所获得的特征在面对视角朝向变化等干扰因素时鲁棒性更强。The inventive idea of the present invention is as follows: the present invention divides the whole process into two parts, one part is to detect salient regions, and the second part is to convert regional features into more robust word bag vectors, obtain global features, and perform image processing on images. Similarity matching, the present invention first analyzes the features extracted from the image by the convolutional neural network, and by judging the average size of the activation value area, detects the saliency area of the image, and extracts the salient area features. In addition, the present invention also collects A collection of representative outdoor scene pictures is used to train a visual word bag model of deep learning features. The local features extracted from salient regions are aggregated into global features through the visual word bag model. It is more robust against disturbance factors such as changes.
本发明是通过如下措施实现的:一种基于显著性区域检测的室外地点重识别方法,包含以下步骤:The present invention is realized by the following measures: an outdoor location re-identification method based on saliency area detection, comprising the following steps:
步骤一、SE-ResNet特征图的提取Step 1. Extraction of SE-ResNet feature map
在卷积神经网络中,卷积操作,很大一部分工作是提高感受野,在空间上把特征融合,或是通过多通道提取多尺度空间信息,传统的卷积操作基本上默认对输入特征图的所有通道进行融合,而SE-Net的关注通道之间的关系,使得模型可以自动学习到不同通道特征的重要程度,SE-Net的网络架构如图1所示,目前大多数的主流网络都是基于这两种类似的单元通过重复的方式叠加来构造的;由此可见,SE模块可以嵌入到现在几乎所有的网络结构中。经对比实验证明,SE-Net嵌入到ResNet当中效果更好,因此在特征图的提取中,本发明采用SE-ResNet模型对图像进行卷积操作,对输入图像I∈RW′×H′×3,在经过卷积操作后得到特征图F∈RW×H×C。In the convolutional neural network, a large part of the convolution operation is to improve the receptive field, spatially fuse features, or extract multi-scale spatial information through multiple channels. The traditional convolution operation basically defaults to the input feature map. All channels are fused, and SE-Net pays attention to the relationship between channels, so that the model can automatically learn the importance of different channel features. The network architecture of SE-Net is shown in Figure 1. At present, most mainstream networks are is constructed based on the overlapping of these two similar units in a repetitive manner; thus, SE modules can be embedded in almost all network structures up to now. It is proved by comparative experiments that SE-Net is better embedded in ResNet. Therefore, in the extraction of feature maps, the present invention uses the SE-ResNet model to perform convolution operations on the image, and the input image I∈R W′×H′× 3. After the convolution operation, the feature map F∈R W×H×C is obtained.
步骤二、显著性区域的检测Step 2. Detection of salient regions
通过分析室外场景图像的特点,可以发现室外地点能够通过一些标志性建筑、或者路标等物体来辨别两幅图像是否属于同一地点,为此我们通过卷积后得到的特征图F中,激活值高的区域往往是图像中特别显著的区域,但这些物体在图像中的大小并不是唯一的,为了适应各个显著性区域的大小不同,本发明使用非零值的连通区域的检测方法来确定显著性区域的位置,因此,在步骤一中提取到的特征图中进行如下操作:By analyzing the characteristics of outdoor scene images, it can be found that outdoor locations can identify whether two images belong to the same location through some landmark buildings, road signs and other objects. For this reason, in the feature map F obtained by convolution, the activation value is high. The area of is often a particularly salient area in the image, but the size of these objects in the image is not unique. In order to adapt to the different sizes of each salient area, the present invention uses the non-zero value of the detection method of the connected area to determine the saliency The location of the region, therefore, perform the following operations in the feature map extracted in step 1:
(1)、二值化特征图(1), binarized feature map
图像在经过卷积神经网络的卷积层和激活函数的处理后会保留图像的空间纹理特征,其中特征图的激活值大小反映了图像该区域的纹理强度大小,因此,为了筛选出显著性区域,首先在各个通道的特征图中划分需要检测的区域,首先使用二值化特征图来划分各个图像物体区域,特征图中激活值较大的区域我们使用1去表示该区域为值得关注的区域,激活值较小的区域则用0表示其为纹理较少;不值得关注的区域,在二值化特征图过程中,本发明使用阈值δ区分每个区域应该被设置为0还是1;After the image is processed by the convolutional layer and activation function of the convolutional neural network, the spatial texture features of the image will be preserved, and the activation value of the feature map reflects the texture intensity of this area of the image. Therefore, in order to filter out the salient areas , first divide the area to be detected in the feature map of each channel, first use the binarized feature map to divide the area of each image object, and use 1 to indicate that the area with a larger activation value in the feature map is an area worthy of attention , the area with smaller activation value is 0 to indicate that it has less texture; the area that is not worthy of attention, in the process of binarizing the feature map, the present invention uses the threshold δ to distinguish whether each area should be set to 0 or 1;
通过如下公式得到二值化之后的特征图FB The feature map F B after binarization is obtained by the following formula
(2)、划分相关区域ROI(2), divide the relevant area ROI
假设显著性区域之间应该是独立的,或者至少是无重叠的。所以使用非零值的连通区域来表示每一个单独的图像区域。It is assumed that the saliency regions should be independent, or at least non-overlapping. So each individual image region is represented by a connected region of non-zero values.
在二值特征图FB中,对所有值为1的位置,搜索与其相邻的8个位置的值,如果有同样为1的元素,则形成同一区域,再对区域内其余元素进行相邻值的搜索,直到同一区域内所有元素都被搜索过,最后得到多个相关区域ROIs(regions of interest),每个通道都有数量不一的ROI,最后总共会产生N个相关区域。In the binary feature map F B , for all positions with a value of 1, search for the values of 8 adjacent positions. If there are elements with the same value of 1, the same area is formed, and then the remaining elements in the area are adjacent to each other. The value is searched until all elements in the same region have been searched, and finally multiple related regions ROIs (regions of interest) are obtained. Each channel has a different number of ROIs, and finally a total of N related regions will be generated.
(3)、确定显著区域位置(3), determine the location of the significant area
对N个ROI对应特征图的区域计算特征图激活值的均值ar,公式如下:Calculate the mean value a r of the activation value of the feature map for the region corresponding to the feature map of the N ROIs, The formula is as follows:
i=1,...,ir;j=1,...,jr i =1,..., ir ; j=1,...,jr
并按照均值ar的值大小从高到低排序,选取最高的m个区域,作为最终的显著性区域S={si|i∈{1,...,m}}。And sort from high to low according to the value of the mean a r , and select the highest m regions as the final saliency region S={s i |i∈{1,...,m}}.
(4)、提取局部特征(4), extract local features
对于某一选定的显著性区域si,其区域范围为Ws×Hs,其中0<Ws,Hs<min(W,H),在特征图F上定位区域si的区域位置,并在该区域的所有通道上,得到维度为Ws×Hs×C的局部特征DL;最后采用总和池化方法,得到池化后的局部特征DL∈R1×1×C,公式如下:For a selected saliency region s i , whose region range is W s ×H s , where 0<W s , H s <min(W, H), locate the region position of region si on the feature map F , and on all channels in the region, a local feature DL with dimension W s × H s ×C is obtained; finally, the sum pooling method is used to obtain the pooled local feature DL ∈R 1×1×C , The formula is as follows:
i=1,...,Ws;j=1,...,Hs i=1, ... ,Ws; j=1, ... ,Hs
其中为局部特征DL中第c个通道的值。in is the value of the cth channel in the local feature DL.
步骤三、训练视觉词袋模型Step 3. Train the visual word bag model
普遍的视觉词袋模型是基于图像提取的SIFT特征训练得到的,本发明使用了SE-ResNet的网络层来生成特征描述符,同时保留卷积信息和局部特征,这些描述符的性能优于类似SIFT的探测器,特别是在SIFT包含许多异常值或无法匹配足够数量特征点的情况下。The general visual bag of words model is trained based on SIFT features extracted from images. The present invention uses the network layer of SE-ResNet to generate feature descriptors while retaining convolution information and local features. The performance of these descriptors is better than similar A detector for SIFT, especially when SIFT contains many outliers or cannot match a sufficient number of feature points.
本发明的步骤三中,训练视觉词袋模型由图像特征提取、视觉词汇树生成、视觉词汇特征构建三个部分组成;其中,第一部分的图像特征提取已经在步骤一、步骤二中得到了,步骤三中主要说明视觉词汇树生成和视觉词汇特征构建,主要流程如下:In step 3 of the present invention, the training of the visual word bag model consists of three parts: image feature extraction, visual vocabulary tree generation, and visual vocabulary feature construction; wherein, the image feature extraction of the first part has been obtained in steps 1 and 2, Step 3 mainly describes the visual vocabulary tree generation and visual vocabulary feature construction. The main process is as follows:
(1)、收集用于构建词汇树的特征(1), collect the features used to build the vocabulary tree
对于词汇树的生成,本发明使用k-means方法,k-means算法作为一种最常用的聚类方法,因其直观易懂,被广泛用于对图像局部特征进行聚类,在聚类之前,本发明收集了一定数量的比较具有代表性的室外场景图像,并对每一张图像按步骤一、步骤二进行特征提取,每一张图片会选取m个显著性区域,得到所有显著性区域的局部特征。For the generation of the vocabulary tree, the present invention uses the k-means method. As one of the most commonly used clustering methods, the k-means algorithm is widely used for clustering local features of images because of its intuitive and easy-to-understand. , the present invention collects a certain number of relatively representative outdoor scene images, and performs feature extraction for each image according to step 1 and step 2, and each image will select m salient areas to obtain all salient areas. local features.
(2)、使用k-means构建词汇树T(2), use k-means to build a vocabulary tree T
先构建根节点,使用k-means对所有特征进行第一次聚类,得到k个类及其类中心,以使类内具有较高的相似度,而类间相似度较低,取类中心作为根节点的子节点,完成词汇树第一层的构建,继续对第一层每个节点的类进行k-means聚类,得到k类,类中心作为该节点的子节点,一直循环,直到所有特征都分到叶子节点上,则词汇树T构建完成。First build the root node, use k-means to cluster all the features for the first time, and obtain k classes and their class centers, so that the intra-class similarity is high, and the inter-class similarity is low, and the class center is taken. As a child node of the root node, complete the construction of the first layer of the vocabulary tree, and continue to perform k-means clustering on the class of each node in the first layer to obtain k classes, and the class center is used as the child node of the node, and the cycle continues until All features are assigned to leaf nodes, and the vocabulary tree T is constructed.
(3)、视觉词汇特征向量Vbow (3), visual vocabulary feature vector V bow
词汇树的每个叶子节点都代表一个视觉单词,假设词汇树包含v个视觉单词,统计单词表中每个单词在图像中出现的次数s,从而将图像表示成为一个维度为v的向量Vbow。Each leaf node of the vocabulary tree represents a visual word. Assuming that the vocabulary tree contains v visual words, count the number of times s that each word in the vocabulary appears in the image, so as to represent the image as a vector V bow with dimension v .
(4)、加权的特征向量VW (4), weighted eigenvector V W
当视觉单词出现在图像数据库的很多图像或每幅图像中时,就会导致一些并没有实际意义的单词的统计值较大,仅仅统计单词表中每个单词在图像出现的次数是不够的,由于每个单词的重要性不同,需要对单词的重要性进行计算,也就是计算单词表中视觉单词的权重,为了解决这个问题,本发明使用TF-IDF(术语频率-逆文档频率)重加权方法,其中TF指的是某视觉单词出现的频率,IDF是逆向文档频率,包含某视觉单词的图片越少,IDF值越大,说明该词语具有很强的区分能力,TF-IDF值越大表示该特征词对这个文本的重要性越大,其计算公式如下:When visual words appear in many images or in each image of the image database, the statistical value of some words that have no actual meaning will be large. It is not enough to count the number of times each word in the word list appears in the image. Since the importance of each word is different, it is necessary to calculate the importance of the word, that is, to calculate the weight of the visual word in the word list. In order to solve this problem, the present invention uses TF-IDF (term frequency-inverse document frequency) reweighting method, where TF refers to the frequency of occurrence of a certain visual word, and IDF is the frequency of inverse documents. The fewer pictures containing a certain visual word, the larger the IDF value, indicating that the word has a strong ability to distinguish, and the larger the TF-IDF value is It indicates that the feature word is more important to the text, and its calculation formula is as follows:
TFIDF=TF×IDFTFIDF=TF×IDF
其中,s为视觉单词出现的次数,v为视觉单词总量,TFw代表了单词w在所有单词中出现的频率,P为图片的总数量,Pw为出现了单词w的数量。Among them, s is the number of occurrences of visual words, v is the total number of visual words, TF w represents the frequency of word w in all words, P is the total number of pictures, and P w is the number of occurrences of word w.
步骤四、图像之间的相似度匹配Step 4. Similarity matching between images
对于两幅图像Ia和Ib,通过上述步骤获得全局特征本发明通过余弦相似度公式来度量两个全局特征向量的距离,For two images I a and I b , global features are obtained through the above steps The present invention measures the distance between two global feature vectors through the cosine similarity formula,
与现有技术相比,本发明的有益效果为:Compared with the prior art, the beneficial effects of the present invention are:
1、本发明提出了一种更为健壮的特征提取方法,检测图片中的显著性区域,能有效地防止场景视角变化的干扰,在室外场景下能提取鲁棒性更强的图像全局特征,减少误匹配。1. The present invention proposes a more robust feature extraction method, which detects salient regions in pictures, can effectively prevent the interference of scene perspective changes, and can extract more robust image global features in outdoor scenes, Reduce mismatches.
2、本发明提出的方法无需使用大量的数据进行卷积神经网络的参数训练,节省了计算资源和时间的开销。2. The method proposed by the present invention does not need to use a large amount of data for parameter training of the convolutional neural network, which saves the overhead of computing resources and time.
3、使用深度学习特征替代传统特征,并结合到词袋模型中,在保持特征维度大小的情况下提高了地点重识别的精度。3. Use deep learning features to replace traditional features and combine them into the bag-of-words model to improve the accuracy of place re-identification while maintaining the feature dimension.
附图说明Description of drawings
附图用来提供对本发明的进一步理解,并且构成说明书的一部分,与本发明的实施例一起用于解释本发明,并不构成对本发明的限制。The accompanying drawings are used to provide a further understanding of the present invention, and constitute a part of the specification, and are used to explain the present invention together with the embodiments of the present invention, and do not constitute a limitation to the present invention.
图1为本发明中SE-Net网络流程图。Fig. 1 is the SE-Net network flow chart in the present invention.
图2为本发明整体流程示意图。FIG. 2 is a schematic diagram of the overall flow of the present invention.
图3为本发明提供的实施例中实验结果图。FIG. 3 is a graph of experimental results in the embodiment provided by the present invention.
具体实施方式Detailed ways
为了使本发明的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本发明进行进一步详细说明。当然,此处所描述的具体实施例仅用以解释本发明,并不用于限定本发明。In order to make the objectives, technical solutions and advantages of the present invention clearer, the present invention will be further described in detail below with reference to the accompanying drawings and embodiments. Of course, the specific embodiments described herein are only used to explain the present invention, but not to limit the present invention.
实施例1Example 1
参见图1至图3,本发明提供其技术方案为,本发明提出了一种基于显著性区域检测的室外地点重识别方法,视觉地点识别问题类似于图像检索问题,对输入的图像,在图像数据库中检索出与输入图像相似度最高的图片,本实施例中,本发明的实验在pytorch上进行,使用英伟达2070sGPU上进行网络训练及测试。实验过程中,在Place365公开数据集上进行SE-ResNet的模型预训练以及视觉词袋模型的生成。在Tokyo24/7数据集上对本发明的模型精度与NetVLAD方法以及SIFT特征匹配方法的精度进行比较。Referring to FIGS. 1 to 3 , the technical solutions provided by the present invention are that the present invention proposes an outdoor location re-identification method based on saliency area detection. The visual location recognition problem is similar to the image retrieval problem. The image with the highest similarity to the input image is retrieved from the database. In this embodiment, the experiment of the present invention is performed on pytorch, and network training and testing are performed on NVIDIA 2070sGPU. During the experiment, the model pre-training of SE-ResNet and the generation of the visual word bag model were performed on the Place365 public dataset. The accuracy of the model of the present invention is compared with the accuracy of the NetVLAD method and the SIFT feature matching method on the Tokyo24/7 dataset.
1、模型预训练1. Model pre-training
Place365是一个用于训练场景识别模型的数据集,数据集中包含多种不同的场景,使用此数据集对用于提取特征的SE-ResNet网络模型进行预训练,得到的模型会对室外场景例如街道路标、建筑物等重要信息更为敏感。提取到的特征会更可靠。Place365 is a dataset for training scene recognition models. The dataset contains a variety of different scenes. Use this dataset to pre-train the SE-ResNet network model for feature extraction, and the resulting model will be used for outdoor scenes such as streets. Important information such as road signs and buildings is more sensitive. The extracted features will be more reliable.
2、视觉词袋模型的生成2. Generation of visual word bag model
在本发明中,需要选取一定数量的图片提取特征以形成视觉词袋模型的词典树。本实施例中选用2000张具有代表性的室外场景的图片,所有图片均在Place365数据集中选取。在图片选取完成后,利用预训练好的SE-ResNet网络对所有图片进行特征提取。图片的特征提取是由SE-ResNet最后一层卷积层计算完成后输出得到的。在得到卷积层特征以后,对特征进行显著性区域的提取,得到一个固定数量的局部特征。在本实施例中,每张图片选取前10个激活值最高的区域,作为此图片的局部特征。于是总共得到了20000个特征向量。对所有特征向量进行正则化之后,循环执行k-means聚类算法,构建词典树,并对所有叶子节点根据TF-IDF公式进行权重的分配。In the present invention, a certain number of image extraction features need to be selected to form a dictionary tree of the visual word bag model. In this embodiment, 2000 pictures of representative outdoor scenes are selected, and all pictures are selected from the Place365 data set. After the image selection is completed, the pre-trained SE-ResNet network is used to extract features from all the images. The feature extraction of the image is obtained by the output of the last convolutional layer of SE-ResNet. After the convolutional layer features are obtained, the salient regions are extracted for the features to obtain a fixed number of local features. In this embodiment, the top 10 regions with the highest activation values are selected for each picture as the local features of the picture. So a total of 20,000 feature vectors are obtained. After regularizing all feature vectors, the k-means clustering algorithm is executed cyclically, a dictionary tree is constructed, and weights are assigned to all leaf nodes according to the TF-IDF formula.
对比试验:Comparative Test:
本实施例使用Tokyo24/7数据集进行模型精度的验证,Tokyo24/7数据集包含使用手机摄像头拍摄的7万5千张用于检索的数据库图像和315个用于查询的图像。其中查询图像分别是在白天、傍晚和晚上拍摄的,而数据库图像仅在白天拍摄,因此,查询图像与检索数据库中的图像之间的光照变化很大,对比的难度也很大。In this example, the Tokyo24/7 dataset is used to verify the model accuracy. The Tokyo24/7 dataset contains 75,000 database images for retrieval and 315 images for query captured by mobile phone cameras. The query images are taken during the day, evening and night respectively, while the database images are only taken during the day. Therefore, the illumination changes between the query images and the images in the retrieval database are very large, and the comparison is also very difficult.
对于查询正确与否的评价标准,本实施例设定对于检索结果的前n张相似度最高的图片中,至少有一张结果图片与查询图片的位置距离在5米的范围内,则认为是一次成功的查询。位置距离可以根据数据集给定的每张图片的GPS信息求得。然后针对不同的n值绘制正确识别查询的百分比(召回率)。For the evaluation criteria of whether the query is correct or not, this embodiment assumes that among the top n pictures with the highest similarity in the retrieval result, if at least one of the result pictures is within a range of 5 meters from the query picture, it is considered to be one time. successful query. The location distance can be obtained from the GPS information of each picture given in the dataset. The percentage of correctly identified queries (recall) is then plotted against different values of n.
本实施例中首先采用NetVLAD方法与本发明的模型效果做对比实验,通过输入相同的查询图片,设置相同的n值,并记录每一次查询的成功与否,求出不同的n值下的召回率,绘制召回率曲线,如图3所示。由图可知本发明在不同n值的召回率均比NetVLAD方法要高。In this embodiment, the NetVLAD method is first used to compare the effect of the model of the present invention. By inputting the same query picture, setting the same n value, and recording the success of each query, the recall under different n values is obtained. rate, draw the recall curve, as shown in Figure 3. It can be seen from the figure that the recall rate of the present invention at different n values is higher than that of the NetVLAD method.
此外,本实施例还对传统的特征点提取方法(SIFT)与本发明的特征提取方法对检索结果精度做了对比。首先采用同样的训练图片集,用SIFT特征提取方法对所有图片采集特征。然后对所有的SIFT特征构建视觉词袋模型,得到SIFT特征词典树TSIFT。对查询图片提取SIFT特征,并通过词典树转换成视觉词袋向量与图像数据库中所有图片进行比对,检索出前n个相似度最高的图片。记录不同n值对应不同的正确识别查询的百分比(召回率)。通过图3可以看出,本发明使用的卷积神经网络提取特征在室外场景下地点重识别的效果要比传统方法提取的SIFT特征要好。In addition, this embodiment also compares the retrieval result accuracy between the traditional feature extraction method (SIFT) and the feature extraction method of the present invention. First, the same training image set is used, and the SIFT feature extraction method is used to collect features for all images. Then a visual word bag model is constructed for all SIFT features, and the SIFT feature dictionary tree T SIFT is obtained. Extract SIFT features from the query image and convert it into a visual word bag vector through a dictionary tree Compare with all the pictures in the image database, and retrieve the top n pictures with the highest similarity. Record the percentage of correctly identified queries (recall) for different values of n. It can be seen from FIG. 3 that the convolutional neural network extraction feature used in the present invention has a better effect on location re-identification in an outdoor scene than the SIFT feature extracted by the traditional method.
以上所述仅为本发明的较佳实施例,并不用以限制本发明,凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。The above descriptions are only preferred embodiments of the present invention, and are not intended to limit the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention shall be included in the protection of the present invention. within the range.
Claims (3)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210104480.0A CN114494736B (en) | 2022-01-28 | 2022-01-28 | Outdoor place re-identification method based on salient region detection |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210104480.0A CN114494736B (en) | 2022-01-28 | 2022-01-28 | Outdoor place re-identification method based on salient region detection |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114494736A true CN114494736A (en) | 2022-05-13 |
CN114494736B CN114494736B (en) | 2024-09-20 |
Family
ID=81476827
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210104480.0A Active CN114494736B (en) | 2022-01-28 | 2022-01-28 | Outdoor place re-identification method based on salient region detection |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114494736B (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150269191A1 (en) * | 2014-03-20 | 2015-09-24 | Beijing University Of Technology | Method for retrieving similar image based on visual saliencies and visual phrases |
CN107357834A (en) * | 2017-06-22 | 2017-11-17 | 浙江工业大学 | Image retrieval method based on visual saliency fusion |
CN112818790A (en) * | 2021-01-25 | 2021-05-18 | 浙江理工大学 | Pedestrian re-identification method based on attention mechanism and space geometric constraint |
-
2022
- 2022-01-28 CN CN202210104480.0A patent/CN114494736B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150269191A1 (en) * | 2014-03-20 | 2015-09-24 | Beijing University Of Technology | Method for retrieving similar image based on visual saliencies and visual phrases |
CN107357834A (en) * | 2017-06-22 | 2017-11-17 | 浙江工业大学 | Image retrieval method based on visual saliency fusion |
CN112818790A (en) * | 2021-01-25 | 2021-05-18 | 浙江理工大学 | Pedestrian re-identification method based on attention mechanism and space geometric constraint |
Non-Patent Citations (1)
Title |
---|
王立新;江加和;: "基于深度学习的显著性区域的图像检索研究", 应用科技, no. 06, 13 April 2018 (2018-04-13) * |
Also Published As
Publication number | Publication date |
---|---|
CN114494736B (en) | 2024-09-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Jin Kim et al. | Learned contextual feature reweighting for image geo-localization | |
CN111126360B (en) | Cross-domain pedestrian re-identification method based on unsupervised combined multi-loss model | |
CN107679078B (en) | Bayonet image vehicle rapid retrieval method and system based on deep learning | |
CN111652934B (en) | Positioning method, map construction method, device, equipment and storage medium | |
EP2054855B1 (en) | Automatic classification of objects within images | |
Cummins et al. | Appearance-only SLAM at large scale with FAB-MAP 2.0 | |
Lynen et al. | Placeless place-recognition | |
WO2020125216A1 (en) | Pedestrian re-identification method, device, electronic device and computer-readable storage medium | |
CN110175615B (en) | Model training method, domain-adaptive visual position identification method and device | |
US20120301014A1 (en) | Learning to rank local interest points | |
Lee et al. | Place recognition using straight lines for vision-based SLAM | |
Derpanis et al. | Classification of traffic video based on a spatiotemporal orientation analysis | |
CN104615986B (en) | The method that pedestrian detection is carried out to the video image of scene changes using multi-detector | |
CN110070066A (en) | A kind of video pedestrian based on posture key frame recognition methods and system again | |
CN113988147B (en) | Multi-label classification method and device for remote sensing image scene based on graph network, and multi-label retrieval method and device | |
CN113407780B (en) | A target retrieval method, device and storage medium | |
CN109785387A (en) | Winding detection method, device and the robot of robot | |
CN111709317A (en) | A pedestrian re-identification method based on multi-scale features under saliency model | |
CN105320764A (en) | 3D model retrieval method and 3D model retrieval apparatus based on slow increment features | |
Dong et al. | A novel loop closure detection method using line features | |
CN116704490B (en) | License plate recognition method, license plate recognition device and computer equipment | |
CN115063831A (en) | A high-performance pedestrian retrieval and re-identification method and device | |
Lynen et al. | Trajectory-based place-recognition for efficient large scale localization | |
Wu et al. | Variant semiboost for improving human detection in application scenes | |
Han et al. | A novel loop closure detection method with the combination of points and lines based on information entropy |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |