[go: up one dir, main page]

CN103778146B - Image clustering device and method - Google Patents

Image clustering device and method Download PDF

Info

Publication number
CN103778146B
CN103778146B CN201210406382.9A CN201210406382A CN103778146B CN 103778146 B CN103778146 B CN 103778146B CN 201210406382 A CN201210406382 A CN 201210406382A CN 103778146 B CN103778146 B CN 103778146B
Authority
CN
China
Prior art keywords
subset
image
clustering
unit
cluster
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201210406382.9A
Other languages
Chinese (zh)
Other versions
CN103778146A (en
Inventor
刘曦
刘汝杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fujitsu Ltd
Original Assignee
Fujitsu Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fujitsu Ltd filed Critical Fujitsu Ltd
Priority to CN201210406382.9A priority Critical patent/CN103778146B/en
Publication of CN103778146A publication Critical patent/CN103778146A/en
Application granted granted Critical
Publication of CN103778146B publication Critical patent/CN103778146B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Library & Information Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Image Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明实施例提供一种图像聚类装置以及方法,所述图像聚类方法包括:对多个图像进行基于视觉特征的聚类以获得第一集合;对多个图像进行链接结构的聚类以获得第二集合;通过视觉特征信息和链接结构信息融合第一集合和第二集合,来获得图像聚类的结果。通过本发明实施例,可以进一步提高聚类结果的准确性,生成语义更加一致的类。

Embodiments of the present invention provide an image clustering device and method, and the image clustering method includes: clustering multiple images based on visual features to obtain a first set; clustering multiple images with a link structure to obtain Obtaining the second set; fusing the first set and the second set by visual feature information and link structure information to obtain an image clustering result. Through the embodiments of the present invention, the accuracy of clustering results can be further improved, and classes with more consistent semantics can be generated.

Description

图像聚类装置以及方法Image clustering device and method

技术领域technical field

本发明涉及图像处理领域,特别涉及一种图像聚类装置以及方法。The present invention relates to the field of image processing, in particular to an image clustering device and method.

背景技术Background technique

随着数码相机和具有拍照功能手机的普及,图像的获取变得越来越容易。此外,互联网的快速发展以及web上图像共享网站的越来越流行,图像的数量正呈爆炸式增长,快速浏览和搜索所需图像因此变得费时费力。当前主要依赖图像的标签来辅助快速浏览,然而标签本身具有多义性、歧义性以及不准确性等限制,因此不能很好地解决该问题。With the popularization of digital cameras and mobile phones with camera functions, the acquisition of images has become easier and easier. In addition, with the rapid development of the Internet and the increasing popularity of image sharing sites on the web, the number of images is increasing explosively, and it becomes time-consuming and laborious to quickly browse and search for desired images. At present, image tags are mainly relied on to assist quick browsing, but the tags themselves have limitations such as ambiguity, ambiguity, and inaccuracy, so they cannot solve this problem well.

基于图像内容的图像自组织非常重要,它可以有效地辅助图像浏览。图像聚类(Image Clustering)是实现基于图像内容的图像自组织的一种有效方法,它以某种方式将相似的图像快速组合在一起。ClustTour是最近提出的一种对城市中景点图像进行聚类组织的方法,它首先分别利用图像的标签信息和视觉信息构建两个相似图,然后在这两个相似图上应用一个图聚类方法得到最终的聚类结果。Image self-organization based on image content is very important, and it can effectively assist image browsing. Image clustering (Image Clustering) is an effective method to achieve image self-organization based on image content, which quickly combines similar images together in a certain way. ClustTour is a recently proposed method for clustering and organizing images of scenic spots in cities. It first constructs two similarity graphs using image label information and visual information respectively, and then applies a graph clustering method on these two similarity graphs. Get the final clustering result.

但是,发明人发现,现有技术中(例如ClustTour)只考虑了图像之间的链接结构,它仅仅利用了基于图的聚类方法,因此需要图像的标签信息,不能进一步提升聚类结果。However, the inventors found that the prior art (such as ClustTour) only considers the link structure between images, and it only uses the graph-based clustering method, so it needs the label information of the image, and cannot further improve the clustering result.

下面列出了对于理解本发明和常规技术有益的文献,通过引用将它们并入本文中,如同在本文中完全阐明了一样。Documents useful to the understanding of the present invention and conventional art are listed below and are incorporated by reference as if fully set forth herein.

【参考文献1】S.Papadopoulos,C.Zigkolis,S.Kapiris,Y.Kompatsiaris andA.Vakali.ClustTour:City Exploration by use of Hybrid Photo Clustering,InProceedings ofACM Multimedia,1617-1620,2010.[Reference 1] S. Papadopoulos, C. Zigkolis, S. Kapiris, Y. Kompatsiaris and A. Vakali. Clust Tour: City Exploration by use of Hybrid Photo Clustering, In Proceedings of ACM Multimedia, 1617-1620, 2010.

【参考文献2】X.W.Xu,N.Yuruk,Z.D.Feng and T.A.J.Schweiger.SCAN:AStructural Clustering Algorithm for Networks,Proceedings of the 13th ACMSIGKDDinternational conference on Knowledge discovery and data mining,824-833,2007.[Reference 2] X.W.Xu, N.Yuruk, Z.D.Feng and T.A.J.Schweiger. SCAN: AStructural Clustering Algorithm for Networks, Proceedings of the 13th ACMSIGKDD international conference on Knowledge discovery and data mining, 824-833, 2007.

发明内容Contents of the invention

本发明实施例提供一种图像聚类装置以及方法,目的在于进一步提高聚类结果的准确性,生成语义更加一致的类。Embodiments of the present invention provide an image clustering device and method, with the purpose of further improving the accuracy of clustering results and generating clusters with more consistent semantics.

根据本发明实施例的一个方面,提供一种图像聚类装置,所述图像聚类装置包括:According to an aspect of an embodiment of the present invention, an image clustering device is provided, and the image clustering device includes:

第一聚类单元,对多个图像进行基于视觉特征的聚类以获得第一集合;The first clustering unit performs visual feature-based clustering on multiple images to obtain a first set;

第二聚类单元,对所述多个图像进行链接结构的聚类以获得第二集合;a second clustering unit, performing clustering of link structures on the plurality of images to obtain a second set;

融合单元,通过视觉特征信息和链接结构信息融合所述第一集合和第二集合,来获得图像聚类的结果。The fusion unit is configured to fuse the first set and the second set by using visual feature information and link structure information to obtain an image clustering result.

根据本发明实施例的另一个方面,提供一种图像聚合方法,所述图像聚类方法包括:According to another aspect of the embodiments of the present invention, an image aggregation method is provided, and the image clustering method includes:

对多个图像进行基于视觉特征的聚类以获得第一集合;performing visual feature-based clustering on the plurality of images to obtain a first set;

对所述多个图像进行链接结构的聚类以获得第二集合;performing link-structured clustering on the plurality of images to obtain a second set;

通过视觉特征信息和链接结构信息融合所述第一集合和第二集合,来获得图像聚类的结果。The result of image clustering is obtained by fusing the first set and the second set by using visual feature information and link structure information.

本发明的有益效果在于:通过融合基于视觉特征的聚类和基于链接结构信息的聚类,可以进一步提高聚类结果的准确性,生成语义更加一致的类。The beneficial effect of the present invention is that the accuracy of clustering results can be further improved and more consistent semantic classes can be generated by fusing the clustering based on visual features and the clustering based on link structure information.

参照后文的说明和附图,详细公开了本发明的特定实施方式,指明了本发明的原理可以被采用的方式。应该理解,本发明的实施方式在范围上并不因而受到限制。在所附权利要求的精神和条款的范围内,本发明的实施方式包括许多改变、修改和等同。With reference to the following description and accompanying drawings, there are disclosed in detail specific embodiments of the invention, indicating the manner in which the principles of the invention may be employed. It should be understood that embodiments of the invention are not limited thereby in scope. Embodiments of the invention encompass many changes, modifications and equivalents within the spirit and scope of the appended claims.

针对一种实施方式描述和/或示出的特征可以以相同或类似的方式在一个或更多个其它实施方式中使用,与其它实施方式中的特征相组合,或替代其它实施方式中的特征。Features described and/or illustrated with respect to one embodiment can be used in the same or similar manner in one or more other embodiments, in combination with, or instead of features in other embodiments .

应该强调,术语“包括/包含”在本文使用时指特征、整件、步骤或组件的存在,但并不排除一个或更多个其它特征、整件、步骤或组件的存在或附加。It should be emphasized that the term "comprising/comprising" when used herein refers to the presence of a feature, integer, step or component, but does not exclude the presence or addition of one or more other features, integers, steps or components.

附图说明Description of drawings

图1是本发明实施例1的图像聚类装置的一个构成示意图;FIG. 1 is a schematic diagram of the composition of an image clustering device according to Embodiment 1 of the present invention;

图2是本发明实施例2的图像聚类装置的一个构成示意图;FIG. 2 is a schematic diagram of the composition of an image clustering device according to Embodiment 2 of the present invention;

图3是本发明实施例2的融合单元的一个构成示意图;3 is a schematic diagram of the composition of the fusion unit in Embodiment 2 of the present invention;

图4是本发明实施例2的第一更新单元的一个构成示意图;FIG. 4 is a schematic diagram of the composition of the first update unit in Embodiment 2 of the present invention;

图5是本发明实施例3的图像聚类方法的一个流程图;Fig. 5 is a flow chart of the image clustering method of Embodiment 3 of the present invention;

图6是本发明实施例4的图像聚类方法的一个流程图;Fig. 6 is a flow chart of the image clustering method of Embodiment 4 of the present invention;

图7是本发明实施例4的图像聚类方法的另一个示意图;Fig. 7 is another schematic diagram of the image clustering method in Embodiment 4 of the present invention;

图8是本发明实施例4的对第二子集进行更新的流程图。Fig. 8 is a flow chart of updating the second subset according to Embodiment 4 of the present invention.

具体实施方式detailed description

参照附图,通过下面的说明书,本发明的前述以及其它特征将变得明显。在说明书和附图中,具体公开了本发明的特定实施方式,其表明了其中可以采用本发明的原则的部分实施方式,应了解的是,本发明不限于所描述的实施方式,相反,本发明包括落入所附权利要求的范围内的全部修改、变型以及等同物。The foregoing and other features of the invention will become apparent from the following description, taken with reference to the accompanying drawings. In the specification and drawings, specific embodiments of the invention are disclosed, which illustrate some embodiments in which the principles of the invention may be employed. It is to be understood that the invention is not limited to the described embodiments, but rather, the invention The invention includes all modifications, variations and equivalents that come within the scope of the appended claims.

实施例1Example 1

本发明实施例提供一种图像聚类装置,图1是本发明实施例的图像聚类装置的一个构成示意图。如图1所示,该图像聚类装置100包括:第一聚类单元101、第二聚类单元102和融合单元103。An embodiment of the present invention provides an image clustering device, and FIG. 1 is a schematic diagram of the composition of the image clustering device according to the embodiment of the present invention. As shown in FIG. 1 , the image clustering apparatus 100 includes: a first clustering unit 101 , a second clustering unit 102 and a fusion unit 103 .

其中,第一聚类单元101对多个图像进行基于视觉特征的聚类以获得第一集合;第二聚类单元102对多个图像进行链接结构的聚类以获得第二集合;融合单元103通过视觉特征信息和链接结构信息融合第一集合和第二集合,来获得图像聚类的结果。Wherein, the first clustering unit 101 performs clustering based on visual features to multiple images to obtain the first set; the second clustering unit 102 performs clustering of the link structure to multiple images to obtain the second set; the fusion unit 103 The result of image clustering is obtained by fusing the first set and the second set by visual feature information and link structure information.

在本实施例中,可以首先给定多个图像,该图像可以是地理标注图像。例如,可以给定N个图像I={(x1,g1),(x2,g2),…,(xn,gn)},Ii=(xi,gi),其中xi是一个d维特征向量,它表示第i个图像的原始特征,gi是一个e维特征向量,它表示图像的附加信息,例如可以是GPS信息。本发明的目的是对这N个图像分成m类,使得每一类中的图像之间尽可能相似。In this embodiment, a plurality of images may be given first, and the images may be geotagged images. For example, given N images I={(x 1 ,g 1 ),(x 2 ,g 2 ),...,(x n ,g n )}, I i =(x i ,g i ), where x i is a d-dimensional feature vector, which represents the original feature of the i-th image, g i is an e-dimensional feature vector, which represents the additional information of the image, such as GPS information. The purpose of the present invention is to divide the N images into m categories, so that the images in each category are as similar as possible.

在本实施例中,第一聚类单元101可以基于视觉特征进行聚类。可以使用传统聚类方法,例如k-means、合并聚类等在图像视觉特征上对图像进行聚类,得到基于视觉特征的聚类结果,其中具有较大视觉相似度的图像将会被归为一类。In this embodiment, the first clustering unit 101 may perform clustering based on visual features. You can use traditional clustering methods, such as k-means, merge clustering, etc., to cluster images on the visual features of images, and obtain clustering results based on visual features, in which images with greater visual similarity will be classified as one type.

在本实施例中,第二聚类单元102可以基于图像视觉特征对多个图像构建K近邻(KNN,K-Nearest Neighbor)图,并在KNN图上进行结构化聚类以得到基于链接结构信息的聚类结果。In this embodiment, the second clustering unit 102 can construct a K-Nearest Neighbor (KNN, K-Nearest Neighbor) graph for multiple images based on image visual features, and perform structured clustering on the KNN graph to obtain link-based structural information clustering results.

例如,可以首先基于图像视觉特征对所有图像构建KNN图,具体构建过程可以如下:根据图像视觉特征计算每幅图像与其他所有图像的距离,选取前k1个最小距离的图像作为图像的k1近邻图像,将每幅图像视为图中一个节点,每幅图像与它的k1个近邻图像相连接形成图的边,图像之间的视觉相似度决定边的权重。For example, KNN graphs can be constructed for all images based on image visual features first, and the specific construction process can be as follows: Calculate the distance between each image and all other images according to image visual features, and select the top k 1 images with the smallest distance as the k 1 of the image Neighboring images, each image is regarded as a node in the graph, and each image is connected with its k 1 neighboring images to form the edges of the graph, and the visual similarity between images determines the weight of the edges.

然后,在KNN图上使用一种结构化聚类算法,例如SCAN2,对图像进行聚类,从而得到基于链接结构信息的聚类结果,其中两图像如果具有足够多的共同链接图像,那么他们将会被归为一类。Then, use a structured clustering algorithm on the KNN graph, such as SCAN 2 , to cluster the images to obtain a clustering result based on link structure information. If two images have enough common link images, then their will be classified as one.

在本实施例中,融合单元103可以融合第一集合和第二集合来获得图像聚类的结果。由于考虑了图像的视觉特征以及图像之间的链接结构,可以针对不同模态信息采用不同聚类方法,由此得到很好的聚类结果。In this embodiment, the fusion unit 103 may fuse the first set and the second set to obtain an image clustering result. Since the visual features of images and the link structure between images are considered, different clustering methods can be used for different modal information, thus obtaining good clustering results.

值得注意的是,以上仅对基于视觉特征的聚类和基于链接结构的聚类进行了示意性说明。但本发明不限于此,例如还可以采用其他的聚类算法。或者,基于链接结构聚类时并不限于KNN图,还可以构建其他的结构。可以根据具体情况确定具体的实施方式。It is worth noting that the above is only a schematic illustration of visual feature-based clustering and link structure-based clustering. But the present invention is not limited thereto, for example, other clustering algorithms can also be used. Alternatively, the clustering based on the link structure is not limited to the KNN graph, and other structures can also be constructed. The specific implementation manner can be determined according to specific conditions.

由上述实施例可知,通过融合基于视觉特征的聚类和基于链接结构信息的聚类,可以进一步提高聚类结果的准确性,生成语义更加一致的类。It can be known from the above embodiments that by fusing the clustering based on visual features and the clustering based on link structure information, the accuracy of the clustering results can be further improved and more semantically consistent clusters can be generated.

实施例2Example 2

在实施例1的基础上,本发明实施例又提供一种图像聚类装置,与实施例1相同的内容不再赘述。On the basis of Embodiment 1, the embodiment of the present invention provides an image clustering device, and the same content as Embodiment 1 will not be repeated.

图2是本发明实施例的图像聚类装置的另一个构成示意图。如图2所示,该图像聚类装置200包括:第一聚类单元201、第二聚类单元202和融合单元203。如图2所示,该图像聚类装置200还可以包括:分类单元204,该分类单元204根据分类信息对多个图像进行聚类,以对多个图像进行筛选。Fig. 2 is another schematic diagram of the structure of the image clustering device according to the embodiment of the present invention. As shown in FIG. 2 , the image clustering apparatus 200 includes: a first clustering unit 201 , a second clustering unit 202 and a fusion unit 203 . As shown in FIG. 2 , the image clustering apparatus 200 may further include: a classification unit 204 , which clusters a plurality of images according to classification information, so as to screen the plurality of images.

在本实施例中,该分类信息可以是图像附加信息,例如可以是GPS信息,但本发明不限于此,还可以是其他的分类信息。分类单元204可以基于图像附加信息对图像进行聚类,并利用聚类结果对图像过滤。In this embodiment, the classification information may be image additional information, such as GPS information, but the present invention is not limited thereto, and may also be other classification information. The classification unit 204 can cluster the images based on the additional information of the images, and use the clustering results to filter the images.

例如,可以首先使用传统聚类方法如k-means、meanshift等在图像附加信息上如GPS信息对图像进行聚类,然后按照某预定原则对聚类结果的每一类进行过滤,例如对具有较少图像的类或者具有较偏GPS位置的类进行过滤删除。过滤剩下的图像将用于后续的聚类处理,可以输入第一聚类单元201和第二聚类单元202。For example, traditional clustering methods such as k-means, meanshift, etc. can be used to cluster images on image additional information such as GPS information, and then filter each category of the clustering results according to a predetermined principle, such as Classes with fewer images or classes with more GPS locations are filtered and deleted. The remaining images after filtering will be used for subsequent clustering processing, and can be input into the first clustering unit 201 and the second clustering unit 202 .

在本实施例中,整个图像聚类装置可以主要分为三部分:(1)基于图像附加信息如GPS信息对图像进行聚类并基于聚类结果进行图像过滤;(2)基于视觉特征和链接结构信息对图像进行聚类;(3)融合基于视觉特征的聚类结果和基于链接结构的聚类结果。以下对具体如何融合进行示意性说明。In this embodiment, the entire image clustering device can be mainly divided into three parts: (1) cluster images based on image additional information such as GPS information and perform image filtering based on the clustering results; (2) based on visual features and links (3) Fusion of visual feature-based clustering results and link structure-based clustering results. The following is a schematic illustration of how to specifically integrate.

图3是本发明实施例的融合单元的一个构成示意图,如图3所示,融合单元203可以包括选择单元301和处理单元302。其中,选择单元301将第一集合和第二集合中的一个集合作为目标集合,将另一个集合作为源集合;处理单元302将源集合中的元素加入到目标集合中,或者根据源集合中的元素更新目标集合中的元素。FIG. 3 is a schematic structural diagram of a fusion unit according to an embodiment of the present invention. As shown in FIG. 3 , the fusion unit 203 may include a selection unit 301 and a processing unit 302 . Wherein, the selection unit 301 takes one of the first set and the second set as the target set, and uses the other set as the source set; the processing unit 302 adds the elements in the source set to the target set, or according to the elements in the source set Element updates the elements in the target collection.

具体地,处理单元302可以包括:第一计算单元3021、合并单元3022和第一更新单元3023。其中,第一计算单元3021对于源集合中的一个第一子集,计算该第一子集和目标集合的每个子集的重叠度;合并单元3022在目标集合中不存在重叠度大于预设阈值的子集时,将第一子集加入到目标集合中;第一更新单元3023在目标集合中存在重叠度大于预设阈值的第二子集时,对第二子集进行更新。Specifically, the processing unit 302 may include: a first calculating unit 3021 , a combining unit 3022 and a first updating unit 3023 . Wherein, the first calculation unit 3021 calculates the degree of overlap between the first subset and each subset of the target set for a first subset in the source set; the merging unit 3022 does not have an overlap degree greater than a preset threshold in the target set When there are subsets, the first subset is added to the target set; the first updating unit 3023 updates the second subset when there is a second subset whose degree of overlap is greater than a preset threshold in the target set.

在具体实施时,可以从第一集合和第二集合中任选其中一个聚类结果作为目标集合Cd,另一个聚类结果作为源集合Cs。可以分析源集合Cs中每一个元素csi(即聚类结果中一个类),以确定该元素是否直接添加到目标集合Cd中,还是用于更新目标集合Cd中的某些元素。具体分析过程可以如下:During specific implementation, one of the clustering results from the first set and the second set may be selected as the target set C d , and the other clustering result may be used as the source set C s . Each element c si in the source set C s (that is, a class in the clustering result) can be analyzed to determine whether the element is directly added to the target set C d or used to update certain elements in the target set C d . The specific analysis process can be as follows:

计算csi与目标集合Cd中每一个元素cdj的重叠度,其中两个元素的重叠度Overlap(csi,cdj)的计算公式可以如(1):Calculate the degree of overlap between c si and each element c dj in the target set C d , where the calculation formula for the overlap of two elements Overlap(c si ,c dj ) can be as follows (1):

若目标集合中不存在重叠度大于预设阈值thr的元素,则将csi加入到目标集合中;若目标集合中存在某元素cdj,它与元素csi的重叠度大于预设阈值thr,则利用元素csi对目标集合中的元素cdj进行更新。If there is no element whose overlapping degree is greater than the preset threshold thr in the target set, then add c si to the target set; if there is an element c dj in the target set, its overlap with element c si is greater than the preset threshold thr, Then use the element c si to update the element c dj in the target set.

图4是本发明实施例的第一更新单元的一个构成示意图,如图4所示,第一更新单元3023可以包括:第一生成单元401、第二更新单元402和第一替换单元403。其中,第一生成单元401将第一子集和第二子集的交集作为第三子集;第二更新单元402基于聚类测量值对第三子集进行更新;第一替换单元403用更新后的第三子集替换第二子集。FIG. 4 is a schematic diagram of the composition of the first update unit according to an embodiment of the present invention. As shown in FIG. 4 , the first update unit 3023 may include: a first generation unit 401 , a second update unit 402 and a first replacement unit 403 . Among them, the first generating unit 401 uses the intersection of the first subset and the second subset as the third subset; the second updating unit 402 updates the third subset based on the cluster measurement value; the first replacing unit 403 uses the updated Then the third subset replaces the second subset.

具体地,第二更新单元402可以包括:第二生成单元4021、第二计算单元4022和第二替换单元4023。其中,第二生成单元4021对于不属于第三子集、且属于第一子集或第二子集的每个元素,增加到第三子集后形成一个新的第四子集;第二计算单元4022对于每个第四子集计算聚类测量值,以获得具有最优的聚类测量值的一个第四子集;第二替换单元4023在第四子集的聚类测量值优于第三子集的聚类测量值时,用第四子集替换第三子集。Specifically, the second update unit 402 may include: a second generation unit 4021 , a second calculation unit 4022 and a second replacement unit 4023 . Among them, the second generation unit 4021, for each element that does not belong to the third subset but belongs to the first subset or the second subset, adds to the third subset and forms a new fourth subset; the second calculation The unit 4022 calculates the clustering measurement value for each fourth subset, so as to obtain a fourth subset with the optimal clustering measurement value; the clustering measurement value of the second replacement unit 4023 in the fourth subset is better than the first When clustering measurements of three subsets, replace the third subset with the fourth subset.

在具体实施时,第二生成单元4021、第二计算单元4022和第二替换单元4023可以多次执行,重复上述过程,直到没有新的集合的聚类测量值优于第三子集的聚类测量值。In a specific implementation, the second generation unit 4021, the second calculation unit 4022 and the second replacement unit 4023 can be executed multiple times, and the above process is repeated until no clustering measurement value of the new set is better than the clustering of the third subset Measurements.

在本实施例中,图像聚类装置200还可以包括:第三计算单元(图中未示出),第三计算单元对于多个图像的某一个集合,计算该集合的聚类测量值。其中,聚类测量值可以包括:全局视觉相关值、局部视觉相关值、全局链接相关值、局部链接相关值中的其中之一或其组合。In this embodiment, the image clustering apparatus 200 may further include: a third calculation unit (not shown in the figure), for a certain set of multiple images, the third calculation unit calculates the clustering measurement value of the set. Wherein, the cluster measurement value may include: one of global visual correlation value, local visual correlation value, global link correlation value, local link correlation value or a combination thereof.

以下以该集合为任一类c进行示意性说明。可以定义两图像Ii,Ij的视觉特征相似度为In the following, the set is any type c for schematic illustration. The visual feature similarity of two images I i and I j can be defined as

其中σ=mean(‖xi-xj2,1≤i≠j≤n)是在所有两两图像对上计算的平均视觉特征距离。where σ=mean(‖x i −x j2 , 1≤i≠j≤n) is the average visual feature distance computed over all pairwise image pairs.

每个图像作为一个图结点,对于每个图像i,可以由(2)计算它与其他图像的特征相似度,找出前k2个具有最大特征相似度的图像,记为Nvk2(i),并将该图像与这k2个图像连接,形成图的边,从而构建出k近邻图,图的边的权重由公式(3)计算。Each image is used as a graph node. For each image i, its feature similarity with other images can be calculated by (2), and the top k 2 images with the largest feature similarity can be found, denoted as N vk2 (i ), and connect the image with these k 2 images to form the edges of the graph, thereby constructing a k-nearest neighbor graph, and the weight of the edges of the graph is calculated by formula (3).

在一个实施方式中,对于全局视觉相关值可以如下计算:根据图像视觉特征,计算类c中任两个图像之间的视觉相似度,对所有视觉相似度求平均即为全局视觉相关值,如公式(4)。In one embodiment, the global visual correlation value can be calculated as follows: according to the visual features of the image, the visual similarity between any two images in class c is calculated, and the average of all visual similarities is the global visual correlation value, such as Formula (4).

Distgv(c)=mean(Sv,ij),i,j∈c (4)Dist gv (c)=mean(S v,ij ),i,j∈c (4)

在另一个实施方式中,对于局部视觉相关值可以如下计算:根据图像视觉特征,对于类c中每幅图像,从c中找出与其最相似的k2个图像并计算其与这k2个图像的视觉相似度,对所有视觉相似度求平均即为局部视觉相关值,如公式(5)。In another embodiment, the local visual correlation value can be calculated as follows: according to the visual characteristics of the image, for each image in class c, find the k2 images most similar to it from c and calculate its relationship with the k2 images Visual similarity, the average of all visual similarities is the local visual correlation value, such as formula (5).

Distlv(c)=mean(Sv,ij),i∈c,j∈NNvk2(i) (5)Dist lv (c)=mean(S v,ij ),i∈c, j∈N Nvk2 (i) (5)

在另一个实施方式中,对于全局链接相关值可以如下计算:在所有图像上基于视觉特征构建k近邻图,找到k近邻图上这样的边,其连接的图像都在类c中,对这些边权重求和得到第一和值;找出k近邻图上这样的边,其连接的图像至少有一个图像在类c中,对这些边的权重求和得到第二和值;通过第一和值除以第二和值给出全局链接相关值,如公式(6)。In another embodiment, the correlation value for the global link can be calculated as follows: construct a k-nearest neighbor graph based on visual features on all images, find such edges on the k-nearest neighbor graph, whose connected images are all in class c, and for these edges The weights are summed to obtain the first sum value; find such edges on the k-nearest neighbor graph, and the connected images have at least one image in class c, and the weights of these edges are summed to obtain the second sum value; through the first sum value Dividing by the second sum gives the global link correlation value, as in Equation (6).

在具体实施时,第三计算单元可以包括:第四计算单元、第五计算单元和第六计算单元。其中,第四计算单元在构建的K近邻图上找到连接的图像均在该集合中的一条或多条边,对这些边的权重求和得到第一和值;第五计算单元在K近邻图上找出连接的图像至少有一个图像在该集合中的一条或多条边,对这些边的权重求和得到第二和值;第六计算单元将第一和值除以第二和值以得到全局链接相关值。During specific implementation, the third computing unit may include: a fourth computing unit, a fifth computing unit, and a sixth computing unit. Wherein, the fourth calculation unit finds one or more edges in which the connected images are all in the set on the constructed K-nearest neighbor graph, and sums the weights of these edges to obtain the first sum value; Find out that the connected image has at least one image in the set of one or more edges, and sum the weights of these edges to obtain the second sum value; the sixth calculation unit divides the first sum value by the second sum value to obtain Get the global link related value.

在另一个实施方式中,对于局部链接相关值可以如下计算:求出类c中任两个图像的链接权重,对所有链接权重求平均即为局部链接相关值。其中,可以在所有图像上基于视觉特征构建k近邻图,两图像的链接权重为图上连接这两幅图像的边的权重,如果图上不存在边连接该两幅图像,链接权重为0。可以如公式(7)。In another embodiment, the local link correlation value can be calculated as follows: find the link weights of any two images in class c, and average all the link weights to get the local link correlation value. Among them, the k-nearest neighbor graph can be constructed based on visual features on all images. The link weight of two images is the weight of the edge connecting the two images on the graph. If there is no edge connecting the two images on the graph, the link weight is 0. It can be as formula (7).

Distll(c)=mean(Wij),i,j∈c (7)Dist ll (c)=mean(W ij ),i,j∈c (7)

在具体实施时,第三计算单元可以包括:第七计算单元和第八计算单元。其中,第七计算单元求出该集合中任意两个图像的链接权重;第八计算单元对所有链接权重求平均值以得到局部链接相关值。During specific implementation, the third computing unit may include: a seventh computing unit and an eighth computing unit. Wherein, the seventh calculation unit calculates the link weights of any two images in the set; the eighth calculation unit averages all link weights to obtain local link correlation values.

在本实施例中,可以以某种一致的方式融合两种聚类结果:基于视觉特征的聚类和基于链接信息的聚类,因此两类信息可以同时被考虑,从而生成更加语义一致的类。并且,可以同时考虑四种不同的聚类测量值,能够更加有效地评估聚类,为一致地融合两种聚类结果提供很好基础。此外,组合的聚类测量值也可用于其他应用如对聚类结果中的各类进行排序等。In this embodiment, two types of clustering results can be fused in a consistent manner: clustering based on visual features and clustering based on link information, so two types of information can be considered at the same time to generate more semantically consistent clusters . Moreover, four different clustering measurements can be considered at the same time, which can evaluate clustering more effectively and provide a good basis for consistent fusion of two clustering results. Furthermore, the combined cluster measures can also be used in other applications such as ranking classes in the cluster results, etc.

实施例3Example 3

本发明实施例提供一种图像聚合方法,对应于实施例1中的图像聚合方法,相同的内容不再赘述。图5是本发明实施例的图像聚类方法的一个流程图,如图5所示,该图像聚类方法包括:An embodiment of the present invention provides an image aggregation method, which corresponds to the image aggregation method in Embodiment 1, and the same content will not be repeated here. Fig. 5 is a flow chart of the image clustering method of the embodiment of the present invention, as shown in Fig. 5, this image clustering method comprises:

步骤501,对多个图像进行基于视觉特征的聚类以获得第一集合;Step 501, performing clustering based on visual features on multiple images to obtain a first set;

步骤502,对多个图像进行链接结构的聚类以获得第二集合;Step 502, performing link structure clustering on multiple images to obtain a second set;

步骤503,通过视觉特征信息和链接结构信息融合第一集合和第二集合,来获得图像聚类的结果。Step 503, merging the first set and the second set by using visual feature information and link structure information to obtain an image clustering result.

在本实施例中,步骤502对多个图像进行链接结构的聚类以获得第二集合具体可以包括:基于图像视觉特征对多个图像构建KNN图,并在KNN图上进行结构化聚类以得到基于链接结构信息的聚类结果。但本发明不限于此。In this embodiment, step 502 performing link structure clustering on multiple images to obtain the second set may specifically include: constructing a KNN graph for multiple images based on image visual features, and performing structured clustering on the KNN graph to A clustering result based on link structure information is obtained. But the present invention is not limited thereto.

由上述实施例可知,通过融合基于视觉特征的聚类和基于链接结构信息的聚类,可以进一步提高聚类结果的准确性,生成语义更加一致的类。It can be known from the above embodiments that by fusing the clustering based on visual features and the clustering based on link structure information, the accuracy of the clustering results can be further improved and more semantically consistent clusters can be generated.

实施例4Example 4

本发明实施例提供一种图像聚合方法,对应于实施例2中的图像聚合方法,相同的内容不再赘述。图6是本发明实施例的图像聚类方法的一个流程图,如图6所示,该图像聚类方法包括:An embodiment of the present invention provides an image aggregation method, which corresponds to the image aggregation method in Embodiment 2, and the same content will not be repeated here. Fig. 6 is a flow chart of the image clustering method of the embodiment of the present invention, as shown in Fig. 6, the image clustering method includes:

步骤601,根据分类信息对多个图像进行聚类以进行筛选;Step 601, clustering multiple images according to classification information for screening;

步骤602,对筛选后的多个图像进行基于视觉特征的聚类以获得第一集合;Step 602, performing clustering based on visual features on the multiple screened images to obtain a first set;

步骤603,对筛选后的多个图像进行链接结构的聚类以获得第二集合;Step 603, performing link structure clustering on the screened multiple images to obtain a second set;

步骤604,通过视觉特征信息和链接结构信息融合第一集合和第二集合,来获得图像聚类的结果。Step 604, merging the first set and the second set by using visual feature information and link structure information to obtain an image clustering result.

图7是本发明实施例的图像聚类方法的另一个示意图,如图7所示,可以通过图像附加信息(例如GPS信息)进行筛选,过滤掉噪声图像,由此使得聚类结果更加准确。FIG. 7 is another schematic diagram of the image clustering method of the embodiment of the present invention. As shown in FIG. 7 , additional image information (such as GPS information) can be used to filter out noisy images, thereby making the clustering results more accurate.

在本实施例中,通过视觉特征信息和链接结构信息融合第一集合和第二集合具体可以包括:将第一集合和第二集合中的一个集合作为目标集合,将另一个集合作为源集合;将源集合中的元素加入到目标集合中,或者根据源集合中的元素更新目标集合中的元素。In this embodiment, fusing the first set and the second set by using visual feature information and link structure information may specifically include: taking one of the first set and the second set as a target set, and using the other set as a source set; Add the elements in the source collection to the target collection, or update the elements in the target collection according to the elements in the source collection.

在本实施例中,将源集合中的元素加入到目标集合中,或者根据源集合中的元素更新目标集合中的元素具体可以包括:对于源集合中的每一个第一子集,计算第一子集和目标集合的每个子集的重叠度;在目标集合中不存在重叠度大于预设阈值的子集时,将第一子集加入到目标集合中;在目标集合中存在重叠度大于预设阈值的第二子集时,对第二子集进行更新。In this embodiment, adding the elements in the source set to the target set, or updating the elements in the target set according to the elements in the source set may specifically include: for each first subset in the source set, calculating the first The degree of overlap between the subset and each subset of the target set; when there is no subset with an overlap greater than the preset threshold in the target set, the first subset is added to the target set; if there is an overlap greater than the preset threshold in the target set When the second subset of the threshold is set, the second subset is updated.

在具体实施时,对第二子集进行更新可以包括:将第一子集和第二子集的交集作为第三子集;基于聚类测量值对第三子集进行更新;用更新后的第三子集替换第二子集。During specific implementation, updating the second subset may include: taking the intersection of the first subset and the second subset as the third subset; updating the third subset based on the cluster measurement value; using the updated The third subset replaces the second subset.

图8是本发明实施例的对第二子集进行更新的流程图。如图8所示,对第二子集进行更新具体可以包括:Fig. 8 is a flow chart of updating the second subset according to the embodiment of the present invention. As shown in FIG. 8, updating the second subset may specifically include:

步骤801,将第一子集和第二子集的交集作为第三子集;Step 801, taking the intersection of the first subset and the second subset as the third subset;

步骤802,对于不属于第三子集、且属于第一子集或第二子集的每个元素,增加到第三子集后形成一个新的第四子集;Step 802, for each element not belonging to the third subset but belonging to the first subset or the second subset, adding to the third subset to form a new fourth subset;

在本实施例中,如果有新的第四子集生成,则执行步骤803;没有新的第四子集生成,则执行步骤805。In this embodiment, if a new fourth subset is generated, step 803 is performed; if no new fourth subset is generated, step 805 is performed.

步骤803,对于每个新的第四子集计算聚类测量值,以获得具有最优的聚类测量值的一个第四子集;并判断该第四子集的聚类测量值是否优于第三子集的聚类测量值;Step 803, calculating the clustering measurement value for each new fourth subset to obtain a fourth subset with the optimal clustering measurement value; and judging whether the clustering measurement value of the fourth subset is better than Clustering measures for the third subset;

在本实施例中,如果该第四子集的聚类测量值优于第三子集的聚类测量值,则执行步骤804;否则执行步骤805。In this embodiment, if the clustering measurement value of the fourth subset is better than the clustering measurement value of the third subset, then step 804 is performed; otherwise, step 805 is performed.

步骤804,用第四子集替换第三子集;然后执行步骤802。Step 804, replace the third subset with the fourth subset; then execute step 802.

步骤805,用更新后的第三子集替换第二子集。Step 805, replace the second subset with the updated third subset.

也就是说,在具体实施时,可以将csi和cdj所共有的图像形成一个类ccom。可以对于每个属于csi∪cd且不属于ccom的图像,加入到ccom中,生成一个新的类。如果没有新的类生成,则可以用类ccom替换cdj;否则对于每个新的类,计算其聚类测量值,选择具有最优的聚类测量值的一个新的类。判断该新的类的聚类测量值是否优于ccom的聚类测量值,如果优于ccom,则用该新的类替换更新类ccom,并重复前面步骤重新生成新类,否则可以用类ccom替换cdjThat is to say, during specific implementation, the images shared by c si and c dj can form a class c com . For each image that belongs to c si ∪ c d and does not belong to c com , it can be added to c com to generate a new class. If no new class is generated, c dj can be replaced by class c com ; otherwise for each new class, its clustering measure is calculated and a new class with the best clustering measure is selected. Judging whether the clustering measurement value of the new class is better than that of c com , if it is better than c com , replace and update the class c com with the new class, and repeat the previous steps to regenerate the new class, otherwise it can be Replace c dj with class c com .

在本实施例中,图像聚类方法还可以包括:对于多个图像的一个集合,计算该集合的聚类测量值;其中该聚类测量值包括:全局视觉相关值、局部视觉相关值、全局链接相关值、局部链接相关值中的其中之一或其组合。In this embodiment, the image clustering method may further include: for a set of multiple images, calculating the clustering measurement value of the set; wherein the clustering measurement value includes: global visual correlation value, local visual correlation value, global One or a combination of link-related values, partial link-related values.

在一个实施方式中,计算全局链接相关值具体包括:在构建的K近邻图上找到连接的图像均在该集合中的一条或多条边,对这些边的权重求和得到第一和值;在K近邻图上找出连接的图像至少有一个图像在该集合中的一条或多条边,对这些边的权重求和得到第二和值;将第一和值除以第二和值以得到全局链接相关值。In one embodiment, calculating the global link correlation value specifically includes: finding one or more edges in which the connected images are all in the set on the constructed K-nearest neighbor graph, and summing the weights of these edges to obtain the first sum value; On the K-nearest neighbor graph, find out that the connected images have at least one image in the set of one or more edges, and sum the weights of these edges to obtain the second sum value; divide the first sum value by the second sum value to obtain Get the global link related value.

在另一个实施方式中,计算局部链接相关值具体包括:求出该集合中任意两个图像的链接权重;对所有链接权重求平均值以得到局部链接相关值。In another embodiment, calculating the local link correlation value specifically includes: calculating the link weights of any two images in the set; and calculating the average value of all link weights to obtain the local link correlation value.

由上述实施例可知,可以以某种一致的方式融合两种聚类结果:基于视觉特征的聚类和基于链接信息的聚类,因此两类信息可以同时被考虑,从而生成更加语义一致的类。并且,可以同时考虑四种不同的聚类测量值,能够更加有效地评估聚类,为一致地融合两种聚类结果提供很好基础。此外,组合的聚类测量值也可用于其他应用如对聚类结果中的各类进行排序等。It can be seen from the above examples that two types of clustering results can be fused in a consistent manner: clustering based on visual features and clustering based on link information, so two types of information can be considered at the same time to generate more semantically consistent clusters . Moreover, four different clustering measurements can be considered at the same time, which can evaluate clustering more effectively and provide a good basis for consistent fusion of two clustering results. Furthermore, the combined cluster measures can also be used in other applications such as ranking classes in the cluster results, etc.

本发明以上的装置和方法可以由硬件实现,也可以由硬件结合软件实现。本发明涉及这样的计算机可读程序,当该程序被逻辑部件所执行时,能够使该逻辑部件实现上文所述的装置或构成部件,或使该逻辑部件实现上文所述的各种方法或步骤。本发明还涉及用于存储以上程序的存储介质,如硬盘、磁盘、光盘、DVD、flash存储器等。The above devices and methods of the present invention can be implemented by hardware, or by combining hardware and software. The present invention relates to such a computer-readable program that, when the program is executed by a logic component, enables the logic component to realize the above-mentioned device or constituent component, or enables the logic component to realize the above-mentioned various methods or steps. The present invention also relates to a storage medium for storing the above program, such as hard disk, magnetic disk, optical disk, DVD, flash memory and the like.

以上结合具体的实施方式对本发明进行了描述,但本领域技术人员应该清楚,这些描述都是示例性的,并不是对本发明保护范围的限制。本领域技术人员可以根据本发明的精神和原理对本发明做出各种变型和修改,这些变型和修改也在本发明的范围内。The present invention has been described above in conjunction with specific embodiments, but those skilled in the art should be clear that these descriptions are all exemplary and not limiting the protection scope of the present invention. Those skilled in the art can make various variations and modifications to the present invention according to the spirit and principle of the present invention, and these variations and modifications are also within the scope of the present invention.

关于包括以上实施例的实施方式,还公开下述的附记:Regarding the implementation manner comprising the above embodiments, the following additional notes are also disclosed:

(附记1)一种图像聚类装置,所述图像聚类装置包括:(Supplementary note 1) An image clustering device, the image clustering device comprising:

第一聚类单元,对多个图像进行基于视觉特征的聚类以获得第一集合;The first clustering unit performs visual feature-based clustering on multiple images to obtain a first set;

第二聚类单元,对所述多个图像进行链接结构的聚类以获得第二集合;a second clustering unit, performing clustering of link structures on the plurality of images to obtain a second set;

融合单元,通过视觉特征信息和链接结构信息融合所述第一集合和第二集合,来获得图像聚类的结果。The fusion unit is configured to fuse the first set and the second set by using visual feature information and link structure information to obtain an image clustering result.

(附记2)根据附记1所述的图像聚类装置,其中,所述图像聚类装置还包括:(Supplement 2) The image clustering device according to Supplement 1, wherein the image clustering device further includes:

分类单元,根据分类信息对所述多个图像进行聚类,以对所述多个图像进行筛选。The classification unit is configured to cluster the multiple images according to the classification information, so as to filter the multiple images.

(附记3)根据附记1或2所述的图像聚类装置,其中,所述第二聚类单元基于图像视觉特征对所述多个图像构建KNN图,并在所述KNN图上进行结构化聚类以得到基于链接结构信息的聚类结果。(Supplementary Note 3) The image clustering device according to Supplementary Note 1 or 2, wherein the second clustering unit constructs a KNN graph for the plurality of images based on image visual features, and performs Structured clustering to obtain clustering results based on link structure information.

(附记4)根据附记1至3任一项所述的图像聚类装置,其中,所述融合单元包括:(Supplement 4) The image clustering device according to any one of Supplements 1 to 3, wherein the fusion unit includes:

选择单元,将所述第一集合和所述第二集合中的一个集合作为目标集合,将另一个集合作为源集合;A selection unit, using one of the first set and the second set as a target set, and using the other set as a source set;

处理单元,将所述源集合中的元素加入到所述目标集合中,或者根据所述源集合中的元素更新所述目标集合中的元素。A processing unit, adding elements in the source collection to the target collection, or updating elements in the target collection according to the elements in the source collection.

(附记5)根据附记4所述的图像聚类装置,其中,所述处理单元包括:(Supplementary Note 5) The image clustering device according to Supplementary Note 4, wherein the processing unit includes:

第一计算单元,对于所述源集合中的一个第一子集,计算所述第一子集和所述目标集合的每个子集的重叠度;A first calculation unit, for a first subset in the source set, calculates the degree of overlap between the first subset and each subset of the target set;

合并单元,在所述目标集合中不存在重叠度大于预设阈值的子集时,将所述第一子集加入到目标集合中;The merging unit is configured to add the first subset to the target set when there is no subset with an overlapping degree greater than a preset threshold in the target set;

第一更新单元,在所述目标集合中存在重叠度大于预设阈值的第二子集时,对所述第二子集进行更新。The first updating unit is configured to update the second subset when there is a second subset whose degree of overlap is greater than a preset threshold in the target set.

(附记6)根据附记5所述的图像聚类装置,其中,所述第一更新单元包括:(Supplementary Note 6) The image clustering device according to Supplementary Note 5, wherein the first updating unit includes:

第一生成单元,将所述第一子集和所述第二子集的交集作为第三子集;a first generation unit, using the intersection of the first subset and the second subset as a third subset;

第二更新单元,基于聚类测量值对所述第三子集进行更新;a second updating unit, configured to update the third subset based on the clustering measurement value;

第一替换单元,用更新后的所述第三子集替换所述第二子集。The first replacing unit is configured to replace the second subset with the updated third subset.

(附记7)根据附记6所述的图像聚类装置,其中,所述第二更新单元包括:(Supplement 7) The image clustering device according to Supplement 6, wherein the second update unit includes:

第二生成单元,对于不属于所述第三子集、且属于所述第一子集或第二子集的每个元素,增加到所述第三子集后形成一个新的第四子集;The second generating unit, for each element that does not belong to the third subset but belongs to the first subset or the second subset, is added to the third subset to form a new fourth subset ;

第二计算单元,对于每个所述第四子集计算聚类测量值,以获得具有最优的聚类测量值的一个第四子集;a second computing unit, computing clustering measures for each of said fourth subsets to obtain a fourth subset with an optimal clustering measure;

第二替换单元,在所述第四子集的聚类测量值优于所述第三子集的聚类测量值时,用所述第四子集替换所述第三子集。The second replacement unit is configured to replace the third subset with the fourth subset when the cluster measurement value of the fourth subset is better than the cluster measurement value of the third subset.

(附记8)根据附记1至7任一项所述的图像聚类装置,其中,所述图像聚类装置还包括:(Supplement 8) The image clustering device according to any one of Supplements 1 to 7, wherein the image clustering device further includes:

第三计算单元,对于所述多个图像的一个集合,计算所述集合的聚类测量值;所述聚类测量值包括:全局视觉相关值、局部视觉相关值、全局链接相关值、局部链接相关值中的其中之一或其组合。A third calculation unit, for a set of the plurality of images, calculates the clustering measurement value of the set; the clustering measurement value includes: global visual correlation value, local visual correlation value, global link correlation value, local link One or a combination of related values.

(附记9)根据附记8所述的图像聚类装置,其中,所述第三计算单元包括:(Supplement 9) The image clustering device according to Supplement 8, wherein the third calculation unit includes:

第四计算单元,在构建的K近邻图上找到连接的图像均在所述集合中的一条或多条边,对所述一条或多条边的权重求和得到第一和值;The fourth calculation unit finds one or more edges in which the connected images are all in the set on the constructed K-nearest neighbor graph, and sums the weights of the one or more edges to obtain the first sum;

第五计算单元,在所述K近邻图上找出连接的图像至少有一个图像在所述集合中的一条或多条边,对所述一条或多条边的权重求和得到第二和值;The fifth calculation unit finds out one or more edges in which at least one of the connected images is in the set on the K-nearest neighbor graph, and sums the weights of the one or more edges to obtain a second sum value ;

第六计算单元,将所述第一和值除以所述第二和值以得到所述全局链接相关值。A sixth calculation unit, dividing the first sum by the second sum to obtain the global link correlation value.

(附记10)根据附记8所述的图像聚类装置,其中,所述第三计算单元包括:(Supplementary Note 10) The image clustering device according to Supplementary Note 8, wherein the third calculation unit includes:

第七计算单元,求出所述集合中任意两个图像的链接权重;The seventh calculation unit is used to calculate the link weight of any two images in the set;

第八计算单元,对所有链接权重求平均值以得到所述局部链接相关值。The eighth calculation unit calculates the average value of all link weights to obtain the local link correlation value.

(附记11)一种图像聚合方法,所述图像聚类方法包括:(Appendix 11) An image aggregation method, the image clustering method comprising:

对多个图像进行基于视觉特征的聚类以获得第一集合;performing visual feature-based clustering on the plurality of images to obtain a first set;

对所述多个图像进行链接结构的聚类以获得第二集合;performing link-structured clustering on the plurality of images to obtain a second set;

通过视觉特征信息和链接结构信息融合所述第一集合和第二集合,来获得图像聚类的结果。The result of image clustering is obtained by fusing the first set and the second set by using visual feature information and link structure information.

(附记12)根据附记11所述的图像聚类方法,其中,所述图像聚类方法还包括:(Supplementary Note 12) The image clustering method according to Supplementary Note 11, wherein the image clustering method further includes:

根据分类信息对所述多个图像进行聚类,以对所述多个图像进行筛选;clustering the plurality of images according to classification information, so as to filter the plurality of images;

并且,对筛选后的多个图像进行基于视觉特征的聚类以获得第一集合,以及对筛选后的多个图像进行链接结构的聚类以获得第二集合。In addition, visual feature-based clustering is performed on the multiple screened images to obtain a first set, and link structure clustering is performed on the multiple screened images to obtain a second set.

(附记13)根据附记11或12所述的图像聚类方法,其中,对所述多个图像进行链接结构的聚类以获得第二集合具体包括:(Supplementary Note 13) The image clustering method according to Supplementary Note 11 or 12, wherein performing link structure clustering on the plurality of images to obtain the second set specifically includes:

基于图像视觉特征对所述多个图像构建KNN图,并在所述KNN图上进行结构化聚类以得到基于链接结构信息的聚类结果。Constructing a KNN graph for the plurality of images based on image visual features, and performing structured clustering on the KNN graph to obtain a clustering result based on link structure information.

(附记14)根据附记11至13任一项所述的图像聚类方法,其中,通过视觉特征信息和链接结构信息融合所述第一集合和第二集合具体包括:(Supplementary Note 14) The image clustering method according to any one of Supplementary Notes 11 to 13, wherein fusing the first set and the second set through visual feature information and link structure information specifically includes:

将所述第一集合和所述第二集合中的一个集合作为目标集合,将另一个集合作为源集合;using one of the first set and the second set as a target set, and using the other set as a source set;

将所述源集合中的元素加入到所述目标集合中,或者根据所述源集合中的元素更新所述目标集合中的元素。Add the elements in the source collection to the target collection, or update the elements in the target collection according to the elements in the source collection.

(附记15)根据附记14所述的图像聚类方法,其中,将所述源集合中的元素加入到所述目标集合中,或者根据所述源集合中的元素更新所述目标集合中的元素具体包括:(Supplementary Note 15) The image clustering method according to Supplementary Note 14, wherein the elements in the source set are added to the target set, or the target set is updated according to the elements in the source set The elements specifically include:

对于所述源集合中的一个第一子集,计算所述第一子集和所述目标集合的每个子集的重叠度;For a first subset of the source set, calculating the degree of overlap between the first subset and each subset of the target set;

在所述目标集合中不存在重叠度大于预设阈值的子集时,将所述第一子集加入到目标集合中;在所述目标集合中存在重叠度大于预设阈值的第二子集时,对所述第二子集进行更新。When there is no subset with an overlap greater than a preset threshold in the target set, adding the first subset to the target set; there is a second subset with an overlap greater than a preset threshold in the target set , update the second subset.

(附记16)根据附记15所述的图像聚类方法,其中,对所述第二子集进行更新具体包括:(Supplementary Note 16) The image clustering method according to Supplementary Note 15, wherein updating the second subset specifically includes:

将所述第一子集和所述第二子集的交集作为第三子集;taking the intersection of the first subset and the second subset as a third subset;

基于聚类测量值对所述第三子集进行更新;updating the third subset based on clustering measures;

用更新后的所述第三子集替换所述第二子集。The second subset is replaced with the updated third subset.

(附记17)根据附记16所述的图像聚类方法,其中,基于聚类测量值对所述第三子集进行更新具体包括:(Supplementary Note 17) The image clustering method according to Supplementary Note 16, wherein updating the third subset based on the clustering measurement value specifically includes:

对于不属于所述第三子集、且属于所述第一子集或第二子集的每个元素,增加到所述第三子集后形成一个新的第四子集;For each element not belonging to the third subset but belonging to the first subset or the second subset, adding to the third subset forms a new fourth subset;

对于每个所述第四子集计算聚类测量值,以获得具有最优的聚类测量值的一个第四子集;calculating a clustering measure for each of said fourth subsets to obtain a fourth subset having an optimal clustering measure;

在所述第四子集的聚类测量值优于所述第三子集的聚类测量值时,用所述第四子集替换所述第三子集。The third subset is replaced with the fourth subset when the cluster measure of the fourth subset is better than the cluster measure of the third subset.

(附记18)根据附记11至17任一项所述的图像聚类方法,其中,所述图像聚类方法还包括:(Supplementary Note 18) The image clustering method according to any one of Supplementary Notes 11 to 17, wherein the image clustering method further includes:

对于所述多个图像的一个集合,计算所述集合的聚类测量值;所述聚类测量值包括:全局视觉相关值、局部视觉相关值、全局链接相关值、局部链接相关值中的其中之一或其组合。For a set of the plurality of images, calculate the clustering measure value of the set; the clustering measure value includes: global visual correlation value, local visual correlation value, global link correlation value, local link correlation value among them one or a combination thereof.

(附记19)根据附记18所述的图像聚类方法,其中,计算所述全局链接相关值具体包括:(Supplementary Note 19) The image clustering method according to Supplementary Note 18, wherein calculating the global link correlation value specifically includes:

在构建的K近邻图上找到连接的图像均在所述集合中的一条或多条边,对所述一条或多条边的权重求和得到第一和值;On the constructed K-nearest neighbor graph, find one or more edges whose connected images are all in the set, and sum the weights of the one or more edges to obtain the first sum;

在所述K近邻图上找出连接的图像至少有一个图像在所述集合中的一条或多条边,对所述一条或多条边的权重求和得到第二和值;Finding on the K-nearest neighbor graph that the connected images have at least one image in the set of one or more edges, and summing the weights of the one or more edges to obtain a second sum;

将所述第一和值除以所述第二和值以得到所述全局链接相关值。The first sum is divided by the second sum to obtain the global linkage correlation value.

(附记20)根据附记18所述的图像聚类方法,其中,计算所述局部链接相关值具体包括:(Supplementary Note 20) The image clustering method according to Supplementary Note 18, wherein calculating the local link correlation value specifically includes:

求出所述集合中任意两个图像的链接权重;find the link weights of any two images in the set;

对所有链接权重求平均值以得到所述局部链接相关值。All link weights are averaged to obtain the local link correlation value.

(附记21)一种计算机可读程序,其中当在计算机中执行所述程序时,所述程序使得计算机执行如附记11至20任一项所述的图像聚类方法。(Supplementary Note 21) A computer-readable program, wherein when the program is executed in a computer, the program causes the computer to execute the image clustering method described in any one of Supplementary Notes 11 to 20.

(附记22)一种存储有计算机可读程序的存储介质,其中所述计算机可读程序使得计算机执行如附记11至20任一项所述的图像聚类方法。(Supplementary Note 22) A storage medium storing a computer-readable program, wherein the computer-readable program causes a computer to execute the image clustering method described in any one of Supplementary Notes 11 to 20.

Claims (8)

1. a kind of image clustering device, described image clustering apparatus include:
Multiple images are carried out the cluster of view-based access control model feature to obtain first set by the first cluster cell;
Second cluster cell, carries out the cluster of link structure to obtain second set to the plurality of image;And
Integrated unit, by visual signature information and first set and second set described in link structure information fusion, to obtain The result of image clustering,
Wherein, described integrated unit includes:
Select unit, one of described first set and described second set is gathered as goal set, another is collected Cooperate to gather for source;And
Processing unit, the element during described source is gathered is added in described goal set, or according in the set of described source Element updates the element in described goal set,
And, described processing unit includes:
First computing unit, for one of described source set the first subset, calculates described first subset and described object set The degree of overlapping of each subset closed;
Combining unit, when there is not the subset that degree of overlapping is more than predetermined threshold value in described goal set, by described first subset It is added in goal set;And
First updating block, when there is the yield in the second subset that degree of overlapping is more than predetermined threshold value in described goal set, to described the Two subsets are updated.
2. image clustering device according to claim 1, wherein, described image clustering apparatus also include:
Taxon, clusters to the plurality of image according to classification information, to screen to the plurality of image.
3. image clustering device according to claim 1, wherein, described first updating block includes:
First signal generating unit, using the common factor of described first subset and described yield in the second subset as the 3rd subset;
Second updating block, is updated to described 3rd subset based on cluster measured value;
First replacement unit, replaces described yield in the second subset with described 3rd subset after updating.
4. image clustering device according to claim 3, wherein, described second updating block includes:
Second signal generating unit, for being not belonging to described 3rd subset and belong to each unit of described first subset or yield in the second subset Element, forms a 4th new subset after increasing to described 3rd subset;
Second computing unit, calculates cluster measured value for the 4th subset each described, to obtain the cluster measurement with optimum One the 4th subset of value;
Second replacement unit, when the cluster measured value of described 4th subset is better than the cluster measured value of described 3rd subset, uses Described 4th subset replaces described 3rd subset.
5. the image clustering device according to any one of Claims 1-4, wherein, described image clustering apparatus also include:
3rd computing unit, for a set of the plurality of image, calculates the cluster measured value of described set;Described cluster Measured value includes:Overall Vision correlation, local visual correlation, overall situation link correlation, its in local link correlation One of or a combination thereof.
6. image clustering device according to claim 5, wherein, described 3rd computing unit includes:
4th computing unit, finds the image of connection one or more side all in described set on the k nearest neighbor figure building, Weight summation to described one or more side obtains the first value preset;
5th computing unit, finds out at least one image of image of connection in described set on described k nearest neighbor figure Bar or multiple summits, the weight summation to described one or more side obtains the second value preset;
6th computing unit, by described first value preset divided by described second value preset to obtain described overall situation link correlation.
7. image clustering device according to claim 5, wherein, described 3rd computing unit includes:
7th computing unit, obtains the link weight of any two image in described set;
8th computing unit, averages to all-links weight to obtain described local link correlation.
8. a kind of image clustering method, described image clustering method includes:
Multiple images are carried out with the cluster of view-based access control model feature to obtain first set;
The plurality of image is carried out with the cluster of link structure to obtain second set;And
By visual signature information and first set and second set described in link structure information fusion, to obtain image clustering As a result,
Wherein, included by visual signature information and first set and second set described in link structure information fusion:
Using the set of one of described first set and described second set as goal set, using another set as source collection Close;And
Element during described source is gathered is added in described goal set, or updates institute according to the element in the set of described source State the element in goal set,
And, the element in the described set by described source is added in described goal set, or according in the set of described source The process that element updates the element in described goal set includes:
For one of described source set the first subset, calculate each subset of described first subset and described goal set Degree of overlapping;
When there is not the subset that degree of overlapping is more than predetermined threshold value in described goal set, described first subset is added to target In set;And
When there is the yield in the second subset that degree of overlapping is more than predetermined threshold value in described goal set, described yield in the second subset is carried out more Newly.
CN201210406382.9A 2012-10-23 2012-10-23 Image clustering device and method Active CN103778146B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210406382.9A CN103778146B (en) 2012-10-23 2012-10-23 Image clustering device and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210406382.9A CN103778146B (en) 2012-10-23 2012-10-23 Image clustering device and method

Publications (2)

Publication Number Publication Date
CN103778146A CN103778146A (en) 2014-05-07
CN103778146B true CN103778146B (en) 2017-03-01

Family

ID=50570389

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210406382.9A Active CN103778146B (en) 2012-10-23 2012-10-23 Image clustering device and method

Country Status (1)

Country Link
CN (1) CN103778146B (en)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105654102A (en) * 2014-11-10 2016-06-08 富士通株式会社 Data processing device and data processing method
CN105956631A (en) * 2016-05-19 2016-09-21 南京大学 On-line progressive image classification method facing electronic image base
CN106997371B (en) * 2016-10-28 2020-06-23 华数传媒网络有限公司 Method for constructing single-user intelligent map
CN108805148B (en) * 2017-04-28 2022-01-11 富士通株式会社 Method of processing image and apparatus for processing image
CN109978006B (en) * 2019-02-25 2021-02-19 北京邮电大学 Face image clustering method and device
CN110348521A (en) * 2019-07-12 2019-10-18 创新奇智(重庆)科技有限公司 Image procossing clustering method and its system, electronic equipment
CN113553461B (en) * 2020-04-26 2024-08-20 北京搜狗科技发展有限公司 Picture clustering method and related device
CN112070144B (en) * 2020-09-03 2024-08-27 Oppo广东移动通信有限公司 Image clustering method, device, electronic equipment and storage medium
CN114764437A (en) * 2021-01-04 2022-07-19 阿里巴巴集团控股有限公司 User intention identification method and device and electronic equipment
CN113326880A (en) * 2021-05-31 2021-08-31 南京信息工程大学 Unsupervised image classification method based on community division

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101295360A (en) * 2008-05-07 2008-10-29 清华大学 A Semi-supervised Image Classification Method Based on Weighted Graph
CN101706806A (en) * 2009-11-11 2010-05-12 北京航空航天大学 Text classification method by mean shift based on feature selection
CN102004944A (en) * 2009-08-27 2011-04-06 Sap股份公司 Planogram compliance using automated item-tracking
CN102509107A (en) * 2011-10-13 2012-06-20 西北工业大学 A Locally Global Consistency Classification Method Based on Sparsely Decomposed l0 Graph

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8121358B2 (en) * 2009-03-06 2012-02-21 Cyberlink Corp. Method of grouping images by face
WO2012140315A1 (en) * 2011-04-15 2012-10-18 Nokia Corporation Method, apparatus and computer program product for providing incremental clustering of faces in digital images

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101295360A (en) * 2008-05-07 2008-10-29 清华大学 A Semi-supervised Image Classification Method Based on Weighted Graph
CN102004944A (en) * 2009-08-27 2011-04-06 Sap股份公司 Planogram compliance using automated item-tracking
CN101706806A (en) * 2009-11-11 2010-05-12 北京航空航天大学 Text classification method by mean shift based on feature selection
CN102509107A (en) * 2011-10-13 2012-06-20 西北工业大学 A Locally Global Consistency Classification Method Based on Sparsely Decomposed l0 Graph

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
一种选择性加权聚类融合算法;樊晓平 等;《计算机工程与应用》;20110729;第48卷(第22期);195-200 *
旋转网格 : 一种新的聚类融合方法;曹巧玲 等;《计算机科学》;20110715;第38卷(第7期);157-161 *
聚类融合算法研究及其应用;翁芳菲;《中国优秀硕士学位论文全文数据库 信息科技辑》;20090815;第2.2节,第2.4节,图2.2、2.4、2.7 *

Also Published As

Publication number Publication date
CN103778146A (en) 2014-05-07

Similar Documents

Publication Publication Date Title
CN103778146B (en) Image clustering device and method
Hu et al. Predicting drug-target interactions from drug structure and protein sequence using novel convolutional neural networks
CN107784598A (en) A kind of network community discovery method
Zhang et al. Protein complex prediction in large ontology attributed protein-protein interaction networks
CN112182306B (en) Uncertain graph-based community discovery method
CN113255895A (en) Graph neural network representation learning-based structure graph alignment method and multi-graph joint data mining method
Pourabbasi et al. A new single-chromosome evolutionary algorithm for community detection in complex networks by combining content and structural information
Cho et al. Latent space model for multi-modal social data
Wang et al. Detecting overlapping protein complexes in PPI networks based on robustness
Su et al. Multi-view heterogeneous molecular network representation learning for protein–protein interaction prediction
CN110515986A (en) A kind of processing method of social network diagram, device and storage medium
CN103400299B (en) Method for detecting network overlapped communities based on overlapped point identification
CN108647487A (en) The prediction technique and forecasting system of g protein coupled receptor-ligand interaction relationship
CN110493045A (en) A kind of directed networks link prediction method merging multimode body information
Cinaglia PyMulSim: a method for computing node similarities between multilayer networks via graph isomorphism networks
CN108287866A (en) Community discovery method based on node density in a kind of large scale network
Chehreghani Efficient computation of pairwise minimax distance measures
Luo et al. A novel graph neural network based approach for influenza-like illness nowcasting: exploring the interplay of temporal, geographical, and functional spatial features
Qin et al. An ontology-independent representation learning for similar disease detection based on multi-layer similarity network
Rui et al. DB-NMS: improving non-maximum suppression with density-based clustering
CN110853763B (en) Fusion attribute-based miRNA-disease association identification method and system
CN111026919A (en) Adaptive two-stage weighted target community discovery and detection method based on double views
Zhou et al. Protein complex identification based on heterogeneous protein information network
He et al. Efficient and accurate greedy search methods for mining functional modules in protein interaction networks
Abbou et al. Logistic matrix factorisation and generative adversarial neural network-based method for predicting drug-target interactions

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant