[go: up one dir, main page]

CN110472657B - Image classification method based on trust function theory - Google Patents

Image classification method based on trust function theory Download PDF

Info

Publication number
CN110472657B
CN110472657B CN201910599618.7A CN201910599618A CN110472657B CN 110472657 B CN110472657 B CN 110472657B CN 201910599618 A CN201910599618 A CN 201910599618A CN 110472657 B CN110472657 B CN 110472657B
Authority
CN
China
Prior art keywords
initial
class
images
composite
new
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910599618.7A
Other languages
Chinese (zh)
Other versions
CN110472657A (en
Inventor
刘准钆
张作伟
潘泉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northwestern Polytechnical University
Original Assignee
Northwestern Polytechnical University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northwestern Polytechnical University filed Critical Northwestern Polytechnical University
Priority to CN201910599618.7A priority Critical patent/CN110472657B/en
Publication of CN110472657A publication Critical patent/CN110472657A/en
Application granted granted Critical
Publication of CN110472657B publication Critical patent/CN110472657B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明公开了一种基于信任函数理论的图像分类方法,将待分类图像集X中的所有图像划分为多个初始子类;计算出每个初始子类的密度值;根据初始复合类和对应初始单类密度值的大小关系,将初始复合类和对应初始单类中的图像合并,生成新单类和新复合类,直至合并后生成的新单类和初始单类数量之和为c;将初始复合类中的图像划分到新单类、初始单类或新复合类中,得到待分类图像集X中的图像的分类结果;本发明利用不同类的密度将复合类和单类以一个给定的规则逐渐合并到一起,直到满足合并后的初始单类数量刚好为图像真实分类数量时停止,有效地降低错误划分数据带来的风险,避免做出错误或偏差较大的决策。

Figure 201910599618

The invention discloses an image classification method based on trust function theory, which divides all images in an image set X to be classified into a plurality of initial subclasses; calculates the density value of each initial subclass; The size relationship of the initial single-class density value, the initial composite class and the images in the corresponding initial single-class are merged to generate a new single-class and a new composite class, until the sum of the number of new single-class and initial single-class generated after merging is c; The images in the initial composite class are divided into new single class, initial single class or new composite class to obtain the classification result of the images in the image set X to be classified; the present invention uses the density of different classes to divide the composite class and the single class into one The given rules are gradually merged together until the initial number of single classes after merging is exactly equal to the actual number of image classifications, which effectively reduces the risk of wrongly dividing data and avoids making wrong or biased decisions.

Figure 201910599618

Description

基于信任函数理论的图像分类方法Image Classification Method Based on Trust Function Theory

【技术领域】【Technical field】

本发明属于图像分类识别技术领域,尤其涉及一种基于信任函数理论的图像分类方法。The invention belongs to the technical field of image classification and recognition, and in particular relates to an image classification method based on trust function theory.

【背景技术】【Background technique】

不平衡数据聚类是模式识别和无监督机器学习的重要分支之一,在各个领域有着广泛的应用。Imbalanced data clustering is one of the important branches of pattern recognition and unsupervised machine learning, and has a wide range of applications in various fields.

对空中目标的图像识别是航空军事监视系统的一个重要研究方向。由于人为对抗、战场环境复杂多变以及传感器性能限制等原因,图像传感器在对空中运动目标拍摄图像时会出现不同类别运动目标图像数量不平衡的情况,具体表现为拍摄的不同运动类别图像在数量上存在较大差异,同时不同目标在某些视角下的图像有可能十分相似,这使获得目标图像有了很大不确定性。Image recognition of aerial targets is an important research direction of aviation military surveillance system. Due to man-made confrontation, complex and changeable battlefield environment, and sensor performance limitations, when the image sensor captures images of moving targets in the air, there will be an imbalance in the number of images of different types of moving targets. At the same time, the images of different targets under certain viewing angles may be very similar, which makes the acquisition of target images with great uncertainty.

现有经典聚类方法对这些不平衡样本进行无监督分类(聚类)时往往很难获得可靠的结果,甚至出现“平衡效应”,即不同目标的图像数量差异非常大,比如,同时对两种目标拍摄图像,可能其中一种目标的图像数量极少,另一种极多,但聚类后,仍然会将图像集聚类为数量大致相等的两个目标类别,导致图像分类时出现了较高的错误率。同时传统的经典聚类方法并没有考虑不同目标在某些视角下的图像有可能十分相似,即不同目标的图像在部分位置有可能相似的问题,当隶属于不同目标的图像相似度很高时,对这些重叠图像往往是很难聚类准确的。When the existing classical clustering methods perform unsupervised classification (clustering) on these unbalanced samples, it is often difficult to obtain reliable results, and even a "balance effect" occurs, that is, the number of images of different targets is very different. There may be very few images of one kind of target, and the other is very large, but after clustering, the image set will still be clustered into two target categories with roughly equal numbers, resulting in the occurrence of image classification. higher error rate. At the same time, the traditional classical clustering method does not consider that the images of different targets may be very similar in certain perspectives, that is, the images of different targets may be similar in some positions. When the similarity of images belonging to different targets is high , it is often difficult to accurately cluster these overlapping images.

【发明内容】[Content of the invention]

本发明的目的是提供一种基于信任函数理论的图像分类方法,在对不同目标的图像进行分类时,通过使用各类图像的密度作为分类标准对图像进行分类,提升图像分类的准确性。The purpose of the present invention is to provide an image classification method based on the trust function theory. When classifying images of different targets, the images are classified by using the densities of various types of images as classification criteria to improve the accuracy of image classification.

本发明采用以下技术方案:基于信任函数理论的图像分类方法,包括以下步骤:The present invention adopts the following technical solutions: an image classification method based on the trust function theory, comprising the following steps:

将待分类图像集X中的所有图像划分为多个初始子类;其中,初始子类由N个初始单类和多个初始复合类组成,每个初始复合类由两个初始单类组成,N≥c,c表示图像集X中所有图像的真实类别数;Divide all images in the image set X to be classified into multiple initial subclasses; wherein, the initial subclass consists of N initial single classes and multiple initial composite classes, and each initial composite class consists of two initial single classes, N≥c, c represents the number of true categories of all images in the image set X;

计算出每个初始子类的密度值;其中,密度为初始子类中所有图像的欧氏距离平均值的倒数;Calculate the density value of each initial subclass; where the density is the reciprocal of the average Euclidean distance of all images in the initial subclass;

根据初始复合类和对应初始单类密度值的大小关系,将初始复合类和对应初始单类中的图像合并,生成新单类和新复合类,直至合并后生成的新单类和初始单类数量之和为c;According to the size relationship between the initial composite class and the density value of the corresponding initial single class, the images in the initial composite class and the corresponding initial single class are merged to generate a new single class and a new composite class, until the new single class and the initial single class are generated after merging The sum of the quantities is c;

将初始复合类中的图像划分到新单类、初始单类或新复合类中,得到待分类图像集X中的图像的分类结果。The images in the initial composite class are divided into new single class, initial single class or new composite class, and the classification result of the images in the image set X to be classified is obtained.

进一步的,初始复合类和对应初始单类密度值的大小关系由以下三种情况组成:Further, the magnitude relationship between the initial composite class and the corresponding initial single class density value consists of the following three cases:

情况一:ρk,t≥ρk且ρk,t≥ρtCase 1: ρ k,t ≥ρ k and ρ k,t ≥ρ t ;

情况二:ρk≤ρk,t≤ρtCase 2: ρ k ≤ρ k,t ≤ρ t ;

情况三:ρk,t≤ρk且ρk,t≤ρtCase 3: ρ k,t ≤ρ k and ρ k,t ≤ρ t ;

其中,ρk为第k个初始单类的密度值,0≤k≤N,ρt为第t个初始单类的密度值,0≤t≤N,ρk,t为由第k个初始单类和第t个初始单类组成的初始复合类的密度值。Among them, ρ k is the density value of the k-th initial single class, 0≤k≤N, ρ t is the density value of the t-th initial single class, 0≤t≤N, ρ k,t is the k-th initial single class The density value of the initial composite class consisting of a single class and the t-th initial single class.

进一步的,将初始复合类和对应初始单类中的图像合并时的合并顺序依次为:情况一、情况二、情况三。Further, the order of combining the images in the initial composite class and the corresponding initial single class is as follows: Case 1, Case 2, Case 3.

进一步的,将初始复合类和对应初始单类中的图像合并的具体方法为:Further, the specific method for merging the images in the initial composite class and the corresponding initial single class is as follows:

计算初始复合类对应的两个初始单类之间的密度距离值,将初始复合类按照密度距离值由小至大排序,并依次将初始复合类和对应初始单类中的图像合并为一个新单类;Calculate the density distance value between the two initial single classes corresponding to the initial composite class, sort the initial composite classes according to the density distance value from small to large, and merge the images in the initial composite class and the corresponding initial single class into a new one. single class;

当初始复合类对应的两个初始单类之间的密度距离值相等时,按照各初始复合类的密度值由大到小的顺序依次将初始复合类和对应初始单类中的图像合并为一个新单类;When the density distance values between the two initial single classes corresponding to the initial composite class are equal, the images in the initial composite class and the corresponding initial single class are merged into one according to the order of the density value of each initial composite class from large to small. new single class;

其中,密度距离值为两个初始单类在初始单类密度集Ds中的序数的差值,初始单类密度集Ds为N个初始单类的密度值的集合,且初始单类密度集Ds中按照初始单类的密度值由大到小排序。Among them, the density distance value is the difference between the ordinal numbers of the two initial single classes in the initial single class density set D s , the initial single class density set D s is the set of density values of N initial single class, and the initial single class density The set D s is sorted according to the density value of the initial single class from large to small.

进一步的,当生成新单类后且进行下一次初始复合类合并前,判断新单类与其他新单类中是否存在相同的图像,当存在时,将新单类和该与其存在相同图像的新单类中的所有图像进行合并,生成一个新单类。Further, when a new single class is generated and before the next initial composite class merging, it is judged whether the new single class and other new single classes have the same image. All images in the new single class are merged to generate a new single class.

进一步的,将初始复合类中的图像划分到新单类、初始单类或新复合类中的具体方法为:Further, the specific method for dividing the images in the initial composite class into the new single class, the initial single class or the new composite class is as follows:

选取初始复合类中的图像xi,并在待分类图像集X中找出与其欧氏距离最近的K2个图像作为其近邻图像,并生成K2个向量,根据图像xi和其近邻图像之间的欧氏距离生成K2个向量的权重;Select the image xi in the initial composite class, and find the K 2 images with the closest Euclidean distance in the image set X to be classified as its neighbor images, and generate K 2 vectors, according to the image xi and its neighbor images The Euclidean distance between generates K weights of 2 vectors;

计算出K2个向量的和向量,作为第一和向量;Calculate the sum vector of K 2 vectors as the first sum vector;

计算出K2个向量中向量终点对应的近邻图像属于同一初始单类/新单类的向量的和向量,作为第二和向量;Calculate the sum vector of the vectors of the vector end points corresponding to the vector end points in the K 2 vectors that belong to the same initial single class/new single class, as the second sum vector;

计算每个第二和向量与第一和向量形成的夹角的余弦值,将图像xi划分到夹角的余弦值最小的第二和向量所对应的初始单类/新单类中;Calculate the cosine value of the angle formed by each second sum vector and the first sum vector, and divide the image x i into the initial single class/new single class corresponding to the second sum vector with the smallest cosine value of the included angle;

重复上述步骤,直至将所有初始复合类中的图像划分完。Repeat the above steps until all images in the initial composite class are divided.

进一步的,当两个第二和向量与第一和向量形成的夹角余弦值的差值小于阈值时,将图像划分到由两个第二和向量所对应的初始单类/新单类形成的新复合类中。Further, when the difference between the cosine value of the angle formed by the two second sum vectors and the first sum vector is smaller than the threshold, the image is divided into the initial single class/new single class formed by the two second sum vectors corresponding to in the new composite class.

进一步的,将待分类图像集X中的所有图像划分为多个初始子类具体采用c均值聚类算法。Further, all images in the image set X to be classified are divided into a plurality of initial sub-classes, and the c-means clustering algorithm is specifically adopted.

进一步的,待分类图像集X为非对称数据的图像集。Further, the image set X to be classified is an image set of asymmetric data.

进一步的,计算出每个初始子类的密度采用公式为:Further, the density of each initial subclass is calculated using the formula:

Figure BDA0002118756180000041
Figure BDA0002118756180000041

其中,ρj为第j个初始子类Aj的密度,

Figure BDA0002118756180000042
表示初始子类Aj中的图像xi与其在图像集X中的K1个近邻的欧氏距离的平均距离,K1为常数,nj为初始子类Aj中图像的数量,i为初始子类Aj中图像的序数。where ρ j is the density of the j-th initial subclass A j ,
Figure BDA0002118756180000042
represents the average distance of the Euclidean distance between the image x i in the initial subclass A j and its K 1 nearest neighbors in the image set X, where K 1 is a constant, n j is the number of images in the initial subclass A j , and i is The ordinal number of images in the initial subclass A j .

本发明的有益效果是:本发明将局部相似度高且难以分类的图像划分为新的复合类,利用不同类的密度将复合类和单类以一个给定的规则逐渐合并到一起,直到满足合并后的新单类数量刚好为图像真实分类数量时停止,有效地降低错误划分数据带来的风险,避免做出错误或偏差较大的决策。The beneficial effects of the present invention are as follows: the present invention divides images with high local similarity and difficult to classify into new composite classes, and uses the density of different classes to gradually merge the composite classes and single classes together with a given rule until satisfying the The number of new single classes after merging just stops when the actual number of image classifications, effectively reducing the risk of wrongly dividing the data, and avoiding making wrong or biased decisions.

【附图说明】【Description of drawings】

图1为本发明的流程框图;Fig. 1 is a flowchart of the present invention;

图2为本发明实施例中目标样本xi与其K2个近邻向量的示例图。FIG. 2 is an example diagram of a target sample xi and its K 2 neighbor vectors in an embodiment of the present invention.

【具体实施方式】【Detailed ways】

下面结合附图和具体实施方式对本发明进行详细说明。The present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.

非对称数据聚类是一个非常具有挑战性的问题,本发明在信任函数理论框架下提出了一个用来处理非对称数据的新的信任聚类方法(CClu)。对于一个含有c类图像的图像集而言,首先利用信任c均值聚类算法(CCM)将图像集划分为大于c(真实类别数)的类别(例如类别数为N,N>>c),这些类别中既包含初始单类也包含初始复合类,其中,复合类是由处于不同类别重叠不可分区域的图像构成的,即既像这一类又像另一类的图像。这样可以使原本属于同一类别的图像被聚类到同一类中,然后利用不同类的密度将这N类以一个给定的规则逐渐合并到一起,直到满足合并后的新单类和初始单类的数量和刚好为c时停止。然后利用图像的K近邻设计了一种带有权重的向量方法将那些没有被合并的初始复合类重新划分到新单类或者新复合类中。Asymmetric data clustering is a very challenging problem. The present invention proposes a new trust clustering method (CClu) for processing asymmetric data under the framework of trust function theory. For an image set containing c-type images, first use the trusted c-means clustering algorithm (CCM) to divide the image set into categories greater than c (the number of true categories) (for example, the number of categories is N, N>>c), These classes contain both the initial single class and the initial composite class, where the composite class is composed of images in overlapping inseparable regions of different classes, ie images like one class and another. In this way, images originally belonging to the same category can be clustered into the same category, and then the N categories can be gradually merged together with a given rule by using the density of different categories, until the combined new single category and the initial single category are satisfied. Stop when the sum is exactly c. Then, a vector method with weights is designed using the K-nearest neighbors of the image to re-classify those initial composite classes that have not been merged into new single classes or new composite classes.

对于那些处于重叠区域的难以划分的图像将会被划分为新的复合类,这样做的目的是为了降低错误分类所带来的风险。利用人工数据集和真实数据集进行了多个实验,测试了CClu相对于其他相关方法的性能。结果表明,CClu能够很好地处理多类非对称图像集,并且由于复合类的存在,通过对样本不精确性的正确建模有效地减少了误差。Images that are difficult to classify in overlapping regions will be classified into new composite classes, in order to reduce the risk of misclassification. Multiple experiments are conducted with artificial and real datasets to test the performance of CClu relative to other related methods. The results show that CClu can handle multi-class asymmetric image sets well, and due to the existence of composite classes, the error is effectively reduced by correct modeling of sample imprecision.

在信任函数理论框架下新的CClu方法被用来处理非对称图像。对于一个c类问题,CClu首先利用CCM产生多于图像的真实类别c的若干个初始子类(包含N个初始单类和若干个复合类),其中,初始复合类中仅允许包含两个初始单类。然后,利用这些初始子类别所包含的图像的K近邻将这些初始子类的密度计算出来,并利用给定的基于密度关系的合并规则,将初始复合类及其所包含的初始单类合并成为新单类直到满足新单类的数量刚好为c。The new CClu method is used to deal with asymmetric images under the framework of trust function theory. For a class c problem, CClu first uses CCM to generate several initial subclasses (including N initial single classes and several composite classes) that are more than the real class c of the image, where only two initial composite classes are allowed to contain single class. Then, use the K-nearest neighbors of the images contained in these initial subclasses to calculate the density of these initial subclasses, and use the given density relationship-based merging rule to merge the initial composite class and its initial single class into a New single class until the number of new single class is exactly c.

最后,利用新单类中的K近邻将没有合并的初始复合类中的图像划分到新单类或者新复合类中。对于初始复合类中难以准确划分的图像,CClu将把它们划分到由新单类组成的新复合类中,这将有效降低图像分类的错误率。新复合类中的图像在当前条件下难以被准确划分,但是它们最终可以通过新的技术或者额外的信息来划分。因此,CClu通过生成特殊的集群结果来防止做出错误的决策,这在任何时候都很重要。Finally, the images in the original composite class that were not merged are classified into the new single class or the new composite class using the K-nearest neighbors in the new single class. For images that are difficult to be accurately classified in the initial composite class, CClu will classify them into a new composite class consisting of a new single class, which will effectively reduce the error rate of image classification. Images in the new composite class are difficult to classify accurately under current conditions, but they can eventually be classified by new techniques or additional information. Therefore, CClu prevents bad decisions by generating special clustering results, which is important at all times.

本发明的一种基于信任函数理论的图像分类方法,如图1所示,包括以下步骤:An image classification method based on the trust function theory of the present invention, as shown in Figure 1, includes the following steps:

采用c均值聚类算法将待分类图像集X中的所有图像划分为多个初始子类;其中,待分类图像集X为非对称数据的图像集。初始子类由N个初始单类和多个初始复合类组成,每个初始复合类由两个初始单类组成,N≥c,c表示图像集X中所有图像的真实类别数。The c-means clustering algorithm is used to divide all images in the image set X to be classified into multiple initial subclasses; wherein the image set X to be classified is an image set of asymmetric data. The initial subclass consists of N initial single classes and multiple initial composite classes, each initial composite class consists of two initial single classes, N ≥ c, c represents the number of true classes of all images in the image set X.

考虑一个图像集包含c个类别的图像,在多聚类的指导思想下,图像集X中的所有图像都会被CCM聚类成不同的类别。例如,当N=2c时,将会产生22c个类别,包含初始单类、初始复合类和噪声类,随着N增加这会带来大量的计算问题。但是在实际应用中图像处于不同类别重叠区域超过2个初始单类的情况很少见,也就是说,当某个样本难以准确划分给某一个初始单类时,很大程度上是在2个初始单类之间难以区分。所以在本发明中,只允许初始复合类包含2个初始单类。此时当N=2c时CCM至多会产生

Figure BDA0002118756180000061
个类别,其中,2c表示初始单类的数量,
Figure BDA0002118756180000062
表示初始复合类的数量,1表示噪声类φ。这些类别可以用幂集2Ω的一个子集SΩ表示,其中,SΩ被定义为
Figure BDA0002118756180000071
tc表示允许初始复合类所包含初始单类个数的阈值。定义初始单类ωi和ωj的复合类用Ai,j表示,所以当N=2c时,
Figure BDA0002118756180000077
举个例子,对于一个c=2类的问题,当N=2c=4时,Ω={ω1234},CCM在2Ω框架中将会至多产生
Figure BDA0002118756180000072
个图像类别,即
Figure BDA0002118756180000073
Consider an image set containing c categories of images, under the guidance of multi-clustering, all images in the image set X will be clustered into different categories by CCM. For example, when N=2c, 2 2c classes will be generated, including the initial single class, the initial composite class and the noise class, which will bring a lot of computational problems as N increases. However, in practical applications, it is rare that the image is in the overlapping area of different categories with more than 2 initial single categories. Indistinguishable between initial single classes. Therefore, in the present invention, only the initial composite class is allowed to contain 2 initial single classes. At this time, when N=2c, CCM will at most generate
Figure BDA0002118756180000061
categories, where 2c represents the number of initial single categories,
Figure BDA0002118756180000062
represents the number of initial composite classes, and 1 represents the noise class φ. These classes can be represented by a subset S Ω of the power set 2 Ω , where S Ω is defined as
Figure BDA0002118756180000071
t c represents the threshold for the number of initial single classes that are allowed to be included in the initial composite class. The composite class that defines the initial single class ω i and ω j is represented by A i,j , so when N=2c,
Figure BDA0002118756180000077
For example, for a problem of class c = 2, when N = 2c = 4, Ω = {ω 1 , ω 2 , ω 3 , ω 4 }, CCM in the 2 Ω frame will yield at most
Figure BDA0002118756180000072
image categories, i.e.
Figure BDA0002118756180000073

计算出每个初始子类的密度值。其中,密度为初始子类中所有图像的欧氏距离平均值的倒数。为了避免偶然误差,首先利用图像xi∈Aj的K1个近邻的平均欧氏距离来表示xi∈Aj的最近邻距离,计算出每个初始子类的密度采用公式为:Calculate the density value for each initial subclass. where density is the reciprocal of the mean Euclidean distance of all images in the initial subclass. In order to avoid accidental errors, the average Euclidean distance of the K 1 nearest neighbors of the image x i ∈ A j is used to represent the nearest neighbor distance of x i ∈ A j , and the density of each initial subclass is calculated using the formula:

Figure BDA0002118756180000074
Figure BDA0002118756180000074

其中,ρj为第j个初始子类Aj的密度,

Figure BDA0002118756180000075
表示初始子类Aj中的图像xi与其在图像集X中的K1个近邻的欧氏距离的平均距离,
Figure BDA0002118756180000076
dik表示图像xi及其在图像集中第k个近邻xk之间的欧氏距离,K1为常数,表示图像xi的近邻个数,nj为初始子类Aj中图像的数量,i为初始子类Aj中图像的序数。where ρ j is the density of the j-th initial subclass A j ,
Figure BDA0002118756180000075
represents the average distance of the Euclidean distance between the image x i in the initial subclass A j and its K 1 nearest neighbors in the image set X,
Figure BDA0002118756180000076
d ik represents the Euclidean distance between image xi and its k-th nearest neighbor x k in the image set, K 1 is a constant, represents the number of neighbors of image xi , n j is the number of images in the initial subclass A j , i is the ordinal number of images in the initial subclass A j .

可以从上述公式中发现,类别Aj中的目标越是分散,ρj的值就越小。ρj是用来描述类别Aj中图像的离散程度的,所以它可以被用作对类别Aj进行相关判断的基本依据。It can be found from the above formula that the more dispersed the objects in the category A j , the smaller the value of ρ j . ρ j is used to describe the discrete degree of images in category A j , so it can be used as the basic basis for making relevant judgments on category A j .

可以利用初始复合类以及初始复合类所包含初始单类之间的密度关系来合并这些初始子类。在本发明中初始复合类被认为是不同初始单类之间的一种过渡类,初始复合类中的图像被认为属于其所包含的某个初始单类,是一种不精确表示。These initial subclasses can be combined using the density relationship between the initial composite class and the initial single classes contained in the initial composite class. In the present invention, the initial composite class is regarded as a transition class between different initial single classes, and the images in the initial composite class are regarded as belonging to a certain initial single class contained in it, which is an inexact representation.

如果原本属于某一类别的图像因为多聚类的原因被划分到不同类别中(包括初始单类和初始复合类),那么这些初始单类在密度大小上应该是差异不大的,并且初始复合类的密度应该大于或者等于其所包含的初始单类的密度,因为初始复合类中的图像原本应该是分布在这一类别相对中心的位置。需要注意的是并不是所有的初始复合类都满足以上原则,因为有一些特殊情况下初始复合类中的图像有可能是非常少的或者没有初始复合类,这也就意味着这个数据的分布是比较特殊的。If the images originally belonging to a certain category are divided into different categories (including the initial single category and the initial composite category) due to multi-clustering, then these initial single categories should have little difference in density, and the initial composite The density of the class should be greater than or equal to the density of the initial single class it contains, because the images in the initial composite class should be distributed in the relative center of this class. It should be noted that not all initial composite classes satisfy the above principles, because in some special cases, the images in the initial composite class may be very few or no initial composite class, which means that the distribution of this data is rather special.

因此,根据初始复合类和对应初始单类密度值的大小关系,将初始复合类和对应初始单类中的图像合并,生成新单类和/或新复合类,直至合并后生成的新单类和初始单类数量之和为c,初始复合类和对应初始单类密度值的大小关系由以下三种情况组成:Therefore, according to the size relationship between the initial composite class and the density value of the corresponding initial single class, the images in the initial composite class and the corresponding initial single class are merged to generate a new single class and/or a new composite class, until a new single class is generated after merging The sum of the number of the initial single class and the initial single class is c, and the size relationship between the initial composite class and the corresponding initial single class density value consists of the following three cases:

情况一:初始复合类的密度大于或者等于其所包含的初始单类的密度,即ρk,t≥ρk且ρk,t≥ρtCase 1: The density of the initial composite class is greater than or equal to the density of the initial single class it contains, that is, ρ k,t ≥ρ k and ρ k,t ≥ρ t .

情况二:初始复合类的密度介于其所包含的初始单类的密度之间,即ρk≤ρk,t≤ρtCase 2: The density of the initial composite class is between the densities of the initial single class it contains, that is, ρ k ≤ρ k,t ≤ρ t .

情况三:初始复合类的密度小于其所包含的初始单类的密度,即ρk,t≤ρk且ρk,t≤ρtCase 3: The density of the initial composite class is smaller than the density of the initial single class it contains, ie ρ k,t ≤ρ k and ρ k,t ≤ρ t .

其中,ρk为第k个初始单类的密度值,0≤k≤N,ρt为第t个初始单类的密度值,0≤t≤N,ρk,t为由第k个初始单类和第t个初始单类组成的初始复合类的密度值。Among them, ρ k is the density value of the k-th initial single class, 0≤k≤N, ρ t is the density value of the t-th initial single class, 0≤t≤N, ρ k,t is the k-th initial single class The density value of the initial composite class consisting of a single class and the t-th initial single class.

根据上述的原则,满足情况一的初始单类和初始复合类应该首先被合并成为一个新单类,其次是满足情况二的,最后是情况三的,即将初始复合类和对应初始单类中的图像合并时的合并顺序依次为:情况一、情况二、情况三。新单类被定义为:

Figure BDA0002118756180000091
According to the above principles, the initial single class and the initial composite class that satisfy the first case should be merged into a new single class first, followed by the one satisfying the second case, and finally the third case, that is, the initial composite class and the corresponding initial single class in the The order of merging in image merging is as follows: Case 1, Case 2, Case 3. The new single class is defined as:
Figure BDA0002118756180000091

在一个具体问题中,有可能只分析其中的某一种情况,也有可能需要分析其中的两种,甚至是三种情况同时都需要分析。如果有太多的初始单类和初始复合类满足同一种情况(例如满足情况一),这个时候在合并时还需要考虑优先级问题。为了避免在同一种情况下合并时有可能产生的选择混乱,本发明设计了一种优先级法则,

Figure BDA0002118756180000092
其中,Dd是密度距离集,R(ρi)是ρi在集合Ds中所处位置的序号值,并且集合Ds表示的是不同初始单类的密度值按照大小降序排列的集合,被定义为Ds=sort{ρ1,...,ρc,...,ρ2c|1≤i≤2c},集合Do表示不同初始复合类密度按照大小降序排列的集合,被定义为Ds=sort{ρ1,2,...,ρ1,2c,...,ρ2c-1,2c|1≤i,j≤2c},在同一种情况下,当密度距离一致时集合Do将会被用来辅助判断合并顺序的优先级。yi,j
Figure BDA0002118756180000093
Figure BDA0002118756180000094
之间差值的绝对值,即为密度距离。In a specific problem, it is possible to analyze only one of them, or it may be necessary to analyze two of them, or even all three of them need to be analyzed at the same time. If there are too many initial single classes and initial composite classes that satisfy the same situation (such as meeting the first case), the priority issue needs to be considered when merging. In order to avoid the confusion of choices that may occur when merging in the same situation, the present invention designs a priority rule,
Figure BDA0002118756180000092
Among them, D d is the density distance set, R (ρi) is the serial number value of the position of ρ i in the set D s , and the set D s represents the set of the density values of different initial single classes arranged in descending order of size, which is Defined as D s =sort{ρ 1 ,...,ρ c ,...,ρ 2c |1≤i≤2c}, the set D o represents a collection of different initial composite class densities arranged in descending order of size, and is defined as D s =sort{ρ 1,2 ,...,ρ 1,2c ,...,ρ 2c-1,2c |1≤i,j≤2c}, in the same case, when the density distances are consistent The set D o will be used to help determine the priority of the merge order. y i,j is
Figure BDA0002118756180000093
and
Figure BDA0002118756180000094
The absolute value of the difference between them is the density distance.

初始复合类所包含的两个初始单类的密度距离较小时,CClu会优先合并这个初始复合类及其包含的两个初始单类成为一个新单类,这也就意味着在同一情况下,初始复合类所包含的两个初始单类的密度值越接近(即密度距离小),它们越有可能优先被合并(在同一种情况下,密度距离被用作优先判断标准,因为密度值更加容易受到数据分布或者聚类结果的影响所以它的变化是非常大的)。这样做的目的既是为了防止密度差异较大的初始单类被合并成为一个新单类;同时也是为了避免由于部分数据重叠导致初始复合类密度值较大时这些类被合并成为一个新单类。但是在同一种情况下,当初始复合类所包含的初始单类的密度距离一致时,密度值较大的初始复合类以及其所包含的初始单类将会优先被合并。When the density distance between the two initial single classes included in the initial composite class is small, CClu will preferentially merge the initial composite class and the two initial single classes it contains into a new single class, which means that in the same case, The closer the density values of the two initial single classes contained in the initial composite class are (that is, the smaller the density distance), the more likely they are to be merged preferentially (in the same case, the density distance is used as the priority judgment criterion, because the density value is more It is easily affected by data distribution or clustering results, so its variation is very large). The purpose of this is not only to prevent the initial single class with large density difference from being merged into a new single class, but also to prevent these classes from being merged into a new single class when the density value of the initial composite class is larger due to partial data overlap. However, in the same case, when the density distances of the initial single classes included in the initial composite class are consistent, the initial composite class with a larger density value and the initial single class included in it will be merged preferentially.

将初始复合类和对应初始单类中的图像合并的具体方法为:计算初始复合类对应的两个初始单类之间的密度距离值,将初始复合类按照密度距离值由小至大排序,并依次将初始复合类和对应初始单类中的图像合并为一个新单类。The specific method of merging the images in the initial composite class and the corresponding initial single class is: calculating the density distance value between the two initial single classes corresponding to the initial composite class, and sorting the initial composite classes according to the density distance value from small to large, And sequentially merge the images in the initial composite class and the corresponding initial single class into a new single class.

当初始复合类对应的两个初始单类之间的密度距离值相等时,按照各初始复合类的密度值由大到小的顺序依次将初始复合类和对应初始单类中的图像合并为一个新单类。其中,密度距离值为两个初始单类在初始单类密度集Ds中的序数的差值,初始单类密度集Ds为N个初始单类的密度值的集合,且初始单类密度集Ds中按照初始单类的密度值由大到小排序。When the density distance values between the two initial single classes corresponding to the initial composite class are equal, the images in the initial composite class and the corresponding initial single class are merged into one according to the order of the density value of each initial composite class from large to small. New single class. Among them, the density distance value is the difference between the ordinal numbers of the two initial single classes in the initial single class density set D s , the initial single class density set D s is the set of density values of N initial single class, and the initial single class density The set D s is sorted according to the density value of the initial single class from large to small.

当生成新单类后且进行下一次初始复合类合并前,判断新单类与其他新单类中是否存在相同的图像,当存在时,将新单类和该与其存在相同图像的新单类中的所有图像进行合并,生成一个新单类。也就是说,在很多情况下合并不同类别产生一个新单类后,这些新单类有可能还要继续被合并。也就是说,当两个新单类产生后,要优先考虑这两个新单类需不需要再次被合并,而合并的条件就是这两个新单类的交集不为空集,这被称为合并法则的传递性,即

Figure BDA0002118756180000101
其中,
Figure BDA0002118756180000102
表示由合并法则传递性产生的新单类,
Figure BDA0002118756180000103
Figure BDA0002118756180000104
分别表示不同的新单类。After the new single class is generated and before the next initial composite class merging, it is judged whether the new single class and other new single classes have the same image. All images in are merged to generate a new single class. That is to say, in many cases, after merging different categories to generate a new single category, these new single categories may continue to be merged. That is to say, when two new single classes are generated, priority must be given to whether these two new single classes need to be merged again, and the condition for merging is that the intersection of these two new single classes is not an empty set, which is called is the transitivity of the merger rule, that is
Figure BDA0002118756180000101
in,
Figure BDA0002118756180000102
represents a new single class resulting from the transitivity of the merge rule,
Figure BDA0002118756180000103
and
Figure BDA0002118756180000104
respectively represent different new single classes.

下面举个简单的例子来进一步说明合并规则。A simple example is given below to further illustrate the merge rule.

案例1:考虑一个3类问题,并且假定N=2c=6,Ω={ω123456},SΩ={ω1234561,21,31,41,52,42,64,54,6},在HΩ中,ρ1=2.17,ρ2=3.2,ρ3=1.85,ρ4=1.59,ρ5=1.05,ρ6=1.23,ρ1,2=2.77,ρ1,3=2.02,ρ1,4=2.26,ρ1,5=1.45,ρ2,4=2.66,ρ2,6=1.97,ρ4,5=0.9,ρ4,6=0.99,很容易发现三种情况对应着不同的初始复合类及其所包含的初始单类,并且通过计算可得:Case 1: Consider a class 3 problem and assume N=2c=6, Ω={ω 123456 },S Ω ={ω 1234561,21,31,41,52,42,64,54,6 }, in , ρ 1 =2.17, ρ 2 =3.2, ρ 3 =1.85, ρ 4 =1.59, ρ 5 =1.05, ρ 6 =1.23, ρ 1,2 =2.77, ρ 1,3 =2.02 , ρ 1,4 =2.26, ρ 1,5 =1.45, ρ 2,4 =2.66, ρ 2,6 =1.97, ρ 4,5 =0.9, ρ 4,6 =0.99, it is easy to find that the three cases correspond to There are different initial composite classes and the initial single class they contain, and can be obtained by calculation:

满足情况一:ρ1,4Satisfy condition 1: ρ 1,4 ;

满足情况二:ρ1,2,ρ1,3,ρ1,5,ρ2,4,ρ2,6Satisfy the second condition: ρ 1,2 , ρ 1,3 , ρ 1,5 , ρ 2,4 , ρ 2,6 ;

满足情况三:ρ4,5,ρ4,6Satisfy condition three: ρ 4,5 , ρ 4,6 ;

则有,Dd={y1,2=1,y1,3=1,y1,4=2,y1,5=5,y2,4=3,y2,6=4,y4,5=2,y4,6=1},且有,Ds={ρ2>ρ1>ρ3>ρ4>ρ6>ρ5},Do={ρ1,2>ρ2,4>ρ1,4>ρ1,3>ρ2,6>ρ1,5>ρ4,6>ρ4,5}。Then, D d ={y 1,2 =1,y 1,3 =1,y 1,4 =2,y 1,5 =5,y 2,4 =3,y 2,6 =4,y 4,5 =2,y 4,6 =1}, and there is, D s ={ρ 213465 }, D o ={ρ 1,22,41,41,32,61,54,64,5 }.

根据合并法则,满足情况一的类别应该首先被合并,并且只有ρ1,4满足。所以ω1,4,ω1和ω4应该被合并并且产生一个新单类。现在新单类的数量不满足N′=3,所以需要继续合并满足情况二的类别。According to the merging rule, the categories satisfying case one should be merged first, and only ρ 1,4 is satisfied. So ω 1,4 , ω 1 and ω 4 should be merged and produce a new single class. Now the number of new single classes does not satisfy N'=3, so it is necessary to continue to merge the classes that satisfy the second case.

共有五个初始复合类及其所包含的初始单类满足情况二,所以密度距离应该被引入用来辅助判断优先合并顺序。由于在集合Dd中y1,2=y1,3=1是最小的,所以ω1,2、ω1,3及其它们所包含的初始单类应该优先被合并,并且它们的合并顺序是先ω1,2后ω1,3,因为在集合Do中ρ1,2>ρ1,3,接下来的合并顺序是ω2,4,ω2,6,ω1,5There are five initial composite classes and the initial single class contained in them satisfy the second case, so the density distance should be introduced to assist in judging the priority merging order. Since y 1,2 =y 1,3 =1 is the smallest in the set D d , ω 1,2 , ω 1,3 and the initial single class they contain should be merged preferentially, and their merge order is ω 1,2 first and then ω 1,3 , because in the set D o ρ 1,21,3 , the next merge order is ω 2,4 , ω 2,6 , ω 1,5 .

显然,在满足情况三的初始复合类及其所包含的初始单类中,初始复合类ω4,6及其所包含的初始单类应该首先被合并,因为在集合Dd中y4,6>y4,5Obviously, among the initial composite classes and the initial single classes they contain that satisfy Case 3, the initial composite classes ω 4,6 and the initial single classes they contain should be merged first, because in the set D d y 4,6 >y 4,5 .

需要注意的是尽管在三种情况下依次分析了所有初始复合类及其所包含的初始单类的合并顺序,但是并不是所有的分析都是有用的,可以从以下的合并步骤中看得出来:It should be noted that although the merging order of all initial composite classes and the initial single classes they contain is analyzed in turn in three cases, not all analyses are useful, as can be seen from the following merging steps :

步骤一:在情况一中,

Figure BDA0002118756180000111
N′>3。Step 1: In case 1,
Figure BDA0002118756180000111
N'>3.

步骤二:在情况二中,

Figure BDA0002118756180000112
然后可得,Step 2: In case 2,
Figure BDA0002118756180000112
then get,

Figure BDA0002118756180000113
Figure BDA0002118756180000114
N′>3。
Figure BDA0002118756180000113
and
Figure BDA0002118756180000114
N'>3.

步骤三:在情况二中,

Figure BDA0002118756180000115
然后可得,Step 3: In case 2,
Figure BDA0002118756180000115
then get,

Figure BDA0002118756180000116
Figure BDA0002118756180000117
N′=3。
Figure BDA0002118756180000116
and
Figure BDA0002118756180000117
N'=3.

所以,合并产生的新单类应该是:Therefore, the new single class generated by the merge should be:

Figure BDA0002118756180000121
Figure BDA0002118756180000121

Figure BDA0002118756180000122
Figure BDA0002118756180000122

Figure BDA0002118756180000123
Figure BDA0002118756180000123

需要注意的是还有4个复合类{ω1,52,64,54,6}没有被合并到新单类中

Figure BDA0002118756180000124
所以接下来需要把这4个初始复合类中的图像划分到新单类中或者划分到由这3个新单类组成的新的复合类中(例如
Figure BDA0002118756180000125
),其中,
Figure BDA0002118756180000126
同样被认为是
Figure BDA0002118756180000127
Figure BDA0002118756180000128
组成的新复合类,处于这两个新单类的重叠区域。It should be noted that there are also 4 composite classes {ω 1,52,64,54,6 } which are not merged into the new single class
Figure BDA0002118756180000124
Therefore, it is necessary to divide the images in these 4 initial composite classes into new single classes or into new composite classes composed of these 3 new single classes (for example
Figure BDA0002118756180000125
),in,
Figure BDA0002118756180000126
also considered to be
Figure BDA0002118756180000127
and
Figure BDA0002118756180000128
The new composite class formed is in the overlapping area of the two new single classes.

将初始复合类中的图像划分到新单类、初始单类或新复合类中,得到待分类图像集X中的图像的分类结果。The images in the initial composite class are divided into new single class, initial single class or new composite class, and the classification result of the images in the image set X to be classified is obtained.

未合并的初始复合类中的图像将会被划分给与其所在的初始复合类相关的新单类中或者由这些新单类组成的新复合类中,即将初始复合类中的图像划分到新单类、初始单类或新复合类中。所以,基于样本K近邻提出了一种向量加权余弦距离方法。The images in the unmerged initial composite class will be divided into a new single class related to the original composite class in which it is located or a new composite class composed of these new single classes, that is, the images in the initial composite class will be divided into the new single class. class, initial single class, or new composite class. Therefore, a vector weighted cosine distance method is proposed based on the sample K nearest neighbors.

在该方法中,首先,选取初始复合类中的图像xi,在新单类中寻找图像xi的K2个近邻,并定义为

Figure BDA0002118756180000129
其中
Figure BDA00021187561800001210
表示xj的类别是
Figure BDA00021187561800001211
例如,在案例1中复合类ω1,5包含两个初始单类ω1和ω5,其中ω1被合并到了新单类
Figure BDA00021187561800001212
中,并且ω5被合并到了新单类
Figure BDA00021187561800001213
所以初始复合类ω1,5中的图像xi应该被划分到新单类
Figure BDA00021187561800001214
或者
Figure BDA00021187561800001215
中,又或者被划分到由它们组成的新复合类中
Figure BDA00021187561800001216
但是初始复合类ω1,5中的图像xi不能够被划分到新单类
Figure BDA00021187561800001217
或者与它相关的新复合类中,因此初始复合类ω1,5中的图像xi的K2个近邻的标签只能够是
Figure BDA00021187561800001218
或者
Figure BDA00021187561800001219
Figure BDA00021187561800001220
或者
Figure BDA00021187561800001221
In this method, first, select the image xi in the initial composite class, find the K 2 nearest neighbors of the image xi in the new single class, and define it as
Figure BDA0002118756180000129
in
Figure BDA00021187561800001210
The class representing x j is
Figure BDA00021187561800001211
For example, in case 1 the composite class ω 1,5 contains two initial single classes ω 1 and ω 5 , where ω 1 is merged into the new single class
Figure BDA00021187561800001212
, and ω 5 is merged into the new single class
Figure BDA00021187561800001213
So the images xi in the initial composite class ω 1,5 should be classified into the new single class
Figure BDA00021187561800001214
or
Figure BDA00021187561800001215
, or into a new composite class consisting of them
Figure BDA00021187561800001216
But the images xi in the initial composite class ω 1,5 cannot be classified into the new single class
Figure BDA00021187561800001217
or in a new composite class related to it, so the labels of the K 2 nearest neighbors of image x i in the initial composite class ω 1,5 can only be
Figure BDA00021187561800001218
or
Figure BDA00021187561800001219
which is
Figure BDA00021187561800001220
or
Figure BDA00021187561800001221

选取初始复合类中的图像xi,并在待分类图像集X中找出与其欧氏距离最近的K2个图像作为其近邻图像,并生成K2个向量。由于未合并初始复合类中的图像xi到K2个近邻

Figure BDA0002118756180000131
的欧氏距离通常不相同,所以在使用向量法时需要采取加权策略,根据图像xi和其近邻图像之间的欧氏距离生成K2个向量的权重。一般而言,图像离它的近邻越远,那么近邻的可靠性就越低。所以距离dij越大往往导致其可靠性权重λij就越低。一个简单又合理的方法已经被广泛应用于各个领域,在这里选择一种被广泛应用于各个领域且非常合理的估计可靠性权重的方法,
Figure BDA0002118756180000132
其中dij表示图像xi和近邻
Figure BDA0002118756180000133
之间的欧氏距离。Select the image xi in the initial composite class, and find the K 2 images with the nearest Euclidean distance in the image set X to be classified as its nearest neighbor images, and generate K 2 vectors. Since images x i in the initial composite class are not merged to K 2 nearest neighbors
Figure BDA0002118756180000131
The Euclidean distances are usually different, so a weighting strategy needs to be adopted when using the vector method, and the weights of K 2 vectors are generated according to the Euclidean distance between the image xi and its neighboring images. In general, the farther an image is from its neighbors, the less reliable the neighbors are. Therefore, the larger the distance d ij , the lower the reliability weight λ ij will be. A simple and reasonable method has been widely used in various fields. Here, a method that is widely used in various fields and very reasonable to estimate reliability weights is selected.
Figure BDA0002118756180000132
where d ij represents the image xi and the nearest neighbors
Figure BDA0002118756180000133
Euclidean distance between .

计算出K2个向量的和向量,作为第一和向量;Calculate the sum vector of K 2 vectors as the first sum vector;

计算出K2个向量中向量终点对应的近邻图像属于同一初始单类/新单类的向量的和向量,作为第二和向量;Calculate the sum vector of the vectors of the vector end points corresponding to the vector end points in the K 2 vectors that belong to the same initial single class/new single class, as the second sum vector;

计算每个第二和向量与第一和向量形成的夹角的余弦值,将图像xi划分到夹角的余弦值最小的第二和向量所对应的初始单类/新单类中。Calculate the cosine value of the angle formed by each second sum vector and the first sum vector, and divide the image xi into the initial single class/new single class corresponding to the second sum vector with the smallest cosine value of the included angle.

在获得K2个近邻后就可以得到K2个向量

Figure BDA0002118756180000134
其中,
Figure BDA0002118756180000135
表示向量的起点是未合并初始复合类中的图像xi,向量的终点是K2个近邻中的近邻xj,并且xj向量加权余弦距离方法的核心思想是分别计算图像xi的K2个近邻中属于同一个类别的所有向量的和向量,并将它们同K2个向量
Figure BDA0002118756180000136
的和向量作比较,余弦值较小的类别向量所代表的类别作为图像xi的最终划分类别。也就是说,不同类的和向量与所有向量的和向量之间夹角的余弦越小,图像xi就越有可能属于余弦值较小的类别。当不同类的和向量与所有向量的和向量之间夹角的余弦值非常接近(差异不大)时,这意味着图像xi很难被分到某个新单类。也就是说,此时图像xi应该被划分到由新单类组成的新复合类中。重复上述步骤,直至将所有初始复合类中的图像划分完。After getting K 2 nearest neighbors, you can get K 2 vectors
Figure BDA0002118756180000134
in,
Figure BDA0002118756180000135
The starting point of the representation vector is the image x i in the unmerged initial composite class, the end point of the vector is the nearest neighbor x j among the K 2 nearest neighbors, and the core idea of the x j vector weighted cosine distance method is to calculate the K 2 of the image x i separately The sum vector of all vectors belonging to the same category in the nearest neighbors, and compare them with K 2 vectors
Figure BDA0002118756180000136
The sum vector is compared, and the category represented by the category vector with the smaller cosine value is used as the final classification category of the image xi . That is, the smaller the cosine of the angle between the sum vector of different classes and the sum vector of all vectors, the more likely the image xi belongs to the class with the smaller cosine value. When the cosine value of the angle between the sum vector of different classes and the sum vector of all vectors is very close (not very different), it means that the image xi is difficult to be classified into a new single class. That is, at this point the image xi should be classified into a new composite class consisting of a new single class. Repeat the above steps until all images in the initial composite class are divided.

当两个第二和向量与第一和向量形成的夹角余弦值的差值小于阈值时,将图像划分到由两个第二和向量所对应的初始单类/新单类形成的新复合类中。When the difference between the cosine value of the angle formed by the two second sum vectors and the first sum vector is less than the threshold, divide the image into a new composite formed by the initial single class/new single class corresponding to the two second sum vectors in the class.

如图2所示,简要介绍向量加权余弦距离方法的计算过程。在图中,假设K2=5,

Figure BDA0002118756180000141
并且dik表示向量
Figure BDA0002118756180000142
的模或者图像xi和近邻xj之间欧氏距离。所以向量加权余弦距离方法的计算步骤如下:As shown in Figure 2, the calculation process of the vector weighted cosine distance method is briefly introduced. In the figure, assuming K 2 =5,
Figure BDA0002118756180000141
and d ik represents the vector
Figure BDA0002118756180000142
or the Euclidean distance between the image xi and its nearest neighbors x j . So the calculation steps of the vector weighted cosine distance method are as follows:

第一步:求解Γ,Γ12The first step: solve Γ, Γ 1 , Γ 2 ;

第二步:求解

Figure BDA0002118756180000143
Figure BDA0002118756180000144
Figure BDA0002118756180000145
Step 2: Solve
Figure BDA0002118756180000143
and
Figure BDA0002118756180000144
Figure BDA0002118756180000145

第三步:根据初始复合类参数ξ划分图像xi属于哪个新单类或者由新单类构成的新复合类。Step 3: According to the initial composite class parameter ξ, the image xi belongs to a new single class or a new composite class composed of the new single class.

参数设置规则:根据实验结果,推荐N=2c作为默认值,如果N太小,CClu的性能会有所下降,但是如果N取值太大会产生很多类别,这会带来巨大的计算量;ξ∈[0,0.4]也被设定为默认值,这个应该根据使用者在具体应用过程中可以接受的精确度来设定。同时,由于K1和K2取值在[5,10]之间时,对实验结果几乎没有什么影响,因此,将K1和K2的默认值设为5。Parameter setting rules: According to the experimental results, N=2c is recommended as the default value. If N is too small, the performance of CClu will decrease, but if the value of N is too large, many categories will be generated, which will bring a huge amount of calculation; ξ ∈ [0, 0.4] is also set as the default value, which should be set according to the accuracy acceptable to the user in the specific application process. At the same time, since K 1 and K 2 are between [5, 10], they have little effect on the experimental results. Therefore, the default values of K 1 and K 2 are set to 5.

在CClu中,假定真实类别c是可以根据先验知识或者经验得出的,如果没有先验知识可用,可以通过

Figure BDA0002118756180000146
来获取,其中,0≤N*(c)≤1。In CClu, it is assumed that the real category c can be derived from prior knowledge or experience. If no prior knowledge is available, it can be obtained by
Figure BDA0002118756180000146
to obtain, where 0≤N*(c)≤1.

Claims (9)

1.基于信任函数理论的图像分类方法,其特征在于,包括以下步骤:1. the image classification method based on trust function theory, is characterized in that, comprises the following steps: 将待分类图像集X中的所有图像划分为多个初始子类;其中,初始子类由N个初始单类和多个初始复合类组成,每个所述初始复合类由两个所述初始单类组成,N≥c,c表示图像集X中所有图像的真实类别数;Divide all images in the image set X to be classified into multiple initial subclasses; wherein, the initial subclass consists of N initial single classes and multiple initial composite classes, and each initial composite class consists of two initial composite classes. Single-class composition, N≥c, c represents the number of real classes of all images in the image set X; 计算出每个所述初始子类的密度值;其中,所述密度为初始子类中所有图像之间的欧氏距离平均值的倒数;Calculate the density value of each of the initial subclasses; wherein, the density is the reciprocal of the average Euclidean distance between all images in the initial subclass; 计算出每个所述初始子类的密度采用公式为:The density of each of the initial subclasses is calculated using the formula:
Figure FDA0003166439890000011
Figure FDA0003166439890000011
其中,ρj为第j个初始子类Aj的密度,
Figure FDA0003166439890000012
表示初始子类Aj中的图像xi与其在图像集X中的K1个近邻的欧氏距离的平均距离,K1为常数,nj为初始子类Aj中图像的数量,i为初始子类Aj中图像的序数;
where ρ j is the density of the j-th initial subclass A j ,
Figure FDA0003166439890000012
represents the average distance of the Euclidean distance between the image x i in the initial subclass A j and its K 1 nearest neighbors in the image set X, where K 1 is a constant, n j is the number of images in the initial subclass A j , and i is ordinal number of images in initial subclass A j ;
根据所述初始复合类和对应初始单类密度值的大小关系,将所述初始复合类和对应初始单类中的图像合并,生成新单类和新复合类,直至合并后生成的新单类和所述初始单类数量之和为c;According to the size relationship between the initial composite class and the density value of the corresponding initial single class, the images in the initial composite class and the corresponding initial single class are merged to generate a new single class and a new composite class, until a new single class is generated after merging and the sum of the initial single class quantity is c; 将所述初始复合类中的图像划分到所述新单类、初始单类或新复合类中,得到所述待分类图像集X中的图像的分类结果。The images in the initial composite class are divided into the new single class, the initial single class or the new composite class, and the classification result of the images in the image set X to be classified is obtained.
2.如权利要求1所述的基于信任函数理论的图像分类方法,其特征在于,所述初始复合类和对应初始单类密度值的大小关系由以下三种情况组成:2. The image classification method based on the trust function theory as claimed in claim 1, wherein the size relationship between the initial composite class and the corresponding initial single class density value is made up of the following three situations: 情况一:ρk,t≥ρk且ρk,t≥ρtCase 1: ρ k,t ≥ρ k and ρ k,t ≥ρ t ; 情况二:ρk≤ρk,t≤ρtCase 2: ρ k ≤ρ k,t ≤ρ t ; 情况三:ρk,t≤ρk且ρk,t≤ρtCase 3: ρ k,t ≤ρ k and ρ k,t ≤ρ t ; 其中,ρk为第k个初始单类的密度值,0≤k≤N,ρt为第t个初始单类的密度值,0≤t≤N,ρk,t为由第k个初始单类和第t个初始单类组成的初始复合类的密度值。Among them, ρ k is the density value of the k-th initial single class, 0≤k≤N, ρ t is the density value of the t-th initial single class, 0≤t≤N, ρ k,t is the k-th initial single class The density value of the initial composite class consisting of a single class and the t-th initial single class. 3.如权利要求2所述的基于信任函数理论的图像分类方法,其特征在于,将所述初始复合类和对应初始单类中的图像合并时的合并顺序依次为:情况一、情况二、情况三。3. The image classification method based on the trust function theory as claimed in claim 2, wherein the merging sequence when merging the images in the initial composite class and the corresponding initial single class is: case one, case two, Case three. 4.如权利要求3所述的基于信任函数理论的图像分类方法,其特征在于,将所述初始复合类和对应初始单类中的图像合并的具体方法为:4. The image classification method based on the trust function theory as claimed in claim 3, wherein the concrete method for merging the images in the initial composite class and the corresponding initial single class is: 计算初始复合类对应的两个初始单类之间的密度距离值,将所述初始复合类按照所述密度距离值由小至大排序,并依次将所述初始复合类和对应初始单类中的图像合并为一个新单类;Calculate the density distance value between the two initial single classes corresponding to the initial composite class, sort the initial composite classes according to the density distance value from small to large, and sequentially place the initial composite class and the corresponding initial single class in the The images are merged into a new single class; 当所述初始复合类对应的两个初始单类之间的密度距离值相等时,按照各初始复合类的密度值由大到小的顺序依次将所述初始复合类和对应初始单类中的图像合并为一个新单类;When the density distance values between the two initial single classes corresponding to the initial composite class are equal, the initial composite class and the corresponding initial single class are sorted in order according to the density value of each initial composite class in descending order. Images are merged into a new single class; 其中,所述密度距离值为两个所述初始单类在初始单类密度集Ds中的序数的差值,所述初始单类密度集Ds为所述N个初始单类的密度值的集合,且所述初始单类密度集Ds中按照所述初始单类的密度值由大到小排序。Wherein, the density distance value is the difference between the ordinal numbers of the two initial single classes in the initial single class density set D s , and the initial single class density set D s is the density value of the N initial single class , and the initial single-class density set D s is sorted according to the density value of the initial single-class from large to small. 5.如权利要求4所述的基于信任函数理论的图像分类方法,其特征在于,当生成新单类后且进行下一次初始复合类合并前,判断所述新单类与其他新单类中是否存在相同的图像,当存在时,将所述新单类和该与其存在相同图像的新单类中的所有图像进行合并,生成一个新单类。5. The image classification method based on the trust function theory as claimed in claim 4, characterized in that, after a new single class is generated and before the next initial composite class merging is performed, it is judged that the new single class is different from other new single classes. Whether there is the same image, when it exists, combine the new single class with all the images in the new single class with the same image to generate a new single class. 6.如权利要求2-5任一所述的基于信任函数理论的图像分类方法,其特征在于,将所述初始复合类中的图像划分到所述新单类、初始单类或新复合类中的具体方法为:6. The image classification method based on the trust function theory according to any one of claims 2-5, wherein the images in the initial composite class are divided into the new single class, the initial single class or the new composite class The specific methods are: 选取所述初始复合类中的图像xi,并在所述待分类图像集X中找出与其欧氏距离最近的K2个图像作为其近邻图像,并生成K2个向量,根据所述图像xi和其近邻图像之间的欧氏距离生成K2个向量的权重;Select the image x i in the initial composite class, and find the K 2 images with the closest Euclidean distance in the image set X to be classified as its nearest neighbor images, and generate K 2 vectors, according to the image The Euclidean distance between x i and its nearest neighbor images generates weights of K 2 vectors; 计算出K2个向量的和向量,作为第一和向量;Calculate the sum vector of K 2 vectors as the first sum vector; 计算出K2个向量中向量终点对应的近邻图像属于同一初始单类/新单类的向量的和向量,作为第二和向量;Calculate the sum vector of the vectors of the vector end points corresponding to the vector end points in the K 2 vectors that belong to the same initial single class/new single class, as the second sum vector; 计算每个所述第二和向量与第一和向量形成的夹角的余弦值,将图像xi划分到夹角的余弦值最小的第二和向量所对应的初始单类/新单类中;Calculate the cosine value of the angle formed by each of the second sum vectors and the first sum vector, and divide the image x i into the initial single class/new single class corresponding to the second sum vector with the smallest cosine value of the included angle ; 重复上述步骤,直至将所有初始复合类中的图像划分完。Repeat the above steps until all images in the initial composite class are divided. 7.如权利要求6所述的基于信任函数理论的图像分类方法,其特征在于,当两个所述第二和向量与第一和向量形成的夹角余弦值的差值小于阈值时,将图像划分到由两个第二和向量所对应的初始单类/新单类形成的新复合类中。7. The image classification method based on the trust function theory as claimed in claim 6, wherein when the difference between the cosine values of the included angles formed by the two second sum vectors and the first sum vector is less than a threshold, the The image is divided into a new composite class formed by the initial single class/new single class corresponding to the two second sum vectors. 8.如权利要求7所述的基于信任函数理论的图像分类方法,其特征在于,将待分类图像集X中的所有图像划分为多个初始子类具体采用c均值聚类算法。8 . The image classification method based on the trust function theory according to claim 7 , wherein all images in the image set X to be classified are divided into a plurality of initial sub-classes, and a c-means clustering algorithm is specifically adopted. 9 . 9.如权利要求7或8所述的基于信任函数理论的图像分类方法,其特征在于,所述待分类图像集X为非对称数据的图像集。9 . The image classification method based on the trust function theory according to claim 7 or 8 , wherein the image set X to be classified is an image set of asymmetric data. 10 .
CN201910599618.7A 2019-07-04 2019-07-04 Image classification method based on trust function theory Active CN110472657B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910599618.7A CN110472657B (en) 2019-07-04 2019-07-04 Image classification method based on trust function theory

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910599618.7A CN110472657B (en) 2019-07-04 2019-07-04 Image classification method based on trust function theory

Publications (2)

Publication Number Publication Date
CN110472657A CN110472657A (en) 2019-11-19
CN110472657B true CN110472657B (en) 2021-09-03

Family

ID=68507393

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910599618.7A Active CN110472657B (en) 2019-07-04 2019-07-04 Image classification method based on trust function theory

Country Status (1)

Country Link
CN (1) CN110472657B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105550714A (en) * 2015-12-30 2016-05-04 国家电网公司 Cluster fusion method for warning information in heterogeneous network environment
CN105930791A (en) * 2016-04-19 2016-09-07 重庆邮电大学 Road traffic sign identification method with multiple-camera integration based on DS evidence theory
CN107368854A (en) * 2017-07-20 2017-11-21 华北电力大学(保定) A kind of circuit breaker failure diagnostic method based on improvement evidence theory

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080228699A1 (en) * 2007-03-16 2008-09-18 Expanse Networks, Inc. Creation of Attribute Combination Databases

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105550714A (en) * 2015-12-30 2016-05-04 国家电网公司 Cluster fusion method for warning information in heterogeneous network environment
CN105930791A (en) * 2016-04-19 2016-09-07 重庆邮电大学 Road traffic sign identification method with multiple-camera integration based on DS evidence theory
CN107368854A (en) * 2017-07-20 2017-11-21 华北电力大学(保定) A kind of circuit breaker failure diagnostic method based on improvement evidence theory

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
A new pattern classification improvement method with local quality matrix based on K-NN;Zhun-ga Liu et al.;《Knowledge-Based Systems》;20181107;1-27 *
基于置信函数理论的不确定数据分类与决策融合;焦连猛;《中国博士学位论文全文数据库 工程科技Ⅱ辑》;20170815;C028-3 *

Also Published As

Publication number Publication date
CN110472657A (en) 2019-11-19

Similar Documents

Publication Publication Date Title
CN111126482B (en) Remote sensing image automatic classification method based on multi-classifier cascade model
CN114255403B (en) Optical remote sensing image data processing method and system based on deep learning
Lynen et al. Placeless place-recognition
Xie et al. Density core-based clustering algorithm with dynamic scanning radius
CN108492298B (en) Multispectral image change detection method based on generation countermeasure network
CN101853389A (en) Detection device and method for multi-class targets
US20110235901A1 (en) Method, apparatus, and program for generating classifiers
CN110175615A (en) The adaptive visual position recognition methods in model training method, domain and device
CN109150830B (en) Hierarchical intrusion detection method based on support vector machine and probabilistic neural network
Manziuk et al. Definition of information core for documents classification
Mladenova et al. Comparative analysis between the traditional K-Nearest Neighbor and Modifications with Weight-Calculation
Gedeon Classifying dry sclerophyll forest from augmented satellite data: comparing neural network, decision tree & maximum likelihood
CN112633389A (en) Method for calculating trend of hurricane motion track based on MDL and speed direction
CN102930294A (en) Chaotic characteristic parameter-based motion mode video segmentation and traffic condition identification method
CN106599924A (en) Classifier construction method based on three-way decision
CN109686402A (en) Based on key protein matter recognition methods in dynamic weighting interactive network
CN109685122B (en) A clustering method for semi-supervised tourist portrait data based on density peaks and gravitational influence
Zhang et al. An efficient class-constrained DBSCAN approach for large-scale point cloud clustering
Pal et al. Extreme value meta-learning for few-shot open-set recognition of hyperspectral images
CN117292249A (en) Underwater sonar image open set classification method, system, equipment and medium
CN112329798A (en) An image scene classification method based on the optimized visual word bag model
Sulistyaningrum et al. Vehicle detection using histogram of oriented gradients and real adaboost
Zeng et al. Fuzzy entropy clustering by searching local border points for the analysis of gene expression data
CN112866156B (en) A radio signal clustering method and system based on deep learning
CN110472657B (en) Image classification method based on trust function theory

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant