KR102803941B1

KR102803941B1 - Method for classifying multi-label in a partially labeled environment and an apparatus for performing the same

Info

Publication number: KR102803941B1
Application number: KR1020210173776A
Authority: KR
Inventors: 서창호; 강민근; 이기원
Original assignee: 한국과학기술원
Priority date: 2021-12-07
Filing date: 2021-12-07
Publication date: 2025-05-09
Anticipated expiration: 2041-12-07
Also published as: KR20230085529A

Abstract

레이블이 부분적으로 주어진 환경에서 다중 레이블을 분류하는 방법 및 이를 수행하는 장치가 개시된다. 일 실시예에 따른 다중 레이블 분류 방법은, 이미지 및 상기 이미지의 부분 레이블 데이터를 수신하는 동작과 상기 이미지를 컨볼루셔널 뉴럴 네트워크에 입력함으로써 상기 이미지의 특징 벡터를 획득하는 동작과 상기 이미지의 특징 벡터 및 상기 이미지의 부분 레이블 데이터에 기초하여 제1 행렬을 생성하는 동작과 상기 제1 행렬을 오토 인코더에 입력함으로써 상기 이미지의 추정 레이블 데이터를 포함하는 제2 행렬을 획득하는 동작을 포함할 수 있다.A method for classifying multiple labels in an environment where labels are partially given and a device for performing the same are disclosed. The multi-label classification method according to one embodiment may include an operation of receiving an image and partial label data of the image, an operation of obtaining a feature vector of the image by inputting the image to a convolutional neural network, an operation of generating a first matrix based on the feature vector of the image and the partial label data of the image, and an operation of obtaining a second matrix including estimated label data of the image by inputting the first matrix to an autoencoder.

Description

{METHOD FOR CLASSIFYING MULTI-LABEL IN A PARTIALLY LABELED ENVIRONMENT AND AN APPARATUS FOR PERFORMING THE SAME}

아래 개시는 레이블이 부분적으로 주어진 환경에서 다중 레이블을 분류하는 방법 및 이를 수행하는 장치에 관한 것이다.The disclosure below relates to a method for classifying multiple labels in an environment where labels are partially given and to a device for performing the same.

최근 딥러닝 분야에서 다중 레이블 분류(Multi-label Classification)에 대한 연구가 활발히 진행되고 있다. 다중 레이블 분류란 기계 학습(Machine Learning)의 한 범주로, 임의에 이미지에 대한 다중 클래스의 레이블 값을 예측하는 것을 의미한다.Recently, research on multi-label classification has been actively conducted in the field of deep learning. Multi-label classification is a category of machine learning, which means predicting label values of multiple classes for an image at random.

뉴럴 네트워크 기반 판별기는 많은 양의 레이블을 가진 데이터에 의해 학습될 경우 우수한 판별 성능을 가진다. 그러나 실제 현실 상황에서 데이터 레이블링에는 많은 시간이 소요되며, 데이터 레이블링을 수행할 인력은 부족한 실정이다. 이러한 문제점은 다중 레이블 분류 프로젝트에서 더 큰 문제로 다가온다.Neural network-based discriminators have excellent discriminative performance when trained by data with a large amount of labels. However, in real-world situations, data labeling takes a lot of time, and there is a shortage of personnel to perform data labeling. This problem becomes a bigger problem in multi-label classification projects.

다중 레이블 분류의 목표는 하나의 이미지에 대해 모든 객체 클래스의 존재(presence) 또는 부재(absence)를 식별하는 것이다. 하나의 객체 클래스 인식만을 목표로 하는 단일 레이블 분류와 비교할 때, 다중 레이블 분류는 많은 레이블 주석화 과정이 요구된다.The goal of multi-label classification is to identify the presence or absence of all object classes in an image. Compared to single-label classification, which aims to recognize only one object class, multi-label classification requires a lot of label annotation processes.

이미지의 레이블 데이터의 일부만 주어진 환경에서 다중 레이블 분류를 수행하는 기술이 요구된다.A technique is required to perform multi-label classification in an environment where only a portion of the label data of an image is given.

실시예에 따르면 오토 인코더의 행렬 채움에 기초하여 이미지 및 이미지의 부분 레이블 데이터로부터 이미지의 전체 레이블 데이터를 획득할 수 있다.According to an embodiment, the entire label data of an image can be obtained from the image and the partial label data of the image based on the matrix filling of the autoencoder.

다만, 기술적 과제는 상술한 기술적 과제들로 한정되는 것은 아니며, 또 다른 기술적 과제들이 존재할 수 있다.However, technical challenges are not limited to the technical challenges described above, and other technical challenges may exist.

일 실시예에 따른 다중 레이블 분류 방법은, 이미지 및 상기 이미지의 부분 레이블 데이터를 수신하는 동작과 상기 이미지를 컨볼루셔널 뉴럴 네트워크에 입력함으로써 상기 이미지의 특징 벡터를 획득하는 동작과 상기 이미지의 특징 벡터 및 상기 이미지의 부분 레이블 데이터에 기초하여 제1 행렬을 생성하는 동작과 상기 제1 행렬을 오토 인코더에 입력함으로써 상기 이미지의 추정 레이블 데이터를 포함하는 제2 행렬을 획득하는 동작을 포함할 수 있다.A multi-label classification method according to one embodiment may include an operation of receiving an image and partial label data of the image, an operation of obtaining a feature vector of the image by inputting the image to a convolutional neural network, an operation of generating a first matrix based on the feature vector of the image and the partial label data of the image, and an operation of obtaining a second matrix including estimated label data of the image by inputting the first matrix to an autoencoder.

상기 이미지는, 복수의 객체 클래스에 매칭되는 객체를 포함하고, 상기 부분 레이블 데이터는, 상기 복수의 객체 클래스의 일부만이 레이블된 데이터이고, 상기 추정 레이블 데이터는, 상기 복수의 객체 클레스가 전부 레이블된 데이터일 수 있다.The image may include objects matching multiple object classes, the partial label data may be data in which only some of the multiple object classes are labeled, and the estimated label data may be data in which all of the multiple object classes are labeled.

상기 제1 행렬을 생성하는 동작은, 상기 제1 행렬의 하나의 열에 상기 특징 벡터와 상기 부분 레이블 데이터를 순차적으로 입력하는 동작을 포함할 수 있다.The operation of generating the first matrix may include an operation of sequentially inputting the feature vector and the partial label data into one column of the first matrix.

상기 다중 레이블 분류 방법은, 상기 이미지 및 상기 제1 행렬에 기초하여 상기 컨볼루셔널 뉴럴 네트워크와 상기 오토 인코더를 공동으로 학습시키는 동작을 더 포함할 수 있다.The above multi-label classification method may further include an operation of jointly training the convolutional neural network and the autoencoder based on the image and the first matrix.

상기 공동으로 학습시키는 동작은, 상기 부분 레이블 데이터에 기초하여 제1 손실 함수를 계산하는 동작을 포함할 수 있다.The above jointly learning operation may include an operation of calculating a first loss function based on the partial label data.

상기 공동으로 학습시키는 동작은, 복수의 이미지 간의 유사도를 나타내는 유사도 그래프를 상기 추정 레이블 데이터에 기초하여 생성하는 동작과 상기 유사도 그래프에 기초하여 제2 손실 함수를 계산하는 동작을 더 포함할 수 있다.The above jointly learning operation may further include an operation of generating a similarity graph representing similarity between a plurality of images based on the estimated label data and an operation of calculating a second loss function based on the similarity graph.

상기 유사도 그래프는, 코사인 유사도 그래프일 수 있다.The above similarity graph may be a cosine similarity graph.

상기 공동으로 학습시키는 동작은, 상기 추정 레이블 데이터 및 상기 제2 행렬에 포함된 추정 특징 벡터에 기초하여 제3 손실 함수를 계산하는 동작을 더 포함할 수 있다.The above jointly learning operation may further include an operation of calculating a third loss function based on the estimated label data and the estimated feature vector included in the second matrix.

상기 공동으로 학습시키는 동작은, 상기 제1 손실 함수, 상기 제2 손실 함수, 및 상기 제3손실 함수에 기초한 제4 손실 함수를 계산하는 동작과 상기 제4 손실 함수를 최소화하도록 상기 컨볼루셔널 뉴럴 네트워크와 상기 오토 인코더를 공동으로 학습시키는 동작을 더 포함할 수 있다.The above jointly training operation may further include an operation of calculating a fourth loss function based on the first loss function, the second loss function, and the third loss function, and an operation of jointly training the convolutional neural network and the autoencoder to minimize the fourth loss function.

일 실시예에 따른 다중 레이블 분류 장치는, 하나 이상의 인스트럭션을 저장하는 메모리와 상기 인스트럭션을 실행시키기 위한 프로세서를 포함하고, 상기 인스트럭션이 실행될 때, 상기 프로세서는, 이미지 및 상기 이미지의 부분 레이블 데이터를 수신하고, 상기 이미지를 컨볼루셔널 뉴럴 네트워크에 입력함으로써 상기 이미지의 특징 벡터를 획득하고, 상기 이미지의 특징 벡터 및 상기 이미지의 부분 레이블 데이터에 기초하여 제1 행렬을 생성하고, 상기 제1 행렬을 오토 인코더에 입력함으로써 상기 이미지의 추정 레이블 데이터를 포함하는 제2 행렬을 획득할 수 있다.A multi-label classification device according to one embodiment includes a memory storing one or more instructions and a processor for executing the instructions, wherein when the instructions are executed, the processor can receive an image and partial label data of the image, obtain a feature vector of the image by inputting the image to a convolutional neural network, generate a first matrix based on the feature vector of the image and the partial label data of the image, and obtain a second matrix including estimated label data of the image by inputting the first matrix to an autoencoder.

상기 프로세서는, 상기 제1 행렬의 하나의 열에 상기 특징 벡터와 상기 부분 레이블 데이터를 순차적으로 입력할 수 있다.The above processor can sequentially input the feature vector and the partial label data into one column of the first matrix.

상기 프로세서는, 상기 이미지 및 상기 제1 행렬에 기초하여 상기 컨볼루셔널 뉴럴 네트워크와 상기 오토 인코더를 공동으로 학습시킬 수 있다.The above processor can jointly train the convolutional neural network and the autoencoder based on the image and the first matrix.

상기 프로세서는, 상기 부분 레이블 데이터에 기초하여 제1 손실 함수를 계산할 수 있다.The above processor can calculate a first loss function based on the partial label data.

상기 프로세서는, 복수의 이미지 간의 유사도를 나타내는 유사도 그래프를 상기 추정 레이블 데이터에 기초하여 생성하고, 상기 유사도 그래프에 기초하여 제2 손실 함수를 계산할 수 있다.The above processor can generate a similarity graph indicating similarity between a plurality of images based on the estimated label data, and calculate a second loss function based on the similarity graph.

상기 프로세서는, 상기 추정 레이블 데이터 및 상기 제2 행렬에 포함된 추정 특징 벡터에 기초하여 제3 손실 함수를 계산할 수 있다.The above processor can calculate a third loss function based on the estimated label data and the estimated feature vector included in the second matrix.

상기 프로세서는, 상기 제1 손실 함수, 상기 제2 손실 함수, 및 상기 제3손실 함수에 기초한 제4 손실 함수를 계산하고, 상기 제4 손실 함수를 최소화하도록 상기 컨볼루셔널 뉴럴 네트워크와 상기 오토 인코더를 공동으로 학습시킬 수 있다.The processor may calculate a fourth loss function based on the first loss function, the second loss function, and the third loss function, and jointly train the convolutional neural network and the autoencoder to minimize the fourth loss function.

도 1은 이미지와 이미지의 레이블 데이터의 예를 나타낸다.
도 2는 일 실시예에 따른 다중 레이블 분류 장치의 간략한 블록도이다.
도 3은 다중 레이블 분류를 위한 뉴럴 네트워크 구조를 나타낸다.
도 4는 도 3에 도시된 오토 인코더의 일 예를 나타낸다.
도 5는 도 3에 도시된 오토 인코더의 다른 예를 나타낸다.
도 6는 도 2에 도시된 다중 레이블 분류 장치의 성능의 일 예를 나타낸다.
도 7는 도 2에 도시된 다중 레이블 분류 장치의 성능의 다른 예를 나타낸다.
도 8은 일 실시예에 따른 다중 레이블 분류 방법의 흐름도이다.Figure 1 shows an example of an image and its label data.
FIG. 2 is a simplified block diagram of a multi-label classification device according to one embodiment.
Figure 3 shows a neural network structure for multi-label classification.
Figure 4 shows an example of the autoencoder illustrated in Figure 3.
Figure 5 shows another example of the autoencoder illustrated in Figure 3.
Figure 6 shows an example of the performance of the multi-label classification device illustrated in Figure 2.
Figure 7 shows another example of the performance of the multi-label classification device illustrated in Figure 2.
Figure 8 is a flowchart of a multi-label classification method according to one embodiment.

실시예들에 대한 특정한 구조적 또는 기능적 설명들은 단지 예시를 위한 목적으로 개시된 것으로서, 다양한 형태로 변경되어 구현될 수 있다. 따라서, 실제 구현되는 형태는 개시된 특정 실시예로만 한정되는 것이 아니며, 본 명세서의 범위는 실시예들로 설명한 기술적 사상에 포함되는 변경, 균등물, 또는 대체물을 포함한다.Specific structural or functional descriptions of the embodiments are disclosed for illustrative purposes only and may be implemented in various forms. Accordingly, the actual implemented form is not limited to the specific embodiments disclosed, and the scope of the present disclosure includes modifications, equivalents, or alternatives included in the technical idea described in the embodiments.

제1 또는 제2 등의 용어를 다양한 구성요소들을 설명하는데 사용될 수 있지만, 이런 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 해석되어야 한다. 예를 들어, 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소는 제1 구성요소로도 명명될 수 있다.Although the terms first or second may be used to describe various components, such terms should be construed only for the purpose of distinguishing one component from another. For example, a first component may be referred to as a second component, and similarly, a second component may also be referred to as a first component.

어떤 구성요소가 다른 구성요소에 "연결되어" 있다고 언급된 때에는, 그 다른 구성요소에 직접적으로 연결되어 있거나 또는 접속되어 있을 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 할 것이다.When it is said that a component is "connected" to another component, it should be understood that it may be directly connected or connected to that other component, but there may also be other components in between.

단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 명세서에서, "포함하다" 또는 "가지다" 등의 용어는 설명된 특징, 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것이 존재함으로 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.Singular expressions include plural expressions unless the context clearly indicates otherwise. In this specification, the terms "comprises" or "has" and the like are intended to specify the presence of a described feature, number, step, operation, component, part, or combination thereof, but should be understood to not preclude the presence or addition of one or more other features, numbers, steps, operations, components, parts, or combinations thereof.

다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 해당 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가진다. 일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥상 가지는 의미와 일치하는 의미를 갖는 것으로 해석되어야 하며, 본 명세서에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.Unless otherwise defined, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art. Terms defined in commonly used dictionaries should be interpreted as having a meaning consistent with the meaning they have in the context of the relevant art, and will not be interpreted in an idealized or overly formal sense unless explicitly defined herein.

이하, 실시예들을 첨부된 도면들을 참조하여 상세하게 설명한다. 첨부 도면을 참조하여 설명함에 있어, 도면 부호에 관계없이 동일한 구성 요소는 동일한 참조 부호를 부여하고, 이에 대한 중복되는 설명은 생략하기로 한다.Hereinafter, embodiments will be described in detail with reference to the attached drawings. In describing with reference to the attached drawings, identical components are given the same reference numerals regardless of the drawing numbers, and redundant descriptions thereof will be omitted.

도 1은 이미지와 이미지의 레이블 데이터의 예를 나타낸다.Figure 1 shows an example of an image and its label data.

다중 레이블 분류는 이미지에 대한 다중 클래스의 레이블 값을 예측하는 것을 의미할 수 있다. 예를 들어, 다중 레이블 분류는 복수의 객체를 포함하는 이미지에 어떤 객체가 존재하고, 어떤 객체가 존재하지 않는지를 예측하는 것일 수 있다.Multi-label classification can mean predicting label values for multiple classes for an image. For example, multi-label classification can mean predicting which objects are present and which are not present in an image containing multiple objects.

도 1의 (a)는 다중 레이블 분류가 수행되기 이전의 데이터일 수 있고, 도 1의 (b)는 다중 레이블 분류가 수행된 후의 데이터일 수 있다.Figure 1 (a) may be data before multi-label classification is performed, and Figure 1 (b) may be data after multi-label classification is performed.

도 1의 (a)를 참조하면, 이미지의 레이블 데이터는 자전거가 없다는 정보(-1) 및 사람이 있다는 정보(1)를 포함할 수 있고, 이미지의 레이블 데이터는 산, 소, 말 등에 관한 정보(0)를 포함하지 않을 수 있다. 도 1의 (a)의 이미지 레이블 데이터는 복수의 객체 클래스의 일부(예: 자전거, 사람)만이 레이블된 데이터일 수 있다.Referring to (a) of Fig. 1, the label data of the image may include information that there is no bicycle (-1) and information that there is a person (1), and the label data of the image may not include information (0) about mountains, cows, horses, etc. The image label data of (a) of Fig. 1 may be data in which only a part of a plurality of object classes (e.g., bicycles, people) are labeled.

도 1의 (b)를 참조하면, 이미지의 레이블 데이터는 산, 자전거, 및 말이 없다는 정보(-1) 및 소와 사람이 있다는 정보(1)를 포함할 수 있다. 도 1의 (b)의 이미지 레이블 데이터는 복수의 객체 클래스의 전부(예: 산, 자전거, 말, 소, 사람 등)가 레이블된 데이터일 수 있다.Referring to (b) of Fig. 1, the label data of the image may include information that there are no mountains, bicycles, or horses (-1) and information that there are cows and people (1). The image label data of (b) of Fig. 1 may be data in which all of a plurality of object classes (e.g., mountains, bicycles, horses, cows, people, etc.) are labeled.

이미지와 이미지의 레이블 데이터가 매칭된 데이터 D는 수학식 1을 통해 표현될 수 있다.Data D, which matches the image and the image's label data, can be expressed using mathematical expression 1.

[수학식 1][Mathematical formula 1]

수학식 1에서, 는 i번째 이미지의 데이터이고, 는 i번째 이미지의 레이블 데이터이고, m은 이미지의 개수이고, 는 i번째 이미지의 j번째 클래스의 레이블 값이고, c는 클래스의 개수일 수 있다.In mathematical expression 1, is the data of the i-th image, is the label data of the i-th image, m is the number of images, is the label value of the jth class of the ith image, and c can be the number of classes.

= 1을 만족할 때, i번째 이미지에는 j번째 클래스에 해당하는 객체가 존재할 수 있고, = -1을 만족할 때, i번째 이미지에는 j번째 클래스에 해당하는 객체가 존재하지 않을 수 있고, = 0을 만족할 때, i번째 이미지에는 j번째 클래스에 해당하는 객체에 대한 정보가 존재하지 않을 수 있다. = 1, there can be an object corresponding to the jth class in the i-th image, = -1, there may not be an object corresponding to the jth class in the i-th image. = 0, the i-th image may not contain information about the object corresponding to the j-th class.

객체 클래스의 모든 레이블 값이 주어진 상황이 아닌 객체 클래스의 레이블 값이 일부만 주어진 상황에서 이미지의 다중 레이블 분류를 수행하는 기술이 현재 요구되고 있다.There is currently a need for techniques to perform multi-label classification of images in situations where only some of the label values of an object class are given, rather than situations where all label values of an object class are given.

다중 레이블 분류를 위한 하나의 방법으로, 관찰되지 않은 레이블을 점진적으로 업데이트하는 방법이 있다. 이 방법은 먼저 높은 신뢰도로 예측한 표본들을 새로운 레이블로 추가하고, 점차 더 어려운 표본들을 새로운 레이블로 추가함으로써 점진적으로 판별기를 학습시킬 수 있다. 이 방법은 높은 신뢰 수준을 가진 예측만을 새로운 레이블로 간주할 수 있다(예: pseudo labeling). 이 방법은 주석이 새롭게 달린 레이블을 레이블 세트로 포함시켜 판별기를 반복적으로 트레이닝할 수 있다. 이 방법은 서로 다른 이미지 간에 유사도(상관 관계)를 파악하지 않고 서로 다른 레이블에 대한 상관 관계만을 이용하는 문제점을 가지고 있고, 이 방법은 부정확한 레이블링을 통해 바람직하지 않은 성능을 가질 수 있다.One method for multi-label classification is to incrementally update unobserved labels. This method can gradually train a discriminator by first adding samples predicted with high confidence as new labels, and then adding increasingly difficult samples as new labels. This method can consider only predictions with high confidence as new labels (e.g., pseudo labeling). This method can repeatedly train a discriminator by including newly annotated labels in the label set. This method has the problem that it does not identify the similarity (correlation) between different images, but only uses the correlation between different labels, and this method can have undesirable performance due to inaccurate labeling.

다중 레이블 분류 방법 중 하나는 그래프(예: 유사도 그래프)에 기초하여 이미지 간의 상관 관계를 고려할 수 있다. 이 방법은 Orthogonal Matching Pursuit (OMP)이라는 Compressed Sensing(CS) 알고리즘을 사용할 수 있다. 이 방법은 상당히 우수한 성능을 갖지만, 판별기 모델 파라미터가 업데이트될 때마다 독립적으로 그래프 구성을 수행하므로 상당한 계산 오버헤드를 요구한다. 이 방법은 또한 개별 이미지에 대해 OMP가 수행되므로 메모리 점유율이 추가로 증가될 수 있다.One of the multi-label classification methods can consider the correlation between images based on a graph (e.g., a similarity graph). This method can use a Compressed Sensing (CS) algorithm called Orthogonal Matching Pursuit (OMP). This method has quite good performance, but it requires a significant computational overhead because it independently performs graph construction every time the discriminator model parameters are updated. This method can also increase memory usage further because OMP is performed for each individual image.

레이블이 부분적으로 주어진 환경에서, 다중 레이블 분류 방법 중 하나는 행렬 채움 알고리즘(matrix completion algorithm)을 통해 다중 레이블 분류를 수행할 수 있다. 이 방법은 이미지 특징 벡터(image feature vector)와 이미지마다 부분적으로 주어진 레이블 데이터를 쌓아 행렬을 구성한다. 판별기 모델 파라미터가 업데이트될 때마다 이미지 특징 벡터가 바뀌게 되고 판별기 모델 파라미터가 업데이트될 때마다 행렬 채움 알고리즘이 다시 동작하므로, 이 방법은 상당한 계산 복잡도를 가질 수 있다.In an environment where labels are partially given, one of the multi-label classification methods can perform multi-label classification through the matrix completion algorithm. This method stacks the image feature vector and the label data partially given for each image to form a matrix. Since the image feature vector changes every time the discriminator model parameters are updated, and the matrix completion algorithm is re-run every time the discriminator model parameters are updated, this method can have considerable computational complexity.

도 2는 일 실시예에 따른 다중 레이블 분류 장치의 간략한 블록도이다.FIG. 2 is a simplified block diagram of a multi-label classification device according to one embodiment.

다중 레이블 분류 장치(100)는 오토 인코더의 행렬 채움(matrix completion)에 기초하여 이미지의 다중 레이블 분류를 수행할 수 있다.A multi-label classification device (100) can perform multi-label classification of an image based on matrix completion of an auto-encoder.

다중 레이블 분류 장치(100)는 오토 인코더의 행렬 채움에 기초하여 이미지 및 이미지의 부분 레이블 데이터로부터 이미지의 전체 레이블 데이터를 획득할 수 있다.A multi-label classification device (100) can obtain full label data of an image from an image and partial label data of the image based on matrix filling of an auto-encoder.

다중 레이블 분류 장치(100)는 이미지의 특징 벡터와 이미지의 부분 레이블 데이터에 기초하여 입력 행렬을 생성할 수 있고, 오토 인코더는 입력 행렬의 행렬 채움을 수행함으로써 이미지의 전체 레이블 데이터를 포함하는 출력 행렬을 생성할 수 있다.A multi-label classification device (100) can generate an input matrix based on a feature vector of an image and partial label data of the image, and an auto-encoder can generate an output matrix including the entire label data of the image by performing matrix filling of the input matrix.

다중 레이블 분류 장치(100)는 오토 인코더에 기초하여 우수한 정확도로 다중 레이블 분류를 수행할 수 있고, 전체 클래스의 수가 적은 환경에서도 낮은 복잡도, 높은 성능으로 다중 레이블 분류를 수행할 수 있다.A multi-label classification device (100) can perform multi-label classification with excellent accuracy based on an auto-encoder, and can perform multi-label classification with low complexity and high performance even in an environment where the number of total classes is small.

다중 레이블 분류 장치(100)는 오토 인코더의 출력 행렬에 기초하여 코사인 유사도 그래프를 생성할 수 있고, 코사인 유사도 그래프에 기초하여 오토 인코더 및 컨볼루셔널 뉴럴 네트워크를 동시에 학습시킴으로써 학습 시간을 감소시킬 수 있다.A multi-label classification device (100) can generate a cosine similarity graph based on an output matrix of an autoencoder, and can reduce learning time by simultaneously learning an autoencoder and a convolutional neural network based on the cosine similarity graph.

다중 레이블 분류 장치(100)는 레이블링에 요구되는 비용과 소모 시간을 절감시킬 수 있다.A multi-label classification device (100) can reduce the cost and time required for labeling.

다중 레이블 분류 장치(100)는 뉴럴 네트워크를 학습시킬 수 있다. 다중 레이블 분류 장치(100)는 학습된 뉴럴 네트워크에 기초하여 추론을 수행할 수 있다.A multi-label classification device (100) can train a neural network. The multi-label classification device (100) can perform inference based on the trained neural network.

뉴럴 네트워크(또는 인공 신경망)는 기계학습과 인지과학에서 생물학의 신경을 모방한 통계학적 학습 알고리즘을 포함할 수 있다. 뉴럴 네트워크는 시냅스의 결합으로 네트워크를 형성한 인공 뉴런(노드)이 학습을 통해 시냅스의 결합 세기를 변화시켜, 문제 해결 능력을 가지는 모델 전반을 의미할 수 있다.Neural networks (or artificial neural networks) can include statistical learning algorithms that mimic biological neurons in machine learning and cognitive science. A neural network can refer to a model in general in which artificial neurons (nodes) that form a network by combining synapses change the strength of the synaptic connection through learning, thereby having the ability to solve problems.

뉴럴 네트워크의 뉴런은 가중치 또는 바이어스의 조합을 포함할 수 있다. 뉴럴 네트워크는 하나 이상의 뉴런 또는 노드로 구성된 하나 이상의 레이어(layer)를 포함할 수 있다. 뉴럴 네트워크는 뉴런의 가중치를 학습을 통해 변화시킴으로써 임의의 입력으로부터 예측하고자 하는 결과를 추론할 수 있다.Neurons in a neural network can include a combination of weights or biases. A neural network can include one or more layers consisting of one or more neurons or nodes. A neural network can infer a desired outcome from an arbitrary input by changing the weights of neurons through learning.

뉴럴 네트워크는 심층 뉴럴 네트워크 (Deep Neural Network)를 포함할 수 있다. 뉴럴 네트워크는 CNN(Convolutional Neural Network), RNN(Recurrent Neural Network), 퍼셉트론(perceptron), 다층 퍼셉트론(multilayer perceptron), FF(Feed Forward), RBF(Radial Basis Network), DFF(Deep Feed Forward), LSTM(Long Short Term Memory), GRU(Gated Recurrent Unit), AE(Auto Encoder), VAE(Variational Auto Encoder), DAE(Denoising Auto Encoder), SAE(Sparse Auto Encoder), MC(Markov Chain), HN(Hopfield Network), BM(Boltzmann Machine), RBM(Restricted Boltzmann Machine), DBN(Depp Belief Network), DCN(Deep Convolutional Network), DN(Deconvolutional Network), DCIGN(Deep Convolutional Inverse Graphics Network), GAN(Generative Adversarial Network), LSM(Liquid State Machine), ELM(Extreme Learning Machine), ESN(Echo State Network), DRN(Deep Residual Network), DNC(Differentiable Neural Computer), NTM(Neural Turning Machine), CN(Capsule Network), KN(Kohonen Network) 및 AN(Attention Network)를 포함할 수 있다.Neural networks may include deep neural networks. Neural networks include CNN (Convolutional Neural Network), RNN (Recurrent Neural Network), perceptron, multilayer perceptron, FF (Feed Forward), RBF (Radial Basis Network), DFF (Deep Feed Forward), LSTM (Long Short Term Memory), GRU (Gated Recurrent Unit), AE (Auto Encoder), VAE (Variational Auto) Encoder), DAE (Denoising Auto Encoder), SAE (Sparse Auto Encoder), MC (Markov Chain), HN (Hopfield Network), BM (Boltzmann Machine), RBM (Restricted Boltzmann Machine), DBN (Depp Belief Network), DCN (Deep Convolutional Network), DN (Deconvolutional Network), DCIGN (Deep Convolutional Inverse Graphics Network), Generative Adversarial Network (GAN), Liquid State Machine (LSM), Extreme Learning Machine (ELM), It may include ESN (Echo State Network), DRN (Deep Residual Network), DNC (Differentiable Neural Computer), NTM (Neural Turning Machine), CN (Capsule Network), KN (Kohonen Network), and AN (Attention Network).

다중 레이블 분류 장치(100)는 마더보드(motherboard)와 같은 인쇄 회로 기판(printed circuit board(PCB)), 집적 회로(integrated circuit(IC)), 또는 SoC(system on chip)로 구현될 수 있다. 예를 들어, 다중 레이블 분류 장치(100)는 애플리케이션 프로세서(application processor)로 구현될 수 있다.The multi-label classification device (100) may be implemented as a printed circuit board (PCB) such as a motherboard, an integrated circuit (IC), or a system on chip (SoC). For example, the multi-label classification device (100) may be implemented as an application processor.

또한, 다중 레이블 분류 장치(100)는 PC(personal computer), 데이터 서버, 또는 휴대용 장치 내에 구현될 수 있다.Additionally, the multi-label classification device (100) can be implemented in a personal computer (PC), a data server, or a portable device.

휴대용 장치는 랩탑(laptop) 컴퓨터, 이동 전화기, 스마트 폰(smart phone), 태블릿(tablet) PC, 모바일 인터넷 디바이스(mobile internet device(MID)), PDA(personal digital assistant), EDA(enterprise digital assistant), 디지털 스틸 카메라(digital still camera), 디지털 비디오 카메라(digital video camera), PMP(portable multimedia player), PND(personal navigation device 또는 portable navigation device), 휴대용 게임 콘솔(handheld game console), e-북(e-book), 또는 스마트 디바이스(smart device)로 구현될 수 있다. 스마트 디바이스는 스마트 와치(smart watch), 스마트 밴드(smart band), 또는 스마트 링(smart ring)으로 구현될 수 있다.The portable device may be implemented as a laptop computer, a mobile phone, a smart phone, a tablet PC, a mobile internet device (MID), a personal digital assistant (PDA), an enterprise digital assistant (EDA), a digital still camera, a digital video camera, a portable multimedia player (PMP), a personal navigation device (PND) or portable navigation device (PND), a handheld game console, an e-book, or a smart device. The smart device may be implemented as a smart watch, a smart band, or a smart ring.

다중 레이블 분류 장치(100)는 메모리(110) 및 프로세서(130)를 포함할 수 있다.A multi-label classification device (100) may include a memory (110) and a processor (130).

메모리(110)는 프로세서(130)에 의해 실행가능한 인스트럭션들(또는 프로그램)을 저장할 수 있다. 예를 들어, 인스트럭션들은 프로세서의 동작 및/또는 프로세서의 각 구성의 동작을 실행하기 위한 인스트럭션들을 포함할 수 있다.The memory (110) can store instructions (or programs) executable by the processor (130). For example, the instructions can include instructions for executing operations of the processor and/or operations of each component of the processor.

메모리(110)는 휘발성 메모리 장치 또는 불휘발성 메모리 장치로 구현될 수 있다.The memory (110) may be implemented as a volatile memory device or a nonvolatile memory device.

휘발성 메모리 장치는 DRAM(dynamic random access memory), SRAM(static random access memory), T-RAM(thyristor RAM), Z-RAM(zero capacitor RAM), 또는 TTRAM(Twin Transistor RAM)으로 구현될 수 있다.Volatile memory devices can be implemented as dynamic random access memory (DRAM), static random access memory (SRAM), thyristor RAM (T-RAM), zero capacitor RAM (Z-RAM), or twin transistor RAM (TTRAM).

불휘발성 메모리 장치는 EEPROM(Electrically Erasable Programmable Read-Only Memory), 플래시(flash) 메모리, MRAM(Magnetic RAM), 스핀전달토크 MRAM(Spin-Transfer Torque(STT)-MRAM), Conductive Bridging RAM(CBRAM), FeRAM(Ferroelectric RAM), PRAM(Phase change RAM), 저항 메모리(Resistive RAM(RRAM)), 나노 튜브 RRAM(Nanotube RRAM), 폴리머 RAM(Polymer RAM(PoRAM)), 나노 부유 게이트 메모리(Nano Floating Gate Memory(NFGM)), 홀로그래픽 메모리(holographic memory), 분자 전자 메모리 소자(Molecular Electronic Memory Device), 또는 절연 저항 변화 메모리(Insulator Resistance Change Memory)로 구현될 수 있다.The nonvolatile memory device can be implemented as an Electrically Erasable Programmable Read-Only Memory (EEPROM), flash memory, Magnetic RAM (MRAM), Spin-Transfer Torque (STT)-MRAM, Conductive Bridging RAM (CBRAM), Ferroelectric RAM (FeRAM), Phase change RAM (PRAM), Resistive RAM (RRAM), Nanotube RRAM, Polymer RAM (PoRAM), Nano Floating Gate Memory (NFGM), holographic memory, Molecular Electronic Memory Device, or Insulator Resistance Change Memory.

프로세서(130)는 메모리(110)에 저장된 데이터를 처리할 수 있다. 프로세서(130)는 메모리(110)에 저장된 컴퓨터로 읽을 수 있는 코드(예를 들어, 소프트웨어) 및 프로세서(130)에 의해 유발된 인스트럭션(instruction)들을 실행할 수 있다.The processor (130) can process data stored in the memory (110). The processor (130) can execute computer-readable code (e.g., software) stored in the memory (110) and instructions generated by the processor (130).

프로세서(130)는 목적하는 동작들(desired operations)을 실행시키기 위한 물리적인 구조를 갖는 회로를 가지는 하드웨어로 구현된 데이터 처리 장치일 수 있다. 예를 들어, 목적하는 동작들은 프로그램에 포함된 코드(code) 또는 인스트럭션들(instructions)을 포함할 수 있다.The processor (130) may be a data processing device implemented as hardware having a circuit having a physical structure for executing desired operations. For example, the desired operations may include code or instructions included in a program.

예를 들어, 하드웨어로 구현된 데이터 처리 장치는 마이크로프로세서(microprocessor), 중앙 처리 장치(central processing unit), 프로세서 코어(processor core), 멀티-코어 프로세서(multi-core processor), 멀티프로세서(multiprocessor), ASIC(Application-Specific Integrated Circuit), FPGA(Field Programmable Gate Array)를 포함할 수 있다.For example, a data processing device implemented in hardware may include a microprocessor, a central processing unit, a processor core, a multi-core processor, a multiprocessor, an application-specific integrated circuit (ASIC), or a field programmable gate array (FPGA).

프로세서(130)는 이미지 및 이미지의 부분 레이블 데이터를 수신할 수 있다. 이미지는 복수의 객체 클래스에 매칭되는 객체를 포함할 수 있고, 부분 레이블 데이터는 복수의 객체 클래스의 일부만이 레이블된 데이터일 수 있다.The processor (130) can receive an image and partial label data of the image. The image can include objects matching multiple object classes, and the partial label data can be data in which only a portion of the multiple object classes is labeled.

프로세서(130)는 수신한 이미지를 컨볼루셔널 뉴럴 네트워크에 입력함으로써 이미지의 특징 벡터를 획득할 수 있다.The processor (130) can obtain a feature vector of the image by inputting the received image into a convolutional neural network.

프로세서(130)는 이미지의 특징 벡터 및 이미지의 부분 레이블 데이터에 기초하여 제1 행렬을 생성할 수 있다. 프로세서(130)는 하나의 열(column)에 특징 벡터와 부분 레이블 데이터를 순차적으로 입력함으로써 제1 행렬을 생성할 수 있다.The processor (130) can generate a first matrix based on the feature vector of the image and the partial label data of the image. The processor (130) can generate the first matrix by sequentially inputting the feature vector and the partial label data into one column.

프로세서(130)는 제1 행렬을 오토 인코더에 입력함으로써 이미지의 추정 레이블 데이터를 포함하는 제2 행렬을 획득할 수 있다. 제2 행렬은 이미지의 추정 특징 벡터 및 이미지의 추정 레이블 데이터를 포함할 수 있고, 추정 레이블 데이터는 복수의 객체 클레스가 전부 레이블된 데이터일 수 있다.The processor (130) can obtain a second matrix including estimated label data of the image by inputting the first matrix to the autoencoder. The second matrix can include an estimated feature vector of the image and an estimated label data of the image, and the estimated label data can be data in which a plurality of object classes are all labeled.

프로세서(130)는 이미지 및 제1 행렬에 기초하여 컨볼루셔널 뉴럴 네트워크와 오토 인코더를 공동으로 학습시킬 수 있다.The processor (130) can jointly train a convolutional neural network and an autoencoder based on an image and a first matrix.

프로세서(130)는 지도 학습(supervised learning) 또는 비지도 학습(unsupervised learning)을 통해 뉴럴 네트워크(예: 컨볼루셔널 뉴럴 네트워크, 오토 인코더)를 학습시킬 수 있다.The processor (130) can train a neural network (e.g., a convolutional neural network, an autoencoder) through supervised learning or unsupervised learning.

지도 학습이란 입력 데이터와 그에 대응하는 출력 데이터를 함께 뉴럴 네트워크에 입력하고, 입력 데이터에 대응하는 출력 데이터가 출력되도록 연결선들의 연결 가중치를 업데이트하는 방법이다.Supervised learning is a method of inputting input data and corresponding output data together into a neural network and updating the connection weights of the connecting lines so that the output data corresponding to the input data is output.

예를 들어, 프로세서(130)는 델타 규칙(delta rule)과 오류 역전파 학습(backpropagation learning) 등을 통해 인공 뉴런들 사이의 연결 가중치를 업데이트할 수 있다.For example, the processor (130) can update connection weights between artificial neurons through the delta rule and error backpropagation learning.

오류 역전파 학습은, 주어진 학습 데이터에 대해 전방 계산(forward computation)으로 오류를 추정한 후, 출력 층에서 시작하여 히든 층과 입력 층 방향으로 역으로 전진하여 추정한 오류를 전파하고, 오류를 줄이는 방향으로 연결 가중치를 업데이트하는 방법이다.Error backpropagation learning is a method that estimates errors through forward computation for given training data, then propagates the estimated errors backwards starting from the output layer toward the hidden layer and input layer, and updates the connection weights in a direction that reduces the errors.

뉴럴 네트워크의 처리는 입력 레이어, 히든 레이어, 출력 층의 방향으로 진행되지만, 오류 역전파 학습에서 연결 가중치의 업데이트 방향은 출력 레이어, 히든 레이어, 입력 레이어의 방향으로 진행될 수 있다.The processing of a neural network proceeds in the direction of the input layer, hidden layer, and output layer, but in error backpropagation learning, the update direction of the connection weights can proceed in the direction of the output layer, hidden layer, and input layer.

프로세서(130)는 현재 설정된 연결 가중치들이 얼마나 최적에 가까운지를 측정하기 위한 목적 함수(objective function)를 정의하고, 목적 함수의 결과에 기초하여 연결 가중치들을 계속 변경하고, 학습을 반복적으로 수행할 수 있다.The processor (130) can define an objective function to measure how close the currently set connection weights are to the optimum, continuously change the connection weights based on the result of the objective function, and repeatedly perform learning.

예를 들어, 목적 함수는 뉴럴 네트워크가 학습 데이터에 기초하여 실제 출력한 출력 값과 출력되기로 원하는 기대 값 간의 오류를 계산하기 위한 오류 함수일 수 있다. 프로세서(130)는 오류 함수의 값을 줄이는 방향으로 연결 가중치들을 업데이트할 수 있다. 이하에서는 프로세서(130)가 뉴럴 네트워크를 학습시키는 동작에 대하여 자세히 설명하도록 한다.For example, the objective function may be an error function for calculating the error between the actual output value output by the neural network based on the training data and the expected value to be output. The processor (130) may update the connection weights in a direction to reduce the value of the error function. Hereinafter, the operation of the processor (130) for training the neural network will be described in detail.

도 3은 다중 레이블 분류를 위한 뉴럴 네트워크 구조를 나타낸다.Figure 3 shows a neural network structure for multi-label classification.

프로세서(도 2의 프로세서(130))는 이미지 및 제1 행렬에 기초하여 컨볼루셔널 뉴럴 네트워크와 오토 인코더를 공동으로 학습시킬 수 있다.The processor (processor (130) of Fig. 2) performs a convolutional neural network based on the image and the first matrix. Wow autoencoder can be taught jointly.

프로세서(130)는 컨볼루셔널 뉴럴 네트워크를 학습시키기 위한 제1 손실 함수 및 제2 손실 함수를 계산할 수 있고, 오토 인코더를 학습시키기 위한 제3 손실 함수를 계산할 수 있다. 프로세서(130)는 제1 손실 함수, 제2 손실 함수, 및 제3손실 함수에 기초한 제4 손실 함수를 계산하고, 제4 손실 함수를 최소화하도록 컨볼루셔널 뉴럴 네트워크와 오토 인코더를 공동으로 학습시킬 수 있다.The processor (130) is a convolutional neural network The first loss function and the second loss function can be calculated to learn the autoencoder. A third loss function for learning can be calculated. The processor (130) can calculate a fourth loss function based on the first loss function, the second loss function, and the third loss function, and jointly train the convolutional neural network and the autoencoder to minimize the fourth loss function.

컨볼루셔널 뉴럴 네트워크는 를 만족할 수 있다. h는 이미지의 특징 벡터를 추출할 수 있고, 를 만족할 수 있다. l은 시그모이드와 결합된 완전 연결 레이어(fully-connected layer)일 수 있고, 를 만족할 수 있다.Convolutional neural networks are can satisfy h. h can extract the feature vector of the image, can satisfy l. l can be a fully-connected layer combined with a sigmoid, can satisfy.

프로세서(130)는 부분 레이블 데이터에 기초한 제1 손실 함수(예: 지도 손실, supervised loss) 및 유사도 그래프(similarity graph)에 기초한 제2 손실 함수(예: 정규화 손실, regularization loss)에 기초하여 컨볼루셔널 뉴럴 네트워크를 학습시킬 수 있다.The processor (130) performs a convolutional neural network based on a first loss function (e.g., supervised loss) based on partial label data and a second loss function (e.g., regularization loss) based on a similarity graph. can be taught.

프로세서(130)는 제1 손실 함수를 계산하기 위해 부분 레이블 데이터의 값을 선별하여 변환할 수 있다. 예를 들어, 프로세서(130)는 레이블 값(예: -1, 1)을 선별하고, 수학식 2를 통해 레이블 값(예: -1, 1)을 변환할 수 있다.The processor (130) may select and transform the values of the partial label data to calculate the first loss function. For example, the processor (130) may select the label values (e.g., -1, 1) and transform the label values (e.g., -1, 1) using mathematical expression 2.

[수학식 2][Mathematical formula 2]

수학식 2에서, 는 i번째 이미지의 j번째 클래스의 변환된 레이블 값이고, 는 i번째 이미지의 j번째 클래스의 레이블 값이고, 는 i번째 이미지에서 레이블 값이 존재하는(레이블 값이 0이 아닌) 클래스의 집합일 수 있다.In mathematical expression 2, is the transformed label value of the jth class of the ith image, is the label value of the jth class of the ith image, can be a set of classes for which a label value exists (the label value is not 0) in the i-th image.

프로세서(130)는 수학식 3을 통해 제1 손실 함수(예: 지도 손실, supervised loss)를 계산할 수 있다.The processor (130) uses the first loss function through mathematical expression 3. (e.g. supervised loss, supervised loss) can be computed.

[수학식 3][Mathematical Formula 3]

수학식 3에서, 는 i번째 이미지의 j번째 클래스의 변환된 레이블 값이고, 는 i번째 이미지의 j번째 클래스의 예측 조건부 확률(predicted conditional probability) 일 수 있다.In mathematical expression 3, is the transformed label value of the jth class of the ith image, is the predicted conditional probability of the jth class of the ith image. It could be.

프로세서(130)는 유사도 그래프에 기초하여 제2 손실 함수를 계산할 수 있고, 제2 손실 함수에 기초하여 컨볼루셔널 뉴럴 네트워크를 학습시킬 수 있다.The processor (130) can calculate a second loss function based on the similarity graph and train a convolutional neural network based on the second loss function.

프로세서(130)는 제2 손실 함수를 통해 저차원 형상 공간에서 인접한 데이터 점을 가까이 임베딩할 수 있고, 구별되는 이미지뿐만 아니라 구별되는 레이블을 고려할 수 있다. 구별되는 레이블에 걸친 상관관계는 유사도 그래프에 암시적으로(implicit) 포착될 수 있다.The processor (130) can embed adjacent data points in the low-dimensional shape space closely through a second loss function, and can consider not only distinct images but also distinct labels. The correlation across the distinct labels can be implicitly captured in the similarity graph.

다중 레이블 분류에서 유사도 그래프의 정확도는 뉴럴 네트워크의 판별 성능에 중요한 영향을 끼칠 수 있다. 다중 레이블 분류 장치(100)는 오토 인코더의 행렬 채움(matrix completion)을 통해 유사도 그래프의 정확도를 개선할 수 있다.In multi-label classification, the accuracy of the similarity graph can have a significant impact on the discrimination performance of the neural network. The multi-label classification device (100) can improve the accuracy of the similarity graph through matrix completion of the auto-encoder.

행렬 채움 알고리즘은 부분 레이블 데이터에 포함된 레이블 값(예: 0)을 채워주는 알고리즘일 수 있다. 유사도 그래프 W를 구성할 때 종래의 방법은 기존의 행렬 채움 알고리즘 또는 OMP 기반 그래프를 이용할 수 있고, 종래의 방법은 엄청난 계산 오버헤드가 요구될 수 있다. 종래의 방법은 뉴럴 네트워크가 업데이트될 때마다 유사도 그래프 W를 재교육할 필요가 있고, 종래의 방법은 상당한 계산 복잡성 및 시간 소모를 수반할 수 있다.The matrix filling algorithm may be an algorithm that fills in label values (e.g., 0) included in partial label data. When constructing a similarity graph W, conventional methods may utilize conventional matrix filling algorithms or OMP-based graphs, and conventional methods may require enormous computational overhead. Conventional methods require retraining the similarity graph W every time a neural network is updated, and conventional methods may entail significant computational complexity and time consumption.

다중 레이블 분류 장치(100)는 오토 인코더를 사용하여 행렬 채움 문제를 해결하고, 예측된 결과를 바탕으로 유사도 그래프를 설계할 수 있다. 다중 레이블 분류 장치(100)는 또한 컨볼루셔널 뉴럴 네트워크와 오토 인코더를 동시에 학습시킴으로써 학습 속도를 크게 개선할 수 있다.The multi-label classification device (100) can solve the matrix filling problem using an auto-encoder and design a similarity graph based on the predicted results. The multi-label classification device (100) can also significantly improve the learning speed by simultaneously training a convolutional neural network and an auto-encoder.

이하에서는 오토 인코더의 행렬 채움에 기초하여 제2 손실 함수 및 제3 손실 함수를 계산하는 동작을 설명하도록 한다.Below is the autoencoder Let us explain the operation of computing the second loss function and the third loss function based on the matrix filling.

도 4는 도 3에 도시된 오토 인코더의 일 예를 나타낸다.Figure 4 shows an example of the autoencoder illustrated in Figure 3.

오토 인코더(예: 노이즈 제거 오토 인코더, denoising autoencoder)는 행렬 Z의 행렬 채움을 수행함으로써 행렬를 출력할 수 있다. 오토 인코더는 이미지의 특징 벡터X 및 이미지의 부분 레이블 데이터 Y로부터 생성된 행렬 Z에 기초하여 추정 특징 벡터 및 추정 레이블 데이터를 포함하는 행렬를 출력할 수 있다. 부분 레이블 데이터Y는 복수의 객체 클래스의 일부만이 레이블된 데이터이고, 추정 레이블 데이터는 복수의 객체 클레스가 전부 레이블된 데이터일 수 있다.Autoencoders (e.g. denoising autoencoders) is a matrix filling operation on the matrix Z. can output. Autoencoder is an estimated feature vector based on the feature vector X of the image and the matrix Z generated from the partial label data Y of the image. and estimated label data A matrix containing can output. Partial label data Y is data where only part of multiple object classes are labeled, and estimated label data can be data where multiple object classes are all labeled.

특징 벡터 X 및 부분 레이블 데이터 Y는 수학식 4를 만족할 수 있고, 프로세서(130)는 특징 벡터 X 및 부분 레이블 데이터Y를 하나의 열(column)로 쌓아 행렬 Z를 생성할 수 있다.The feature vector X and the partial label data Y can satisfy mathematical expression 4, and the processor (130) can stack the feature vector X and the partial label data Y into one column to generate a matrix Z.

[수학식 4][Mathematical formula 4]

수학식 4에서, 은 m번째 이미지의 레이블 데이터이고, 는 m번째 이미지의 특징 벡터이고, c는 클래스의 개수이고, m은 이미지의 개수일 수 있다. 부분 레이블 데이터 Y는 복수의 객체 클래스의 일부만이 레이블된 데이터로써, 0의 값을 갖는 클래스를 포함할 수 있다. 프로세서(130)는 노이즈 제거 오토 인코더(denoising autoencoder)의 행렬 채움에 기초하여 0의 값을 갖는 클래스의 레이블 값을 예측할 수 있다.In mathematical expression 4, is the label data of the mth image, is a feature vector of the mth image, c is the number of classes, and m may be the number of images. Partial label data Y is data in which only a part of multiple object classes is labeled, and may include a class having a value of 0. The processor (130) may predict the label value of the class having a value of 0 based on matrix filling of a denoising autoencoder.

오토 인코더가 레이블 값을 정확하게 예측하게 하기 위해, 프로세서(130)는 제3 손실 함수에 기초하여 오토 인코더를 학습시킬 수 있다. 프로세서(130)는 수학식 5를 통해 제3 손실 함수를 계산할 수 있다.To enable the autoencoder to accurately predict the label value, the processor (130) can train the autoencoder based on the third loss function. The processor (130) can calculate the third loss function through mathematical expression 5.

[수학식 5][Mathematical Formula 5]

수학식 5에서, 는 레이블 데이터의 재구성 손실과 특징 벡터의 재구성 손실의 균형을 맞추는 하이퍼 파라미터일 수 있고, 는 양의 레이블과 음의 레이블의 가중치를 고려(예: 데이터 집합이 음의 레이블 값(부재)를 갖는 경우가 더 많음)하는 마스크 행렬(mask matrix)일 수 있다.In mathematical expression 5, can be a hyperparameter that balances the reconstruction loss of label data and the reconstruction loss of feature vectors. can be a mask matrix that considers the weights of positive and negative labels (e.g., the dataset is more likely to have negative label values (absences)).

프로세서(130)는 오토 인코더가 출력한 추정 레이블 데이터에 기초하여 이미지 간의 유사도를 나타내는 유사도 그래프를 생성할 수 있고, 유사도 그래프에 기초하여 컨볼루셔널 뉴럴 네트워크를 학습시키기 위한 제2 손실 함수를 계산할 수 있다.The processor (130) estimates the label data output by the autoencoder. A similarity graph representing the similarity between images can be generated based on the similarity graph, and a second loss function for training a convolutional neural network can be calculated based on the similarity graph.

프로세서(130)는 추정 레이블 데이터에 기초하여 유사도 그래프(similarity graph)를 생성할 수 있다.The processor (130) estimates label data A similarity graph can be created based on this.

유사도 그래프는 다수결 방식에 기초하여 생성될 수 있다. 다수결 방식은 동일한 레이블 쌍(부재-부재, 존재-존재)의 수가 c/2를 넘어갈 경우 i번째 이미지와 j번째 이미지가 비슷하다고 판단하여 를 리턴하고, 그렇지 않을 경우 경우 i번째 이미지와 j번째 이미지가 비슷하지 않다고 판단하여 를 리턴함으로써 유사도 그래프를 생성할 수 있다. 하지만 다수결 방식은 이미지 간의 상관 관계를 제대로 포착하지 못할 수 있고, 다수결 방식에 기초하여 생성된 유사도 그래프를 활용한다면 최종 성능이 저하될 수 있다.The similarity graph can be generated based on the majority voting method. The majority voting method determines that the i-th image and the j-th image are similar if the number of identical label pairs (absence-absence, presence-presence) exceeds c/2. Returns , otherwise it is determined that the i-th image and the j-th image are not similar. By returning , a similarity graph can be generated. However, the majority voting method may not properly capture the correlation between images, and if a similarity graph generated based on the majority voting method is utilized, the final performance may be degraded.

프로세서(130)는 오토 인코더로부터 획득한 추정 레이블 데이터에 기초하여 코사인 유사도 그래프(cosine similarity graph)를 생성할 수 있다. 추정 레이블 데이터는 [-1, +1] 범위 내에 존재하지 않을 수 있고, 프로세서(130)는 추정 레이블 데이터의 범위를 [-1, +1]로 제한함으로써 클리핑(clipping) 레이블 데이터을 획득할 수 있다.The processor (130) obtains estimated label data from an autoencoder. A cosine similarity graph can be generated based on the estimated label data. may not exist within the range [-1, +1], and the processor (130) estimates the label data Clipping label data by limiting the range to [-1, +1] can be obtained.

프로세서(130)는 클리핑 레이블 데이터에 기초하여 코사인 유사도 그래프를 생성할 수 있고, 코사인 유사도 그래프는 수학식 6을 만족할 수 있다.The processor (130) clips the label data Based on this, a cosine similarity graph can be generated, and a cosine similarity graph can satisfy mathematical expression 6.

[수학식 6][Mathematical Formula 6]

수학식 6에서, 는 i번째 이미지의 클리핑 레이블 데이터이고, 는 j번째 이미지의 클리핑 레이블 데이터일 수 있다.In mathematical expression 6, is the clipping label data of the i-th image, can be the clipping label data of the jth image.

프로세서(130)는 수학식 7을 통해 제2 손실 함수(예: 정규화 손실, regularization loss)를 계산할 수 있다.The processor (130) uses the second loss function through mathematical expression 7. (e.g. regularization loss) can be calculated.

[수학식 7][Mathematical formula 7]

수학식 7에서, 는 i번째 이미지의 특징 벡터이고, 는 j번째 이미지의 특징 벡터이고, 는 코사인 유사도 그래프의 (i, j) 원소 값으로서, i번째 이미지와 j번째 이미지의 유사도를 나타낸 값이고, margin은 특징 벡터 간의 거리를 조절하는 임계 값 역할을 하는 사전 정의된 값이고, 는 이미지 유사도에 대한 임계 값일 수 있다.In mathematical expression 7, is the feature vector of the i-th image, is the feature vector of the jth image, is the (i, j) element value of the cosine similarity graph, which represents the similarity between the i-th image and the j-th image, and margin is a predefined value that acts as a threshold value that adjusts the distance between feature vectors. can be a threshold for image similarity.

프로세서(130)는 값이 보다 큰 경우 i번째 이미지와 j번째 이미지가 가깝다고 판단하여, 특징 벡터 공간에서 i번째 이미지와 j번째 이미지가 가깝게 임베딩되도록 컨볼루셔널 뉴럴 네트워크를 학습시킬 수 있다.The processor (130) The value If the i-th image and the j-th image are judged to be close, a convolutional neural network can be trained to embed the i-th image and the j-th image closely in the feature vector space.

프로세서(130)는 제2 손실 함수를 통해, 저차원 형상 공간에서 인접한 데이터 점을 가까이 임베딩할 수 있고, 구별되는 이미지뿐만 아니라 구별되는 레이블을 고려할 수 있다. 구별되는 레이블에 걸친 상관관계는 유사도 그래프에 암시적으로(implicit) 포착될 수 있다.The processor (130) is a second loss function Through this, we can embed adjacent data points in a low-dimensional feature space, and consider not only distinct images but also distinct labels. Correlations across distinct labels can be implicitly captured in a similarity graph.

프로세서(130)는 제1 손실 함수, 제2 손실 함수, 및 제3손실 함수에 기초한 제4 손실 함수를 수학식 8을 통해 계산할 수 있다.The processor (130) is a first loss function , the second loss function , and the third loss function The fourth loss function based on can be calculated using mathematical expression 8.

[수학식 8][Mathematical formula 8]

수학식 8에서, 는 제2 손실 함수를 조절하는 하이퍼 파라미터이고, 는 제3 손실 함수를 조절하는 하이퍼 파라미터일 수 있다.In mathematical expression 8, is the second loss function is a hyperparameter that controls, is the third loss function may be a hyperparameter that controls the .

프로세서(130)는 제4 손실 함수를 최소화하도록 컨볼루셔널 뉴럴 네트워크와 오토 인코더를 공동으로 학습시킬 수 있다. 프로세서(130)는 컨볼루셔널 뉴럴 네트워크와 오토 인코더를 공동으로 학습시킴으로써 계산 복잡도를 크게 줄일 수 있다. 이하에서는 다중 레이블 분류 장치(100)의 성능에 대하여 설명하도록 한다.The processor (130) can jointly train the convolutional neural network and the auto-encoder to minimize the fourth loss function. The processor (130) can significantly reduce the computational complexity by jointly training the convolutional neural network and the auto-encoder. Hereinafter, the performance of the multi-label classification device (100) will be described.

도 5는 도 3에 도시된 오토 인코더의 다른 예를 나타낸다.Figure 5 shows another example of the autoencoder illustrated in Figure 3.

다중 레이블 분류 장치(100)의 성능 검증을 위해 다양한 새(bird) 종류를 포함하는 11,788개의 이미지 데이터 세트가 이용될 수 있다. 이미지 데이터 세트는 학습/검증/테스터 데이터 각각 5000/2300/5000장으로 나눠져 있을 수 있다. 각 이미지는 부리 모양, 색상, 및 길이와 같은 312차원을 나타내는 다중 레이블을 가질 수 있고, 레이블 데이터는 클래스마다 동일한 개수를 맞추어 랜덤으로 선정된 것일 수 있다.In order to verify the performance of the multi-label classifier (100), a data set of 11,788 images containing various types of birds can be used. The image data set can be divided into 5000/2300/5000 images for learning/verification/tester data, respectively. Each image can have multiple labels representing 312 dimensions such as beak shape, color, and length, and the label data can be randomly selected with the same number for each class.

다중 레이블 분류 장치(100)는 행렬 채움을 위한 하이퍼 파라미터로 을 사용할 수 있다. 도 5를 참조하면, 행렬 채움을 수행하는 오토 인코더는 총 4개의 은닉 계층으로 구성될 수 있고, 각 유닛의 수는 [2048,2048,1024,1024]일 수 있다. 오토 인코더에서 인코더 파트는 non-linear activation을 사용할 수 있고, 디코더 파트는 linear activation을 사용할 수 있다. 오토 인코더는 도 5에 개시된 구조에 한정되지 않는다.The multi-label classifier (100) has hyperparameters for matrix filling. Referring to Fig. 5, the autoencoder performing matrix filling can be composed of a total of 4 hidden layers, and the number of each unit can be [2048, 2048, 1024, 1024]. In the autoencoder, the encoder part can use non-linear activation, and the decoder part can use linear activation. The autoencoder is not limited to the structure disclosed in Fig. 5.

도 6는 도 2에 도시된 다중 레이블 분류 장치의 성능의 일 예를 나타내고, 도 7는 도 2에 도시된 다중 레이블 분류 장치의 성능의 다른 예를 나타낸다.Figure 6 shows an example of the performance of the multi-label classifier shown in Figure 2, and Figure 7 shows another example of the performance of the multi-label classifier shown in Figure 2.

도 6을 참조하면, 다중 레이블 누락 비율에 따른 mAP(%) 성능을 확인할 수 있다. 레이블 누락 비율(missing ratio)은 312개의 레이블 중에서 랜덤하게 누락된 것일 수 있다. 다중 레이블 분류 장치(100)는 모든 레이블 누락 비율에서 최고 성능을 가질 수 있다.Referring to Fig. 6, the mAP(%) performance according to the multi-label missing ratio can be confirmed. The label missing ratio can be randomly missing among 312 labels. The multi-label classification device (100) can have the best performance in all label missing ratios.

도 7의 (a)를 참조하면, 계산 복잡도, 학습 시간, 및 파라미터의 개수를 확인할 수 있다. 다중 레이블 분류 장치(100)에서 계산되는 계산 복잡도는 오토 인코더의 입력 데이터 d+l크기 및 오토 인코더의 인코더 출력 차원 의 곱에 비례할 수 있다. m이 매우 클 때, 종래의 기술은 개별 이미지에 대한 OMP 작동에 기초하여 훨씬 더 높은 시간 복잡성을 가질 수 있다. 다중 레이블 분류 장치(100)의 프레임워크는 약 5배 더 빠른 학습 시간을 제공할 수 있다. 다중 레이블 분류 장치(100)는 판별기 과 오토인코더 를 동시에 학습할 수 있고, 다중 레이블 분류 장치(100)는 매 반복(iteration)마다 그래프 설계 과정과 특징 추출 과정이 분리되어서 이뤄지는 종래의 기술보다 큰 시간을 줄일 수 있다. 다중 레이블 분류 장치(100)는 오토 인코더를 도입함으로써 더 많은 모델 파라미터를 요구하지만, 추가된 모델 파라미터의 개수는 컨볼루셔널 뉴럴 네트워크의 모델 파라미터의 개수에 비해 훨씬 작기 때문에 추가된 모델 파라미터의 개수는 무시될 수 있다.Referring to (a) of Fig. 7, the computational complexity, learning time, and number of parameters can be confirmed. The computational complexity calculated in the multi-label classification device (100) is the size of the input data d+l of the auto-encoder and the encoder output dimension of the auto-encoder. can be proportional to the product of m. When m is very large, conventional techniques can have much higher time complexity based on OMP operation on individual images. The framework of the multi-label classifier (100) can provide about 5 times faster learning time. The multi-label classifier (100) is a discriminator and autoencoder can learn simultaneously, and the multi-label classification device (100) can reduce the time significantly compared to the conventional technology in which the graph design process and the feature extraction process are separated for each iteration. The multi-label classification device (100) requires more model parameters by introducing an auto-encoder, but the number of added model parameters is much smaller than the number of model parameters of the convolutional neural network, so the number of added model parameters can be ignored.

도 7의 (b)를 참조하면, 손실 함수에 기초한 다중 레이블 분류 장치의 성능을 확인할 수 있다. 다중 레이블 분류 장치(100)는 제1 손실 함수, 제2 손실 함수, 및 제3 손실 함수에 기초하여 계산된 제4 손실 함수를 최소화하도록 컨볼루셔널 뉴럴 네트워크와 오토 인코더를 공동으로 학습시킴으로써, 다중 레이블 분류 장치(100)는 최고의 성능을 획득할 수 있다.Referring to (b) of Fig. 7, the performance of a multi-label classification device based on a loss function can be verified. The multi-label classification device (100) can obtain the best performance by jointly training a convolutional neural network and an auto-encoder to minimize a fourth loss function calculated based on the first loss function, the second loss function, and the third loss function.

도 8은 일 실시예에 따른 다중 레이블 분류 방법의 흐름도이다.Figure 8 is a flowchart of a multi-label classification method according to one embodiment.

도 8을 참조하면, 동작 810에서, 프로세서(도 2의 프로세서(130))는 이미지 및 이미지의 부분 레이블 데이터를 수신할 수 있다. 이미지는 복수의 객체 클래스에 매칭되는 객체를 포함할 수 있고, 부분 레이블 데이터는 복수의 객체 클래스의 일부만이 레이블된 데이터일 수 있다.Referring to FIG. 8, in operation 810, a processor (processor (130) of FIG. 2) may receive an image and partial label data of the image. The image may include objects matching multiple object classes, and the partial label data may be data in which only a portion of the multiple object classes is labeled.

동작 830에서, 프로세서(130)는 수신한 이미지를 컨볼루셔널 뉴럴 네트워크에 입력함으로써 이미지의 특징 벡터를 획득할 수 있다.In operation 830, the processor (130) can obtain a feature vector of the image by inputting the received image into a convolutional neural network.

동작 850에서, 프로세서(130)는 이미지의 특징 벡터 및 이미지의 부분 레이블 데이터에 기초하여 제1 행렬을 생성할 수 있다. 프로세서(130)는 하나의 열(column)에 특징 벡터와 부분 레이블 데이터를 순차적으로 입력함으로써 제1 행렬을 생성할 수 있다.In operation 850, the processor (130) can generate a first matrix based on the feature vector of the image and the partial label data of the image. The processor (130) can generate the first matrix by sequentially inputting the feature vector and the partial label data into one column.

동작 870에서, 프로세서(130)는 제1 행렬을 오토 인코더에 입력함으로써 이미지의 추정 레이블 데이터를 포함하는 제2 행렬을 획득할 수 있다. 제2 행렬은 이미지의 추정 특징 벡터 및 이미지의 추정 레이블 데이터를 포함할 수 있고, 추정 레이블 데이터는 복수의 객체 클레스가 전부 레이블된 데이터일 수 있다.In operation 870, the processor (130) may obtain a second matrix including estimated label data of the image by inputting the first matrix to an autoencoder. The second matrix may include an estimated feature vector of the image and an estimated label data of the image, and the estimated label data may be data in which a plurality of object classes are all labeled.

다중 레이블 분류 장치(100)는 오토 인코더의 행렬 채움에 기초하여 이미지의 다중 레이블 분류를 수행할 수 있다.A multi-label classification device (100) can perform multi-label classification of an image based on matrix filling of an auto-encoder.

다중 레이블 분류 장치(100)는 이미지의 특징 벡터와 이미지의 부분 레이블 데이터에 기초하여 입력 행렬(예: 제1 행렬)을 생성할 수 있고, 오토 인코더는 입력 행렬의 행렬 채움을 수행함으로써 이미지의 전체 레이블 데이터를 포함하는 출력 행렬(예: 제2 행렬)을 생성할 수 있다.A multi-label classification device (100) can generate an input matrix (e.g., a first matrix) based on a feature vector of an image and partial label data of the image, and an auto-encoder can generate an output matrix (e.g., a second matrix) including the entire label data of the image by performing matrix filling of the input matrix.

이상에서 설명된 실시예들은 하드웨어 구성요소, 소프트웨어 구성요소, 및/또는 하드웨어 구성요소 및 소프트웨어 구성요소의 조합으로 구현될 수 있다. 예를 들어, 실시예들에서 설명된 장치, 방법 및 구성요소는, 예를 들어, 프로세서, 콘트롤러, ALU(arithmetic logic unit), 디지털 신호 프로세서(digital signal processor), 마이크로컴퓨터, FPGA(field programmable gate array), PLU(programmable logic unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(OS) 및 상기 운영 체제 상에서 수행되는 소프트웨어 애플리케이션을 수행할 수 있다. 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다. 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술분야에서 통상의 지식을 가진 자는, 처리 장치가 복수 개의 처리 요소(processing element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다. 예를 들어, 처리 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 컨트롤러를 포함할 수 있다. 또한, 병렬 프로세서(parallel processor)와 같은, 다른 처리 구성(processing configuration)도 가능하다.The embodiments described above may be implemented as hardware components, software components, and/or a combination of hardware components and software components. For example, the devices, methods, and components described in the embodiments may be implemented using a general-purpose computer or a special-purpose computer, such as, for example, a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable gate array (FPGA), a programmable logic unit (PLU), a microprocessor, or any other device capable of executing instructions and responding to them. The processing device may execute an operating system (OS) and software applications running on the OS. In addition, the processing device may access, store, manipulate, process, and generate data in response to the execution of the software. For ease of understanding, the processing device is sometimes described as being used alone, but those skilled in the art will appreciate that the processing device may include multiple processing elements and/or multiple types of processing elements. For example, a processing device may include multiple processors, or a processor and a controller. Other processing configurations, such as parallel processors, are also possible.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다. 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상 장치(virtual equipment), 컴퓨터 저장 매체 또는 장치, 또는 전송되는 신호 파(signal wave)에 영구적으로, 또는 일시적으로 구체화(embody)될 수 있다. 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.The software may include a computer program, code, instructions, or a combination of one or more of these, which may configure a processing device to perform a desired operation or may independently or collectively command the processing device. The software and/or data may be permanently or temporarily embodied in any type of machine, component, physical device, virtual equipment, computer storage medium or device, or transmitted signal waves, for interpretation by the processing device or for providing instructions or data to the processing device. The software may also be distributed over network-connected computer systems and stored or executed in a distributed manner. The software and data may be stored on a computer-readable recording medium.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 저장할 수 있으며 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다.The method according to the embodiment may be implemented in the form of program commands that can be executed through various computer means and recorded on a computer-readable medium. The computer-readable medium may store program commands, data files, data structures, etc., alone or in combination, and the program commands recorded on the medium may be those specially designed and configured for the embodiment or may be those known to and available to those skilled in the art of computer software. Examples of the computer-readable recording medium include magnetic media such as hard disks, floppy disks, and magnetic tapes, optical media such as CD-ROMs and DVDs, magneto-optical media such as floptical disks, and hardware devices specially configured to store and execute program commands such as ROMs, RAMs, and flash memories. Examples of program commands include not only machine language codes generated by a compiler, but also high-level language codes that can be executed by a computer using an interpreter, etc.

위에서 설명한 하드웨어 장치는 실시예의 동작을 수행하기 위해 하나 또는 복수의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The hardware devices described above may be configured to operate as one or more software modules to perform the operations of the embodiments, and vice versa.

이상과 같이 실시예들이 비록 한정된 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 이를 기초로 다양한 기술적 수정 및 변형을 적용할 수 있다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.Although the embodiments have been described with limited drawings as described above, those skilled in the art can apply various technical modifications and variations based on them. For example, even if the described techniques are performed in a different order than the described method, and/or the components of the described system, structure, device, circuit, etc. are combined or combined in a different form than the described method, or are replaced or substituted by other components or equivalents, appropriate results can be achieved.

그러므로, 다른 구현들, 다른 실시예들 및 특허청구범위와 균등한 것들도 후술하는 특허청구범위의 범위에 속한다.Therefore, other implementations, other embodiments, and equivalents to the claims are also included in the scope of the claims described below.

Claims

An operation for receiving an image and partial label data of said image;
An operation of obtaining a feature vector of the image by inputting the image into a convolutional neural network;
An operation of generating a first matrix based on a feature vector of the image and partial label data of the image;
An operation of obtaining a second matrix including estimated label data of the image by inputting the first matrix into an autoencoder; and
An operation of jointly training the convolutional neural network and the autoencoder based on the image and the first matrix,
The above joint learning actions are:
An operation of calculating a first loss function based on the above partial label data;
An operation of generating a similarity graph representing the similarity between multiple images based on the estimated label data;
An operation of calculating a second loss function based on the above similarity graph; and
An operation of calculating a third loss function based on the estimated label data and the estimated feature vector included in the second matrix.
A multi-label classification method, including:

In the first paragraph,
The image above is,
Contains objects that match multiple object classes,
The above partial label data is,
Only some of the above multiple object classes are labeled data,
The above estimated label data is,
The above multiple object classes are all labeled data,
Multi-label classification methods.

In the first paragraph,
The operation of generating the above first matrix is:
An operation of sequentially inputting the feature vector and the partial label data into one column of the first matrix.
A multi-label classification method, including:

delete

In the first paragraph,
The above similarity graph is,
Cosine similarity graph,
Multi-label classification methods.

delete

In the first paragraph,
The above joint learning actions are:
An operation of calculating a fourth loss function based on the first loss function, the second loss function, and the third loss function; and
An operation of jointly training the convolutional neural network and the autoencoder to minimize the fourth loss function.
A multi-label classification method that further includes.

A computer program stored on a computer-readable recording medium for executing the method of any one of claims 1 to 3, 7, and 9 in combination with hardware.

Memory that stores one or more instructions; and
Processor for executing the above instructions
Including,
When the above instruction is executed, the processor,
Receiving image and partial label data of said image,
By inputting the above image into a convolutional neural network, a feature vector of the above image is obtained,
Generate a first matrix based on the feature vector of the image and the partial label data of the image,
By inputting the first matrix to an autoencoder, a second matrix including estimated label data of the image is obtained,
Jointly train the convolutional neural network and the autoencoder based on the above image and the first matrix,
Compute the first loss function based on the above partial label data,
A similarity graph representing the similarity between multiple images is generated based on the estimated label data,
Compute the second loss function based on the above similarity graph,
Computing a third loss function based on the estimated label data and the estimated feature vector included in the second matrix,
Multi-label classifier.

In Article 11,
The image above is,
Contains objects that match multiple object classes,
The above partial label data is,
Only some of the above multiple object classes are labeled data,
The above estimated label data is,
The above multiple object classes are all labeled data,
Multi-label classifier.

In Article 11,
The above processor,
Sequentially inputting the feature vector and the partial label data into one column of the first matrix.
Multi-label classifier.

delete

In Article 11,
The above similarity graph is,
Cosine similarity graph,
Multi-label classifier.

delete

In Article 11,
The above processor,
Compute a fourth loss function based on the first loss function, the second loss function, and the third loss function,
Jointly training the convolutional neural network and the autoencoder to minimize the fourth loss function.
Multi-label classifier.