KR102803897B1

KR102803897B1 - Method and apparatus for training deep learning model to infer rule from visual information

Info

Publication number: KR102803897B1
Application number: KR1020230193663A
Authority: KR
Inventors: 최승규; 김광수
Original assignee: 성균관대학교산학협력단
Priority date: 2023-12-27
Filing date: 2023-12-27
Publication date: 2025-05-02
Anticipated expiration: 2043-12-27

Abstract

본 발명의 일 실시예에 따른 시각 추론을 위한 딥러닝 모델을 학습시키는 방법은, 제1 규칙 이미지 정보 및 제1 정답 후보 이미지 정보를 포함하는 제1 문제 정보를 획득하는 단계; 제2 문제 정보에 포함된 제2 정답 후보 이미지 정보를 이용하여, 상기 제1 문제 정보로부터 제1 정답 행렬 정보, 제1 정답 후보 행렬 정보 및 제1 오답 행렬 정보를 포함하는 제1 학습 행렬 정보를 결정하는 단계; 상기 제1 학습 행렬 정보를 기초로 제1 의사 라벨 정보를 결정하는 단계; 및 상기 제1 의사 라벨 정보에 기초하여, 상기 제1 학습 행렬 정보를 입력 받아 상기 제1 정답 행렬 정보, 상기 제1 정답 후보 행렬 정보 및 상기 제1 오답 행렬 정보를 분류하도록 상기 딥러닝 모델을 학습시키는 단계를 포함할 수 있다.A method for training a deep learning model for visual inference according to one embodiment of the present invention may include: obtaining first problem information including first rule image information and first correct answer candidate image information; determining first learning matrix information including first correct answer matrix information, first correct answer candidate matrix information, and first incorrect answer matrix information from the first problem information using second correct answer candidate image information included in the second problem information; determining first pseudo-label information based on the first learning matrix information; and training the deep learning model to classify the first correct answer matrix information, the first correct answer candidate matrix information, and the first incorrect answer matrix information by receiving the first learning matrix information based on the first pseudo-label information.

Description

{METHOD AND APPARATUS FOR TRAINING DEEP LEARNING MODEL TO INFER RULE FROM VISUAL INFORMATION}

본 발명은 시각 정보로부터 규칙을 추론하기 위해 딥러닝 모델을 학습시키는 방법 및 장치에 관한 것이다.The present invention relates to a method and apparatus for training a deep learning model to infer rules from visual information.

레이븐 매트릭스 검사(RPM)는 행렬 추론 능력을 평가하기 위한 심리학적 검사 도구 중 하나로서, 인간의 추론, 문제 해결, 추상적 사고 능력을 측정하는 데 이용된다. 또한, 레이븐 매트릭스 검사는 언어 능력에 크게 의존하지 않고 다양한 연령과 언어 그룹에서 사용이 가능하며, 주로 지능 검사, 교육 및 직업 상황에서 인재의 잠재력을 평가하는 데에도 활용될 수 있다.The Raven Matrices Test (RPM) is one of the psychological test tools for assessing matrix reasoning ability, and is used to measure human reasoning, problem solving, and abstract thinking ability. In addition, the Raven Matrices Test does not rely heavily on language ability, and can be used in various age and language groups, and can also be used mainly for intelligence testing, and to assess talent potential in educational and occupational situations.

최근, 레이븐 매트릭스 검사는 인공지능 모델이 주어진 문제에서 규칙을 발견하는 시각적 추론 능력을 측정하는 데 사용되고 있다. 이에 따라, 인공지능 모델의 시각적 추론 능력을 향상시키기 위해 많은 시도들이 이어지고 있다.Recently, Raven's matrix test has been used to measure the visual reasoning ability of AI models to discover rules in a given problem. Accordingly, many attempts have been made to improve the visual reasoning ability of AI models.

다만, 종래의 기술에 따르면, 입력되는 데이터 간의 도메인의 차이를 고려하지 않음에 따라, 인공지능 모델의 학습 과정에 있어서 정답 라벨과 오답 라벨의 개수가 불균형하여 인공지능 모델이 과잉 예측을 하는 경향이 있었다.However, according to conventional technology, since the difference in domain between input data was not considered, the number of correct and incorrect labels was imbalanced during the learning process of the artificial intelligence model, which resulted in the artificial intelligence model tending to over-predict.

이에 따라, 시각적인 정보 속에 담겨 있는 규칙을 추론하기 위해 딥러닝 모델의 과잉 예측을 완화하고, 일반화 능력을 향상시키는 기술이 개발될 필요가 있다.Accordingly, there is a need to develop technologies to alleviate the overprediction of deep learning models and improve their generalization ability to infer rules contained in visual information.

한국공개특허공보 제10-2023-0163614호(2023.12.01)Korean Patent Publication No. 10-2023-0163614 (2023.12.01)

본 발명이 해결하고자 하는 과제는, 도메인의 차이를 반영한 라벨 스무딩을 이용하여 시각 추론을 위한 딥러닝 모델의 일반화 능력을 향상시킴으로써, 의료 영상 분류, 자율 주행과 같은 예측의 민감도가 높은 분야에서의 안정성을 확보하는 것이다.The problem to be solved by the present invention is to secure stability in fields with high prediction sensitivity, such as medical image classification and autonomous driving, by improving the generalization ability of a deep learning model for visual inference by using label smoothing that reflects domain differences.

다만, 본 발명이 해결하고자 하는 과제는 이상에서 언급한 것으로 제한되지 않으며, 언급되지 않은 또 다른 해결하고자 하는 과제는 아래의 기재로부터 본 발명이 속하는 통상의 지식을 가진 자에게 명확하게 이해될 수 있을 것이다.However, the problems to be solved by the present invention are not limited to those mentioned above, and other problems to be solved that are not mentioned will be clearly understood by a person having ordinary skill in the art to which the present invention pertains from the description below.

본 발명의 일 실시예에 따른 시각 정보로부터 규칙을 추론하기 위해 딥러닝 모델을 학습시키는 방법은, 제1 규칙 이미지 정보 및 제1 정답 후보 이미지 정보를 포함하는 제1 문제 정보를 획득하는 단계; 제2 문제 정보에 포함된 제2 정답 후보 이미지 정보를 이용하여, 상기 제1 문제 정보로부터 제1 정답 행렬 정보, 제1 정답 후보 행렬 정보 및 제1 오답 행렬 정보를 포함하는 제1 학습 행렬 정보를 결정하는 단계; 상기 제1 학습 행렬 정보를 기초로 제1 의사 라벨 정보를 결정하는 단계; 및 상기 제1 의사 라벨 정보에 기초하여, 상기 제1 학습 행렬 정보를 입력 받아 상기 제1 정답 행렬 정보, 상기 제1 정답 후보 행렬 정보 및 상기 제1 오답 행렬 정보를 분류하도록 상기 딥러닝 모델을 학습시키는 단계를 포함할 수 있다.A method for training a deep learning model to infer rules from visual information according to one embodiment of the present invention may include: obtaining first problem information including first rule image information and first correct answer candidate image information; determining first learning matrix information including first correct answer matrix information, first correct answer candidate matrix information, and first incorrect answer matrix information from the first problem information using second correct answer candidate image information included in the second problem information; determining first pseudo-label information based on the first learning matrix information; and training the deep learning model to classify the first correct answer matrix information, the first correct answer candidate matrix information, and the first incorrect answer matrix information by receiving the first learning matrix information based on the first pseudo-label information.

또한, 상기 제1 규칙 이미지 정보는 물체, 속성 및 관계에 관한 규칙을 나타내는 6개의 제1 규칙 이미지 패널을 포함하고, 상기 제1 문제 정보는 상기 6개의 제1 규칙 이미지 패널을 기초로, 2개의 제1 문제 이미지 패널의 다음에 위치할 제1 타겟 이미지 패널을 추론하도록 구성되는 3X3 크기의 제1 문제 행렬을 포함할 수 있다.In addition, the first rule image information may include six first rule image panels representing rules regarding objects, properties, and relationships, and the first problem information may include a first problem matrix having a size of 3X3 configured to infer a first target image panel to be located next to two first problem image panels based on the six first rule image panels.

여기서, 상기 제1 문제 행렬은 상기 제1 타겟 이미지 패널이 상기 제1 정답 후보 이미지 정보에 포함된 복수의 제1 정답 후보 이미지 패널 중 하나로 결정되도록 구성된 것일 수 있다.Here, the first problem matrix may be configured such that the first target image panel is determined as one of a plurality of first correct answer candidate image panels included in the first correct answer candidate image information.

한편, 상기 제1 학습 행렬 정보를 결정하는 단계는, 기설정된 값에 기초하여, 상기 복수의 제1 정답 후보 이미지 패널 중 적어도 일부를 상기 제2 정답 후보 이미지 정보에 포함된 복수의 제2 정답 후보 이미지 패널 중 적어도 일부와 치환함으로써, 상기 제1 정답 후보 이미지 정보를 변경하는 단계를 포함할 수 있다.Meanwhile, the step of determining the first learning matrix information may include a step of changing the first correct answer candidate image information by replacing at least some of the plurality of first correct answer candidate image panels with at least some of the plurality of second correct answer candidate image panels included in the second correct answer candidate image information based on a preset value.

또한, 상기 제1 학습 행렬 정보를 결정하는 단계는, 상기 제1 타겟 이미지 패널에 상기 변경된 제1 정답 후보 이미지 정보에 포함된 복수의 제1 정답 후보 이미지 패널 및 복수의 제2 정답 후보 이미지 패널을 각각 입력하여 제1 학습 행렬을 결정하는 단계를 포함할 수 있다.In addition, the step of determining the first learning matrix information may include the step of determining the first learning matrix by respectively inputting a plurality of first correct answer candidate image panels and a plurality of second correct answer candidate image panels included in the changed first correct answer candidate image information into the first target image panel.

한편, 상기 제1 의사 라벨 정보를 결정하는 단계는, 상기 제1 학습 행렬에 포함된 각각의 행(row)에 대한 특징을 추출하는 단계; 상기 제1 학습 행렬에 포함된 각각의 행에 대한 특징을 기초로 이진화된 하드 라벨을 결정하는 단계; 및 라벨 스무딩(label smoothing)을 이용하여 상기 이진화된 하드 라벨로부터 상기 제1 정답 행렬 정보에 대한 소프트 라벨 및 상기 제1 정답 후보 행렬 정보에 대한 소프트 라벨을 포함하는 상기 제1 의사 라벨 정보를 결정하는 단계를 포함할 수 있다.Meanwhile, the step of determining the first pseudo-label information may include: a step of extracting a feature for each row included in the first learning matrix; a step of determining a binarized hard label based on the feature for each row included in the first learning matrix; and a step of determining the first pseudo-label information including a soft label for the first correct answer matrix information and a soft label for the first correct answer candidate matrix information from the binarized hard label using label smoothing.

또한, 상기 딥러닝 모델을 학습시키는 단계는, 상기 제1 학습 행렬에 포함된 각각의 행에 대한 특징 및 상기 제1 의사 라벨 정보를 이용하여 결정되는 손실함수를 최소화하도록 역전파를 통해 상기 딥러닝 모델을 학습시키는 단계를 포함할 수 있다.In addition, the step of training the deep learning model may include a step of training the deep learning model through backpropagation to minimize a loss function determined using features for each row included in the first learning matrix and the first pseudo-label information.

또한, 상기 제1 정답 행렬 정보는 상기 제1 학습 행렬에 포함된 각각의 행 중 상기 제1 규칙 이미지 패널만 포함하고 있는 행에 관한 정보를 포함하고, 상기 제1 정답 후보 행렬 정보는 상기 제1 학습 행렬에 포함된 각각의 행 중 상기 제1 정답 후보 이미지 패널을 포함하고 있는 행에 관한 정보를 포함하고, 상기 제1 오답 행렬 정보는 상기 제1 학습 행렬에 포함된 각각의 행 중 상기 제2 정답 후보 이미지 패널을 포함하고 있는 행에 관한 정보를 포함할 수 있다.In addition, the first correct answer matrix information may include information about a row among each row included in the first learning matrix that includes only the first rule image panel, the first correct answer candidate matrix information may include information about a row among each row included in the first learning matrix that includes the first correct answer candidate image panel, and the first incorrect answer matrix information may include information about a row among each row included in the first learning matrix that includes the second correct answer candidate image panel.

한편, 본 발명의 일 실시예에 따른 사전 학습된 딥러닝 모델을 이용한 시각 추론 방법은, 규칙 이미지 정보 및 정답 후보 이미지 정보를 포함하는 문제 정보를 입력 받는 단계; 및 사전 학습된 딥러닝 모델을 이용하여 상기 정답 후보 이미지 정보에 포함된 정답 후보 이미지 패널 중에서 타겟 이미지 패널에 대응되는 패널을 결정하는 단계를 포함하고, 상기 딥러닝 모델은, 제1 규칙 이미지 정보 및 제1 정답 후보 이미지 정보를 포함하는 제1 문제 정보를 획득하는 단계; 제2 문제 정보에 포함된 제2 정답 후보 이미지 정보를 이용하여, 상기 제1 문제 정보로부터 제1 정답 행렬 정보, 제1 정답 후보 행렬 정보 및 제1 오답 행렬 정보를 포함하는 제1 학습 행렬 정보를 결정하는 단계; 상기 제1 학습 행렬 정보를 기초로 제1 의사 라벨 정보를 결정하는 단계; 및 상기 제1 의사 라벨 정보에 기초하여, 상기 제1 학습 행렬 정보를 입력 받아 상기 제1 정답 행렬 정보, 상기 제1 정답 후보 행렬 정보 및 상기 제1 오답 행렬 정보를 분류하도록 상기 딥러닝 모델을 학습시키는 단계를 포함하는 학습 과정에 의해 사전 학습된 것일 수 있다.Meanwhile, a visual inference method using a pre-learned deep learning model according to one embodiment of the present invention includes a step of receiving problem information including rule image information and correct answer candidate image information; and a step of determining a panel corresponding to a target image panel from among correct answer candidate image panels included in the correct answer candidate image information using a pre-learned deep learning model, wherein the deep learning model may be pre-learned by a learning process including a step of obtaining first problem information including first rule image information and first correct answer candidate image information; a step of determining first learning matrix information including first correct answer matrix information, first correct answer candidate matrix information, and first incorrect answer matrix information from the first problem information using second correct answer candidate image information included in second problem information; a step of determining first pseudo-label information based on the first learning matrix information; and a step of training the deep learning model to classify the first correct answer matrix information, the first correct answer candidate matrix information, and the first incorrect answer matrix information based on the first pseudo-label information by receiving the first learning matrix information.

본 발명의 다른 실시예에 따른 시각 정보로부터 규칙을 추론하기 위해 딥러닝 모델을 학습시키는 장치는, 딥러닝 모델 학습 프로그램이 저장된 메모리; 및 상기 메모리에서 상기 딥러닝 모델 학습 프로그램을 로드하여, 상기 딥러닝 모델 학습 프로그램을 실행하는 프로세서를 포함하고, 상기 프로세서는, 제1 규칙 이미지 정보 및 제1 정답 후보 이미지 정보를 포함하는 제1 문제 정보를 획득하고, 제2 문제 정보에 포함된 제2 정답 후보 이미지 정보를 이용하여, 상기 제1 문제 정보로부터 제1 정답 행렬 정보, 제1 정답 후보 행렬 정보 및 제1 오답 행렬 정보를 포함하는 제1 학습 행렬 정보를 결정하고, 상기 제1 학습 행렬 정보를 기초로 제1 의사 라벨 정보를 결정하고, 상기 제1 의사 라벨 정보에 기초하여, 상기 제1 학습 행렬 정보를 입력 받아 상기 제1 정답 행렬 정보, 상기 제1 정답 후보 행렬 정보 및 상기 제1 오답 행렬 정보를 분류하도록 상기 딥러닝 모델을 학습시킬 수 있다.According to another embodiment of the present invention, a device for training a deep learning model to infer rules from visual information includes: a memory in which a deep learning model training program is stored; and a processor for loading the deep learning model training program from the memory and executing the deep learning model training program, wherein the processor obtains first problem information including first rule image information and first correct answer candidate image information, determines first learning matrix information including first correct answer matrix information, first correct answer candidate matrix information, and first incorrect answer matrix information from the first problem information by using second correct answer candidate image information included in second problem information, determines first pseudo label information based on the first learning matrix information, and trains the deep learning model to classify the first correct answer matrix information, the first correct answer candidate matrix information, and the first incorrect answer matrix information based on the first pseudo label information.

한편, 상기 프로세서는, 기설정된 값에 기초하여, 상기 복수의 제1 정답 후보 이미지 패널 중 적어도 일부를 상기 제2 정답 후보 이미지 정보에 포함된 복수의 제2 정답 후보 이미지 패널 중 적어도 일부와 치환함으로써, 상기 제1 정답 후보 이미지 정보를 변경할 수 있다.Meanwhile, the processor may change the first correct answer candidate image information by replacing at least some of the plurality of first correct answer candidate image panels with at least some of the plurality of second correct answer candidate image panels included in the second correct answer candidate image information based on a preset value.

또한, 상기 프로세서는, 상기 제1 타겟 이미지 패널에 상기 변경된 제1 정답 후보 이미지 정보에 포함된 복수의 제1 정답 후보 이미지 패널 및 복수의 제2 정답 후보 이미지 패널을 각각 입력하여 제1 학습 행렬을 결정할 수 있다.Additionally, the processor can determine a first learning matrix by inputting a plurality of first correct answer candidate image panels and a plurality of second correct answer candidate image panels included in the changed first correct answer candidate image information into the first target image panel, respectively.

한편, 상기 프로세서는, 상기 제1 학습 행렬에 포함된 각각의 행(row)에 대한 특징을 추출하고, 상기 제1 학습 행렬에 포함된 각각의 행에 대한 특징을 기초로 이진화된 하드 라벨을 결정하고, 라벨 스무딩(label smoothing)을 이용하여 상기 이진화된 하드 라벨로부터 상기 제1 정답 행렬 정보에 대한 소프트 라벨 및 상기 제1 정답 후보 행렬 정보에 대한 소프트 라벨을 포함하는 상기 제1 의사 라벨 정보를 결정할 수 있다.Meanwhile, the processor may extract features for each row included in the first learning matrix, determine a binarized hard label based on the features for each row included in the first learning matrix, and determine the first pseudo-label information including a soft label for the first correct answer matrix information and a soft label for the first correct answer candidate matrix information from the binarized hard label using label smoothing.

또한, 상기 프로세서는, 상기 제1 학습 행렬에 포함된 각각의 행에 대한 특징 및 상기 제1 의사 라벨 정보를 이용하여 결정되는 손실함수를 최소화하도록 역전파를 통해 상기 딥러닝 모델을 학습시킬 수 있다.Additionally, the processor can train the deep learning model through backpropagation to minimize a loss function determined using features for each row included in the first learning matrix and the first pseudo-label information.

본 발명의 다른 실시예에 따른 사전 학습된 딥러닝 모델을 이용한 시각 추론 장치는, 시각 추론 프로그램이 저장된 메모리; 및 상기 메모리에서 상기 시각 추론 프로그램을 로드하여, 상기 시각 추론 프로그램을 실행하는 프로세서를 포함하고, 상기 프로세서는, 규칙 이미지 정보 및 정답 후보 이미지 정보를 포함하는 문제 정보를 입력 받고, 사전 학습된 딥러닝 모델을 이용하여 상기 정답 후보 이미지 정보에 포함된 정답 후보 이미지 패널 중에서 타겟 이미지 패널에 대응되는 패널을 결정하되, 상기 딥러닝 모델은, 제1 규칙 이미지 정보 및 제1 정답 후보 이미지 정보를 포함하는 제1 문제 정보를 획득하는 단계; 제2 문제 정보에 포함된 제2 정답 후보 이미지 정보를 이용하여, 상기 제1 문제 정보로부터 제1 정답 행렬 정보, 제1 정답 후보 행렬 정보 및 제1 오답 행렬 정보를 포함하는 제1 학습 행렬 정보를 결정하는 단계; 상기 제1 학습 행렬 정보를 기초로 제1 의사 라벨 정보를 결정하는 단계; 및 상기 제1 의사 라벨 정보에 기초하여, 상기 제1 학습 행렬 정보를 입력 받아 상기 제1 정답 행렬 정보, 상기 제1 정답 후보 행렬 정보 및 상기 제1 오답 행렬 정보를 분류하도록 상기 딥러닝 모델을 학습시키는 단계를 포함하는 학습 과정에 의해 사전 학습된 것일 수 있다.According to another embodiment of the present invention, a visual inference device using a pre-learned deep learning model comprises: a memory in which a visual inference program is stored; and a processor for loading the visual inference program from the memory and executing the visual inference program, wherein the processor receives problem information including rule image information and correct answer candidate image information, and determines a panel corresponding to a target image panel from among correct answer candidate image panels included in the correct answer candidate image information using a pre-learned deep learning model, wherein the deep learning model comprises: a step of obtaining first problem information including first rule image information and first correct answer candidate image information; a step of determining first learning matrix information including first correct answer matrix information, first correct answer candidate matrix information, and first incorrect answer matrix information from the first problem information using second correct answer candidate image information included in second problem information; a step of determining first pseudo-label information based on the first learning matrix information; And based on the first pseudo-label information, the first learning matrix information may be input and the deep learning model may be trained to classify the first correct answer matrix information, the first correct answer candidate matrix information, and the first incorrect answer matrix information.

본 발명의 또 다른 일 실시예에 따른 컴퓨터 프로그램을 저장하고 있는 컴퓨터 판독 가능 기록매체는, 제1 규칙 이미지 정보 및 제1 정답 후보 이미지 정보를 포함하는 제1 문제 정보를 획득하는 단계; 제2 문제 정보에 포함된 제2 정답 후보 이미지 정보를 이용하여, 상기 제1 문제 정보로부터 제1 정답 행렬 정보, 제1 정답 후보 행렬 정보 및 제1 오답 행렬 정보를 포함하는 제1 학습 행렬 정보를 결정하는 단계; 상기 제1 학습 행렬 정보를 기초로 제1 의사 라벨 정보를 결정하는 단계; 및 상기 제1 의사 라벨 정보에 기초하여, 상기 제1 학습 행렬 정보를 입력 받아 상기 제1 정답 행렬 정보, 상기 제1 정답 후보 행렬 정보 및 상기 제1 오답 행렬 정보를 분류하도록 딥러닝 모델을 학습시키는 단계를 포함하는 시각 정보로부터 규칙을 추론하기 위해 딥러닝 모델을 학습시키는 방법을 상기 프로세서가 수행하도록 하기 위한 명령어를 포함할 수 있다.A computer-readable recording medium storing a computer program according to another embodiment of the present invention may include instructions for causing the processor to perform a method of training a deep learning model to infer rules from visual information, the method comprising: obtaining first problem information including first rule image information and first correct answer candidate image information; determining first learning matrix information including first correct answer matrix information, first correct answer candidate matrix information, and first incorrect answer matrix information from the first problem information using second correct answer candidate image information included in the second problem information; determining first pseudo-label information based on the first learning matrix information; and training a deep learning model to classify the first correct answer matrix information, the first correct answer candidate matrix information, and the first incorrect answer matrix information based on the first pseudo-label information.

본 발명의 또 다른 일 실시예에 따른 컴퓨터 판독 가능한 기록매체에 저장되어 있는 컴퓨터 프로그램은, 제1 규칙 이미지 정보 및 제1 정답 후보 이미지 정보를 포함하는 제1 문제 정보를 획득하는 단계; 제2 문제 정보에 포함된 제2 정답 후보 이미지 정보를 이용하여, 상기 제1 문제 정보로부터 제1 정답 행렬 정보, 제1 정답 후보 행렬 정보 및 제1 오답 행렬 정보를 포함하는 제1 학습 행렬 정보를 결정하는 단계; 상기 제1 학습 행렬 정보를 기초로 제1 의사 라벨 정보를 결정하는 단계; 및 상기 제1 의사 라벨 정보에 기초하여, 상기 제1 학습 행렬 정보를 입력 받아 상기 제1 정답 행렬 정보, 상기 제1 정답 후보 행렬 정보 및 상기 제1 오답 행렬 정보를 분류하도록 딥러닝 모델을 학습시키는 단계를 포함하는 시각 정보로부터 규칙을 추론하기 위해 딥러닝 모델을 학습시키는 방법을 상기 프로세서가 수행하도록 하기 위한 명령어를 포함할 수 있다.According to another embodiment of the present invention, a computer program stored in a computer-readable recording medium may include instructions for causing the processor to perform a method of training a deep learning model to infer rules from visual information, the method comprising: obtaining first problem information including first rule image information and first correct answer candidate image information; determining first learning matrix information including first correct answer matrix information, first correct answer candidate matrix information, and first incorrect answer matrix information from the first problem information using second correct answer candidate image information included in the second problem information; determining first pseudo-label information based on the first learning matrix information; and training a deep learning model to classify the first correct answer matrix information, the first correct answer candidate matrix information, and the first incorrect answer matrix information based on the first pseudo-label information.

본 발명의 실시예에 의하면, 라벨 스무딩을 이용하여 동일한 도메인에 포함된 정답 후보에 관한 라벨 값을 높이도록 딥러닝 모델을 학습시킴으로써, 상기 딥러닝 모델을 이용한 시각 추론의 유연성을 확보하고, 과잉 예측을 방지할 수 있게 되는 효과가 달성될 수 있다.According to an embodiment of the present invention, by training a deep learning model to increase label values for correct answer candidates included in the same domain using label smoothing, the flexibility of visual inference using the deep learning model can be secured, and over-prediction can be prevented.

도 1은 본 발명의 실시예에 따른 딥러닝 모델 학습 장치를 나타내는 블록도이다.
도 2는 본 발명의 실시예에 따른 딥러닝 모델 학습 프로그램의 기능을 개념적으로 나타내는 블록도이다.
도 3은 본 발명의 일 실시예에 따른 딥러닝 모델 학습 방법을 나타내는 흐름도이다.
도 4는 본 발명의 일 실시예에 따른 라벨 스무딩을 이용하여 딥러닝 모델을 학습시키는 것을 예시적으로 나타내는 도면이다.
도 5는 본 발명의 일 실시예에 따른 시각 추론 프로그램의 기능을 개념적으로 나타내는 블록도이다.
도 6은 본 발명의 일 실시예에 따른 시각 추론 방법을 나타내는 흐름도이다.
도 7은 본 발명의 일 실시예에 따른 사전 학습된 딥러닝 모델을 이용하여 주어진 시각 정보로부터 규칙을 추론하는 것을 예시적으로 나타내는 도면이다.FIG. 1 is a block diagram showing a deep learning model training device according to an embodiment of the present invention.
FIG. 2 is a block diagram conceptually illustrating the function of a deep learning model training program according to an embodiment of the present invention.
Figure 3 is a flowchart illustrating a deep learning model training method according to one embodiment of the present invention.
FIG. 4 is a diagram exemplarily showing training a deep learning model using label smoothing according to one embodiment of the present invention.
FIG. 5 is a block diagram conceptually illustrating the function of a visual inference program according to one embodiment of the present invention.
Figure 6 is a flowchart illustrating a visual inference method according to one embodiment of the present invention.
FIG. 7 is a diagram exemplarily showing inferring rules from given visual information using a pre-learned deep learning model according to one embodiment of the present invention.

본 발명의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나 본 발명은 이하에서 개시되는 실시예들에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 수 있으며, 단지 본 실시예들은 본 발명의 개시가 완전하도록 하고, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 발명의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 발명은 청구항의 범주에 의해 정의될 뿐이다.The advantages and features of the present invention, and the methods for achieving them, will become clear with reference to the embodiments described in detail below together with the accompanying drawings. However, the present invention is not limited to the embodiments disclosed below, but may be implemented in various different forms, and these embodiments are provided only to make the disclosure of the present invention complete and to fully inform those skilled in the art of the scope of the invention, and the present invention is defined only by the scope of the claims.

본 발명의 실시예들을 설명함에 있어서 공지 기능 또는 구성에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명을 생략할 것이다. 그리고 후술되는 용어들은 본 발명의 실시예에서의 기능을 고려하여 정의된 용어들로서 이는 사용자, 운용자의 의도 또는 관례 등에 따라 달라질 수 있다. 그러므로 그 정의는 본 명세서 전반에 걸친 내용을 토대로 내려져야 할 것이다.In describing embodiments of the present invention, if it is judged that a detailed description of a known function or configuration may unnecessarily obscure the gist of the present invention, the detailed description will be omitted. In addition, the terms described below are terms defined in consideration of functions in the embodiments of the present invention, and these may vary depending on the intention or custom of the user or operator. Therefore, the definitions should be made based on the contents throughout this specification.

도 1은 본 발명의 실시예에 따른 딥러닝 모델 학습 장치를 나타내는 블록도이다.FIG. 1 is a block diagram showing a deep learning model training device according to an embodiment of the present invention.

도 1을 참조하면, 딥러닝 모델 학습 장치(100)는 프로세서(110), 입출력 장치(120) 및 메모리(130)를 포함할 수 있다.Referring to FIG. 1, a deep learning model training device (100) may include a processor (110), an input/output device (120), and a memory (130).

프로세서(110)는 딥러닝 모델 학습 장치(100)의 동작을 전반적으로 제어할 수 있다.The processor (110) can control the overall operation of the deep learning model training device (100).

프로세서(110)는, 입출력 장치(120)를 이용하여, 문제 정보(예를 들어, 제1 문제 정보 및 제2 문제 정보)를 입력 받을 수 있다.The processor (110) can receive problem information (e.g., first problem information and second problem information) using an input/output device (120).

본 발명에서, 시각 정보는 인간의 지능을 측정하기 위해 물체, 속성 및 관계를 나타내는 시각적 정보로서, 복수의 이미지 패널을 포함할 수 있다. 여기서, 이미지 패널은 일정한 크기의 패널(또는 판) 내부에 포함된 도형을 나타내는 이미지로서, 도형의 형태, 크기, 색상, 위치 및 개수가 목적에 맞게 설정되어 있을 수 있다.In the present invention, the visual information is visual information representing objects, properties, and relationships for measuring human intelligence, and may include a plurality of image panels. Here, the image panel is an image representing a shape contained within a panel (or plate) of a certain size, and the shape, size, color, position, and number of the shape may be set according to the purpose.

또한, 본 발명에서, 문제 정보는 규칙 이미지 정보 및 정답 후보 이미지 정보를 포함할 수 있다. 여기서, 규칙 이미지 정보는 물체, 속성 및 관계에 관한 규칙을 나타내는 복수의 이미지 패널을 포함할 수 있고, 정답 후보 이미지 정보는 문제의 정답 가능성이 존재하는 복수의 이미지 패널을 포함할 수 있다.In addition, in the present invention, the problem information may include rule image information and correct answer candidate image information. Here, the rule image information may include a plurality of image panels representing rules regarding objects, properties, and relationships, and the correct answer candidate image information may include a plurality of image panels in which there is a possibility of a correct answer to the problem.

구체적으로, 문제 정보는 소정의 크기의 문제 행렬을 포함할 수 있고, 문제 행렬은 복수의 규칙 이미지 패널, 복수의 문제 이미지 패널 및 상기 복수의 문제 이미지 패널의 다음에 위치하는 타겟 이미지 패널을 추론하도록 구성될 수 있다. 여기서, 문제 행렬은 행마다 공통된 규칙이 존재하도록 구성될 수 있다.Specifically, the problem information may include a problem matrix of a predetermined size, and the problem matrix may be configured to infer a plurality of rule image panels, a plurality of problem image panels, and a target image panel located next to the plurality of problem image panels. Here, the problem matrix may be configured such that a common rule exists in each row.

본 발명에서는, 제1 문제 정보 및 제2 문제 정보는 입출력 장치(120)를 통해 입력되는 것으로 설명하였지만, 이에 한정되지 않는다. 즉, 실시예에 따라, 딥러닝 모델 학습 장치(100)는 송수신기(미도시)를 포함할 수 있고, 딥러닝 모델 학습 장치(100)는 송수신기(미도시)를 이용하여 제1 문제 정보 및 제2 문제 정보 중 적어도 하나를 수신할 수도 있으며, 제1 문제 정보 및 제2 문제 정보 중 적어도 하나는 딥러닝 모델 학습 장치(100) 내에서 생성될 수도 있다.In the present invention, the first problem information and the second problem information are described as being input through the input/output device (120), but are not limited thereto. That is, according to an embodiment, the deep learning model learning device (100) may include a transceiver (not shown), and the deep learning model learning device (100) may receive at least one of the first problem information and the second problem information using the transceiver (not shown), and at least one of the first problem information and the second problem information may be generated within the deep learning model learning device (100).

여기서, 제1 문제 정보 및 제2 문제 정보는 주어진 시각 정보로부터 규칙을 추론하기 위한 문제를 나타내는 정보로서, 제1 문제와 제2 문제는 각각 상이한 도메인에 있으며, 상이한 시각 정보를 지닐 수 있다.Here, the first problem information and the second problem information are information representing a problem for inferring rules from given visual information, and the first problem and the second problem are in different domains and may have different visual information.

프로세서(110)는 제1 규칙 이미지 정보 및 제1 정답 후보 이미지 정보를 포함하는 제1 문제 정보를 획득하고, 제2 문제 정보에 포함된 제2 정답 후보 이미지 정보를 이용하여, 제1 문제 정보로부터 제1 정답 행렬 정보, 제1 정답 후보 행렬 정보 및 제1 오답 행렬 정보를 포함하는 제1 학습 행렬 정보를 결정하고, 제1 학습 행렬 정보를 기초로 제1 의사 라벨 정보를 결정하고, 제1 의사 라벨 정보에 기초하여, 제1 학습 행렬 정보를 입력 받아 제1 정답 행렬 정보, 제1 정답 후보 행렬 정보 및 제1 오답 행렬 정보를 분류하도록 딥러닝 모델을 학습시킬 수 있다.The processor (110) obtains first problem information including first rule image information and first correct answer candidate image information, determines first learning matrix information including first correct answer matrix information, first correct answer candidate matrix information, and first incorrect answer matrix information from the first problem information using second correct answer candidate image information included in the second problem information, determines first pseudo-label information based on the first learning matrix information, and trains a deep learning model to receive the first learning matrix information and classify the first correct answer matrix information, the first correct answer candidate matrix information, and the first incorrect answer matrix information based on the first pseudo-label information.

입출력 장치(120)는 하나 이상의 입력 장치 및/또는 하나 이상의 출력 장치를 포함할 수 있다. 예컨대, 입력 장치는 마이크, 키보드, 마우스, 터치 스크린 등을 포함하고, 출력 장치는 디스플레이, 스피커 등을 포함할 수 있다.The input/output device (120) may include one or more input devices and/or one or more output devices. For example, the input devices may include a microphone, a keyboard, a mouse, a touch screen, etc., and the output devices may include a display, a speaker, etc.

메모리(130)는 딥러닝 모델 학습 프로그램(200) 및 딥러닝 모델 학습 프로그램(200)의 실행에 필요한 정보를 저장할 수 있다.The memory (130) can store a deep learning model training program (200) and information required for executing the deep learning model training program (200).

본 명세서에서 딥러닝 모델 학습 프로그램(200)은 규칙 이미지 정보 및 정답 후보 이미지 정보를 포함하는 문제 정보를 입력 받아 주어진 시각 정보로부터 규칙을 추론하기 위해 결정되는 라벨 정보를 이용하여 딥러닝 모델을 학습시키기 위한 명령어들을 포함하는 소프트웨어를 의미할 수 있다.In this specification, a deep learning model training program (200) may mean software including commands for training a deep learning model by inputting problem information including rule image information and correct answer candidate image information and using label information determined to infer rules from given visual information.

프로세서(110)는 딥러닝 모델 학습 프로그램(200)을 실행하기 위하여 메모리(130)에서 딥러닝 모델 학습 프로그램(200) 및 딥러닝 모델 학습 프로그램(200)의 실행에 필요한 정보를 로드할 수 있다.The processor (110) can load the deep learning model training program (200) and information necessary for executing the deep learning model training program (200) from the memory (130) to execute the deep learning model training program (200).

프로세서(110)는, 딥러닝 모델 학습 프로그램(200)을 실행하여, 제1 규칙 이미지 정보 및 제1 정답 후보 이미지 정보를 포함하는 제1 문제 정보를 입력 받아 제2 문제 정보에 포함된 제2 정답 후보 이미지 정보를 이용하여 결정되는 제1 라벨 정보를 기초로, 정답 행렬 정보, 정답 후보 행렬 정보 및 오답 행렬 정보를 분류하도록 딥러닝 모델을 학습시킬 수 있다.The processor (110) executes a deep learning model training program (200) to input first problem information including first rule image information and first correct answer candidate image information, and train a deep learning model to classify correct answer matrix information, correct answer candidate matrix information, and incorrect answer matrix information based on first label information determined using second correct answer candidate image information included in second problem information.

딥러닝 모델 학습 프로그램(200)의 기능 및/또는 동작에 대하여는 도 2를 통해 상세하게 살펴보기로 한다.The functions and/or operations of the deep learning model training program (200) will be examined in detail with reference to Fig. 2.

도 2는 본 발명의 실시예에 따른 딥러닝 모델 학습 프로그램의 기능을 개념적으로 나타내는 블록도이다.FIG. 2 is a block diagram conceptually illustrating the function of a deep learning model training program according to an embodiment of the present invention.

도 2를 참조하면, 딥러닝 모델 학습 프로그램(200)은 문제 정보 획득부(210), 학습 행렬 정보 결정부(220) 및 딥러닝 모델 학습부(230)를 포함할 수 있다.Referring to FIG. 2, a deep learning model training program (200) may include a problem information acquisition unit (210), a learning matrix information determination unit (220), and a deep learning model training unit (230).

도 2에 도시된 문제 정보 획득부(210), 학습 행렬 정보 결정부(220) 및 딥러닝 모델 학습부(230)는 딥러닝 모델 학습 프로그램(200)의 기능을 쉽게 설명하기 위하여 딥러닝 모델 학습 프로그램(200)의 기능을 개념적으로 나눈 것으로서, 이에 한정되지 않는다. 실시예들에 따라, 문제 정보 획득부(210), 학습 행렬 정보 결정부(220) 및 딥러닝 모델 학습부(230)의 기능은 병합/분리 가능하며, 하나의 프로그램에 포함된 일련의 명령어들로 구현될 수도 있다.The problem information acquisition unit (210), the learning matrix information determination unit (220), and the deep learning model learning unit (230) illustrated in Fig. 2 conceptually divide the functions of the deep learning model learning program (200) in order to easily explain the functions of the deep learning model learning program (200), but are not limited thereto. According to embodiments, the functions of the problem information acquisition unit (210), the learning matrix information determination unit (220), and the deep learning model learning unit (230) can be merged/separated, and can also be implemented as a series of commands included in one program.

먼저, 문제 정보 획득부(210)는, 제1 규칙 이미지 정보 및 제1 정답 후보 이미지 정보를 포함하는 제1 문제 정보를 획득할 수 있다.First, the problem information acquisition unit (210) can acquire first problem information including first rule image information and first correct answer candidate image information.

구체적으로, 제1 규칙 이미지 정보는 물체, 속성 및 관계에 관한 규칙을 나타내는 6개의 제1 규칙 이미지 패널을 포함할 수 있고, 제1 정답 후보 이미지 정보는 복수의 제1 정답 후보 이미지 패널을 포함할 수 있다.Specifically, the first rule image information may include six first rule image panels representing rules regarding objects, properties, and relationships, and the first correct answer candidate image information may include a plurality of first correct answer candidate image panels.

또한, 제1 문제 정보는 6개의 규칙 이미지 패널을 기초로, 2개의 제1 문제 이미지 패널의 다음에 위치할 제1 타겟 이미지 패널을 추론하도록 구성되는 3X3 크기의 제1 문제 행렬을 포함할 수 있다. 제1 문제 행렬은 다음의 수학식 1과 같이 표현될 수 있다.In addition, the first problem information may include a first problem matrix of size 3X3, which is configured to infer a first target image panel to be located next to two first problem image panels based on six rule image panels. The first problem matrix may be expressed as in the following mathematical expression 1.

여기서, 은 제1 규칙 이미지 패널, 는 제1 문제 이미지 패널, 는 제1 타겟 이미지 패널을 의미할 수 있다.Here, is the first rule image panel, is the first problem image panel, may refer to a first target image panel.

구체적으로, 제1 문제 행렬은 제1 타겟 이미지 패널이 제1 정답 후보 이미지 정보에 포함된 복수의 제1 정답 후보 이미지 패널 중 하나로 결정되도록 구성될 수 있다.Specifically, the first problem matrix can be configured such that the first target image panel is determined as one of a plurality of first correct answer candidate image panels included in the first correct answer candidate image information.

한편, 제1 문제 행렬의 각 행마다 공통된 규칙이 존재할 수 있다. 본 발명의 일 실시예에 따른 규칙은 행에 걸쳐서 물체의 속성값이 일정한 것에 관한 일정 규칙(constant rule), 행의 오른쪽 이미지 패널로 갈수록 물체의 속성값이 증가하거나 감소하는 것에 관한 증감 규칙(progression rule), 행의 왼쪽 이미지 패널과 중앙 이미지 패널의 속성값의 덧셈 혹은 뺄셈의 결과가 행의 오른쪽 이미지 패널의 속성값과 같은 것에 관한 대수연산 규칙(arithmetic rule) 및 특정 속성에 대해서 세 가지 값을 선택하고, 행의 각 패널에 세 가지 값을 순서만 다르게 배치하는 것에 관한 세 가지 값 재배열 규칙(distribute three rule) 중 적어도 하나를 포함할 수 있다.Meanwhile, a common rule may exist for each row of the first problem matrix. The rule according to one embodiment of the present invention may include at least one of a constant rule regarding the constant property value of an object across a row, a progression rule regarding the increasing or decreasing property value of an object toward the right image panel of a row, an arithmetic rule regarding the result of addition or subtraction of property values of the left image panel and the center image panel of a row being the same as the property value of the right image panel of the row, and a distribute three rule regarding the selection of three values for a specific property and the arrangement of the three values in each panel of a row with only a different order.

다음으로, 학습 행렬 정보 결정부(220)는, 제2 문제 정보에 포함된 제2 정답 후보 이미지 정보를 이용하여, 제1 문제 정보로부터 제1 학습 행렬 정보를 결정할 수 있다.Next, the learning matrix information determination unit (220) can determine the first learning matrix information from the first problem information by using the second correct answer candidate image information included in the second problem information.

구체적으로, 학습 행렬 정보 결정부(220)는 기설정된 값에 기초하여, 복수의 제1 정답 후보 이미지 패널 중 적어도 일부를 제2 정답 후보 이미지 정보에 포함된 복수의 제2 정답 후보 이미지 패널 중 적어도 일부와 치환함으로써, 제1 정답 후보 이미지 정보를 변경할 수 있다.Specifically, the learning matrix information determination unit (220) can change the first correct answer candidate image information by replacing at least some of the plurality of first correct answer candidate image panels with at least some of the plurality of second correct answer candidate image panels included in the second correct answer candidate image information based on a preset value.

예를 들어, 기설정된 값이 4이고, 복수의 제1 정답 후보 이미지 패널이 8개인 경우, 학습 행렬 정보 결정부(220)는 4개의 제1 정답 후보 이미지 패널을 4개의 제2 정답 후보 이미지 패널로 치환함으로써, 제1 정답 후보 이미지 정보를 변경할 수 있다. 여기서, 변경된 제1 정답 후보 이미지 정보는 4개의 제1 정답 후보 이미지 패널 및 4개의 제2 정답 후보 이미지 패널을 포함할 수 있다.For example, if the preset value is 4 and there are 8 first correct answer candidate image panels, the learning matrix information determining unit (220) can change the first correct answer candidate image information by replacing the 4 first correct answer candidate image panels with 4 second correct answer candidate image panels. Here, the changed first correct answer candidate image information can include 4 first correct answer candidate image panels and 4 second correct answer candidate image panels.

한편, 본 발명의 일 실시예에 따른 기설정된 값은 예시일 뿐, 본 발명의 목적을 달성할 수 있는 범위 내에서 다양하게 변경될 수 있다.Meanwhile, the preset values according to one embodiment of the present invention are only examples and may be variously changed within the scope that can achieve the purpose of the present invention.

또한, 학습 행렬 정보 결정부(220)는 제1 타겟 이미지 패널에 변경된 제1 정답 후보 이미지 정보에 포함된 복수의 제1 정답 후보 이미지 패널 및 복수의 제2 정답 후보 이미지 패널을 각각 입력하여 제1 학습 행렬을 결정할 수 있고, 제1 학습 행렬은 다음의 수학식 2와 같이 표현될 수 있다.In addition, the learning matrix information determining unit (220) can determine the first learning matrix by inputting a plurality of first correct answer candidate image panels and a plurality of second correct answer candidate image panels included in the changed first correct answer candidate image information into the first target image panel, and the first learning matrix can be expressed as in the following mathematical expression 2.

여기서, N은 상기 제1 학습 행렬의 행(row)의 수이고, 는 제1 정답 후보 이미지 패널을 의미할 수 있고, 는 제2 정답 후보 이미지 패널을 의미할 수 있고, M은 변경된 제1 정답 후보 이미지 정보에 포함된 제1 정답 후보 이미지 패널의 수를 의미할 수 있다.Here, N is the number of rows of the first learning matrix, may refer to the first correct answer candidate image panel, may mean a second correct answer candidate image panel, and M may mean the number of first correct answer candidate image panels included in the changed first correct answer candidate image information.

예를 들어, 제1 정답 후보 이미지 정보에 포함된 제1 정답 후보 이미지 패널 및 제2 정답 후보 이미지 패널의 수가 8개인 경우, 제1 학습 행렬의 행의 수는 10으로 결정될 수 있다.For example, if the number of first correct answer candidate image panels and second correct answer candidate image panels included in the first correct answer candidate image information is 8, the number of rows of the first learning matrix can be determined as 10.

이 때, 제1 학습 행렬의 1열에 존재하는 제1 문제 이미지 패널은 모두 동일하고, 2열에 존재하는 제1 문제 이미지 패널은 모두 동일하다.At this time, all the first problem image panels existing in the first column of the first learning matrix are the same, and all the first problem image panels existing in the second column are the same.

한편, 제1 학습 행렬 정보는 상기 제1 학습 행렬에 포함된 각각의 행에 대한 정보(즉, 3개의 이미지 패널에 대한 정보)로서, 제1 정답 행렬 정보, 제1 정답 후보 행렬 정보 및 제1 오답 행렬 정보를 포함할 수 있다.Meanwhile, the first learning matrix information may include first correct answer matrix information, first correct answer candidate matrix information, and first incorrect answer matrix information as information for each row included in the first learning matrix (i.e., information for three image panels).

보다 구체적으로, 제1 정답 행렬 정보는 제1 학습 행렬에 포함된 각각의 행 중 제1 규칙 이미지 패널만 포함하고 있는 행에 관한 정보를 의미할 수 있다. More specifically, the first correct answer matrix information may mean information about a row that contains only the first rule image panel among each row included in the first learning matrix.

또한, 제1 정답 후보 행렬 정보는 제1 학습 행렬에 포함된 각각의 행 중 제1 정답 후보 이미지 패널을 포함하고 있는 행에 관한 정보를 의미할 수 있다.Additionally, the first correct answer candidate matrix information may mean information about a row that includes a first correct answer candidate image panel among each row included in the first learning matrix.

또한, 제1 오답 행렬 정보는 제1 학습 행렬에 포함된 각각의 행 중 제2 정답 후보 이미지 패널을 포함하고 있는 행에 관한 정보를 의미할 수 있다.Additionally, the first error matrix information may mean information about a row that includes a second correct answer candidate image panel among each row included in the first learning matrix.

한편, 학습 행렬 정보 결정부(220)는 제1 학습 행렬 정보에 포함된 제1 정답 행렬 정보, 제1 정답 후보 행렬 정보 및 제1 오답 행렬 정보를 순차적으로 정렬할 수 있다.Meanwhile, the learning matrix information determination unit (220) can sequentially sort the first correct answer matrix information, the first correct answer candidate matrix information, and the first incorrect answer matrix information included in the first learning matrix information.

예를 들어, 변경된 제1 정답 후보 이미지 정보에 포함된 제1 정답 후보 이미지 패널 및 제2 정답 후보 이미지 패널의 수가 8개인 경우, 제1 정답 행렬 정보는 제1 학습 행렬 중 제1 행 및 제2 행에 대한 정보를 의미할 수 있고, 제1 정답 후보 행렬 정보는 제1 학습 행렬 중 제3 행 내지 제6 행에 대한 정보를 의미할 수 있고, 제1 오답 행렬 정보는 제1 학습 행렬 중 제7 행 내지 제10 행에 대한 정보를 의미할 수 있다.For example, if the number of first correct candidate image panels and second correct candidate image panels included in the changed first correct candidate image information is 8, the first correct matrix information may mean information about the first row and the second row of the first learning matrix, the first correct candidate matrix information may mean information about the third to sixth rows of the first learning matrix, and the first incorrect matrix information may mean information about the seventh to tenth rows of the first learning matrix.

다음으로, 딥러닝 모델 학습부(230)는, 제1 학습 행렬 정보를 기초로 제1 의사 라벨(pseudo label) 정보를 결정할 수 있다. 여기서, 제1 의사 라벨 정보는 비지도 학습 방식에 의해 결정될 수 있다.Next, the deep learning model learning unit (230) can determine first pseudo label information based on the first learning matrix information. Here, the first pseudo label information can be determined by an unsupervised learning method.

구체적으로, 딥러닝 모델 학습부(230)는 제1 학습 행렬에 포함된 각각의 행에 대한 특징을 추출할 수 있다. 예를 들어, 딥러닝 모델 학습부(230)는 딥러닝 모델(예를 들어, ResNet-18)을 이용하여 제1 학습 행렬에 포함된 각각의 행에 대한 특징 벡터를 추출할 수 있다. Specifically, the deep learning model learning unit (230) can extract features for each row included in the first learning matrix. For example, the deep learning model learning unit (230) can extract feature vectors for each row included in the first learning matrix using a deep learning model (e.g., ResNet-18).

또한, 제1 학습 행렬에 포함된 각각의 행에 대한 특징은 제1 학습 행렬에 포함된 제1 정답 행렬에 대한 특징을 기초로 정규화될 수 있다.Additionally, the features for each row included in the first learning matrix can be normalized based on the features for the first correct answer matrix included in the first learning matrix.

예를 들어, 딥러닝 모델 학습부(230)는 제1 행에 대한 특징 벡터 및 제2 행에 대한 특징 벡터의 평균을 제1 학습 행렬에 포함된 각각의 행에 대한 특징 벡터와 연산(예를 들어, 뺄셈 연산)함으로써, 정규화할 수 있다.For example, the deep learning model learning unit (230) can normalize by performing an operation (e.g., a subtraction operation) on the average of the feature vector for the first row and the feature vector for the second row with the feature vector for each row included in the first learning matrix.

또한, 딥러닝 모델 학습부(230)는 제1 학습 행렬에 포함된 각각의 행에 대한 특징을 기초로 이진화된 하드 라벨을 결정할 수 있다.Additionally, the deep learning model learning unit (230) can determine a binarized hard label based on the features for each row included in the first learning matrix.

예를 들어, 딥러닝 모델 학습부(230)는 제1 정답 행렬 정보에 대한 라벨 값을 1로 결정하고, 제1 정답 후보 행렬 정보 및 제1 오답 행렬 정보에 대한 라벨 값을 0으로 결정할 수 있다.For example, the deep learning model learning unit (230) may determine the label value for the first correct answer matrix information as 1, and the label values for the first correct answer candidate matrix information and the first incorrect answer matrix information as 0.

한편, 딥러닝 모델 학습부(230)는 라벨 스무딩(label smoothing)을 이용하여 이진화된 하드 라벨로부터 제1 정답 행렬 정보에 대한 소프트 라벨 및 제1 정답 후보 행렬 정보에 대한 소프트 라벨을 포함하는 제1 의사 라벨 정보를 결정할 수 있다.Meanwhile, the deep learning model learning unit (230) can determine first pseudo-label information including a soft label for the first correct answer matrix information and a soft label for the first correct answer candidate matrix information from the binarized hard label using label smoothing.

구체적으로, 딥러닝 모델 학습부(230)는 사용자에 의해 설정되는 스무딩 값(smoothing value) 및 치환되는 제1 정답 후보 이미지 패널의 개수를 고려하여, 제1 정답 행렬 정보에 대한 소프트 라벨 및 제1 정답 후보 행렬 정보에 대한 소프트 라벨을 결정할 수 있다. 여기서, 제1 오답 행렬 정보에 대한 라벨 값은 여전히 0일 수 있다.Specifically, the deep learning model learning unit (230) can determine a soft label for the first correct answer matrix information and a soft label for the first correct answer candidate matrix information by considering a smoothing value set by the user and the number of first correct answer candidate image panels to be replaced. Here, the label value for the first incorrect answer matrix information may still be 0.

예를 들어, 제1 정답 행렬 정보에 대한 소프트 라벨 및 제1 정답 후보 행렬 정보에 대한 소프트 라벨은 다음의 수학식 3과 같이 표현될 수 있다.For example, the soft label for the first correct answer matrix information and the soft label for the first correct answer candidate matrix information can be expressed as in the following mathematical expression 3.

여기서, 은 제1 정답 행렬 정보에 대한 소프트 라벨 값을 의미할 수 있고, 는 제1 정답 후보 행렬 정보에 대한 소프트 라벨 값을 의미할 수 있고, 는 스무딩 값을 의미할 수 있고, 는 치환되는 제1 정답 후보 이미지 패널의 개수를 의미할 수 있다.Here, can mean the soft label value for the first correct answer matrix information, can mean a soft label value for the first correct answer candidate matrix information, can mean a smoothing value, may mean the number of first correct answer candidate image panels to be replaced.

또한, 딥러닝 모델 학습부(230)는 제1 의사 라벨 정보에 기초하여, 제1 학습 행렬 정보를 입력 받아 제1 정답 행렬 정보, 제1 정답 후보 행렬 정보 및 제1 오답 행렬 정보를 분류하도록 딥러닝 모델을 학습시킬 수 있다.In addition, the deep learning model learning unit (230) can receive first learning matrix information based on the first pseudo-label information and train the deep learning model to classify the first correct answer matrix information, the first correct answer candidate matrix information, and the first incorrect answer matrix information.

구체적으로, 딥러닝 모델 학습부(230)는 제1 학습 행렬에 포함된 각각의 행에 대한 특징 벡터 및 제1 의사 라벨 정보를 이용하여 결정되는 손실함수를 최소화하도록 역전파를 통해 딥러닝 모델을 학습시킬 수 있다.Specifically, the deep learning model learning unit (230) can train the deep learning model through backpropagation to minimize a loss function determined using the feature vector and first pseudo-label information for each row included in the first learning matrix.

이와 같이, 제1 정답 행렬 정보에 대한 라벨 값을 낮추고, 제1 정답 후보 행렬 정보에 대한 라벨 값을 높이도록 딥러닝 모델을 학습시킴으로써, 상기 딥러닝 모델을 이용한 시각 추론의 유연성을 확보하고, 과잉 예측을 방지할 수 있게 되는 효과가 달성될 수 있다.In this way, by training the deep learning model to lower the label value for the first correct answer matrix information and to raise the label value for the first correct answer candidate matrix information, the effect of securing flexibility in visual inference using the deep learning model and preventing over-prediction can be achieved.

도 3은 본 발명의 일 실시예에 따른 딥러닝 모델 학습 방법을 나타내는 흐름도이다.Figure 3 is a flowchart illustrating a deep learning model training method according to one embodiment of the present invention.

도 2 및 도 3을 참조하면, 문제 정보 획득부(210)는 제1 규칙 이미지 정보 및 제1 정답 후보 이미지 정보를 포함하는 제1 문제 정보를 획득할 수 있다(S310).Referring to FIGS. 2 and 3, the problem information acquisition unit (210) can acquire first problem information including first rule image information and first correct answer candidate image information (S310).

그 다음에, 학습 행렬 정보 결정부(220)는 제2 문제 정보에 포함된 제2 정답 후보 이미지 정보를 이용하여, 제1 문제 정보로부터 제1 정답 행렬 정보, 제1 정답 후보 행렬 정보 및 제1 오답 행렬 정보를 포함하는 제1 학습 행렬 정보를 결정할 수 있다(S320).Next, the learning matrix information determination unit (220) can determine first learning matrix information including first correct answer matrix information, first correct answer candidate matrix information, and first incorrect answer matrix information from the first problem information using second correct answer candidate image information included in the second problem information (S320).

그 다음에, 딥러닝 모델 학습부(230)는 제1 학습 행렬 정보를 기초로 제1 의사 라벨 정보를 결정할 수 있다(S330). 여기서, 제1 의사 라벨 정보는 비지도 학습 방식에 기초하여 결정되는 것으로서, 제1 정답 행렬 정보에 대한 소프트 라벨 및 제1 정답 후보 행렬 정보에 대한 소프트 라벨을 포함할 수 있다.Next, the deep learning model learning unit (230) can determine the first pseudo-label information based on the first learning matrix information (S330). Here, the first pseudo-label information is determined based on an unsupervised learning method, and can include a soft label for the first correct answer matrix information and a soft label for the first correct answer candidate matrix information.

그 다음에, 딥러닝 모델 학습부(230)는 제1 의사 라벨 정보에 기초하여, 제1 학습 행렬 정보를 입력 받아 제1 정답 행렬 정보, 제1 정답 후보 행렬 정보 및 제1 오답 행렬 정보를 분류하도록 딥러닝 모델을 학습시킬 수 있다(S340).Next, the deep learning model learning unit (230) can receive first learning matrix information based on the first pseudo-label information and train the deep learning model to classify the first correct answer matrix information, the first correct answer candidate matrix information, and the first incorrect answer matrix information (S340).

도 4는 본 발명의 일 실시예에 따른 라벨 스무딩을 이용하여 딥러닝 모델을 학습시키는 것을 예시적으로 나타내는 도면이다.FIG. 4 is a diagram exemplarily showing training a deep learning model using label smoothing according to one embodiment of the present invention.

도 2 및 도 4를 참조하면, 복수의 제1 규칙 이미지 패널(식별번호 1 내지 6에 대응), 복수의 제1 정답 후보 이미지 패널(식별번호 9 내지 12에 대응) 및 복수의 제2 정답 후보 이미지 패널(식별번호 13 내지 16에 대응)을 포함하는 제1 문제 정보(410)가 도시되어 있다.Referring to FIGS. 2 and 4, first problem information (410) is illustrated, which includes a plurality of first rule image panels (corresponding to identification numbers 1 to 6), a plurality of first correct answer candidate image panels (corresponding to identification numbers 9 to 12), and a plurality of second correct answer candidate image panels (corresponding to identification numbers 13 to 16).

여기서, 학습 행렬 정보 결정부(220)는 제1 타겟 이미지 패널에 복수의 제1 정답 후보 이미지 패널(식별번호 9 내지 12에 대응) 및 복수의 제2 정답 후보 이미지 패널(식별번호 13 내지 16에 대응)을 각각 입력하여 제1 학습 행렬 정보(411)을 결정할 수 있다.Here, the learning matrix information determining unit (220) can determine the first learning matrix information (411) by inputting a plurality of first correct answer candidate image panels (corresponding to identification numbers 9 to 12) and a plurality of second correct answer candidate image panels (corresponding to identification numbers 13 to 16) into the first target image panel, respectively.

또한, 딥러닝 모델 학습부(230)는 제1 학습 행렬에 포함된 각각의 행을 딥러닝 모델(420)에 입력하여 제1 학습 행렬에 포함된 각각의 행에 대한 특징을 추출할 수 있다.In addition, the deep learning model learning unit (230) can input each row included in the first learning matrix into the deep learning model (420) to extract features for each row included in the first learning matrix.

또한, 딥러닝 모델 학습부(230)는 제1 학습 행렬에 포함된 각각의 행에 대한 특징으로 기초로 이진화된 하드 라벨(431)을 결정할 수 있다. 여기서, 제1 정답 행렬 정보에 대응되는 라벨 값은 1이고, 제1 정답 후보 행렬 정보 및 제1 오답 행렬 정보에 대응되는 라벨 값은 0이다.In addition, the deep learning model learning unit (230) can determine a binarized hard label (431) based on the features for each row included in the first learning matrix. Here, the label value corresponding to the first correct answer matrix information is 1, and the label value corresponding to the first correct answer candidate matrix information and the first incorrect answer matrix information is 0.

또한, 라벨 스무딩을 이용하여 제1 정답 행렬 정보 및 제1 정답 후보 행렬 정보에 대한 소프트 라벨(432)을 포함하는 제1 의사 라벨 정보(430)를 결정할 수 있다. 여기서, 제1 정답 행렬 정보에 대응되는 라벨 값은 0.95이고, 제1 정답 후보 행렬 정보에 대응되는 라벨 값은 0.05이고, 제1 오답 행렬 정보에 대응되는 라벨 값은 0이다.In addition, the first pseudo-label information (430) including the soft labels (432) for the first correct answer matrix information and the first correct answer candidate matrix information can be determined by using label smoothing. Here, the label value corresponding to the first correct answer matrix information is 0.95, the label value corresponding to the first correct answer candidate matrix information is 0.05, and the label value corresponding to the first incorrect answer matrix information is 0.

한편, 딥러닝 모델 학습부(230)는 제1 의사 라벨 정보(430)에 기초하여, 제1 학습 행렬 정보(411)를 입력 받아 제1 정답 행렬 정보, 제1 정답 후보 행렬 정보 및 제1 오답 행렬 정보를 분류하도록 딥러닝 모델(420)을 학습시킬 수 있다.Meanwhile, the deep learning model learning unit (230) can receive first learning matrix information (411) based on first pseudo-label information (430) and train the deep learning model (420) to classify first correct answer matrix information, first correct answer candidate matrix information, and first incorrect answer matrix information.

도 5는 본 발명의 일 실시예에 따른 시각 추론 프로그램의 기능을 개념적으로 나타내는 블록도이다.FIG. 5 is a block diagram conceptually illustrating the function of a visual inference program according to one embodiment of the present invention.

본 발명의 일 실시예에 따른 시각 추론 프로그램(300)은 지식 증류를 위한 시각 추론 장치(미도시됨)에서 실행될 수 있으며, 상기 시각 추론 장치는 프로세서(미도시됨) 및 메모리(미도시됨)를 포함할 수 있다.A visual reasoning program (300) according to one embodiment of the present invention may be executed in a visual reasoning device (not shown) for knowledge distillation, and the visual reasoning device may include a processor (not shown) and a memory (not shown).

상기 프로세서(미도시됨)는 시각 추론 장치(미도시됨)의 동작을 전반적으로 제어할 수 있다.The above processor (not shown) can control the overall operation of the visual inference device (not shown).

프로세서(미도시됨)는, 규칙 이미지 정보 및 정답 후보 이미지 정보를 포함하는 문제 정보를 획득하고, 문제 정보에 포함된 문제 행렬을 사전 학습된 딥러닝 모델에 입력하여 정답 후보 이미지 정보에 포함된 정답 후보 이미지 패널 중에서 타겟 이미지 패널에 대응되는 패널을 결정할 수 있다.A processor (not shown) obtains problem information including rule image information and correct answer candidate image information, and inputs a problem matrix included in the problem information into a pre-trained deep learning model to determine a panel corresponding to a target image panel among correct answer candidate image panels included in the correct answer candidate image information.

여기서, 사전 학습된 딥러닝 모델은, 제1 의사 라벨 정보에 기초하여, 제1 학습 행렬 정보를 입력 받아 제1 정답 행렬 정보, 제1 정답 후보 행렬 정보 및 제1 오답 행렬 정보를 분류하도록 학습된 것일 수 있다.Here, the pre-trained deep learning model may be trained to receive first learning matrix information based on first pseudo-label information and classify first correct matrix information, first correct candidate matrix information, and first incorrect matrix information.

메모리(미도시됨)는 시각 추론 프로그램(300) 및 시각 추론 프로그램(300)의 실행에 필요한 정보를 저장할 수 있다.Memory (not shown) can store a visual inference program (300) and information necessary for executing the visual inference program (300).

본 명세서에서 시각 추론 프로그램(300)은 규칙 이미지 정보 및 정답 후보 이미지 정보를 포함하는 문제 정보를 획득하고, 문제 정보에 포함된 문제 행렬을 제1 의사 라벨 정보에 기초하여, 제1 학습 행렬 정보를 입력 받아 제1 정답 행렬 정보, 제1 정답 후보 행렬 정보 및 제1 오답 행렬 정보를 분류하도록 사전 학습된 딥러닝 모델에 입력하여, 정답 후보 이미지 정보에 포함된 정답 후보 이미지 패널 중에서 타겟 이미지 패널에 대응되는 패널을 결정하기 위한 명령어들을 포함하는 소프트웨어를 의미할 수 있다.In this specification, a visual inference program (300) may mean software including commands for obtaining problem information including rule image information and correct answer candidate image information, inputting first learning matrix information based on first pseudo-label information into a problem matrix included in the problem information, classifying the first correct answer matrix information, the first correct answer candidate matrix information, and the first incorrect answer matrix information into a pre-trained deep learning model, and determining a panel corresponding to a target image panel among correct answer candidate image panels included in the correct answer candidate image information.

프로세서(미도시됨)는 시각 추론 프로그램(300)을 실행하기 위하여 메모리(미도시됨)에서 시각 추론 프로그램(300) 및 시각 추론 프로그램(300)의 실행에 필요한 정보를 로드할 수 있다.A processor (not shown) can load a visual inference program (300) and information necessary for executing the visual inference program (300) from a memory (not shown) to execute the visual inference program (300).

프로세서(미도시됨)는, 시각 추론 프로그램(300)을 실행하여, 주어진 시각 정보로부터 규칙을 추론하기 위해, 정답 후보 이미지 패널 중에서 타겟 이미지 패널에 대응되는 패널을 결정할 수 있다.A processor (not shown) may execute a visual inference program (300) to determine a panel corresponding to a target image panel among the candidate image panels to infer a rule from given visual information.

시각 추론 프로그램(300)의 기능 및/또는 동작에 대하여는 도 5를 통해 상세하게 살펴보기로 한다.The functions and/or operations of the visual inference program (300) will be examined in detail with reference to Fig. 5.

도 5에 도시된 문제 정보 획득부(310) 및 시각 추론부(320)는 시각 추론 프로그램(300)의 기능을 쉽게 설명하기 위하여 시각 추론 프로그램(300)의 기능을 개념적으로 나눈 것으로서, 이에 한정되지 않는다. 실시예들에 따라, 문제 정보 획득부(310) 및 시각 추론부(320)의 기능은 병합/분리 가능하며, 하나의 프로그램에 포함된 일련의 명령어들로 구현될 수도 있다.The problem information acquisition unit (310) and the visual inference unit (320) illustrated in Fig. 5 conceptually divide the functions of the visual inference program (300) in order to easily explain the functions of the visual inference program (300), but are not limited thereto. According to embodiments, the functions of the problem information acquisition unit (310) and the visual inference unit (320) can be merged/separated, and can also be implemented as a series of commands included in one program.

먼저, 문제 정보 획득부(310)는, 규칙 이미지 정보 및 정답 후보 이미지 정보를 포함하는 문제 정보를 입력 받을 수 있다. 여기서, 딥러닝 모델의 학습과는 달리, 정답 후보 이미지 정보의 변경은 이루어지지 않는다.First, the problem information acquisition unit (310) can receive problem information including rule image information and correct answer candidate image information. Here, unlike the learning of the deep learning model, the correct answer candidate image information is not changed.

다음으로, 시각 추론부(320)는, 사전 학습된 딥러닝 모델을 이용하여 정답 후보 이미지 정보에 포함된 정답 후보 이미지 패널 중에서 타겟 이미지 패널에 대응되는 패널을 결정할 수 있다.Next, the visual inference unit (320) can determine a panel corresponding to a target image panel among the correct answer candidate image panels included in the correct answer candidate image information using a pre-learned deep learning model.

구체적으로, 시각 추론부(320)는 문제 정보에 포함된 복수의 정답 후보 이미지 패널 각각을 문제 행렬에 입력하여 규칙 추론 행렬을 결정할 수 있다. 또한, 시각 추론부(320)는 사전 학습된 딥러닝 모델이 상기 규칙 추론 행렬을 입력 받아 출력하는 예측값이 가장 높은 정답 후보 이미지 패널을 타겟 이미지 패널에 대응되는 패널로 결정할 수 있다.Specifically, the visual inference unit (320) can input each of a plurality of correct answer candidate image panels included in the problem information into the problem matrix to determine a rule inference matrix. In addition, the visual inference unit (320) can determine the correct answer candidate image panel with the highest predicted value, which is output by a pre-learned deep learning model by inputting the rule inference matrix, as the panel corresponding to the target image panel.

도 6은 본 발명의 일 실시예에 따른 시각 추론 방법을 나타내는 흐름도이다.Figure 6 is a flowchart illustrating a visual inference method according to one embodiment of the present invention.

도 5 및 도 6을 참조하면, 문제 정보 획득부(310)는 규칙 이미지 정보 및 정답 후보 이미지 정보를 포함하는 문제 정보를 입력 받을 수 있다(S610).Referring to FIGS. 5 and 6, the problem information acquisition unit (310) can receive problem information including rule image information and correct answer candidate image information (S610).

그 다음에, 시각 추론부(320)는 사전 학습된 딥러닝 모델에 입력하여 정답 후보 이미지 정보에 포함된 정답 후보 이미지 패널 중에서 타겟 이미지 패널에 대응되는 패널을 결정할 수 있다(S620).Next, the visual inference unit (320) can input the pre-learned deep learning model to determine a panel corresponding to the target image panel among the correct answer candidate image panels included in the correct answer candidate image information (S620).

여기서, 딥러닝 모델은 제1 규칙 이미지 정보 및 제1 정답 후보 이미지 정보를 포함하는 제1 문제 정보를 획득하고, 제2 문제 정보에 포함된 제2 정답 후보 이미지 정보를 이용하여, 제1 문제 정보로부터 제1 정답 행렬 정보, 제1 정답 후보 행렬 정보 및 제1 오답 행렬 정보를 포함하는 제1 학습 행렬 정보를 결정하고, 제1 학습 행렬 정보를 기초로 제1 의사 라벨 정보를 결정하고, 제1 의사 라벨 정보에 기초하여, 제1 학습 행렬 정보를 입력 받아 제1 정답 행렬 정보, 제1 정답 후보 행렬 정보 및 제1 오답 행렬 정보를 분류하도록 사전 학습된 것일 수 있다.Here, the deep learning model may be pre-trained to obtain first problem information including first rule image information and first correct answer candidate image information, determine first learning matrix information including first correct answer matrix information, first correct answer candidate matrix information, and first incorrect answer matrix information from the first problem information by using second correct answer candidate image information included in the second problem information, determine first pseudo-label information based on the first learning matrix information, and classify the first correct answer matrix information, the first correct answer candidate matrix information, and the first incorrect answer matrix information based on the first pseudo-label information.

도 7은 본 발명의 일 실시예에 따른 사전 학습된 딥러닝 모델을 이용하여 시각 정보로부터 규칙을 추론하는 것을 예시적으로 나타내는 도면이다.FIG. 7 is a diagram exemplarily showing inferring rules from visual information using a pre-learned deep learning model according to one embodiment of the present invention.

도 5 및 도 7을 참조하면, 문제 정보 획득부(310)는 복수의 규칙 이미지 패널(식별번호 1 내지 6에 대응) 및 복수의 정답 후보 이미지 패널(식별번호 9 내지 16에 대응)을 포함하는 문제 정보(710)를 입력 받을 수 있다.Referring to FIGS. 5 and 7, the problem information acquisition unit (310) can receive problem information (710) including a plurality of rule image panels (corresponding to identification numbers 1 to 6) and a plurality of correct answer candidate image panels (corresponding to identification numbers 9 to 16).

시각 추론부(320)는 문제 정보(710)에 포함된 복수의 정답 후보 이미지 패널 각각을 문제 행렬에 입력하여 규칙 추론 행렬(711)을 결정할 수 있다.The visual inference unit (320) can determine a rule inference matrix (711) by inputting each of the plurality of correct answer candidate image panels included in the problem information (710) into the problem matrix.

또한, 시각 추론부(320)는 규칙 추론 행렬(711)에 포함된 각각의 행을 사전 학습된 딥러닝 모델(720)에 입력하여 주어진 시각 정보로부터 규칙 추론에 관한 예측값을 출력할 수 있다.In addition, the visual inference unit (320) can input each row included in the rule inference matrix (711) into a pre-learned deep learning model (720) to output a prediction value regarding rule inference from the given visual information.

구체적으로, 시각 추론부(320)는 사전 학습된 딥러닝 모델(720)을 이용하여 규칙 추론 행렬(711)을 입력 받아 출력하는 예측값이 가장 높은 정답 후보 이미지 패널을 타겟 이미지 패널에 대응되는 패널로 결정할 수 있다.Specifically, the visual inference unit (320) can use a pre-learned deep learning model (720) to input a rule inference matrix (711) and determine the correct answer candidate image panel with the highest output predicted value as the panel corresponding to the target image panel.

여기서, 규칙 추론 행렬(711)에 포함된 제3 행에 대한 예측값(721)이 0.87로 가장 높으므로, 시각 추론부(320)는 식별 번호 9에 대응되는 정답 후보 이미지 패널을 주어진 시각 정보로부터 규칙을 추론하기 위한 타겟 이미지 패널로 결정할 수 있다.Here, since the prediction value (721) for the third row included in the rule inference matrix (711) is the highest at 0.87, the visual inference unit (320) can determine the correct answer candidate image panel corresponding to the identification number 9 as the target image panel for inferring the rule from the given visual information.

본 발명에 첨부된 블록도의 각 블록과 흐름도의 각 단계의 조합들은 컴퓨터 프로그램 인스트럭션들에 의해 수행될 수도 있다. 이들 컴퓨터 프로그램 인스트럭션들은 범용 컴퓨터, 특수용 컴퓨터 또는 기타 프로그램 가능한 데이터 프로세싱 장비의 인코딩 프로세서에 탑재될 수 있으므로, 컴퓨터 또는 기타 프로그램 가능한 데이터 프로세싱 장비의 인코딩 프로세서를 통해 수행되는 그 인스트럭션들이 블록도의 각 블록 또는 흐름도의 각 단계에서 설명된 기능들을 수행하는 수단을 생성하게 된다. 이들 컴퓨터 프로그램 인스트럭션들은 특정 방법으로 기능을 구현하기 위해 컴퓨터 또는 기타 프로그램 가능한 데이터 프로세싱 장비를 지향할 수 있는 컴퓨터 이용 가능 또는 컴퓨터 판독 가능 메모리에 저장되는 것도 가능하므로, 그 컴퓨터 이용가능 또는 컴퓨터 판독 가능 메모리에 저장된 인스트럭션들은 블록도의 각 블록 또는 흐름도 각 단계에서 설명된 기능을 수행하는 인스트럭션 수단을 내포하는 제조 품목을 생산하는 것도 가능하다. 컴퓨터 프로그램 인스트럭션들은 컴퓨터 또는 기타 프로그램 가능한 데이터 프로세싱 장비 상에 탑재되는 것도 가능하므로, 컴퓨터 또는 기타 프로그램 가능한 데이터 프로세싱 장비 상에서 일련의 동작 단계들이 수행되어 컴퓨터로 실행되는 프로세스를 생성해서 컴퓨터 또는 기타 프로그램 가능한 데이터 프로세싱 장비를 수행하는 인스트럭션들은 블록도의 각 블록 및 흐름도의 각 단계에서 설명된 기능들을 실행하기 위한 단계들을 제공하는 것도 가능하다.The combination of each block of the block diagram and each step of the flow diagram attached to the present invention may be performed by computer program instructions. These computer program instructions may be installed in an encoding processor of a general-purpose computer, a special-purpose computer, or other programmable data processing equipment, so that the instructions executed by the encoding processor of the computer or other programmable data processing equipment create a means for performing the functions described in each block of the block diagram or each step of the flow diagram. These computer program instructions may also be stored in a computer-available or computer-readable memory that can be directed to a computer or other programmable data processing equipment to implement the functions in a specific manner, so that the instructions stored in the computer-available or computer-readable memory can also produce an article of manufacture that includes an instruction means for performing the functions described in each block of the block diagram or each step of the flow diagram. Since the computer program instructions can also be installed on a computer or other programmable data processing apparatus, a series of operational steps are performed on the computer or other programmable data processing apparatus to create a computer-executable process, and the instructions that cause the computer or other programmable data processing apparatus to perform the steps for executing the functions described in each block of the block diagram and each step of the flowchart can also provide steps for executing the functions described in each block of the block diagram and each step of the flowchart.

또한, 각 블록 또는 각 단계는 특정된 논리적 기능(들)을 실행하기 위한 하나 이상의 실행 가능한 인스트럭션들을 포함하는 모듈, 세그먼트 또는 코드의 일부를 나타낼 수 있다. 또, 몇 가지 대체 실시예들에서는 블록들 또는 단계들에서 언급된 기능들이 순서를 벗어나서 발생하는 것도 가능함을 주목해야 한다. 예컨대, 잇달아 도시되어 있는 두 개의 블록들 또는 단계들은 사실 실질적으로 동시에 수행되는 것도 가능하고 또는 그 블록들 또는 단계들이 때때로 해당하는 기능에 따라 역순으로 수행되는 것도 가능하다.Additionally, each block or step may represent a module, segment, or portion of code that includes one or more executable instructions for performing a specified logical function(s). It should also be noted that in some alternative embodiments, the functions mentioned in the blocks or steps may occur out of order. For example, two blocks or steps depicted in succession may in fact be performed substantially concurrently, or the blocks or steps may sometimes be performed in reverse order, depending on the functionality they serve.

이상의 설명은 본 발명의 기술 사상을 예시적으로 설명한 것에 불과한 것으로서, 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자라면 본 발명의 본질적인 품질에서 벗어나지 않는 범위에서 다양한 수정 및 변형이 가능할 것이다. 따라서, 본 발명에 개시된 실시예들은 본 발명의 기술 사상을 한정하기 위한 것이 아니라 설명하기 위한 것이고, 이러한 실시예에 의하여 본 발명의 기술 사상의 범위가 한정되는 것은 아니다. 본 발명의 보호 범위는 아래의 청구범위에 의하여 해석되어야 하며, 그와 균등한 범위 내에 있는 모든 기술사상은 본 발명의 권리범위에 포함되는 것으로 해석되어야 할 것이다.The above description is merely an illustrative description of the technical idea of the present invention, and those skilled in the art will appreciate that various modifications and variations may be made without departing from the essential quality of the present invention. Accordingly, the embodiments disclosed in the present invention are not intended to limit the technical idea of the present invention but to explain it, and the scope of the technical idea of the present invention is not limited by these embodiments. The protection scope of the present invention should be interpreted by the following claims, and all technical ideas within a scope equivalent thereto should be interpreted as being included in the scope of the rights of the present invention.

100: 딥러닝 모델 학습 장치
200: 딥러닝 모델 학습 프로그램
210: 문제 정보 획득부
220: 학습 행렬 정보 결정부
230: 딥러닝 모델 학습부
300: 시각 추론 프로그램
310: 문제 정보 획득부
320: 시각 추론부100: Deep Learning Model Training Device
200: Deep Learning Model Training Program
210: Problem information acquisition department
220: Learning matrix information decision unit
230: Deep Learning Model Training Section
300: Visual Reasoning Program
310: Problem information acquisition department
320: Visual Reasoning Unit

Claims

A method for training a deep learning model to infer rules from visual information.
A step of obtaining first problem information including first rule image information and first correct answer candidate image information;
A step of determining first learning matrix information including first correct answer matrix information, first correct answer candidate matrix information, and first incorrect answer matrix information from the first problem information by using second correct answer candidate image information included in the second problem information;
A step of determining first pseudo-label information based on the first learning matrix information; and
A step of training the deep learning model to classify the first correct answer matrix information, the first correct answer candidate matrix information, and the first incorrect answer matrix information by inputting the first learning matrix information based on the first pseudo-label information,
The step of determining the above first learning matrix information is:
A step of changing the first correct answer candidate image information by replacing at least some of the plurality of first correct answer candidate image panels with at least some of the plurality of second correct answer candidate image panels included in the second correct answer candidate image information based on a preset value,
The second problem information is characterized in that it has visual information of a different domain from the first problem information.
How to train a deep learning model.

In the first paragraph,
The above first rule image information includes six first rule image panels representing rules regarding objects, properties and relationships,
The above first problem information includes a first problem matrix of size 3X3, which is configured to infer a first target image panel to be located next to two first problem image panels based on the six first rule image panels,
The above first problem matrix is expressed as mathematical expression 1.
How to train a deep learning model.
[Mathematical formula 1]

(Here, is the first rule image panel, is the first problem image panel, refers to the first target image panel, and there is a common rule for each row of the first problem matrix.)

In the second paragraph,
The above first problem matrix is configured such that the first target image panel is determined as one of a plurality of first correct answer candidate image panels included in the first correct answer candidate image information.
How to train a deep learning model.

delete

In the third paragraph,
The step of determining the above first learning matrix information is:
Including a step of determining a first learning matrix by inputting a plurality of first correct answer candidate image panels and a plurality of second correct answer candidate image panels included in the changed first correct answer candidate image information into the first target image panel,
The above first learning matrix is expressed as mathematical expression 2.
How to train a deep learning model.
[Mathematical formula 2]

(Here, N is the number of rows of the first learning matrix, refers to the first correct answer candidate image panel, means the second correct answer candidate image panel, M means the number of first correct answer candidate image panels included in the changed first correct answer candidate image information, all first question image panels existing in column 1 are the same, and all first question image panels existing in column 2 are the same.)

In clause 5,
The step of determining the first medical label information is:
A step of extracting features for each row included in the first learning matrix;
A step of determining a binarized hard label based on the features for each row included in the first learning matrix; and
A step of determining the first pseudo-label information including a soft label for the first correct answer matrix information and a soft label for the first correct answer candidate matrix information from the binarized hard label using label smoothing.
How to train a deep learning model.

In Article 6,
The steps for training the above deep learning model are:
A step of training the deep learning model through backpropagation to minimize a loss function determined using the features for each row included in the first learning matrix and the first pseudo-label information.
How to train a deep learning model.

In Article 7,
The above first correct answer matrix information includes information about a row that contains only the first rule image panel among each row included in the first learning matrix,
The above first correct answer candidate matrix information includes information about a row that includes the first correct answer candidate image panel among each row included in the first learning matrix,
The above first error matrix information includes information about a row that includes the second correct answer candidate image panel among each row included in the first learning matrix.
How to train a deep learning model.

A visual inference method using a pre-trained deep learning model,
A step of obtaining problem information including rule image information and correct answer candidate image information; and
A step of determining a panel corresponding to a target image panel among the candidate image panels included in the candidate image information using a pre-trained deep learning model,
The above deep learning model is,
A step of obtaining first problem information including first rule image information and first correct answer candidate image information;
A step of determining first learning matrix information including first correct answer matrix information, first correct answer candidate matrix information, and first incorrect answer matrix information from the first problem information by using second correct answer candidate image information included in the second problem information;
A step of determining first pseudo-label information based on the first learning matrix information; and
A step of training the deep learning model to classify the first correct answer matrix information, the first correct answer candidate matrix information, and the first incorrect answer matrix information by inputting the first learning matrix information based on the first pseudo-label information,
The step of determining the above first learning matrix information is:
A step of changing the first correct answer candidate image information by replacing at least some of the plurality of first correct answer candidate image panels with at least some of the plurality of second correct answer candidate image panels included in the second correct answer candidate image information based on a preset value,
The above second problem information is pre-learned by a learning process characterized by having visual information of a different domain from the above first problem information.
Visual reasoning methods.

A device that trains a deep learning model to infer rules from visual information.
Memory where the deep learning model training program is stored; and
A processor for loading the deep learning model training program from the memory and executing the deep learning model training program,
The above processor,
Obtain first problem information including first rule image information and first correct answer candidate image information,
Using the second correct answer candidate image information included in the second problem information, first learning matrix information including the first correct answer matrix information, the first correct answer candidate matrix information, and the first incorrect answer matrix information is determined from the first problem information,
Determine the first pseudo-label information based on the first learning matrix information,
Based on the first pseudo-label information, the deep learning model is trained to receive the first learning matrix information and classify the first correct answer matrix information, the first correct answer candidate matrix information, and the first incorrect answer matrix information.
The above processor,
By replacing at least some of the plurality of first correct answer candidate image panels with at least some of the plurality of second correct answer candidate image panels included in the second correct answer candidate image information, based on a preset value, the first correct answer candidate image information is changed,
The second problem information is characterized in that it has visual information of a different domain from the first problem information.
Deep learning model training device.

In Article 10,
The above first rule image information includes six first rule image panels representing rules regarding objects, properties and relationships,
The above first problem information includes a first problem matrix of size 3X3, which is configured to infer a first target image panel to be located next to two first problem image panels based on the six first rule image panels,
The above first problem matrix is expressed as mathematical expression 1.
Deep learning model training device.
[Mathematical formula 1]

(Here, is the first rule image panel, is the first problem image panel, refers to the first target image panel, and there is a common rule for each row of the first problem matrix.)

In Article 11,
The above first problem matrix is configured such that the first target image panel is determined as one of a plurality of first correct answer candidate image panels included in the first correct answer candidate image information.
Deep learning model training device.

delete

In Article 12,
The above processor,
A first learning matrix is determined by inputting a plurality of first correct answer candidate image panels and a plurality of second correct answer candidate image panels included in the changed first correct answer candidate image information into the first target image panel, respectively.
The above first learning matrix is expressed as mathematical expression 2.
Deep learning model training device.
[Mathematical formula 2]

(Here, N is the number of rows of the first learning matrix, refers to the first correct answer candidate image panel, means the second correct answer candidate image panel, M means the number of first correct answer candidate image panels included in the changed first correct answer candidate image information, all first question image panels existing in column 1 are the same, and all first question image panels existing in column 2 are the same.)

In Article 14,
The above processor,
Extract features for each row included in the first learning matrix,
Determine a binarized hard label based on the features for each row included in the first learning matrix,
Using label smoothing, the first pseudo-label information including a soft label for the first correct answer matrix information and a soft label for the first correct answer candidate matrix information are determined from the binarized hard labels.
Deep learning model training device.

In Article 15,
The above processor,
The deep learning model is trained through backpropagation to minimize the loss function determined using the features for each row included in the first learning matrix and the first pseudo-label information.
Deep learning model training device.

In Article 16,
The above first correct answer matrix information includes information about a row that contains only the first rule image panel among each row included in the first learning matrix,
The above first correct answer candidate matrix information includes information about a row that includes the first correct answer candidate image panel among each row included in the first learning matrix,
The above first error matrix information includes information about a row that includes the second correct answer candidate image panel among each row included in the first learning matrix.
Deep learning model training device.

As a visual inference device using a pre-trained deep learning model,
Memory where the visual reasoning program is stored; and
A processor for loading the visual reasoning program from the memory and executing the visual reasoning program,
The above processor,
Enter problem information including rule image information and answer candidate image information,
Using a pre-trained deep learning model, a panel corresponding to a target image panel is determined among the candidate image panels included in the candidate image information.
The above deep learning model is,
A step of obtaining first problem information including first rule image information and first correct answer candidate image information;
A step of determining first learning matrix information including first correct answer matrix information, first correct answer candidate matrix information, and first incorrect answer matrix information from the first problem information by using second correct answer candidate image information included in the second problem information;
A step of determining first pseudo-label information based on the first learning matrix information; and
A step of training the deep learning model to classify the first correct answer matrix information, the first correct answer candidate matrix information, and the first incorrect answer matrix information by inputting the first learning matrix information based on the first pseudo-label information,
The step of determining the above first learning matrix information is:
A step of changing the first correct answer candidate image information by replacing at least some of the plurality of first correct answer candidate image panels with at least some of the plurality of second correct answer candidate image panels included in the second correct answer candidate image information based on a preset value,
The above second problem information is pre-learned by a learning process characterized by having visual information of a different domain from the above first problem information.
Visual reasoning device.

A computer-readable recording medium storing a computer program,
The above computer program, when executed by a processor,
A step of obtaining first problem information including first rule image information and first correct answer candidate image information;
A step of determining first learning matrix information including first correct answer matrix information, first correct answer candidate matrix information, and first incorrect answer matrix information from the first problem information by using second correct answer candidate image information included in the second problem information;
A step of determining first pseudo-label information based on the first learning matrix information; and
Based on the first pseudo-label information, a step of training a deep learning model to receive the first learning matrix information and classify the first correct answer matrix information, the first correct answer candidate matrix information, and the first incorrect answer matrix information is included.
The step of determining the above first learning matrix information is:
A step of changing the first correct answer candidate image information by replacing at least some of the plurality of first correct answer candidate image panels with at least some of the plurality of second correct answer candidate image panels included in the second correct answer candidate image information based on a preset value,
The second problem information is characterized in that it has visual information of a different domain from the first problem information.
A method comprising instructions for causing the processor to perform a method of training a deep learning model to infer rules from visual information.
Computer readable recording medium.

A computer program stored on a computer-readable recording medium,
The above computer program, when executed by a processor,
A step of obtaining first problem information including first rule image information and first correct answer candidate image information;
A step of determining first learning matrix information including first correct answer matrix information, first correct answer candidate matrix information, and first incorrect answer matrix information from the first problem information by using second correct answer candidate image information included in the second problem information;
A step of determining first pseudo-label information based on the first learning matrix information; and
Based on the first pseudo-label information, a step of training a deep learning model to receive the first learning matrix information and classify the first correct answer matrix information, the first correct answer candidate matrix information, and the first incorrect answer matrix information is included.
The step of determining the above first learning matrix information is:
A step of changing the first correct answer candidate image information by replacing at least some of the plurality of first correct answer candidate image panels with at least some of the plurality of second correct answer candidate image panels included in the second correct answer candidate image information based on a preset value,
The second problem information is characterized in that it has visual information of a different domain from the first problem information.
A method comprising instructions for causing the processor to perform a method of training a deep learning model to infer rules from visual information.
Computer program.