KR102698908B1

KR102698908B1 - Methods for quickly detecting information within incoming documents in security solutions

Info

Publication number: KR102698908B1
Application number: KR1020240033689A
Authority: KR
Inventors: 나우성
Original assignee: 시큐레터 주식회사
Priority date: 2024-03-11
Filing date: 2024-03-11
Publication date: 2024-08-26
Anticipated expiration: 2044-03-11

Abstract

본 명세서는 서버가 유입된 문서 내부 정보를 검사하여 사전 필터링하기 위한 방법에 있어서, 분석 대상 파일을 입력받는 단계; 상기 분석 대상 파일을 정적 분석 엔진으로 검사하는 단계; 상기 정적 분석 엔진의 검사 결과가 정상인 것에 근거하여, 상기 분석 대상 파일을 필터 엔진으로 검사하는 단계; 상기 필터 엔진을 통해, 상기 분석 대상 파일의 필터 항목의 존재 여부를 검사하는 단계; 및 상기 필터 항목이 존재하는 것에 근거하여, 상기 분석 대상 파일을 동적 분석 엔진으로 검사하는 단계; 를 포함하며, 상기 분석 대상 파일은 HWP 파일의 형식을 포함할 수 있다.The present specification relates to a method for pre-filtering by examining internal information of an incoming document by a server, comprising: a step of receiving an analysis target file; a step of examining the analysis target file with a static analysis engine; a step of examining the analysis target file with a filter engine based on the inspection result of the static analysis engine being normal; a step of examining whether a filter item of the analysis target file exists through the filter engine; and a step of examining the analysis target file with a dynamic analysis engine based on the existence of the filter item; wherein the analysis target file may include a format of an HWP file.

Description

{ METHODS FOR QUICKLY DETECTING INFORMATION WITHIN INCOMING DOCUMENTS IN SECURITY SOLUTIONS }

본 명세서는 유입된 문서 내부 정보를 검사하여 보안솔루션에서 빠르게 탐지할 수 있는 방법 및 장치에 관한 것이다.This specification relates to a method and device for quickly detecting information inside an imported document by examining the information inside the document.

지능형 지속 위협(APT: Advanced Persistent Threat) 공격은 공격자가 특정 타깃을 정하고 목표한 정보를 빼내기 위해 고도의 공격기법을 적용하여 지속적으로 다양한 형태의 악성 코드를 활용한다.Advanced Persistent Threat (APT) attacks are attacks in which attackers continuously use various forms of malware by applying advanced attack techniques to target specific targets and extract targeted information.

특히 APT 공격은 초기 침입단계에서 탐지하지 못하는 경우가 많으며, 주로 악성 코드를 포함하는 비실행((Non-PE: Non-Portable Executable) 파일을 이용하는 경우가 많다.In particular, APT attacks are often not detected in the initial intrusion stage, and often use non-executable (Non-PE: Non-Portable Executable) files containing malicious code.

실제 사용되는 대부분의 비실행 파일들은 정상파일임에도 보안 솔루션은 적은 비율의 악성 문서를 탐지하기 위해, 모든 파일에 대해 모든 탐지 방법을 적용하여 검사를 수행하고 있다. 특히 샌드박스 기반의 동적 분석 방식은 정적 분석방식보다 정확하지만 상대적으로 분석 시간이 더 많이 소요된다.Although most non-executable files used in practice are normal files, security solutions perform inspections by applying all detection methods to all files in order to detect a small percentage of malicious documents. In particular, sandbox-based dynamic analysis methods are more accurate than static analysis methods, but they take relatively more time to analyze.

본 명세서의 목적은, HWP 파일을 동적 분석하기 전에 사전 필터링을 적용하는 방법을 제안한다.The purpose of this specification is to propose a method for applying pre-filtering before dynamic analysis of HWP files.

본 명세서가 이루고자 하는 기술적 과제들은 이상에서 언급한 기술적 과제들로 제한되지 않으며, 언급되지 않은 또 다른 기술적 과제들은 이하의 명세서의 상세한 설명으로부터 본 명세서가 속하는 기술분야에서 통상의 지식을 가진 자에게 명확하게 이해될 수 있을 것이다.The technical problems to be solved by this specification are not limited to the technical problems mentioned above, and other technical problems not mentioned can be clearly understood by a person having ordinary skill in the technical field to which this specification belongs from the detailed description of the specification below.

본 명세서의 일 양상은, 서버가 유입된 문서 내부 정보를 검사하여 사전 필터링하기 위한 방법에 있어서, 분석 대상 파일을 입력받는 단계; 상기 분석 대상 파일을 정적 분석 엔진으로 검사하는 단계; 상기 정적 분석 엔진의 검사 결과가 정상인 것에 근거하여, 상기 분석 대상 파일을 필터 엔진으로 검사하는 단계; 상기 필터 엔진을 통해, 상기 분석 대상 파일의 필터 항목의 존재 여부를 검사하는 단계; 및 상기 필터 항목이 존재하는 것에 근거하여, 상기 분석 대상 파일을 동적 분석 엔진으로 검사하는 단계; 를 포함하며, 상기 분석 대상 파일은 HWP 파일의 형식을 포함할 수 있다.One aspect of the present specification is a method for pre-filtering by examining internal information of an incoming document by a server, comprising: a step of receiving an analysis target file; a step of examining the analysis target file with a static analysis engine; a step of examining the analysis target file with a filter engine based on the inspection result of the static analysis engine being normal; a step of examining whether a filter item of the analysis target file exists through the filter engine; and a step of examining the analysis target file with a dynamic analysis engine based on the existence of the filter item; wherein the analysis target file may include a format of an HWP file.

또한, 상기 필터 항목의 존재 여부를 검사하는 단계는 상기 HWP 파일의 크기가 기준값을 존재하는 것에 근거하여, 상기 필터 항목이 존재한다고 판단하는 단계; 를 포함할 수 있다.In addition, the step of checking whether the filter item exists may include a step of determining that the filter item exists based on the existence of a reference value in the size of the HWP file.

또한, 상기 필터 항목의 존재 여부를 검사하는 단계는 상기 HWP 파일의 내부에 삽입된 파일이 있는 것에 근거하여, 상기 필터 항목이 존재한다고 판단하는 단계; 를 포함할 수 있다.Additionally, the step of checking whether the filter item exists may include a step of determining that the filter item exists based on the presence of a file inserted inside the HWP file.

또한, 상기 HWP 파일의 내부에 삽입된 파일이 있는 것에 근거하여, 상기 필터 항목이 존재한다고 판단하는 단계는 BinData 스토리지를 식별하는 단계; 및 .OLE 확장자를 갖는 스트림을 식별하는 단계; 를 포함할 수 있다.Additionally, the step of determining that the filter item exists based on the presence of a file inserted inside the HWP file may include the steps of identifying BinData storage; and identifying a stream having an .OLE extension.

또한, 상기 필터 항목의 존재 여부를 검사하는 단계는 상기 HWP 파일의 내부에 삽입된 스크립트가 있는 것에 근거하여, 상기 필터 항목이 존재한다고 판단하는 단계; 를 포함할 수 있다.Additionally, the step of checking whether the filter item exists may include a step of determining that the filter item exists based on the presence of a script inserted inside the HWP file.

또한, 상기 HWP 파일의 내부에 삽입된 스크립트가 있는 것에 근거하여, 상기 필터 항목이 존재한다고 판단하는 단계는 BinData 스토리지를 식별하는 단계; 및 .eps 또는.ps 확장자를 갖는 스트림을 식별하는 단계; 를 포함할 수 있다.Additionally, the step of determining that the filter item exists based on the presence of a script inserted inside the HWP file may include the steps of identifying BinData storage; and identifying a stream having an .eps or .ps extension.

또한, 상기 필터 항목의 존재 여부를 검사하는 단계는 기설정된 취약점과 관련된 개체들의 목록에 근거하여, 상기 HWP 파일의 내부에 상기 취약점과 관련된 개체가 존재하는 것에 근거하여, 상기 필터 항목이 존재한다고 판단하는 단계; 를 포함할 수 있다.In addition, the step of checking whether the filter item exists may include a step of determining that the filter item exists based on the existence of an object related to the vulnerability within the HWP file, based on a list of objects related to the preset vulnerability.

또한, 상기 취약점과 관련된 개체들의 목록은 .swf 확장자를 갖는 플래시 이미지 파일을 포함할 수 있다.Additionally, the list of objects associated with the above vulnerability may include flash image files with the .swf extension.

또한, 상기 필터 항목의 존재 여부를 검사하는 단계는 상기 분석 대상 파일을 바이너리로 읽어드리는 단계; 상기 바이너리 상에 상기 취약점과 관련된 쉘코드가 존재하는 것에 근거하여, 상기 필터 항목이 존재한다고 판단하는 단계; 를 더 포함할 수 있다.In addition, the step of checking whether the filter item exists may further include the step of reading the analysis target file as a binary; and the step of determining that the filter item exists based on the presence of shellcode related to the vulnerability in the binary.

또한, 상기 필터 항목의 존재 여부를 검사하는 단계는 상기 분석 대상 파일의 정보 엔트로피가 기준값을 초과하는 것에 근거하여, 상기 필터 항목이 존재한다고 판단하는 단계; 를 더 포함할 수 있다.In addition, the step of checking whether the filter item exists may further include a step of determining that the filter item exists based on the information entropy of the analysis target file exceeding a reference value.

또한, 상기 필터 항목의 존재 여부를 검사하는 단계는 상기 분석 대상 파일의 이미지를 추출하는 단계; OCR을 통해, 상기 이미지에서 텍스트를 추출하는 단계; 및 상기 텍스트가 공공 문서와 관련된 것에 근거하여, 상기 필터 항목이 존재한다고 판단하는 단계; 를 더 포함할 수 있다.In addition, the step of checking whether the filter item exists may further include the steps of: extracting an image of the analysis target file; extracting text from the image through OCR; and determining that the filter item exists based on the text being related to a public document.

본 명세서의 또 다른 일 양상은, 유입된 문서 내부 정보를 검사하여 사전 필터링하기 위한 서버에 있어서, 통신부; 정적 분석 엔진, 필터 엔진, 및 동적 분석 엔진이 포함된 메모리; 및 상기 통신부 및 상기 메모리를 기능적으로 제어하는 프로세서; 를 포함하고, 상기 프로세서는 분석 대상 파일을 입력받고, 상기 분석 대상 파일을 상기 정적 분석 엔진으로 검사하며, 상기 정적 분석 엔진의 검사 결과가 정상인 것에 근거하여, 상기 분석 대상 파일을 상기 필터 엔진으로 검사하고, 상기 필터 엔진을 통해, 상기 분석 대상 파일의 필터 항목의 존재 여부를 검사하며, 상기 필터 항목이 존재하는 것에 근거하여, 상기 분석 대상 파일을 상기 동적 분석 엔진으로 검사하며, 상기 분석 대상 파일은 HWP 파일의 형식을 포함할 수 있다.Another aspect of the present specification is a server for inspecting and pre-filtering internal information of an imported document, comprising: a communication unit; a memory including a static analysis engine, a filter engine, and a dynamic analysis engine; and a processor functionally controlling the communication unit and the memory; wherein the processor receives an analysis target file, inspects the analysis target file with the static analysis engine, and inspects the analysis target file with the filter engine based on a normal inspection result of the static analysis engine, and inspects whether a filter item of the analysis target file exists through the filter engine, and inspects the analysis target file with the dynamic analysis engine based on the presence of the filter item, and the analysis target file may include a format of an HWP file.

본 명세서의 실시예에 따르면, HWP 파일을 무해화 하기 위해 필터링을 수행할 수 있다.According to embodiments of the present specification, filtering can be performed to sanitize HWP files.

본 명세서에서 얻을 수 있는 효과는 이상에서 언급한 효과로 제한되지 않으며, 언급하지 않은 또 다른 효과들은 아래의 기재로부터 본 명세서가 속하는 기술분야에서 통상의 지식을 가진 자에게 명확하게 이해될 수 있을 것이다.The effects that can be obtained from this specification are not limited to the effects mentioned above, and other effects that are not mentioned will be clearly understood by a person having ordinary skill in the art to which this specification belongs from the description below.

도 1은 본 명세서와 관련된 서버 또는 클라이언트를 나타내는 도면이다.
도 2는 본 명세서에 적용될 수 있는 비정상 입력의 예시이다.
도 3는 본 명세서가 적용될 수 있는 분석 방법을 예시한다.
본 명세서에 관한 이해를 돕기 위해 상세한 설명의 일부로 포함되는, 첨부 도면은 본 명세서에 대한 실시예를 제공하고, 상세한 설명과 함께 본 명세서의 기술적 특징을 설명한다.FIG. 1 is a diagram illustrating a server or client related to this specification.
Figure 2 is an example of abnormal input that can be applied to this specification.
Figure 3 illustrates an analysis method to which the present specification can be applied.
The accompanying drawings, which are incorporated in and are intended to provide an aid to the understanding of the present specification and are a part of the detailed description, illustrate embodiments of the present specification and, together with the detailed description, serve to explain the technical features of the present specification.

이하, 첨부된 도면을 참조하여 본 명세서에 개시된 실시예를 상세히 설명하되, 도면 부호에 관계없이 동일하거나 유사한 구성요소는 동일한 참조 번호를 부여하고 이에 대한 중복되는 설명은 생략하기로 한다. 이하의 설명에서 사용되는 구성요소에 대한 접미사 "모듈" 및 "부"는 명세서 작성의 용이함만이 고려되어 부여되거나 혼용되는 것으로서, 그 자체로 서로 구별되는 의미 또는 역할을 갖는 것은 아니다. 또한, 본 명세서에 개시된 실시예를 설명함에 있어서 관련된 공지 기술에 대한 구체적인 설명이 본 명세서에 개시된 실시예의 요지를 흐릴 수 있다고 판단되는 경우 그 상세한 설명을 생략한다. 또한, 첨부된 도면은 본 명세서에 개시된 실시예를 쉽게 이해할 수 있도록 하기 위한 것일 뿐, 첨부된 도면에 의해 본 명세서에 개시된 기술적 사상이 제한되지 않으며, 본 명세서의 사상 및 기술 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다. Hereinafter, the embodiments disclosed in this specification will be described in detail with reference to the attached drawings. Regardless of the drawing symbols, identical or similar components will be given the same reference numerals and redundant descriptions thereof will be omitted. The suffixes "module" and "part" used for components in the following description are assigned or used interchangeably only for the convenience of writing the specification, and do not have distinct meanings or roles in themselves. In addition, when describing the embodiments disclosed in this specification, if it is determined that a specific description of a related known technology may obscure the gist of the embodiments disclosed in this specification, the detailed description thereof will be omitted. In addition, the attached drawings are only intended to facilitate easy understanding of the embodiments disclosed in this specification, and the technical ideas disclosed in this specification are not limited by the attached drawings, and should be understood to include all modifications, equivalents, and substitutes included in the spirit and technical scope of this specification.

제1, 제2 등과 같이 서수를 포함하는 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 상기 구성요소들은 상기 용어들에 의해 한정되지는 않는다. 상기 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다.Terms that include ordinal numbers, such as first, second, etc., may be used to describe various components, but the components are not limited by the terms. The terms are used only to distinguish one component from another.

어떤 구성요소가 다른 구성요소에 "연결되어" 있다거나 "접속되어" 있다고 언급된 때에는, 그 다른 구성요소에 직접적으로 연결되어 있거나 또는 접속되어 있을 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 할 것이다. 반면에, 어떤 구성요소가 다른 구성요소에 "직접 연결되어" 있다거나 "직접 접속되어" 있다고 언급된 때에는, 중간에 다른 구성요소가 존재하지 않는 것으로 이해되어야 할 것이다.When it is said that a component is "connected" or "connected" to another component, it should be understood that it may be directly connected or connected to that other component, but that there may be other components in between. On the other hand, when it is said that a component is "directly connected" or "directly connected" to another component, it should be understood that there are no other components in between.

단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다.Singular expressions include plural expressions unless the context clearly indicates otherwise.

본 명세서에서, "포함한다" 또는 "가지다" 등의 용어는 명세서상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.In this specification, it should be understood that terms such as “comprises” or “has” are intended to specify the presence of a feature, number, step, operation, component, part or combination thereof described in the specification, but do not exclude in advance the possibility of the presence or addition of one or more other features, numbers, steps, operations, components, parts or combinations thereof.

또한, 명세서에서 사용되는 "부"라는 용어는 소프트웨어 또는 하드웨어 구성요소를 의미하며, "부"는 어떤 역할들을 수행한다. 그렇지만 "부"는 소프트웨어 또는 하드웨어에 한정되는 의미는 아니다. "부"는 어드레싱할 수 있는 저장 매체에 있도록 구성될 수도 있고 하나 또는 그 이상의 프로세서들을 재생시키도록 구성될 수도 있다. 따라서, 일 예로서 "부"는 소프트웨어 구성요소들, 객체지향 소프트웨어 구성요소들, 클래스 구성요소들 및 태스크 구성요소들과 같은 구성요소들과, 프로세스들, 함수들, 속성들, 프로시저들, 서브루틴들, 프로그램 코드의 세그먼트들, 드라이버들, 펌웨어, 마이크로 코드, 회로, 데이터, 데이터베이스, 데이터 구조들, 테이블들, 어레이들 및 변수들을 포함한다. 구성요소들과 "부"들 안에서 제공되는 기능은 더 작은 수의 구성요소들 및 "부"들로 결합되거나 추가적인 구성요소들과 "부"들로 더 분리될 수 있다.Also, the term "part" used in the specification means a software or hardware component, and the "part" performs certain functions. However, the "part" is not limited to software or hardware. The "part" may be configured to be on an addressable storage medium and may be configured to execute one or more processors. Thus, by way of example, the "part" includes components such as software components, object-oriented software components, class components, and task components, and processes, functions, attributes, procedures, subroutines, segments of program code, drivers, firmware, microcode, circuitry, data, databases, data structures, tables, arrays, and variables. The functionality provided in the components and "parts" may be combined into a smaller number of components and "parts" or further separated into additional components and "parts."

또한, 본 명세서의 일 실시예에 따르면 "부"는 프로세서 및 메모리로 구현될 수 있다. 용어 "프로세서"는 범용 프로세서, 중앙 처리 장치 (CPU), 마이크로프로세서, 디지털 신호 프로세서 (DSP), 제어기, 마이크로제어기, 상태 머신 등을 포함하도록 넓게 해석되어야 한다. 몇몇 환경에서는, "프로세서"는 주문형 반도체 (ASIC), 프로그램가능 로직 디바이스 (PLD), 필드 프로그램가능 게이트 어레이 (FPGA) 등을 지칭할 수도 있다. 용어 "프로세서"는, 예를 들어, DSP 와 마이크로프로세서의 조합, 복수의 마이크로프로세서들의 조합, DSP 코어와 결합한 하나 이상의 마이크로프로세서들의 조합, 또는 임의의 다른 그러한 구성들의 조합과 같은 처리 디바이스들의 조합을 지칭할 수도 있다.Additionally, according to one embodiment of the present disclosure, the "unit" may be implemented as a processor and a memory. The term "processor" should be construed broadly to include a general purpose processor, a central processing unit (CPU), a microprocessor, a digital signal processor (DSP), a controller, a microcontroller, a state machine, and the like. In some environments, a "processor" may also refer to an application specific integrated circuit (ASIC), a programmable logic device (PLD), a field programmable gate array (FPGA), and the like. The term "processor" may also refer to a combination of processing devices, such as, for example, a combination of a DSP and a microprocessor, a combination of a plurality of microprocessors, a combination of one or more microprocessors in conjunction with a DSP core, or any other such configuration.

용어 "메모리"는 전자 정보를 저장 가능한 임의의 전자 컴포넌트를 포함하도록 넓게 해석되어야 한다. 용어 메모리는 임의 액세스 메모리 (RAM), 판독-전용 메모리 (ROM), 비-휘발성 임의 액세스 메모리 (NVRAM), 프로그램가능 판독-전용 메모리 (PROM), 소거-프로그램가능 판독 전용 메모리 (EPROM), 전기적으로 소거가능 PROM (EEPROM), 플래쉬 메모리, 자기 또는 광학 데이터 저장장치, 레지스터들 등과 같은 프로세서-판독가능 매체의 다양한 유형들을 지칭할 수도 있다. 프로세서가 메모리로부터 정보를 판독하고/하거나 메모리에 정보를 기록할 수 있다면 메모리는 프로세서와 전자 통신 상태에 있다고 불린다. 프로세서에 집적된 메모리는 프로세서와 전자 통신 상태에 있다.The term "memory" should be interpreted broadly to include any electronic component capable of storing electronic information. The term memory may also refer to various types of processor-readable media, such as random access memory (RAM), read-only memory (ROM), non-volatile random access memory (NVRAM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically erasable PROM (EEPROM), flash memory, magnetic or optical data storage, registers, etc. A memory is said to be in electronic communication with the processor if the processor can read information from and/or write information to the memory. Memory integrated in a processor is in electronic communication with the processor.

본 명세서에서 사용되는 "비실행 파일"이란 실행 파일 또는 실행 가능한 파일과 반대되는 개념으로서 자체적으로 실행되지 않는 파일을 의미한다. 예를 들어, 비실행 파일은 PDF 파일, 한글 파일, 워드 파일과 같은 문서 파일, JPG 파일과 같은 이미지 파일, 동영상 파일, 자바 스크립트 파일, HTML 파일 등이 될 수 있으나, 이에 한정되지 않는다.As used herein, the term "non-executable file" refers to a file that does not run on its own, as opposed to an executable file or an executable file. For example, non-executable files may be, but are not limited to, document files such as PDF files, Hangul files, and Word files, image files such as JPG files, video files, JavaScript files, and HTML files.

아래에서는 첨부한 도면을 참고하여 실시예에 대하여 본 명세서가 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 상세히 설명한다. 그리고 도면에서 본 개시를 명확하게 설명하기 위해서 설명과 관계없는 부분들은 생략될 수 있다.Below, with reference to the attached drawings, an embodiment is described in detail so that a person having ordinary skill in the art to which this specification pertains can easily practice the invention. In addition, in order to clearly explain the present disclosure in the drawings, parts that are not related to the description may be omitted.

도 1은 본 명세서와 관련된 서버 또는 클라이언트를 나타내는 도면이다.FIG. 1 is a diagram illustrating a server or client related to this specification.

본 명세서에서 서버(또는 클라우드 서버) 또는 클라이언트는 제어부(100) 및 통신부(130)를 포함할 수 있다. 제어부(100)는 프로세서(110) 및 메모리(120)를 포함할 수 있다. 프로세서(110)는 메모리(120)에 저장된 명령어들을 수행할 수 있다. 프로세서(110)는 통신부(130)를 제어할 수 있다.In this specification, a server (or cloud server) or a client may include a control unit (100) and a communication unit (130). The control unit (100) may include a processor (110) and a memory (120). The processor (110) may execute instructions stored in the memory (120). The processor (110) may control the communication unit (130).

프로세서(110)는 메모리(120)에 저장된 명령어에 기초하여 서버 또는 클라이언트의 동작을 제어할 수 있다. 서버 또는 클라이언트는 하나의 프로세서를 포함할 수 있고, 복수의 프로세서를 포함할 수 있다. 서버 또는 클라이언트가 복수의 프로세서를 포함하는 경우, 복수의 프로세서 중 적어도 일부는 물리적으로 이격된 거리에 위치할 수 있다. 또한, 서버 또는 클라이언트는 이에 한정되지 않고 알려진 다양한 방식으로 구현될 수 있다.The processor (110) can control the operation of the server or client based on the instructions stored in the memory (120). The server or client may include one processor or may include multiple processors. When the server or client includes multiple processors, at least some of the multiple processors may be located at a physically separated distance. In addition, the server or client is not limited thereto and may be implemented in various known ways.

통신부(130)는, 서버 또는 클라이언트와 무선 통신 시스템 사이, 서버 또는 클라이언트와 다른 서버 또는 클라이언트 사이, 또는 서버 또는 클라이언트와 외부서버 사이의 무선 통신을 가능하게 하는 하나 이상의 모듈을 포함할 수 있다. 또한, 통신부(110)는, 서버 또는 클라이언트를 하나 이상의 네트워크에 연결하는 하나 이상의 모듈을 포함할 수 있다.The communication unit (130) may include one or more modules that enable wireless communication between a server or client and a wireless communication system, between a server or client and another server or client, or between a server or client and an external server. In addition, the communication unit (110) may include one or more modules that connect the server or client to one or more networks.

제어부(100)는 메모리(120)에 저장된 응용 프로그램을 구동하기 위하여, 서버 또는 클라이언트의 구성요소들 중 적어도 일부를 제어할 수 있다. 나아가, 제어부(100)는 상기 응용 프로그램의 구동을 위하여, 서버 또는 클라이언트에 포함된 구성요소들 중 적어도 둘 이상을 서로 조합하여 동작 시킬 수 있다.The control unit (100) can control at least some of the components of the server or client in order to drive the application program stored in the memory (120). Furthermore, the control unit (100) can operate at least two or more of the components included in the server or client in combination with each other in order to drive the application program.

본 명세서에서 서버는 리버싱 엔진 또는/및 CDR 서비스를 제공하는 CDR 엔진을 포함할 수 있다.In this specification, a server may include a reversing engine and/or a CDR engine providing CDR services.

리버싱(Reversing) 엔진Reversing Engine

리버싱 엔진이란, 악성 비실행 파일에 대한 리버스 엔지니어링(리버싱) 과정을 자동화 한 분석/진단 엔진이다. A reversing engine is an analysis/diagnosis engine that automates the reverse engineering process for malicious non-executable files.

예를 들어, 리버싱 엔진은 다음의 단계를 수행할 수 있다.For example, a reversing engine might perform the following steps:

1. 파일 분석: 비실행 파일 자체의 외관(예를 들어, 속성, 작성자, 작성 날짜, 파일 타입)을 분석하는 단계로서, 일반 백신 프로그램과 유사하게 비실행 파일 자체의 정보만으로 악성여부를 진단할 수 있다.1. File analysis: This is the step of analyzing the appearance of the non-executable file itself (e.g., properties, author, creation date, file type), and similar to general antivirus programs, it can diagnose whether the non-executable file is malicious based on information alone.

2. 정적 분석: 비실행 파일 내의 데이터를 추출, 분석해서 정상, 악성 여부를 판별하는 단계로서, 비실행 파일은 실행하지 않고 파일 구조에 맞게 내부 데이터를 추출하여 비교 분석하여 악성여부를 진단할 수 있다. 이는 매크로, URL 추출 분석 등에 적합할 수 있다.2. Static analysis: This is the step of extracting and analyzing data within a non-executable file to determine whether it is normal or malicious. Non-executable files can be diagnosed as malicious by extracting internal data according to the file structure without executing it and comparing and analyzing it. This can be suitable for macro, URL extraction analysis, etc.

3. 동적 분석: 비실행 파일을 실행하고 모니터링하면서 행위를 분석하여 악성 여부를 판별하는 단계로서, 매크로, 하이퍼링크, DDE 등 정상기능을 이용한 악성 행위를 탐지하기에 용이하다.3. Dynamic analysis: This is the step of executing and monitoring non-executable files and analyzing their behavior to determine whether they are malicious. It is easy to detect malicious behavior using normal functions such as macros, hyperlinks, and DDE.

4. 디버깅 분석: 비실행 파일을 실행하고 디버깅하여 취약점, 익스플로잇 등을 분석하는 단계로서, 매크로, 하이퍼링크, DDE를 포함하여 문서 내 본문, 표, 폰트, 그림 등을 이용한 응용프로그램의 취약점을 탐지하기에 적합하다.4. Debugging Analysis: This is the step of analyzing vulnerabilities, exploits, etc. by executing and debugging non-executable files. It is suitable for detecting vulnerabilities in applications that use text, tables, fonts, and images within documents, including macros, hyperlinks, and DDE.

리버싱 엔진은 디버깅 분석에 사용될 수 있는 디버깅 엔진을 포함할 수 있다. 디버깅 엔진은 비실행 파일의 열람 과정을 디버깅하여 문서 입력, 처리, 출력단계에서 발생하는 취약점을 진단할 수 있다. 여기서 취약점이란, 응용프로그램이 응용프로그램의 개발자가 개발한 코드(로직)에서 예상하지 못한 값을 입력 받았을 때, 발생하는 오류, 버그 등을 이용하는 것으로서, 공격자는 취약점을 통해 비정상 종료로 인한 서비스 거부, 원격 코드 실행 등의 악성 행위를 실행할 수 있다.The reversing engine may include a debugging engine that can be used for debugging analysis. The debugging engine can diagnose vulnerabilities occurring in the document input, processing, and output stages by debugging the non-executable file viewing process. Here, a vulnerability refers to an error, bug, etc. that occurs when an application receives an unexpected value from the code (logic) developed by the application developer, and an attacker can perform malicious actions such as denial of service due to abnormal termination or remote code execution through the vulnerability.

CDR(Contents Disarm and Reconstruction)Contents Disarm and Reconstruction (CDR)

CDR 서비스는 비실행 파일을 분해해 악성파일 혹은 불필요한 파일을 제거하고 콘텐츠는 원본과 최대한 동일하게 하여, 새로운 파일을 만드는 솔루션이다.CDR service is a solution that decomposes non-executable files, removes malicious or unnecessary files, and creates new files with content that is as identical to the original as possible.

즉, Contents Disarm and Reconstruction(CDR)은 문서 내의 컨텐츠를 무해화(Disarm)하고 재조합(Reconstruction)하여 안전한 문서를 만들어 고객에게 제공하는 서비스를 의미하며, 무해화 대상 파일은 비실행 파일 일체(예를 들어, 워드, 엑셀, 파워포인트, 한글, HWP)를 대상으로 할 수 있으며, 무해화 대상 컨텐츠는 액티브 컨텐츠(예를 들어, 매크로, 하이퍼링크, OLE 객체 등)일 수 있다.That is, Contents Disarm and Reconstruction (CDR) refers to a service that disarms and reconstructs content within a document to create a safe document and provide it to customers. The target files for disarming can be any non-executable file (e.g., Word, Excel, PowerPoint, Hangul, HWP), and the target content for disarming can be active content (e.g., macros, hyperlinks, OLE objects, etc.).

도 2는 본 명세서에 적용될 수 있는 비정상 입력의 예시이다.Figure 2 is an example of abnormal input that can be applied to this specification.

도 2를 참조하면, 응용프로그램은 비실행 파일을 통해, 비정상적인 값(예를 들어, 입력값이 정상범위인 2를 초과하는 경우)을 입력 받는 경우, 개발자가 의도하지 않은 실행흐름으로 변경되어 취약점이 동작될 수 있다. 디버깅 엔진은 문서 열람 과정을 자동 디버깅하여 취약점과 관련된 특정 지점에 브레이크 포인트를 설정하고 입력값과 관련된 특정값을 확인하여 입력값이 취약점을 일으키는 값인지 아닌지 판별하여 악성 여부를 진단할 수 있다.Referring to Figure 2, if an application receives an abnormal value (for example, an input value exceeding the normal range of 2) through a non-executable file, the execution flow may change to one not intended by the developer, causing a vulnerability to be triggered. The debugging engine can automatically debug the document viewing process, set a breakpoint at a specific point related to the vulnerability, and check a specific value related to the input value to determine whether the input value is a value that causes a vulnerability or not, thereby diagnosing whether it is malicious.

보다 자세하게, 디버깅 엔진은 비실행 파일을 확인하고 이를 열람하기 위한 응용프로그램을 실행하여 디버깅을 시작할 수 있다. 비실행 파일을 열람하는 과정에서 모듈이 로드되면, 디버깅 엔진은 해당 모듈이 분석 대상 모듈인지 확인하고, 분석 대상이라면 지정된 주소에 브레이크 포인트를 설정할 수 있다.In more detail, the debugging engine can start debugging by checking the non-executable file and executing an application to view it. When a module is loaded during the process of viewing the non-executable file, the debugging engine can check whether the module is the target module for analysis, and if so, can set a breakpoint at the specified address.

예를 들어, 악성 비실행 파일은 응용프로그램의 버전이나 운영체제 환경 등의 특정 조건이 만족하지 않으면 응용프로그램을 종료하거나 아무런 악성 행위가 발생하지 않는 흐름으로 분기하는 분기 지점들을 가질 수 있다. 서버는 사전에 분석가에 의해 분석되어 이러한 가능성을 가지는 분기 지점에 브레이크 포인트를 설정할 수 있다. For example, a malicious non-executable file may have branch points that terminate the application or branch to a flow where no malicious activity occurs if certain conditions, such as the application version or operating system environment, are not met. The server can set breakpoints at branch points that have such possibilities by analyzing them in advance by analysts.

또한, 서버는 해당 분기 지점과 연관되어, 응용프로그램을 종료하지 않고 계속 실행하거나 악성 행위가 발생할 수 있는 흐름으로 유도할 수 있는 조건들을 설정할 수 있다.Additionally, the server can set conditions associated with that branch point that can cause the application to continue running without terminating or can lead to a flow that may result in malicious activity.

응용프로그램의 프로세스 실행 중 해당 브레이크 포인트 지점에서 프로세스가 멈춘 경우, 서버는 탐지 로직에 따라 취약점 여부를 탐지한 후, 결과를 분석 리포트에 저장하는 단계를 수행할 수 있다.If the process stops at the breakpoint point during the execution of the application process, the server can detect whether there is a vulnerability according to the detection logic and then perform a step of saving the result in an analysis report.

서버에 포함된 자동화 리버싱 엔진은 전술한 단계들을 자동으로 수행하면서 분석하여 분석가가 연구, 개발한 진단 알고리즘을 통해, 악성 비실행 파일을 진단하고 차단할 수 있다.The automated reversing engine included in the server can automatically perform the aforementioned steps and analyze them to diagnose and block malicious non-executable files through a diagnostic algorithm researched and developed by analysts.

도 3는 본 명세서가 적용될 수 있는 분석 방법을 예시한다.Figure 3 illustrates an analysis method to which the present specification can be applied.

도 3을 참조하면, 서버는 메모리 상에 정적 분석 (Static Analysis) 엔진, 필터룰이 적용된 필터 엔진, 및 동적 분석 엔진 (Dynamic Analysis)을 포함할 수 있다.Referring to FIG. 3, the server may include a static analysis engine, a filter engine with filter rules applied, and a dynamic analysis engine in memory.

정적 분석 엔진은 파일의 내부 구조나 코드를 분석하여 악성 코드의 특징을 식별할 수 있다. 예를 들어, 정적 분석 엔진은 비실행 파일의 헤더, 섹션, 문자열, 바이너리 코드 등을 조사하여 악성 행위를 알아낼 수 있다. 이러한 정적 분석은 파일을 실행하지 않고도 악성 코드를 탐지할 수 있으므로, 본 명세서에서 서버는 정적 분석 엔진을 이용하여, 정적 분석을 통해 탐지할 수 있는 악성 코드를 빠르게 먼저 탐지할 수 있다.A static analysis engine can identify the characteristics of malicious code by analyzing the internal structure or code of a file. For example, a static analysis engine can identify malicious behavior by examining the header, section, string, binary code, etc. of a non-executable file. Since such static analysis can detect malicious code without executing the file, the server in this specification can quickly detect malicious code that can be detected through static analysis first by using a static analysis engine.

예를 들어, 정적 분석 엔진은 비실행 파일의 고유한 서명이나 악성 패턴을 식별할 수 있다. 정적 분석은 실시간으로 작동할 수 있으므로, 필터 엔진이나 동적 분석 엔진 보다 먼저 수행될 수 있다. 즉, 정적 분석은 분석 대상 파일을 실행하지 않고 분석을 수행하는 방법으로 동적 분석을 시행했을 때보다 더 빠른 분석 결과를 얻을 수 있어 전체 분석 엔진 과정에서 먼저 수행될 수 있다.For example, a static analysis engine can identify unique signatures or malicious patterns in non-executable files. Since static analysis can operate in real time, it can be performed before a filter engine or dynamic analysis engine. In other words, static analysis can be performed earlier in the entire analysis engine process because it can obtain faster analysis results than when dynamic analysis is performed by performing analysis without executing the analysis target file.

필터 엔진은 필터룰로 대상 파일 내부에 필터항목의 존재 유무를 검사할 수 있다. 검사 수행결과 파일 내부에 필터항목을 탐지했을 경우, 서버는 동적 분석 엔진을 통해 추가분석을 진행할 수 있다.The filter engine can check whether there are filter items inside the target file using filter rules. If a filter item is detected inside the file as a result of the inspection, the server can perform additional analysis using the dynamic analysis engine.

동적 분석 엔진은 분석 대상 파일을 실행하고 그 실행 동안의 행위를 관찰하여 악성 코드를 탐지할 수 있다. 예를 들어, 동적 분석 엔진은 전술한 리버싱 엔진 및/또는 CDR 엔진을 포함할 수 있다.A dynamic analysis engine can detect malicious code by executing a target file and observing its behavior during execution. For example, the dynamic analysis engine can include the aforementioned reversing engine and/or CDR engine.

동적 분석 엔진은 분석 대상 파일을 실행하여 악성 행위가 나타나는지 모니터링하고, 악성 코드의 행동 패턴을 분석할 수 있다.A dynamic analysis engine can execute target files to monitor for malicious behavior and analyze the behavioral patterns of malicious code.

서버는 분석 대상 파일로서, 비실행 파일을 입력 받는다(S3010). 예를 들어, 서버는 비실행 파일의 문서 포맷을 판단하기 위해, 비실행 파일을 열고 바이너리 코드 상의 식별(Signature) 타입을 확인하여, 문서의 포맷이 무엇인지 확인할 수 있다.The server receives a non-executable file as an analysis target file (S3010). For example, in order to determine the document format of the non-executable file, the server can open the non-executable file and check the identification (Signature) type in the binary code to determine the format of the document.

예를 들어, 비실행 파일들은 각각 고유한 포맷을 가지고 있는데 포맷의 기본이 되는 내용이 파일 시그니처(File Signature)이다. 파일 시그니처는 파일의 가장 처음에 위치하는 특정 바이트들도 파일 포맷을 구분하기 위해 사용될 수 있다. For example, each non-executable file has its own unique format, and the basic content of the format is the file signature. The file signature can also be used to distinguish the file format by specific bytes located at the very beginning of the file.

예를 들어, PDF 파일은 “D0 CF 11 E0 A1 B1 1A E1″의 시그니처를 갖을 수 있고, HWP 파일은 “25 50 44 46”의 시그니처를 갖을 수 있다.For example, a PDF file may have a signature of “D0 CF 11 E0 A1 B1 1A E1” and an HWP file may have a signature of “25 50 44 46”.

서버는 비실행 파일의 포맷을 HWP 파일로 판단한 경우, 분석 대상 파일을 정적 분석 엔진으로 검사한다(S3020). If the server determines that the format of the non-executable file is an HWP file, it examines the file to be analyzed using a static analysis engine (S3020).

HWP 파일을 비롯한 CFB 계열의 파일에 대해 압축을 해제하면, 디렉토리와 파일 구조를 확인할 수 있다. 이 때 디렉토리 형태를 스토리지(storage), 파일 형태를 스트림(stream)이라고 한다. When you unzip a CFB series file, including an HWP file, you can check the directory and file structure. At this time, the directory type is called storage, and the file type is called stream.

다음의 표 1은 일반적인 HWP 파일의 구조를 예시한다.Table 1 below illustrates the structure of a typical HWP file.

예를 들어, 서버는 정적 분석 엔진을 통해, HWP 파일의 데이터 스트림에서 HWP 파일 구조를 식별하고, 파일 내용을 실시간으로 분석하여 악성 코드, 의심스러운 패턴 등을 탐지할 수 있다. 보다 자세하게, 서버는 HWP 파일의 내용을 직접 실행하지 않고 분석할 수 있다. 예를 들어, 서버는 HWP 파일의 코드, 매크로, 문서 내에 포함된 스크립트, 임베디드 오브젝트 등을 검사하여 알려진 악성 코드 시그니처, 의심스러운 패턴, 또는 취약점을 식별할 수 있다.서버는 정적 분석 엔진의 검사 결과 이상이 없는 경우, 분석 대상 파일을 필터 엔진으로 검사한다(S3030). 예를 들어, 필터 엔진은 기설정된 필터룰에 근거하여, 분석 대상 파일의 내부에 필터항목의 존재 유무를 검사할 수 있다.For example, the server can identify the HWP file structure from the data stream of the HWP file through the static analysis engine, and analyze the file contents in real time to detect malicious code, suspicious patterns, etc. In more detail, the server can analyze the contents of the HWP file without directly executing them. For example, the server can identify known malicious code signatures, suspicious patterns, or vulnerabilities by examining the code, macros, scripts included in documents, embedded objects, etc. of the HWP file. If there is no abnormality in the examination result of the static analysis engine, the server examines the analysis target file with the filter engine (S3030). For example, the filter engine can examine the presence or absence of a filter item in the analysis target file based on a preset filter rule.

비실행 파일의 문서 형식에는 사용자에게 편의를 제공하기 위해 구현된 여러 기능과 해당 기능을 수행하는 다양한 요소가 있다. 서버는 이 중에서 악성 코드가 주로 악용하는 기능들을 식별하여 그와 관련된 요소를 검사하는 필터링 규칙을 적용할 수 있다. 이러한 필터링 규칙을 기반으로 추가 분석을 수행할 파일을 분류하는 메커니즘을 적용하면, 문서 내부에 필터링된 항목이 존재하는 경우에만 추가 분석을 수행하고, 해당 사항이 없는 경우에는 빠르게 파일 분석을 완료할 수 있다.The document format of the non-executable file has various functions implemented to provide convenience to the user and various elements that perform the functions. The server can identify the functions that are mainly exploited by malicious code among these and apply filtering rules to examine the elements related to them. If a mechanism is applied to classify files for further analysis based on these filtering rules, further analysis can be performed only if there are filtered items in the document, and if there are no such items, file analysis can be completed quickly.

서버는 필터 엔진으로 분석 대상 파일의 필터 항목의 존재 여부를 검사한다(S3040). 서버는 필터 엔진을 통해, 분석 대상 파일에서 필터 항목을 탐지한 경우, 이를 동적 분석 엔진의 추가 검사를 수행할 수 있다.The server checks whether a filter item exists in the analysis target file using the filter engine (S3040). If the server detects a filter item in the analysis target file using the filter engine, it can perform an additional inspection using the dynamic analysis engine.

예를 들어, 서버의 필터룰을 이용한 HWP 파일의 필터 항목은 다음과 같다 :For example, the filter items of an HWP file using the server's filter rules are as follows:

1) 파일 크기1) File size

서버는 파일의 크기를 검사할 수 있다. 예를 들어, 악성 HWP 파일은 악성 파일 또는 스크립트를 삽입, 포장하여 기존 보안솔루션의 탐지를 우회할 수 있다. 따라서, 서버는 필터 엔진을 통해 검사한 HWP 파일의 크기가 기준값을 초과할 경우, 추가분석을 수행할 수 있다. 예를 들어, HWP 파일의 크기가 1mb 이상일 경우 서버는 추가분석을 수행할 수 있다.The server can check the size of the file. For example, a malicious HWP file can insert or package malicious files or scripts to bypass detection by existing security solutions. Therefore, the server can perform additional analysis if the size of the HWP file checked through the filter engine exceeds the standard value. For example, if the size of the HWP file is 1 MB or more, the server can perform additional analysis.

2) 파일 내부 삽입 개체2) Insert object inside file

서버는 HWP 파일의 내부 삽입 파일의 존재 유무를 검사할 수 있다. 삽입된 파일은 실행 파일이거나 문서파일과 같은 비실행 파일 일 수 있다. 예를 들어, 악성 HWP 파일의 경우 내부에 악성파일을 삽입하여 보안솔루션의 탐지를 우회할 수 있으며, HWP 열람 시 악성 파일이 실행되도록 유도할 수 있다. The server can check whether there is an embedded file inside the HWP file. The embedded file can be an executable file or a non-executable file such as a document file. For example, in the case of a malicious HWP file, a malicious file can be embedded inside to bypass detection by a security solution, and the malicious file can be induced to run when the HWP is opened.

이를 검사하기 위해, 예를 들어, 서버는 필터 엔진을 통해, “BinData” 스토리지를 식별할 수 있다. 이 스토리지는 이미지, 실행 파일 또는 다른 형태의 비실행 파일을 포함할 수 있다. 서버는 “BinData” 스토리지의 존재 유무를 확인하고, 당해 스토리지의 데이터를 추출하여, 내부 삽입 파일의 존재 유무를 검사할 수 있다.To check this, for example, the server can identify a “BinData” storage through the filter engine. This storage can contain images, executables, or other forms of non-executable files. The server can check for the existence of the “BinData” storage, extract the data from the storage, and check for the existence of embedded files.

또한, 서버는 OLE(Object Linking and Embedding) 기술을 사용하여 삽입된 개체를 포함하는 스트림을 식별할 수 있다. 예를 들어, 이러한 스트림은 '.OLE' 확장자를 가질 수 있다. 서버는 필터 엔진을 통해, OLE 확장자를 가진 스트림에서 OLE 개체를 추출하고 분석하여, 내부 삽입 파일의 존재 유무를 검사할 수 있다.Additionally, the server can identify streams that contain embedded objects using Object Linking and Embedding (OLE) technology. For example, such streams may have a '.OLE' extension. The server can extract and analyze OLE objects from streams with an OLE extension through a filter engine to check for the presence of embedded files.

3) 파일 내부 스크립트3) Script inside the file

서버는 HWP 파일에 내부 삽입된 스크립트 존재 유무 및 데이터를 검사할 수 있다. 예를 들어, HWP 문서 열람 시 편의성을 위해, 그래픽 이미지를 표현하기 위한 포스트 스크립트가 삽입될 수 있습니다. 악성 HWP의 경우, 포스트 스크립트를 삽입하여 악성행위가 수행될 수 있다. The server can check whether there is a script embedded in the HWP file and the data. For example, for convenience when viewing an HWP document, PostScript may be inserted to express a graphic image. In the case of malicious HWP, malicious actions may be performed by inserting PostScript.

이를 검사하기 위해, 예를 들어, 서버는 필터 엔진을 통해 “BinData” 스토리지를 식별할 수 있다. 또한, 서버는 필터 엔진을 통해, '.eps' 또는 '.ps' 확장자를 가진 스트림을 식별할 수 있다. 서버는 식별된 스트림에서 포스트 스크립트 코드를 분석할 수 있다.To check this, for example, the server can identify “BinData” storage through the filter engine. Also, the server can identify streams with the extension ‘.eps’ or ‘.ps’ through the filter engine. The server can analyze the PostScript code in the identified streams.

4) HWP 문서 내 취약점이 주로 악용된 개체 검사4) Inspection of objects where vulnerabilities in HWP documents are mainly exploited

서버는 필터 엔진을 통해, HWP 파일 내의 취약점이 자주 악용된 개체 존재 유무를 검사할 수 있다. 예를 들어, HWP 문서 내의 여러 개체 중 특히 취약점이 많이 노출된 개체들이 존재할 수 있다. 서버는 그 개체들을 목록화할 수 있고, 이러한 목록에 근거하여, 그 개체가 존재하는 경우, HWP 파일의 추가 분석을 수행할 수 있다. The server can check whether there are objects in the HWP file that are frequently exploited by vulnerabilities through the filter engine. For example, there may be objects that are particularly vulnerable among the various objects in the HWP document. The server can list those objects, and based on this list, if those objects exist, it can perform additional analysis of the HWP file.

예를 들어, 서버는 HWP 파일 형식에서 알려진 취약점이 자주 발생하는 개체의 목록을 생성하고, 당해 개체를 식별할 수 있다. 이러한 목록은 취약한 플래시 이미지(.swf 파일), 오래된 OLE 개체, 실행 가능한 스크립트, 매크로 등이 포함될 수 있다.For example, the server can generate a list of objects in the HWP file format that frequently have known vulnerabilities and identify such objects. This list may include vulnerable Flash images (.swf files), outdated OLE objects, executable scripts, macros, etc.

5) HWP 문서 구조 내 쉘코드 검사5) Checking shellcode within HWP document structure

서버는 필터 엔진을 통해, 취약점 발현 시 실행되는 명령어코드인 쉘코드의 존재 유무를 검사할 수 있다. 쉘코드는 연속된 바이너리 데이터 묶음으로 이루어져 있다. 서버는 자체 수집된 쉘코드 중 주로 악용되는 데이터를 필터항목으로 선정하여 전체 문서 데이터 중 일치할 경우 추가분석을 수행할 수 있다.The server can check the existence of shellcode, which is a command code that is executed when a vulnerability is triggered, through the filter engine. Shellcode consists of a continuous binary data bundle. The server can select the data that is mainly exploited among the shellcodes collected by itself as filter items and perform additional analysis if they match the entire document data.

예를 들어, 서버는 필터항목으로 설정된 쉘코드를 탐지할 수 있다. 이를 위해, 서버는 주로 악용되는 쉘코드를 필터 항목으로 선정할 수 있다. 보다 자세하게, 이러한 쉘코드는 보통 악의적인 목적으로 사용되는 특정한 패턴이나 바이너리 시그니처를 포함할 수 있다. 서버는 필터 엔진을 통해, HWP 문서 데이터 중에서 쉘코드와 일치하는 부분을 찾을 수 있다. 예를 들어, HWP 파일을 바이너리로 읽어들인 후, 쉘코드와 일치하는 부분을 검사하여 필터 항목에 해당하는 쉘코드가 있는지 확인할 수 있다.For example, the server can detect shellcode set as a filter item. To this end, the server can select shellcodes that are mainly exploited as filter items. More specifically, such shellcodes can include specific patterns or binary signatures that are usually used for malicious purposes. The server can find a part of HWP document data that matches the shellcode through the filter engine. For example, after reading an HWP file as a binary, the part matching the shellcode can be checked to see if there is a shellcode corresponding to the filter item.

예를 들어, 필터 엔진은 다음의 쉘코드를 필터항목으로 포함할 수 있다.For example, a filter engine might include the following shellcode as a filter item:

“33 C9 64 A1 30 00 00 00 8B 40 0C 8B 70 14 AD 96 AD 8B 58 10 8B 53 3C 03 D3 8B 52 78 03 D3 8B 72 20 03 F3 33 C9 41 AD 03 C3 81 38 47 65 74 50 75 F4 81 78 04 72 6F 63 41 75 EB 81 78 08 64 64 72 65 75 E2 8B 72 24 03 F3 66 8B 0C 4E 49 8B 72 1C 03 F3 8B 14 8E 03 D3 33 F6 8B F2 33 C9 51 68 61 72 79 41 68 4C 69 62 72 68 4C 6F 61 64 8B CC 51 53 FF D2 50 33 C9 B9 64 6C 6C 00 51 68 6C 33 32 2E 68 73 68 65 6C 8B CC 51 FF D0 50 33 C9 66 B9 6C 6C 51 68 72 74 2E 64 68 6D 73 76 63 8B CC 51 8B 54 24 20 FF D2 50 E8 8B 05 00 00 73 74 72 6C 65 6E 00 77 00 00 90 90 90 90 90 90 5C 32 5F 4D 57 41 4C 52 44 45”“33 C9 64 A1 30 00 00 00 8B 40 0C 8B 70 14 AD 96 AD 8B 58 10 8B 53 3C 03 D3 8B 52 78 03 D3 8B 72 20 03 F3 33 C9 41 AD 03 C3 81 38 47 65 74 50 75 F4 81 78 04 72 6F 63 41 75 EB 81 78 08 64 64 72 65 75 E2 8B 72 24 03 F3 66 8B 0C 4E 49 8B 72 1C 03 F3 8B 14 8E 03 D3 33 F6 8B F2 33 C9 51 68 61 72 79 41 68 4C 69 62 72 68 4C 6F 61 64 8B CC 51 53 FF D2 50 33 C9 B9 64 6C 6C 00 51 68 6C 33 32 2E 68 73 68 65 6C 8B CC 51 FF D0 50 33 C9 66 B9 6C 6C 51 68 72 74 2E 64 68 6D 73 76 63 8B CC 51 8B 54 24 20 FF D2 50 E8 8B 05 00 00 73 74 72 6C 65 6E 00 77 00 00 90 90 90 90 90 90 5C 32 5F 4D 57 41 4C 52 44 45”

서버는 분석 대상 파일은 HWP 파일을 바이너리로 읽어들인 후, 필터항목의 쉘코드가 있는 경우, 추가 분석을 수행할 수 있다.The server reads the target file for analysis as a binary HWP file, and if there is shellcode in the filter item, it can perform additional analysis.

6) HWP 문서 데이터 엔트로피 측정6) Measuring HWP document data entropy

HWP 파일은 보통 텍스트, 이미지, 그래픽 등 다양한 유형의 데이터를 포함하고 있으며, 이러한 다양한 데이터의 조합으로 인해 HWP 파일은 일정 수준의 엔트로피를 갖을 수 있다. 예를 들어, 엔트로피는 데이터의 무질서 정도를 나타내는 지표로, 데이터의 예측 가능성이 낮을수록 엔트로피가 높아질 수 있다.HWP files usually contain various types of data, such as text, images, and graphics, and due to the combination of these various data, HWP files can have a certain level of entropy. For example, entropy is an indicator of the degree of disorder in data, and the lower the predictability of data, the higher the entropy can be.

악성 HWP 파일은 보통 보안 솔루션을 회피하기 위해 악성 코드나 악성 스크립트를 난독화한다. 난독화된 코드는 일반적인 텍스트나 이미지와 달리 예측하기 어렵고, 그 결과 엔트로피가 높아질 수 있다.Malicious HWP files usually obfuscate malicious code or malicious scripts to evade security solutions. Obfuscated code is harder to predict than regular text or images, which can result in higher entropy.

보다 자세하게, HWP 파일은 평균 정보량이 존재한다. 예를 들어, 악성 HWP 파일인 경우, 악성 파일 혹은 스크립트가 존재할 수 있다. 이 때 존재하는 악성 개체들은 탐지를 회피하기 위해 난독화 되어 있어 평균 정보량이 정상 파일보다 월등히 높을 수 있다. 따라서, 서버는 비트 단위의 정보 엔트로피 공식을 적용하여 평균 정보량을 측정하고, 평균 정보량이 일정 수치 이상일 경우 추가 분석을 수행할 수 있다.In more detail, HWP files have an average amount of information. For example, if it is a malicious HWP file, there may be malicious files or scripts. In this case, the existing malicious entities are obfuscated to avoid detection, so the average amount of information may be much higher than that of normal files. Therefore, the server can measure the average amount of information by applying the information entropy formula in bit units, and perform additional analysis if the average amount of information is above a certain value.

다음의 표 2는 본 명세서가 적용될 수 있는 정보 엔트로피의 예시이다.Table 2 below is an example of information entropy to which this specification can be applied.

표 2를 참조하면, 정보량(information content)은 사건 k가 발생할 확률(p)에 반비례할 수 있다. 정보 엔트로피는 각 사건의 정보량에 확률을 곱한 값을 모두 합하여 계산될 수 있다. 예를 들어, 어떤 결과값의 발생 가능도가 작아질수록 그 정보량은 커지고, 더 자주 발생할수록 그 정보량은 작아질 수 있다. HWP 파일의 경우, 각 비트는 파일 내의 정보를 표현하므로, 파일의 정보량은 각 비트의 정보량의 합으로 나타낼 수 있다. 따라서, 서버는 HWP 파일의 비트 정보량을 계산하고, 이를 통해 엔트로피를 측정할 수 있다. 보다 자세하게, 서버는 HWP 파일을 이진 데이터로 변환한 후, 데이터 스트림을 생성하고, 이러한 스트림들을 분석하여, 각 비트가 0 또는 1일 확률을 추정하는 것을 통해, 정보량을 계산할 수 있다.예를 들어, 서버는 필터 엔진을 통해, HWP 파일의 엔트로피가 일정 수준(예를 들어, 7 이상)을 넘어가면 추가 분석을 수행하여 악성 코드나 악성 스크립트의 존재 여부를 확인할 수 있다.Referring to Table 2, the information content can be inversely proportional to the probability (p) of the occurrence of the event k. The information entropy can be calculated by adding up the values obtained by multiplying the information content of each event by the probability. For example, the information content can be increased as the probability of occurrence of a certain outcome decreases, and the information content can be decreased as the probability of occurrence increases. In the case of an HWP file, since each bit represents information in the file, the information content of the file can be expressed as the sum of the information content of each bit. Therefore, the server can calculate the bit information content of the HWP file and measure the entropy through this. In more detail, the server can calculate the information content by converting the HWP file into binary data, generating data streams, and analyzing these streams to estimate the probability that each bit is 0 or 1. For example, the server can perform additional analysis through the filter engine if the entropy of the HWP file exceeds a certain level (e.g., 7 or more) to check for the presence of malicious code or malicious script.

7) HWP 문서 내부 이미지 텍스트 추출 및 검사7) Extract and inspect image text within HWP documents

HWP 파일은 공공기관에서 널리 사용되며, 정부 문서로 위장한 악성 문서가 사용되기도 한다. 예를 들어, 이러한 악성 문서는 공문 관련 문자열 또는 특정 주제의 문자열을 포함할 수 있으며, 이를 통해 사용자를 속여 악성 코드 실행 또는 정보 유출을 시도할 수 있다. HWP files are widely used in public institutions, and malicious documents disguised as government documents are sometimes used. For example, these malicious documents may contain strings related to official documents or strings on specific topics, and through this, they can attempt to trick users into executing malicious code or leaking information.

보다 자세하게, 서버는 필터 엔진을 통해, HWP 문서에서 이미지를 추출하고, 추출된 이미지에 OCR 기술을 적용하여 이미지 속 문자를 식별하고 텍스트로 변환할 수 있다. 해당 텍스트가 공문에서 주로 사용되는 텍스트 패턴이나 특정 키워드(예를 들어, "북한 신년사 평가") 인 경우, 서버는 추가 분석을 수행할 수 있다.More specifically, the server can extract images from HWP documents through a filter engine, and apply OCR technology to the extracted images to identify characters in the images and convert them into text. If the text is a text pattern or a specific keyword (e.g., “North Korea’s New Year’s speech evaluation”) that is frequently used in official documents, the server can perform additional analysis.

8) HWP 문서 내부의 텍스트 내용 검사8) Inspect text content inside HWP documents

또한, 서버는 필터 엔진을 통해, HWP 파일 내부의 텍스트 데이터를 추출하고, 검사하여, 공문에서 주로 사용되는 특정 문자열이 존재하는지 확인할 수 있다. 예를 들어, "북한 신년사 평가"와 같은 문자열이 문서 내에 존재하는 경우, 서버는 추가 분석을 수행할 수 있다.In addition, the server can extract and inspect text data within the HWP file through the filter engine to check whether specific strings that are frequently used in official documents exist. For example, if a string such as "North Korea New Year's speech evaluation" exists within the document, the server can perform additional analysis.

서버는 필터 항목을 탐지한 경우, 분석 대상 파일을 동적 분석 엔진으로 검사한다(S3050). 서버는 동적분석엔진으로 추가분석을 수행 후 파일 검사를 종료할 수 있다. 동적 분석은 대상 파일을 실행하여 분석하는 방법으로 정적 분석을 시행했을 때보다 상대적으로 분석시간이 더 소요되지만 보다 정확한 결과를 얻을 수 있다.If the server detects a filter item, it examines the target file for analysis with the dynamic analysis engine (S3050). The server can terminate the file examination after performing additional analysis with the dynamic analysis engine. Dynamic analysis is a method of analyzing the target file by executing it, and it takes relatively longer to analyze than static analysis, but it can obtain more accurate results.

기존의 검사 방식은 정적 분석 엔진을 통해 악성 파일을 탐지한 후, 모든 파일에 대해 동적 분석을 수행한다. 이는 모든 파일에 대해 동적 분석을 수행하므로 시간이 많이 소요된다. 그러나 악성 행위는 특정 개체를 통해 주로 발생하며, 모든 파일이 동적 분석을 필요로 하는 것은 비효율적이다. The existing inspection method detects malicious files through a static analysis engine, and then performs dynamic analysis on all files. This is time-consuming because dynamic analysis is performed on all files. However, malicious behaviors mainly occur through specific objects, and it is inefficient for all files to require dynamic analysis.

따라서, 본 명세서에서 서버는 필터 엔진을 적용하여 악성 개체가 없는 파일을 동적 분석 대상에서 제외할 수 있다. 이를 통해, 검사 시간이 단축되어 솔루션의 탐지 시간을 줄일 수 있다. 이는 보안 솔루션의 성능을 향상시키고, 가용성을 향상시켜 주요 업무 애플리케이션의 실시간 소통에 영향을 미칠 수 있다. 따라서, 본 명세서에서 필터 엔진을 적용한 보안 솔루션은 업무의 연속성과 보안성을 동시에 향상시킬 수 있다.Therefore, in this specification, the server can exclude files without malicious objects from the dynamic analysis target by applying the filter engine. This shortens the inspection time, thereby reducing the detection time of the solution. This can improve the performance of the security solution, and improve availability, which can affect the real-time communication of the main business application. Therefore, the security solution applying the filter engine in this specification can improve the continuity and security of the business at the same time.

전술한 본 명세서는, 프로그램이 기록된 매체에 컴퓨터가 읽을 수 있는 코드로서 구현하는 것이 가능하다. 컴퓨터가 읽을 수 있는 매체는, 컴퓨터 시스템에 의하여 읽혀 질 수 있는 데이터가 저장되는 모든 종류의 기록장치를 포함한다. 컴퓨터가 읽을 수 있는 매체의 예로는, HDD(Hard Disk Drive), SSD(Solid State Disk), SDD(Silicon Disk Drive), ROM, RAM, CD-ROM, 자기 테이프, 플로피 디스크, 광 데이터 저장 장치 등이 있으며, 또한 캐리어 웨이브(예를 들어, 인터넷을 통한 전송)의 형태로 구현되는 것도 포함한다. 따라서, 상기의 상세한 설명은 모든 면에서 제한적으로 해석되어서는 아니되고 예시적인 것으로 고려되어야 한다. 본 명세서의 범위는 첨부된 청구항의 합리적 해석에 의해 결정되어야 하고, 본 명세서의 등가적 범위 내에서의 모든 변경은 본 명세서의 범위에 포함된다.The above-described specification can be implemented as a computer-readable code on a medium in which a program is recorded. The computer-readable medium includes all kinds of recording devices that store data that can be read by a computer system. Examples of the computer-readable medium include a hard disk drive (HDD), a solid state disk (SSD), a silicon disk drive (SDD), a ROM, a RAM, a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like, and also includes those implemented in the form of a carrier wave (e.g., transmission via the Internet). Therefore, the above detailed description should not be construed as limiting in all aspects, but should be considered as illustrative. The scope of the present specification should be determined by a reasonable interpretation of the appended claims, and all changes within the equivalency range of the present specification are included in the scope of the present specification.

또한, 이상에서 서비스 및 실시 예들을 중심으로 설명하였으나 이는 단지 예시일 뿐 본 명세서를 한정하는 것이 아니며, 본 명세서가 속하는 분야의 통상의 지식을 가진 자라면 본 서비스 및 실시 예의 본질적인 특성을 벗어나지 않는 범위에서 이상에 예시되지 않은 여러 가지의 변형과 응용이 가능함을 알 수 있을 것이다. 예를 들어, 실시 예들에 구체적으로 나타난 각 구성 요소는 변형하여 실시할 수 있는 것이다. 그리고 이러한 변형과 응용에 관계된 차이점들은 첨부한 청구 범위에서 규정하는 본 명세서의 범위에 포함되는 것으로 해석되어야 할 것이다.In addition, although the above has been described with a focus on services and embodiments, these are merely examples and do not limit the present specification, and those with ordinary knowledge in the field to which this specification pertains will recognize that various modifications and applications not exemplified above are possible without departing from the essential characteristics of the present service and embodiments. For example, each component specifically shown in the embodiments can be modified and implemented. In addition, differences related to such modifications and applications should be interpreted as being included in the scope of the present specification defined in the appended claims.

Claims

In a method for pre-filtering by examining the internal information of a document that has been imported into the server,
Step of receiving the file to be analyzed;
A step of examining the above analysis target file using a static analysis engine;
A step of examining the analysis target file with a filter engine based on the inspection result of the static analysis engine being normal;
A step of checking whether a filter item exists in the analysis target file through the filter engine; and
A step of examining the analysis target file with a dynamic analysis engine based on the existence of the above filter item;
Including,
The above analysis target file contains the HWP file format,
The step of checking whether the above filter items exist is
A step of determining that the filter item exists based on the presence of an object related to the vulnerability within the HWP file, based on a list of objects related to the preset vulnerability;
A step of reading the above analysis target file as binary;
A step of determining that the filter item exists based on the existence of shellcode related to the vulnerability on the binary; and
A step of determining that the filter item exists based on the information entropy of the above analysis target file exceeding a reference value;
A pre-filtering method including:

In the first paragraph,
The step of checking whether the above filter items exist is
A step of determining that the filter item exists based on the existence of a reference value for the size of the HWP file;
A pre-filtering method including:

In the first paragraph,
The step of checking whether the above filter items exist is
A step of determining that the filter item exists based on the presence of a file inserted inside the HWP file;
A pre-filtering method including:

In the third paragraph,
The step of determining that the filter item exists based on the presence of a file inserted inside the above HWP file is
Step for identifying BinData storage; and
Step 1: Identifying a stream with an .OLE extension;
A pre-filtering method including:

In paragraph 4,
The step of checking whether the above filter items exist is
A step of determining that the filter item exists based on the presence of a script inserted inside the HWP file;
A pre-filtering method including:

In paragraph 5,
The step of determining that the filter item exists based on the presence of a script inserted inside the HWP file above is
Step for identifying BinData storage; and
Step of identifying a stream with an .eps or .ps extension;
A pre-filtering method including:

delete

In the first paragraph,
The list of entities associated with the above vulnerabilities is:
A pre-filtering method comprising a flash image file having a .swf extension.

delete

In the first paragraph,
The step of checking whether the above filter items exist is
A step of extracting an image of the above analysis target file;
A step of extracting text from the image through OCR; and
A step of determining that the filter item exists based on the above text being related to a public document;
A pre-filtering method further comprising:

For servers that inspect and pre-filter internal information of incoming documents,
Department of Communications;
Memory containing a static analysis engine, a filter engine, and a dynamic analysis engine; and
A processor functionally controlling the communication unit and the memory;
The above processor
A file to be analyzed is input, the file to be analyzed is examined by the static analysis engine, and based on the result of the static analysis engine being normal, the file to be analyzed is examined by the filter engine, and through the filter engine, the existence of a filter item in the file to be analyzed is examined, and based on the existence of the filter item, the file to be analyzed is examined by the dynamic analysis engine.
The above analysis target file includes the format of an HWP file, and to check whether the filter item exists, based on a list of objects related to preset vulnerabilities, based on the existence of an object related to the vulnerability within the HWP file, it is determined that the filter item exists, and the analysis target file is read in as a binary,
Based on the existence of shellcode related to the vulnerability on the above binary, it is determined that the above filter item exists.
A server that determines that the filter item exists based on the information entropy of the above analysis target file exceeding the reference value.