KR101521862B1

KR101521862B1 - System and method for classifying patent document

Info

Publication number: KR101521862B1
Application number: KR1020140180252A
Authority: KR
Inventors: 송인석; 고병열; 윤혜성
Original assignee: 한국과학기술정보연구원
Priority date: 2014-12-15
Filing date: 2014-12-15
Publication date: 2015-05-21
Anticipated expiration: 2034-12-15
Also published as: WO2016099019A1

Abstract

The present invention relates to a system for classifying a patent document and a method thereof which comprise: a concept element extract portion for extracting a concept element by analyzing each patent document, and assigning a function attribute to each concept element; a concept structure generation portion for calculating a similarity value of the concept element by grouping the concept element of each patent document by a function attribute, and generating a concept structure including the calculated similarity value of the concept element for each patent document; and a classification portion for obtaining the similarity value of the concept structure between patent documents and classifying the patent document based on the similarity value of the obtained concept structure.

Description

[0001] SYSTEM AND METHOD FOR CLASSIFYING PATENT DOCUMENT [0002]

본 발명은 특허문서 분류 시스템 및 방법에 관한 것으로, 보다 상세하게는 특허문서를 분석하여 개념구조를 추출하고, 기능속성 분석 및 유사도 측정을 통해 파악되는 특허문서간의 상호 의미적 관계에 따라 특허문서를 분류하는 특허문서 분류 시스템 및 방법에 관한 것이다.
The present invention relates to a system and method for classifying a patent document, and more particularly, to a system and method for classifying a patent document according to mutual semantic relations between patent documents, And more particularly, to a patent document classification system for classifying patent documents.

한미 FTA(Free Trade Agreement) 체결로 특허, 상표, 저작권 등으로 구성되는 지적재산권에 대한 권리의 기한연장에 따라 특허 보호가 강화됨으로써 국가산업 경쟁력과 직결되는 특허정보에 대한 관심이 어느 때보다 요구되고 있다.With the conclusion of the Korea-US Free Trade Agreement (FTA), patent protection is strengthened by extending the right to intellectual property rights consisting of patents, trademarks, and copyrights, so that interest in patent information directly connected to national industrial competitiveness is demanded more than ever have.

특허정보란 산업재산권과 관련된 정보로서 특허 출원된 기술 내용 및 권리로 주장된 사항, 출원인 및 발명자 등의 인적사항, 기타 서지사항 등에 대한 정보를 의미한다. 산업이 고도화, 복잡화, 다양화됨에 따라 엄청난 특허기술 정보량이 쏟아지고 있는데 기업이 변화하고 있는 산업 사회에서 생존하기 위해서 이러한 정보를 적절하게 기업경영전략에 반영하지 않으면 안 된다.Patent information refers to information related to industrial property rights, including information on patented technology contents and rights, personal information such as applicants and inventors, and other bibliographic information. As the industry becomes more sophisticated, complex, and diversified, a huge amount of patent information is pouring in. In order to survive in a changing industrial society, this information must be properly reflected in corporate management strategies.

오늘날 특허문서는 발명자의 지적 재산권을 공식적으로 부여할 뿐만 아니라, 글로벌 무한 기술경쟁 환경에서 특허DB는 기업의 연구개발 기획과 국가의 과학기술 정책수립을 뒷받침하는, 즉 의사결정 지원을 위한 조사와 분석의 필수 정보자원 중 하나로서 매우 중요한 위치를 차지하고 있다. 아울러 다른 분야와 마찬가지로 특허문서도 중국 등 신흥국가를 포함 등 전 세계적으로 그 규모가 지속적으로 증가하고 있고, 새롭게 부각되고 있는 빅 데이터 분석을 통해 얻고자 하는 정보수준에 대한 요구도 높아지고 있다. 따라서 신속하고 정확한 수요 정보의 접근과 보다 심층적인 분석환경의 확보는 특히 중요한 과제라 할 수 있다. Today, patent documents not only formally grant the intellectual property rights of the inventors, but also in the global infinite technology competition environment, the patent DB supports research and development planning of enterprises and establishment of national science and technology policy, Is one of the essential information resources of the world. In addition, as in other fields, the size of patent documents is continuously increasing worldwide, including emerging countries such as China, and there is a growing demand for information levels to be gained through the emergence of new big data analysis. Therefore, accessing demand information quickly and accurately and securing a more in-depth analysis environment is a particularly important task.

일반적으로 특허DB의 조사와 분석은 주제어 선정, 검색식 구성, 분류코드 활용을 통해 문서를 조회 한 후, 초록, 도면, 청구항 등 세부 내용을 검토하여 분석 대상을 선정하여 이루어진다. 해당 단계에 경험이 축적되고 숙달되면 노하우를 기반으로 일정 수준의 품질을 확보할 수 있으나 개별 특정 주제에 대한 조사 분석일 때 주로 실효성이 있고, 각 단계에서 다음 단계로 넘어가는 과정에서 불가피하게 발생하는 부적합한 결과, 즉 노이즈 제거 작업에는 여전히 상당 부분의 전문가의 지적 노력과 시간 비용이 소요되고 있어, 기술적으로 보완과 개선이 필요하다. In general, the research and analysis of the patent database is conducted by selecting the subject of analysis after reviewing the documents through the selection of the subject word, the composition of the search formula, and the use of the classification code, and then examining the details such as abstract, drawings and claims. Once experience is accumulated and mastery is achieved, it is possible to obtain a certain level of quality based on know-how. However, it is mainly effective when analyzing individual subjects, and it is inevitable Inadequate results, that is, noise removal work, still require a significant amount of intellectual effort and time cost by experts, and technological supplementation and improvement are necessary.

또한, 분류 건수가 상당하고, 정확한 분류를 위해서는 청구항을 비롯한 전체적인 상세설명에 대한 이해가 필요하기 때문에 분류자에게도 적지않은 스트레스를 주고 있다.In addition, since the number of classification is significant, and it is necessary to understand the detailed description including the claim for accurate classification, it also gives a considerable stress to the classifier.

이에 따라, 특허 문서를 자동으로 정확히 분류할 수 있는 방법이 요구되고 있는 실정이다.
Accordingly, there is a need for a method for automatically and accurately classifying patent documents.

한국등록특허 제1,179,613호, 발명의 명칭 '빈발항목과 연관규칙을 이용한 특허문서 자동분류 방법'Korean Patent No. 1,179,613, entitled " Automatic Classification of Patent Documents Using Frequent Items and Association Rules "

본 발명의 목적은 특허문서를 분석하여 개념구조를 추출하고, 기능속성 분석 및 유사도 측정을 통해 파악되는 특허문서간의 상호 의미적 관계에 따라 특허문서를 분류하는 특허문서 분류 시스템 및 방법을 제공함에 있다.
An object of the present invention is to provide a patent document classification system and method for extracting a conceptual structure by analyzing a patent document and classifying the patent document according to mutual semantic relations between patent documents that are obtained through functional attribute analysis and similarity measurement .

상기 목적들을 달성하기 위하여 본 발명의 일 측면에 따르면, 각 특허문서를 분석하여 개념요소들을 추출하고, 각 개념요소에 기능속성을 부여하는 개념요소 추출부, 각 특허문서의 개념요소들을 기능속성별로 군집화하여 개념요소 유사도 값을 산출하고, 상기 산출된 개념요소 유사도 값을 포함하는 개념구조를 특허문서별로 생성하는 개념구조 생성부, 특허문서간 개념구조 유사도 값을 구하고, 상기 구해진 개념구조 유사도 값을 근거로 특허문서를 분류하는 분류부를 포함하는 특허문서 분류 시스템이 제공된다. According to an aspect of the present invention, there is provided an information processing system including a concept element extracting unit for analyzing each patent document to extract conceptual elements and assigning a function attribute to each conceptual element, A conceptual structure generating unit for generating a conceptual element similarity value by clustering and generating a conceptual structure including the calculated conceptual element similarity value for each patent document, a conceptual structure similarity value between patent documents is obtained, There is provided a patent document classification system including a classification section for classifying a patent document based on the classification result.

상기 특허문서 분류 시스템은 제품, 기술에 대한 개체명이 저장된 개체명 사전 데이터베이스, 개체들의 기능속성분류를 위한 규칙이 설정된 기능속성분류 데이터베이스를 더 포함할 수 있다. The patent document classification system may further include a database of entity names in which the entity names of products and technologies are stored, and a functional property classification database in which rules for classifying functional attributes of entities are set.

상기 개념요소 추출부는, 개체명 사전 데이터베이스를 참조하여 특허문서의 기 정의된 영역에서 후보문장을 식별하는 후보문장 식별모듈, 상기 식별된 후보 문장에서 의존문법기반의 구문분석을 통해 개념요소를 나타내는 문자열을 추출하는 문자열 추출모듈, 기능분류 데이터베이스를 참조하여 상기 추출된 문자열에 기능속성을 부여하는 기능속성 부여모듈을 포함하되, 상기 추출된 문자열은 개념요소일 수 있다. The conceptual element extracting unit includes a candidate sentence identification module for identifying a candidate sentence in a predefined area of the patent document with reference to the object name dictionary database, a character string representing a concept element through syntax analysis based on dependency grammar in the identified candidate sentence, And a function attribute assigning module for assigning a function attribute to the extracted character string by referring to the functional classification database, wherein the extracted character string may be a conceptual element.

상기 개념구조 생성부는, 각 특허문서단위로 동일한 기능속성이 부여된 개념요소들을 군집하는 개념요소 군집모듈, 기 정의된 유사도 계산 분석 모델을 이용하여 각 기능속성별로 개념요소 개체들간의 유사도 값을 산출하는 유사도 산출모듈, 기능속성, 개념요소들, 개념요소들의 유사도 값을 포함하는 개념구조를 특허문서단위로 생성하는 개념구조 생성모듈을 포함할 수 있다. The conceptual structure generation unit may calculate a similarity value between conceptual element entities for each functional attribute by using a conceptual element clustering module for clustering conceptual elements to which the same functional attributes are assigned in units of patent documents, And a conceptual structure generation module for generating a conceptual structure including similarity calculation module, functional attribute, conceptual elements, and similarity values of conceptual elements in units of patent documents.

상기 분류부는, 특허문서간 개념구조를 구성하는 개념요소들의 유사도 값을 구하는 개념요소 유사도 산출모듈, 특허문서간 기능속성 유사도 값을 구하는 기능속성 유사도 산출모듈, 상기 구해진 개념요소들의 유사도 값 또는 기능속성 유사도 값을 이용하여 특허문서간 개념구조 유사도 값을 구하는 개념구조 유사도 산출모듈, 상기 특허문서간 개념구조 유사도 값을 근거로 특허문서를 분류하는 분류모듈을 포함할 수 있다. The classification unit includes a concept element similarity degree calculating module that obtains the similarity value of the conceptual elements constituting the conceptual structure between the patent documents, a functional attribute similarity degree calculating module that obtains the functional attribute similarity value between the patent documents, A conceptual structure similarity calculating module for obtaining the conceptual structure similarity value between patent documents using the similarity value, and a classification module for classifying the patent document based on the conceptual structure similarity value between the patent documents.

본 발명의 다른 측면에 따르면, (a) 각 특허문서를 분석하여 개념요소들을 추출하고, 각 개념요소에 기능속성을 부여하는 단계, (b) 각 특허문서의 개념요소들을 기능속성별로 군집화하여 개념요소 유사도 값을 산출하고, 상기 산출된 개념요소 유사도 값을 포함하는 개념구조를 특허문서별로 생성하는 단계, (c) 특허문서간 개념구조 유사도 값을 구하고, 상기 구해진 개념구조 유사도 값을 근거로 특허문서를 분류하는 단계를 포함하는 특허문서 분류 시스템의 특허문서 분류 방법이 제공된다. According to another aspect of the present invention, there is provided a method for analyzing a patent document, comprising the steps of: (a) analyzing each patent document to extract conceptual elements and assigning functional attributes to the conceptual elements; (b) (C) obtaining a concept structure similarity value between patent documents, and calculating a similarity value between the patent documents based on the obtained concept structure similarity value, There is provided a method of classifying a patent document in a patent document classifying system including classifying documents.

상기 (a)단계는, 개체명 사전 데이터베이스를 참조하여 특허문서의 기 정의된 영역에서 후보문장을 식별하는 단계, 상기 식별된 후보 문장에서 의존문법기반의 구문분석을 통해 개념요소를 나타내는 문자열을 추출하는 단계, 기능분류 데이터베이스를 참조하여 상기 추출된 문자열에 기능속성을 부여하는 단계를 포함하되, 상기 추출된 문자열이 개념요소일 수 있다. The step (a) includes the steps of: identifying a candidate sentence in a predefined area of a patent document with reference to an entity name dictionary database; extracting a character string representing a conceptual element through parsing based on dependency grammar in the identified candidate sentence; And providing a functional attribute to the extracted character string by referring to the functional classification database, wherein the extracted character string may be a conceptual element.

상기 (b)단계는, 각 특허문서단위로 동일한 기능속성이 부여된 개념요소들을 군집하는 단계, 기 정의된 유사도 계산 분석 모델을 이용하여 각 기능속성별로 개념요소 개체들간의 유사도 값을 산출하는 단계, 기능속성, 개념요소들, 개념요소들의 유사도 값을 포함하는 개념구조를 특허문서단위로 생성하는 단계를 포함할 수 있다. The step (b) includes the steps of: grouping conceptual elements having the same function attributes in units of patent documents, calculating similarity values between conceptual elements in each functional attribute using a predefined similarity calculation model; , A function attribute, conceptual elements, and a similarity value of conceptual elements, in units of patent documents.

상기 (c)단계는, 특허문서간 개념구조를 구성하는 개념요소들의 유사도 값을 구하는 단계, 특허문서간 기능속성 유사도 값을 구하는 단계, 상기 구해진 개념요소들의 유사도 값 또는 기능속성 유사도 값을 이용하여 특허문서간 개념구조 유사도 값을 구하는 단계, 상기 특허문서간 개념구조 유사도 값을 근거로 특허문서를 분류하는 단계를 포함할 수 있다. The step (c) includes the steps of: obtaining a similarity value of conceptual elements constituting the conceptual structure between patent documents; obtaining a functional property similarity value between patent documents; and using the similarity value or the functional property similarity value of the obtained conceptual elements Obtaining a concept similarity value between the patent documents, and classifying the patent document based on the concept structure similarity value between the patent documents.

본 발명의 또 다른 측면에 따르면, 특허문서 분류 시스템에 의해 실행될 때, (a) 각 특허문서를 분석하여 개념요소들을 추출하고, 각 개념요소에 기능속성을 부여하는 단계, (b) 각 특허문서의 개념요소들을 기능속성별로 군집화하여 개념요소 유사도 값을 산출하고, 상기 산출된 개념요소 유사도 값을 포함하는 개념구조를 특허문서별로 생성하는 단계, (c) 특허문서간 개념구조 유사도 값을 구하고, 상기 구해진 개념구조 유사도 값을 근거로 특허문서를 분류하는 단계를 포함하는 특허문서 분류 방법을 실행하기 위한 프로그램이 수록된 컴퓨터로 판독 가능한 기록 매체가 제공된다.
According to still another aspect of the present invention, there is provided a system for classifying patent documents, comprising: (a) analyzing each patent document to extract conceptual elements and assigning functional attributes to the respective conceptual elements; (b) (C) generating a conceptual structure similarity value between the patent documents; (c) comparing the conceptual element similarity value between the patent documents, There is provided a computer-readable recording medium containing a program for executing a patent document classification method including classifying a patent document based on the obtained concept structure similarity value.

본 발명에 따르면, 특허문서에 대하여 자연언어처리 기반의 텍스트 마이닝을 통해 개념구조를 식별하고, 개념요소의 기능적 속성과 유사도 분석을 통해, 특허 개념구조 간의 연관관계를 분석하여 특허를 분류함으로써, 주제어(키워드) 중심의 1차원적인 접근에서 불가피하게 발생하는 노이즈를 최소화할 수 있고, 지적 노력 및 시간 비용을 절감하고 주제적 관점에서 심층적인 분석을 효율적으로 수행할 수 있다.
According to the present invention, a conceptual structure is identified through text mining based on a natural language processing on a patent document, and a patent is classified by analyzing a relation between patent concept structures through functional property and similarity analysis of conceptual elements. It is possible to minimize noise inevitably generated from a one-dimensional approach centered on the keyword (keyword), to reduce intellectual effort and time cost, and to perform in-depth analysis efficiently from a subject point of view.

도 1은 본 발명의 실시예에 따른 특허문서 분류를 위한 시스템을 나타낸 도면.
도 2는 본 발명의 실시예에 따른 특허문서 분류 시스템의 구성을 개략적으로 나타낸 블럭도.
도 3은 도 2에 도시된 개념요소 추출부의 구성을 나타낸 도면.
도 4는 도 2에 도시된 개념구조 생성부의 구성을 나타낸 도면.
도 5는 도 2에 도시된 분류부의 구성을 나타낸 도면.
도 6은 본 발명의 실시예에 따른 특허문서 분류 방법을 나타낸 흐름도.
도 7은 본 발명의 실시예에 따른 특허문서로부터 개념요소 추출 및 기능속성을 부여하는 방법을 나타낸 흐름도.
도 8은 본 발명의 실시예에 따른 개념구조를 생성하는 방법을 나타낸 흐름도.1 illustrates a system for classifying patent documents according to an embodiment of the present invention.
FIG. 2 is a block diagram schematically showing a configuration of a patent document classification system according to an embodiment of the present invention. FIG.
3 is a diagram showing a configuration of a conceptual element extracting unit shown in FIG.
4 is a diagram showing the configuration of the conceptual structure generation unit shown in FIG.
5 is a view showing a configuration of the classification unit shown in Fig.
6 is a flowchart illustrating a method of classifying a patent document according to an embodiment of the present invention.
FIG. 7 is a flowchart illustrating a method for extracting conceptual elements and functional attributes from a patent document according to an embodiment of the present invention; FIG.
8 is a flow diagram illustrating a method for generating a conceptual structure according to an embodiment of the present invention.

본 발명의 전술한 목적과 기술적 구성 및 그에 따른 작용 효과에 관한 자세한 사항은 본 발명의 명세서에 첨부된 도면에 의거한 이하 상세한 설명에 의해 보다 명확하게 이해될 것이다.The foregoing and other objects, features, and advantages of the present invention will become more apparent from the following detailed description taken in conjunction with the accompanying drawings, in which: FIG.

이하, 첨부된 도면들을 참조하여 본 발명에 따른 '특허문서 분류 시스템 및 방법'을 상세하게 설명한다. 설명하는 실시 예들은 본 발명의 기술 사상을 당업자가 용이하게 이해할 수 있도록 제공되는 것으로 이에 의해 본 발명이 한정되지 않는다. 또한, 첨부된 도면에 표현된 사항들은 본 발명의 실시 예들을 쉽게 설명하기 위해 도식화된 도면으로 실제로 구현되는 형태와 상이할 수 있다.Hereinafter, a patent document classification system and method according to the present invention will be described in detail with reference to the accompanying drawings. The embodiments are provided so that those skilled in the art can easily understand the technical spirit of the present invention, and thus the present invention is not limited thereto. In addition, the matters described in the attached drawings may be different from those actually implemented by the schematic drawings to easily describe the embodiments of the present invention.

한편, 이하에서 표현되는 각 구성부는 본 발명을 구현하기 위한 예일 뿐이다. 따라서, 본 발명의 다른 구현에서는 본 발명의 사상 및 범위를 벗어나지 않는 범위에서 다른 구성부가 사용될 수 있다. 또한, 각 구성부는 순전히 하드웨어 또는 소프트웨어의 구성만으로 구현될 수도 있지만, 동일 기능을 수행하는 다양한 하드웨어 및 소프트웨어 구성들의 조합으로 구현될 수도 있다. 또한, 하나의 하드웨어 또는 소프트웨어에 의해 둘 이상의 구성부들이 함께 구현될 수도 있다. In the meantime, each constituent unit described below is only an example for implementing the present invention. Thus, in other implementations of the present invention, other components may be used without departing from the spirit and scope of the present invention. In addition, each component may be implemented solely by hardware or software configuration, but may be implemented by a combination of various hardware and software configurations performing the same function. Also, two or more components may be implemented together by one hardware or software.

또한, 어떤 구성요소들을 '포함'한다는 표현은, '개방형'의 표현으로서 해당 구성요소들이 존재하는 것을 단순히 지칭할 뿐이며, 추가적인 구성요소들을 배제하는 것으로 이해되어서는 안 된다.
Also, the expression " comprising " is intended to merely denote that such elements are present as an expression of " open ", and should not be understood to exclude additional elements.

도 1은 본 발명의 실시예에 따른 특허문서 분류를 위한 시스템을 나타낸 도면이다. 1 illustrates a system for classifying patent documents according to an embodiment of the present invention.

도 1을 참조하면, 특허문서 분류를 위한 시스템은 특허 제공 서버(100)와 특허문서 분류 시스템(200)이 통신망을 통해 연결되어 있다. Referring to FIG. 1, a patent document server 100 and a patent document classification system 200 are connected to each other through a communication network.

특허 제공 서버(100)는 특허문서 등이 저장된 특허 데이터베이스를 포함하여, 특허문서를 특허문서 분류 시스템(200)에 제공한다. The patent providing server 100 provides a patent document to the patent document classification system 200, including a patent database storing patent documents and the like.

특허문서 분류 시스템(200)은 특허 제공 서버(100)로부터 특허문서를 수집하고, 수집된 특허문서를 분석하여 특허문서를 분류한다. The patent document classification system 200 collects patent documents from the patent providing server 100 and analyzes the collected patent documents to classify the patent documents.

즉, 특허문서 분류 시스템(200)은 특허 제공 서버(100)로부터 수집된 각 특허문서를 분석하여 개념요소들을 추출하고, 각 개념요소에 기능속성을 부여한 후, 각 특허문서의 기능속성별 개념요소 유사도 값을 산출하고, 산출된 개념요소 유사도 값을 포함하는 개념구조를 특허문서별로 생성한다. 그런 후, 특허문서 분류 시스템(200)은 특허문서간 개념구조 유사도 값을 구하고, 그 개념구조 유사도 값을 근거로 특허문서를 분류한다. 여기서, 개념구조는 특허문서로서 성립하기 위한 최소 개체의 집합으로서, 최소 개체는 해결하고자 하는 문제에 대한 발명자 고유의 해결방법, 즉 기술, 그 기술이 적용되는 대상 개체, 즉 제품을 포함한다. 기술 및 제품의 기능속성은 특허분석 방법론 중 하나인 TRIZ(a problem-solving, analysis and forecasting tool drived from patterns of invention in the global literatur) 방법론(Altshuler, 1946) 기반하고 있는 function-oriented search model(Litvin, 2004))에 기초하여 작용(action)과 대상(object)의 속성의 조합으로 정의한다. That is, the patent document classification system 200 analyzes each patent document collected from the patent providing server 100 to extract conceptual elements, assigns functional attributes to the respective conceptual elements, A similarity value is calculated, and a conceptual structure including the calculated concept element similarity value is generated for each patent document. Then, the patent document classification system 200 obtains the concept structure similarity value between the patent documents, and classifies the patent document based on the concept structure similarity value. Here, the conceptual structure is a set of minimum entities to be established as a patent document, and the minimum entity includes an inventor's own solution to the problem to be solved, that is, a technology, a target object to which the technology is applied, that is, a product. The functional attributes of technology and product are function-oriented search model (Litvin) based on Altshuler (1946) methodology of TRIZ (a problem-solving, analysis and forecasting tool) , 2004)), and defines the combination of the attributes of the action and the object.

이러한 특허문서 분류 시스템(200)은 다양한 통신 규격을 통해 다른 전자 장치들과 통신할 수 있고, 다양한 데이터 처리 연산을 수행할 수 있는 전자 장치를 통해 구현될 수 있다. 예를 들어, 특허문서 분류 시스템(200)은 서버(Server) 장치의 형태로 구현될 수 있으며, 이러한 서버 장치의 형태 이외에도 다양한 전자 장치의 형태로 구현될 수 있다. 또한, 특허문서 분류 시스템(200)은 단일의 전자 장치의 형태로 구현되거나, 둘 이상의 전자 장치가 결합 된 형태로 구현될 수 있다.The patent document classification system 200 can be implemented through an electronic device capable of communicating with other electronic devices through various communication standards and performing various data processing operations. For example, the patent document classification system 200 may be implemented in the form of a server device, and may be implemented in various electronic devices other than the form of the server device. Also, the patent document classification system 200 may be implemented in the form of a single electronic device, or in a form in which two or more electronic devices are combined.

이러한 특허문서 분류 시스템(200)에 대한 상세한 설명은 도 2를 참조하기로 한다. The patent document classification system 200 will be described in detail with reference to FIG.

여기에서는 특허문서를 제공하는 특허 제공 서버(100)가 특허문서 분류 시스템(200) 외부에 존재하는 것으로 설명하였으나, 특허문서 분류 시스템(200)이 특허문서들이 저장된 데이터베이스를 내부에 구비할 수도 있다.
Although the patent providing server 100 providing a patent document is described as being outside the patent document classifying system 200, the patent document classifying system 200 may have a database in which patent documents are stored.

도 2는 본 발명의 실시예에 따른 특허문서 분류 시스템의 구성을 개략적으로 나타낸 블럭도, 도 3은 도 2에 도시된 개념요소 추출부의 구성을 나타낸 도면, 도 4는 도 2에 도시된 개념구조 생성부의 구성을 나타낸 도면, 도 5는 도 2에 도시된 분류부의 구성을 나타낸 도면이다. FIG. 2 is a block diagram schematically showing a configuration of a patent document classification system according to an embodiment of the present invention. FIG. 3 is a diagram showing the configuration of the conceptual element extraction unit shown in FIG. FIG. 5 is a diagram showing the configuration of the classification unit shown in FIG. 2. FIG.

도 2를 참조하면, 특허문서 분류 시스템(200)은 데이터베이스(210), 인터페이스부(220), 개념요소 추출부(230), 개념구조 생성부(240), 분류부(250), 제어부(260)를 포함한다. 2, the patent document classification system 200 includes a database 210, an interface unit 220, a conceptual element extraction unit 230, a conceptual structure generation unit 240, a classification unit 250, a control unit 260 ).

데이터베이스(210)는 개체명 사전 데이터베이스(212), 기능속성분류 데이터베이스(214), 개념구조 데이터베이스(216)를 포함한다. The database 210 includes an object name dictionary database 212, a functional property classification database 214, and a conceptual structure database 216.

개체명 사전 데이터베이스(212)에는 제품, 기술에 대한 개체명이 저장되어 있다. The entity name dictionary database 212 stores entity names for products and technologies.

개체명 사전데이터베이스(212)는 상표 데이터베이스, wordnet과 같은 외부 공개 또는 상용 전문용어 데이터베이스로부터 식별 및 추출된 문자열 개체와 속성이 저장될 수 있다. The entity name dictionary database 212 may store string entities and attributes identified and extracted from a trademark database, an external disclosure such as wordnet, or a commercial terminology database.

개체명 사전 데이터베이스(212)는 텍스트 마이닝 결과 및 개체명 식별 결과 값을 근거로 새로운 개체 문자열이 등록, 수정, 삭제될 수 있다. The entity name dictionary database 212 may register, modify, or delete a new entity string based on the text mining result and the entity name identification result value.

기능속성분류 데이터베이스(214)에는 개체들의 기능속성분류를 위한 규칙이 저장되어 있다. The functional attribute classification database 214 stores rules for classifying functional attributes of entities.

기능속성분류 데이터베이스(214)는 TRIZ기반의 작용(action)과 대상(object)의 개체(인스턴스)와 그 조합으로 구성된 기능 분류 매트릭스 및 해당 유형별로 수집 또는 식별하여 구축된 개체 데이터를 포함한다. 여기서, 작용은 이동(move), 제공(add), 제거(remove), 유지(hold), 운동(deflect), 변환(change), 구성(inclusion), 실행(operation) 등을 포함할 수 있고, 대상은 물질, 성질, 정보/개념, 제품 등을 포함할 수 있다. The functional attribute classification database 214 includes functional classification matrices composed of TRIZ-based actions and objects (instances) of the objects, and entity data constructed by collecting or identifying the types of the matrices. Here, the action may include move, add, remove, hold, deflect, change, inclusion, operation, etc., Objects may include materials, properties, information / concepts, products, and the like.

기능속성분류 데이터베이스(214)는 표 1과 같은 기능분류 매트릭스를 포함한다. The functional attribution classification database 214 includes functional classification matrices as shown in Table 1.

물질(substance)Substance 성질(Field)Field 정보/개념(info)Information / Concept (info) 제품(artifact)Product (artifact) 이동(move)Move m-sm-s m-fm-f m-im-i m-am-a 제공(add)Add a-sa-s a-fa-f a-ia-i a-aa-a 제거(remove)Remove r-sr-s r-fr-f r-ir-i r-ar-a 유지(hold)Hold h-sh-s f-ff-f h-ih-i h-ah-a 운동(deflect)Deflect d-sd-s d-fd-f d-id-i d-ad-a 변환(change)Change c-sc-s c-fc-f c-ic-i c-ac-a 구성(Configuration( inclusioninclusion )) I-sI-s I-fI-f I-iI-i I-aI-a 실행(Execution( operationoperation )) o-so-s o-fo-f o-io-i o-po-p

action-object기반의 기능 분류 매트릭스는 TRIZ function oriented search를 토대로 구성되었으나, 분류 매트릭스와 같이 요구되는 분석 수준에 따라 세분화 또는 확장될 수 있다.The action-object-based functional classification matrix is based on the TRIZ function oriented search, but can be refined or extended according to the required analysis level, such as the classification matrix.

개념구조 데이터베이스(216)에는 개념구조 생성부(240)에서 생성된 개념구조에 대한 정보가 저장된다. 즉, 개념구조 분석데이터베이스(216)에는 개념요소(기술, 제품), 개념요소 유사도 값, 개념구조, 개념구조 유사도 행렬 등이 저장된다. The concept structure database 216 stores information on the concept structure generated by the concept structure generation unit 240. That is, the conceptual structure analysis database 216 stores conceptual elements (technology, product), conceptual element similarity values, conceptual structure, conceptual structure similarity matrix, and the like.

인터페이스부(220)는 통신망을 통해 특허제공서버로부터 특허문서를 수집한다.The interface unit 220 collects the patent document from the patent providing server through the communication network.

인터페이스부(220)는 개체 유형, 기능속성 유형, 유사도 값 등을 입력받아 조건을 만족하는 특허문서를 검색할 수 있도록 한다.The interface unit 220 receives an entity type, a function attribute type, a similarity value, and the like, and retrieves a patent document satisfying the condition.

개념요소 추출부(230)는 각 특허문서를 분석하여 개념요소들을 추출하고, 각 개념요소에 기능속성을 부여한다. 여기서, 개념요소는 제품, 기술을 포함하므로, 개념요소 추출부(230)는 특허문서로부터 개념요소 표현의 패턴 분석을 이용하여 제품과 기술에 대한 개체를 식별할 수 있다. 이때, 개념요소 추출부(230)는 자연어 처리방법, 기계적 알고리즘 처리방법 등을 이용하여 특허문서로부터 개념요소를 추출할 수 있다.The conceptual element extracting unit 230 extracts conceptual elements by analyzing each patent document, and assigns functional attributes to each conceptual element. Here, since the conceptual element includes the product and the technology, the conceptual element extraction unit 230 can identify the object for the product and the technology using the pattern analysis of the conceptual element expression from the patent document. At this time, the conceptual element extracting unit 230 can extract the conceptual element from the patent document using the natural language processing method, the mechanical algorithm processing method, and the like.

한편, 개념요소 추출부(230)는 인터페이스부(220)를 통해 개체명 수집실행 횟수, 처리문서단위 갯수, 처리문서단위 개수 당 식별률에 대한 임계치 값과 대상 문서영역(예컨대, 제목, 초록, 상세설명, 도면, 청구항) 등의 환경변수 설정정보를 입력받아 설정할 수 있다. 이 경우, 개념요소 추출부(230)는 환경변수 설정정보에 따라 지정된 문서개수만큼 임의의 문서세트 선정하고, 지정된 문서영역을 로딩한다. 이후, 개념요소 추출부(230)는 개체명 사전과 각 개체유형별로 정의된 패턴인식규칙을 로딩하여 지정된 문서영역으로부터 개체명을 식별하거나 인식하여 추출한다. 개념요소 추출부(230)는 개체명을 식별 또는 인식되지 않은 문서개수와 식별률 값을 참조하여 지정한 회수의 개체명 수집 프로세스를 실행하며, 최소 식별률을 만족시키지 못한 경우 중단한다. 관리자는 미식별문서를 조회하여 새로운 개체명 또는 인식 규칙을 등록하고, 최소 인식률을 충족하지 못한 문서세트에 재실행할 수 있다. Meanwhile, the conceptual element extracting unit 230 extracts a threshold value for an object name collection execution count, a number of processed document units, a discrimination rate per number of processed document units and a target document area (e.g., title, abstract, The detailed description, the drawings, the claims), and the like. In this case, the conceptual element extracting unit 230 selects an arbitrary document set for the specified number of documents according to the environment variable setting information, and loads the specified document area. Then, the conceptual element extracting unit 230 identifies or extracts the entity name from the specified document region by loading the entity name dictionary and the pattern recognition rule defined for each entity type. The conceptual element extracting unit 230 executes the object name collecting process for the designated number of times by referring to the number of documents and the discrimination rate that are not identified or recognized, and stops when the minimum discrimination rate is not satisfied. The administrator can query the unidentified document, register a new object name or recognition rule, and re-execute the document set that does not meet the minimum recognition rate.

개념요소 추출부(230)에 대해 도 3을 참조하면, 개념요소 추출부(230)는 후보문장 식별모듈(232), 문자열 추출모듈(234), 기능속성 부여모듈(236)을 포함한다. Referring to FIG. 3, the conceptual element extracting unit 230 includes a candidate sentence identifying module 232, a character extracting module 234, and a function attribute assigning module 236.

후보문장 식별모듈(232)은 개체명 사전 데이터베이스(212)를 참조하여 특허문서의 기 정의된 영역에서 후보문장을 식별한다. The candidate sentence identification module 232 references the entity name dictionary database 212 to identify candidate sentences in predefined areas of the patent document.

후보문장 식별모듈(232)은 지정된 문서영역에 대하여 개체명 사전 데이터베이스(212)을 참조하여 후보문장을 식별한다. 예를 들면, 제목, 초록, 상세기술, 청구항 첫 문단 등의 영역을 분석하여 개체명 사전에 등록된 문자열을 포함하는 후보 문장을 식별한다. 이때, 후보문장 식별모듈(232)는 문서구조에 따라 전체 또는 임의의 범위를 지정하여 후보문장을 식별할 수도 있다. The candidate sentence identification module 232 identifies the candidate sentence by referring to the object name dictionary database 212 for the designated document area. For example, a candidate sentence including a character string registered in the object name dictionary is analyzed by analyzing the title, abstract, detailed description, first paragraph of the claim, and the like. At this time, the candidate sentence identification module 232 may identify the candidate sentence by designating the whole or an arbitrary range according to the document structure.

후보문장 식별모듈(232)은 기 설정된 환경변수 설정정보를 근거로 특허제공서버로부터 순차적으로 특허문서를 로딩하여 후보문장을 식별할 수 있다. 이때 새로운 환경 변수값을 지정할 수 있다. 그런 후, 후보문장 식별모듈(232)은 환경변수 설정정보에 따라 지정된 문서영역에서 개체명이 식별되거나 인식된 문장을 후보문장으로 식별한다. The candidate sentence identification module 232 can sequentially identify the candidate sentence by loading the patent document from the patent providing server based on the preset environment variable setting information. At this time, you can specify a new environment variable value. Then, the candidate sentence identification module 232 identifies the entity name in the document area designated according to the environment parameter setting information or the recognized sentence as the candidate sentence.

문자열 추출모듈(234)은 후보문장 식별모듈(232)에서 식별된 후보 문장에서 의존문법기반의 구문분석을 통해 개념요소를 나타내는 문자열을 추출한다. 이때 추출된 문자열이 개념요소일 수 있고, 문자열 추출모듈(234)은 텍스트 마이닝, 자연어 처리기법, 형태소 분석 등 다양한 방법을 이용하여 문자열을 추출할 수 있다. 즉, 문자열 추출모듈(234)은 개념요소(개체명) 식별 및 인식을 위해 개체명 사전 데이터베이스(212) 및 불용어 사전(미도시)을 참조하여 주 문장의 비 의존 명사구 표제어를 식별하고, 최장일치 분석을 통해 개체명을 식별하거나 인식한다. 이때, 구문구조 분석은 의존관계 분석을 지원하는 스탠포드 파서(Stanford Parser)와 같은 오픈소스를 활용할 수 있다. The string extracting module 234 extracts a character string representing a concept element from the candidate sentence identified by the candidate sentence identifying module 232 through parsing based on the dependency grammar. At this time, the extracted character string may be a conceptual element, and the character string extraction module 234 may extract a character string using various methods such as text mining, natural language processing, and morphological analysis. That is, the string extracting module 234 identifies the independent noun phrase of the main sentence by referring to the entity name dictionary database 212 and the stop word dictionary (not shown) for identification and recognition of the concept element (entity name) Identify or recognize entity names through analysis. At this time, syntax analysis can utilize open source such as Stanford Parser which supports dependency analysis.

즉, 문자열 추출모듈(234)은 후보문장의 의존문법기반의 구문구조를 분석하여 주문장의 명사구와 그 명사구의 표제어(head word)를 식별 또는 개체명 인식을 실행한다. 예를 들어 'anti-piracy system for protecting distributed software applications from unauthorized use'와 같은 서술형 명사구의 경우 아래와 같은 의존문법기반의 구문구조분석을 통해 'anti-piracy system'을 제품으로 식별할 수 있다. That is, the string extracting module 234 analyzes the syntactic structure based on the dependent grammar of the candidate sentence to identify the noun phrase of the order chapter and the head word of the noun phrase or recognize the entity name. For example, in the case of a descriptive noun phrase such as 'anti-piracy system for protecting distributed software applications from unauthorized use', 'anti-piracy system' can be identified as a product by analyzing syntax based on dependency grammar as follows.

(ROOT (NP ( NP ( JJ anti - piracy ) ( NN system ) ) (PP (IN for) (S (VP (VBG protecting) (NP (VBN distributed) (NN software) (NNS applications)) (PP (IN from) (NP (JJ unauthorized) (NN use)))))) (. .))) (ROOT (NP ( NP ( JJ anti - piracy ) ( NN system)) (PP (IN for ) (S (VP (VBG protecting) (NP (VBN distributed) (NN software) (NNS applications)) (PP (IN from) (NP (JJ unauthorized) (NN use))) )))))))

또 다른 예로서, 'solid polymer electrolyte membrane'와 같은 단순 명사의 경우, 구문구조 분석을 통해 '(ROOT (NP ( NP ( JJ solid ) ( NN polymer )) ( NP ( NN electrolyte) ( NN membrane )) ))'를 개체로 식별할 수 있다. As another example, in the case of simple nouns such as' solid polymer electrolyte membrane ',' ROOT (NP ( NP ( JJ solid ) ( NN polymer)) (NP (NN electrolyte ) (NN membrane )) )) can be identified as an individual.

기능속성 부여모듈(236)은 기능속성분류 데이터베이스(214)를 참조하여 문자열 추출모듈(234)에서 추출된 문자열에 기능속성을 부여한다. The function attribute assigning module 236 refers to the function attribute classification database 214 and gives a function attribute to the string extracted by the string extracting module 234. [

기능속성 부여모듈(236)은 기능속성분류 데이터베이스(214)를 참조로, 구문구조를 분석하여 해당 개체의 기능속성을 식별한다.The function attribute assignment module 236 analyzes the syntax structure with reference to the function attribute classification database 214 and identifies the function attribute of the corresponding object.

예를 들어, 'a system for reporting ( add ) security information(information) relating to a mobile device'의 경우, 아래와 같은 의존문법기반의 구문구조 분석결과를 얻을 수 있다. For example, in the case of 'a system for reporting ( add ) security information (information) relating to a mobile device', the following syntax analysis result based on dependence grammar can be obtained.

(ROOT (NP (NP (DT a) (NN system)) (PP (IN for) (S ( VP ( VBG reporting ) (NP ( NP ( NN security ) ( NN information ) ) (VP (VBG relating) (PP (TO to) (NP (DT a) (JJ mobile) (NN device)))))))) (. .))) (ROOT (NP (NP (DT a) (NN system)) (PP (IN for) (S (VP (VBG reporting ) (NP ( NP ( NN security ) ( NN information)) (VP (VBG relating ) (PP (TO to) (NP (DT a) (JJ mobile) (NN device)))))))) (..)))

이 경우, 기능속성 부여모듈(236)은 기능속성 분류 매트릭스를 참조하여 'reporting'과 'security information'을 각각 제공(add) 유형과 정보(information) 유형으로 'reporting security information'를 전체 문자열의 기능속성으로 인식하여 제공-정보(a-i) 유형으로 기능속성을 부여할 수 있다.In this case, the function attribute assigning module 236 refers to the functional attribute classification matrix to provide 'reporting' and 'security information' respectively, and 'reporting security information' as the information type, Attribute and can assign a functional attribute to the type of provision-information (ai).

기능속성 부여모듈(236)은 개체 문자열과 기능속성을 해당 특허문서 정보와 함께 개념구조 데이터베이스(216)에 저장한다. The function attribute assignment module 236 stores the object string and the function attribute in the concept structure database 216 together with the corresponding patent document information.

개념구조 생성부(240)는 각 특허문서의 개념요소들을 기능속성별로 군집하여 개념요소 유사도 값을 산출하고, 산출된 개념요소 유사도 값을 포함하는 개념구조를 특허문서별로 생성한다. 즉, 개념구조 생성부(240)는 개념요소 추출부(230)에서 추출된 개체들 간의 의미적 유사도를 산출하고, 동일한 기능속성의 개체를 포함하고 있는 특허문서들에 대하여 개념구조 간의 유사도 행렬을 생성한다. 다시 말하면, 개념구조 생성부(240)는 제품유형 개체 또는 기술유형 개체들을 각각 유형별로 구분하고, action-object 분류 매트릭스 상 동일한 기능속성 유형별로 유사도를 측정한다. The conceptual structure generation unit 240 generates conceptual element similarity values by grouping the conceptual elements of each patent document by functional attributes, and generates conceptual structures including the calculated conceptual element similarity values for each patent document. That is, the conceptual structure generation unit 240 calculates the semantic similarities between the entities extracted by the conceptual element extraction unit 230, and calculates a similarity matrix between conceptual structures for patent documents including entities having the same functional attributes . In other words, the conceptual structure generation unit 240 classifies product type entities or technology type entities by type, and measures the similarity according to the same function attribute type in the action-object classification matrix.

개념구조 생성부(240)는 개체 유사도 측정을 위한 계산모델을 선택한다. 이 때, 유사도 임계치에 대한 환경변수를 지정 변경할 수 있다. The conceptual structure generation unit 240 selects a calculation model for object similarity measurement. At this time, the environment variable for the similarity threshold can be specified and changed.

개념구조 생성부(240)는 개념구조 데이터베이스(216)에 저장된 개체 문자열에 대하여 유사도 값을 산출하여 저장한다. 그런 후, 개념구조 생성부(240)는 개념구조, 즉 문서단위의 개체집합 간의 유사도 행렬을 생성하여 개념구조 데이터베이스(216)에 저장한다. The conceptual structure generation unit 240 calculates and stores the similarity value for the object string stored in the conceptual structure database 216. Then, the conceptual structure generation unit 240 generates a conceptual structure, that is, a similarity matrix between entities of document units, and stores the generated similarity matrices in the conceptual structure database 216.

개념구조 생성부(240)에 대해 도 4를 참조하면, 개념구조 생성부(240)는 개념요소 군집모듈(242), 유사도 산출모듈(244), 개념구조 생성모듈(246)을 포함한다. Referring to FIG. 4, the conceptual structure generation unit 240 includes a conceptual element grouping module 242, a similarity calculation module 244, and a conceptual structure generation module 246.

개념요소 군집모듈(242)은 각 특허문서단위로 동일한 기능속성이 부여된 개념요소들을 군집한다. The conceptual element cluster module 242 groups conceptual elements having the same function attribute in units of patent documents.

유사도 산출모듈(244)은 기 정의된 유사도 계산 분석 모델을 이용하여 각 기능속성별로 개념요소 개체들간의 유사도 값을 산출한다. 개념요소는 속성값을 갖는 개체이며 두 개체간의 유사도 측정은 표제어(head word)와 수식부로 분리하는 의존구조(dependecy tree) 분석, 각 단어의 원형 식별(lemmatisation), 불용어 포함 여부, 문자열의 길이를 활용하여 계산될 수 있다. 두 개체 문자열 a와 b의 유사도는 각각 개체 유사도 및 기능속성 유사도 값을 구하고 그 합으로 정의된다. 각각의 유사도는 구문구조속성과 기능속성을 참조하여, 하기 수학식의 최소 편집거리 척도 중 대표적인 DamerauLevenshtein distance로 산출된다. The similarity calculation module 244 calculates a similarity value between the concept element entities for each functional attribute using the previously defined similarity calculation analysis model. The concept element is an entity having attribute values. The similarity measure between two entities is based on analysis of dependecy tree which is separated into head word and expression part, lemmatisation of each word, Can be calculated. The degree of similarity between two object strings a and b is defined as the sum of object similarity and functional attribute similarity. Each similarity degree is calculated as a representative DamerauLevenshtein distance among the minimum editing distance scale of the following equation by referring to the syntax structure attribute and the functional attribute.

즉, 개체 유사도는 수학식 1 내지 수학식 4를 이용하여 구할 수 있다. That is, the similarity degree of the object can be obtained by using Equations (1) to (4).

여기서, LD는 DamerauLevenshtein distance, 즉 두 문자열의 최소편집거리를 의미하여 문자열 a, b의 유사도를 두 문자열이 동일한 문자열이 되기 위해 필요한 삽입, 삭제, 교체의 편집 횟수를 계산하여 산출하는 것을 의미한다. Here, LD means Damerau Levenshtein distance, i.e., the minimum editing distance of two strings, and means that the similarity degree between the strings a and b is calculated by calculating the number of times of editing of insertion, deletion, and replacement necessary for the two strings to be the same string.

C(S) 는 문자열 S 의 개체 클래스, H(S) 는 문자열 S 의 표제어, E‘ 는 개체집합, F는 불용어 또는 단일어로는 개체를 표현함에 있어 의미 변별력이 없는 단어의 집합을 의미한다. 각각의 수식은 구문구조 분석에 따라 식별된 표제어가 F에 속하는 경우와 그렇지 않은 경우, 그리고 표제어의 동일하거나 다른 경우를 고려하고 있다. C (S) is the entity class of the string S , H (S) is the entry word of the string S , E ' is the entity set, and F is a set of words that have no meaningful discriminative power in expressing an object as an abbreviation or monolingual. Each expression considers the case where the heading identified according to the syntactic structure analysis belongs to F, and the case of the same or different heading words.

편집 거리(d)는 수학식 5를 이용하여 구할 수 있다. The editing distance d can be obtained using Equation (5).

그러나, 같은 편집거리라도 문자열 길이에 따라 유사도가 다르므로, 문자열 길이를 고려하여 수학식 6과 같이 정규화한다 However, even if the same editing distance is used, the degree of similarity varies depending on the length of a character string. Therefore, the length of the character string is normalized as shown in Equation 6

개념구조 생성모듈(246)은 기능속성, 개념요소들, 개념요소들의 유사도 값을 포함하는 개념구조를 특허문서단위로 생성한다. The conceptual structure generation module 246 generates a conceptual structure including functional attributes, conceptual elements, and similarity values of conceptual elements in units of patent documents.

분류부(250)는 특허문서간 개념구조 유사도 값을 구하고, 구해진 개념구조 유사도 값을 근거로 특허문서를 분류한다. The classification unit 250 obtains conceptual structure similarity values between patent documents, and classifies patent documents based on the obtained conceptual structure similarity values.

분류부(250)에 대해 도 5를 참조하면, 분류부(250)는 개념요소 유사도 산출모듈(252), 기능속성 유사도 산출모듈(254), 개념구조 유사도 산출모듈(256), 분류모듈(258)을 포함한다. 5, the classification unit 250 includes a concept element similarity degree calculation module 252, a functional property similarity degree calculation module 254, a conceptual structure similarity degree calculation module 256, a classification module 258 ).

개념요소 유사도 산출모듈(252)은 특허문서간 개념구조를 구성하는 개념요소들의 유사도 값을 구한다. 개념요소 유사도 산출모듈(252)이 개념요소들의 유사도 값을 구하는 방법은 유사도 산출모듈이 개념요소 유사도 값을 산출하는 방법과 동일하므로 그 설명은 생략하기로 한다. The conceptual element similarity calculation module 252 obtains a similarity value of conceptual elements constituting the conceptual structure between patent documents. The method for calculating the similarity value of the conceptual elements by the conceptual element similarity degree calculation module 252 is the same as the method for calculating the conceptual element similarity value by the similarity degree calculation module, and a description thereof will be omitted.

기능속성 유사도 산출모듈(254)은 특허문서간 기능속성 유사도 값을 구한다. 이때, 기능속성 유사도 산출모듈(254)은 동일한 작용(action) 속성을 갖는 대상개체의 유사도를 수학식 7 내지 10을 이용하여 산출한다. The functional property similarity calculating module 254 obtains a functional property similarity value between patent documents. At this time, the functional attribute similarity degree calculation module 254 calculates the similarity degree of the target object having the same action attribute using Equations (7) to (10).

여기서, C(S) 는 문자열 S 의 개체 클래스, H(S) 는 문자열 S 의 표제어, O는 기능 속성의 대상개체 집합, F는 불용어 또는 단일어로는 개체를 표현함에 있어 의미 변별력이 없는 단어의 집합을 의미한다. 각각의 수식은 구문구조 분석에 따라 식별된 표제어가 F에 속하는 경우와 그렇지 않은 경우, 그리고 표제어의 동일하거나 다른 경우를 고려하고 있다.Here, C (S) is the object class of the character string S , H (S) is the entry word of the character string S , O is the target object set of the functional attribute, F is a word having no meaning discriminating power Means a set. Each expression considers the case where the heading identified according to the syntactic structure analysis belongs to F, and the case of the same or different heading words.

개념구조 유사도 산출모듈(256)은 개념요소 유사도 산출모듈(252)에서 구해진 개념요소들의 유사도 값 또는 기능속성 유사도 산출모듈(254)에서 구해진 기능속성 유사도 값을 이용하여 특허문서간 개념구조 유사도 값을 구한다. 이때, 개념구조 유사도 산출모듈(256)은 각각의 개체 및 기능속성을 기준으로 세분화하여 구하거나 또는 두 개념요소 유사도의 총 합으로 구할 수 있다. The conceptual structure similarity calculation module 256 uses the similarity value of the conceptual elements obtained in the conceptual element similarity calculation module 252 or the functional property similarity value obtained in the functional property similarity calculation module 254 to calculate the conceptual structure similarity value between patent documents I ask. At this time, the conceptual structure similarity degree calculation module 256 can be obtained by subdividing each object and the attribute of the function or by the total sum of the two concept element similarities.

개념구조 유사도 산출모듈(256)은 수학식 11을 이용하여 특허문서간 개념구조 유사도 값을 구한다. The conceptual structure similarity degree calculating module 256 obtains the conceptual structure similarity value between patent documents using Equation (11).

여기서, N_E(P)는 특허 P에 포함된 개체 수이며, N_E(P_i,P_j)는 특허 P_i 와 P_j에 포함된 제품 및 기술개체간의 유사도 값이 1인 개체 쌍의 개수이다. 마찬가지로 N_F(P)는 특허 P의 개체의 기능속성 수이며, N_F(P_i,P_j)는 특허 P_i 와 P_j에 포함된 제품 및 기술개체의 기능속성 간 유사도 값이 1인 쌍의 개수이다. 두 유사도 값이 각각 1이면 두 특허의 개념구조는 동일하다고 할 수 있다. Where N _E (P) is the number of individuals included in patent P, N _E (P _i , P _j ) is the number of patents P _i And P _j is the number of pairs of entities with a similarity value of 1 between the product and technology entities. Similarly, N _F (P) is the number of functional properties of the object of the patent P, N _F (P _i, P _j) are patent P _i And the number of pairs in which the similarity value between the functional attributes of the product and technology entities included in P _j is 1. If the two similarity values are each 1, the conceptual structure of the two patents can be said to be the same.

분류모듈(258)은 특허문서간 개념구조 유사도 값을 근거로 특허문서를 분류한다. 즉, 분류모듈(258)은 유사도 값이 기 설정된 일정 값 이상인 특허문서들을 동일 또는 유사한 개념구조를 가진 문서로 분류할 수 있다.The classification module 258 classifies patent documents based on the concept structure similarity value between patent documents. That is, the classification module 258 can classify patent documents having a similarity value equal to or greater than a preset constant value into documents having the same or similar conceptual structure.

한편, 개념요소 추출부(230), 개념구조 생성부(240), 분류부(250) 각각은 컴퓨팅 장치상에서 프로그램을 실행하기 위해 필요한 프로세서 등에 의해 각각 구현될 수 있다. 이처럼 개념요소 추출부(230), 개념구조 생성부(240), 분류부(250)는 물리적으로 독립된 각각의 구성에 의해 구현될 수도 있고, 하나의 프로세서 내에서 기능적으로 구분되는 형태로 구현될 수도 있다.The conceptual element extracting unit 230, the conceptual structure generating unit 240, and the classifying unit 250 may each be implemented by a processor or the like necessary for executing a program on a computing device. The conceptual element extracting unit 230, the conceptual structure generating unit 240, and the classifying unit 250 may be physically independent of each other, or may be implemented in a functional manner in one processor have.

제어부(260)는 데이터베이스(210), 인터페이스부(220), 개념요소 추출부(230), 개념구조 생성부(240), 분류부(250)의 다양한 구성부들의 동작을 제어하는 구성이다. The control unit 260 controls the operations of various components of the database 210, the interface unit 220, the conceptual element extracting unit 230, the conceptual structure generating unit 240, and the classifying unit 250.

이러한 제어부(260)는 적어도 하나의 연산 장치를 포함할 수 있는데, 여기서 상기 연산 장치는 범용적인 중앙연산장치(CPU), 특정 목적에 적합하게 구현된 프로그래머블 디바이스 소자(CPLD, FPGA), 주문형 반도체 연산장치(ASIC) 또는 마이크로 컨트롤러 칩일 수 있다.The control unit 260 may include at least one computing device, which may be a general purpose central processing unit (CPU), programmable device elements (CPLDs, FPGAs) that are suitably implemented for a particular purpose, Device (ASIC) or a microcontroller chip.

특허문서 분류 시스템(200)이 포함할 수 있는 이러한 구성부들은 하드웨어, 소프트웨어 또는 이들의 결합으로 구현될 수 있으며, 하나의 하드웨어 또는 소프트웨어에 의해 둘 이상의 구성부들이 동시에 구현될 수도 있다.
These components that may be included in the patent document classification system 200 may be implemented by hardware, software, or a combination thereof, and two or more components may be simultaneously implemented by one hardware or software.

도 6은 본 발명의 실시예에 따른 특허문서 분류 방법을 나타낸 흐름도이다. 6 is a flowchart illustrating a method of classifying a patent document according to an embodiment of the present invention.

도 6을 참조하면, 특허문서 분류 시스템은 각 특허문서를 분석하여 개념요소들을 추출하고, 각 개념요소에 기능속성을 부여한다(S602). S602 단계에 대한 상세한 설명은 도 7을 참조하기로 한다. Referring to FIG. 6, the patent document classification system extracts conceptual elements by analyzing each patent document, and assigns functional attributes to the conceptual elements (S602). A detailed description of step S602 will be made with reference to FIG.

특허문서 분류 시스템은 각 특허문서의 개념요소 유사도 값을 산출하여 특허문서별로 개념구조를 생성한다(S604). S604 단계에 대한 상세한 설명은 도 8을 참조하기로 한다. The patent document classification system calculates a concept element value of each patent document and generates a concept structure for each patent document (S604). A detailed description of step S604 will be made with reference to FIG.

그런 후, 특허문서 분류 시스템은 특허문서간 개념구조 유사도 값을 구하고, 구해진 개념구조 유사도 값을 근거로 특허문서를 분류한다(S606). 즉, 특허문서 분류 시스템은 특허문서간 개념구조를 구성하는 개념요소들의 유사도 값과 특허문서간 기능속성 유사도 값을 구한다. 그런 후, 특허문서 분류 시스템은 구해진 개념요소들의 유사도 값 또는 기능속성 유사도 값을 이용하여 특허문서간 개념구조 유사도 값을 구하고, 구해진 특허문서간 개념구조 유사도 값을 근거로 특허문서를 분류한다.
Then, the patent document classification system obtains the concept structure similarity value between the patent documents, and classifies the patent document based on the obtained concept structure similarity value (S606). That is, the patent document classification system obtains similarity values of conceptual elements constituting the conceptual structure between patent documents and functional attribute similarity values between patent documents. Then, the patent document classification system obtains the concept structure similarity value between the patent documents using the similarity value or the functional property similarity value of the obtained concept elements, and classifies the patent document based on the obtained concept structure similarity value between the patent documents.

도 7은 본 발명의 실시예에 따른 특허문서로부터 개념요소 추출 및 기능속성을 부여하는 방법을 나타낸 흐름도이다. FIG. 7 is a flowchart illustrating a method for extracting conceptual elements and functional attributes from a patent document according to an embodiment of the present invention.

도 7을 참조하면, 특허문서 분류 시스템은 개체명 사전 데이터베이스를 참조하여 특허문서의 기 정의된 영역에서 후보문장을 식별한다(S702).Referring to FIG. 7, the patent document classification system refers to the object name dictionary database to identify a candidate sentence in a predefined area of a patent document (S702).

그런 후, 특허문서 분류 시스템은 식별된 후보 문장에서 의존문법기반의 구문분석을 통해 개념요소를 나타내는 문자열을 추출한다(S704).Then, the patent document classification system extracts a character string representing the concept element through parsing based on the dependency grammar in the identified candidate sentence (S704).

특허문서 분류 시스템은 기능분류 데이터베이스를 참조하여 상기 추출된 문자열에 기능속성을 부여한다(S706). 이후, 특허문서 분류 시스템은 특허문서 단위로 개념요소, 기능속성 등을 저장한다.
The patent document classification system refers to the functional classification database and gives a functional attribute to the extracted character string (S706). Thereafter, the patent document classification system stores conceptual elements, functional attributes, and the like in units of patent documents.

도 8은 본 발명의 실시예에 따른 개념구조를 생성하는 방법을 나타낸 흐름도이다. Figure 8 is a flow diagram illustrating a method for generating a conceptual structure according to an embodiment of the present invention.

도 8을 참조하면, 특허문서 분류 시스템은 각 특허문서단위로 동일한 기능속성이 부여된 개념요소들을 군집한다(S802).Referring to FIG. 8, the patent document classification system groups conceptual elements having the same function attributes in units of patent documents (S802).

그런 후, 특허문서 분류 시스템은 기 정의된 유사도 계산 분석 모델을 이용하여 각 기능속성별로 개념요소 개체들간의 유사도 값을 산출하고(S804), 기능속성, 개념요소들, 개념요소들의 유사도 값을 포함하는 개념구조를 특허문서단위로 생성한다(S804).
Then, the patent document classification system calculates a similarity value between the conceptual element entities according to each functional attribute using the previously defined similarity calculation model (S804), and includes similarity values of the functional attributes, conceptual elements, and conceptual elements Is generated in units of patent documents (S804).

이러한 특허문서 분류 방법은 프로그램으로 작성 가능하며, 프로그램을 구성하는 코드들 및 코드 세그먼트들은 당해 분야의 프로그래머에 의하여 용이하게 추론될 수 있다. 또한, 특허문서 분류 방법에 관한 프로그램은 전자장치가 읽을 수 있는 정보저장매체(Readable Media)에 저장되고, 전자장치에 의하여 읽혀지고 실행될 수 있다.Such a patent document classification method can be written in a program, and the codes and code segments constituting the program can be easily deduced by programmers in the field. In addition, the program relating to the patent document classification method is stored in an information storage medium (Readable Media) readable by an electronic apparatus, and can be read and executed by an electronic apparatus.

이와 같이, 본 발명이 속하는 기술분야의 당업자는 본 발명이 그 기술적 사상이나 필수적 특징을 변경하지 않고서 다른 구체적인 형태로 실시될 수 있다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적인 것이 아닌 것으로서 이해해야만 한다. 본 발명의 범위는 상기 상세한 설명보다는 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구범위의 의미 및 범위 그리고 그 등가개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본 발명의 범위에 포함되는 것으로 해석되어야 한다.Thus, those skilled in the art will appreciate that the present invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. It is therefore to be understood that the embodiments described above are to be considered in all respects only as illustrative and not restrictive. The scope of the present invention is defined by the appended claims rather than the detailed description and all changes or modifications derived from the meaning and scope of the claims and their equivalents are to be construed as being included within the scope of the present invention do.

100 : 특허제공서버 200 : 특허문서 분류 시스템
210 : 데이터베이스 220 : 인터페이스부
230 : 개념요소 추출부 240 : 개념구조 생성부
250 : 분류부 260 : 제어부100: Patent providing server 200: Patent document classification system
210: Database 220: Interface Unit
230: Conceptual element extracting unit 240: Conceptual structure generating unit
250: Classification unit 260:

Claims

A conceptual element extracting unit for extracting conceptual elements by analyzing each patent document, and assigning a functional attribute to each conceptual element;
A conceptual structure generation unit for generating conceptual element similarity values by clustering conceptual elements of each patent document by functional attributes and generating conceptual structures including the calculated conceptual element similarity values for each patent document;
A classifying unit for classifying the patent document based on the concept structure similarity value obtained between the patent documents and the concept structure similarity value between the patent documents;
A patent document classification system.

The method according to claim 1,
Product, and technology. Dictionary database;
And a function attribute classification database in which rules for classifying functional attributes of entities are set.

The method according to claim 1,
The conceptual element extracting unit may include a candidate sentence identifying module for identifying a candidate sentence in a predefined area of the patent document by referring to the object name dictionary database;
A character string extraction module for extracting a character string representing a concept element through syntax analysis based on a dependency grammar in the identified candidate sentence;
And a functional attribute assigning module for referring to the functional classification database and assigning a functional attribute to the extracted character string,
Wherein the extracted character string is a conceptual element.

The method according to claim 1,
The conceptual structure generation unit,
A conceptual element cluster module which clusters conceptual elements to which the same functional attributes are assigned in units of patent documents;
A similarity calculation module for calculating a similarity value between the conceptual element entities for each functional attribute using a previously defined similarity calculation model;
And a conceptual structure generation module for generating a conceptual structure including functional attributes, conceptual elements, and similarity values of conceptual elements in units of patent documents.

The method according to claim 1,
Wherein,
A concept element similarity calculating module for obtaining a similarity value of concept elements constituting a conceptual structure between patent documents;
A functional attribute similarity calculation module for obtaining a functional attribute similarity value between patent documents;
A conceptual structure similarity calculation module for obtaining a conceptual structure similarity value between patent documents using the similarity value or the functional property similarity value of the obtained concept elements;
And a classification module for classifying the patent document based on the conceptual structure similarity value between the patent documents.

(a) analyzing each patent document to extract conceptual elements, and assigning functional attributes to each conceptual element;
(b) calculating conceptual element similarity values by clustering conceptual elements of each patent document by functional attributes, and generating a conceptual structure including the calculated conceptual element similarity values for each patent document; And
(c) classifying the patent document based on the concept structure similarity value between the patent documents and the concept structure similarity value;
The method comprising the steps of:

The method according to claim 6,
The step (a)
Identifying a candidate sentence in a predefined area of the patent document by referring to the object name dictionary database;
Extracting a character string representing a concept element through parsing based on a dependency grammar in the identified candidate sentence;
And providing a functional attribute to the extracted character string by referring to the functional classification database,
Wherein the extracted character string is a conceptual element.

The method according to claim 6,
The step (b)
Clustering conceptual elements to which the same functional attributes are assigned in units of patent documents;
Calculating a similarity value between the conceptual element entities for each functional attribute using a previously defined similarity calculation model;
And generating a conceptual structure including a feature attribute, conceptual elements, and similarity values of the conceptual elements in units of a patent document.

The method according to claim 6,
The step (c)
Obtaining similarity values of conceptual elements constituting a conceptual structure between patent documents;
Obtaining a functional property similarity value between patent documents;
Obtaining a concept structure similarity value between patent documents using the similarity value or the functional property similarity value of the obtained concept elements;
And classifying the patent document based on the concept structure similarity value between the patent documents.

When implemented by a patent document classification system,
(a) analyzing each patent document to extract conceptual elements, and assigning functional attributes to each conceptual element;
(b) calculating conceptual element similarity values by clustering conceptual elements of each patent document by functional attributes, and generating a conceptual structure including the calculated conceptual element similarity values for each patent document; And
(c) obtaining a conceptual similarity value between patent documents, and classifying the patent document based on the obtained conceptual structure similarity value.