KR102806164B1

KR102806164B1 - Method and apparatus of performing tagging process for recommending policy to individual based on natural language-processed policy data

Info

Publication number: KR102806164B1
Application number: KR1020220034072A
Authority: KR
Inventors: 김유리안나
Original assignee: 주식회사 웰로
Priority date: 2022-03-18
Filing date: 2022-03-18
Publication date: 2025-05-14
Anticipated expiration: 2042-03-18
Also published as: KR20230136396A

Abstract

본 발명은 서버에 의해 수행되는 자연어 처리된 정책 데이터를 기초로 개인에게 정책을 추천하기 위한 태깅 과정 수행 방법에 관한 것으로서, (a) 기 설정된 정책 데이터 자연어 처리 학습 모델을 통해 정책 공고문에서 추출된 키워드를 입력 받는 단계; (b) 개체명 인식 모듈을 통하여, 상기 (a) 단계에 따라 추출된 키워드 중에서 기 설정된 카테고리 항목에 대응하는 키워드를 가중치를 고려하여 선별하고, 상기 선별된 키워드에 대응되는 카테고리 항목을 나타내는 식별자를 생성하는 단계; 및 (c) 상기 식별자를 기준으로, 상기 선별된 키워드를 해당 키워드에 대응되는 카테고리 항목에 기입하여 분류 결과표를 생성하는 단계를 포함하며, 상기 (b)단계에서, 상기 (a) 단계에 따라 추출된 키워드 중에서 정책 추천 대상이 되는 개인의 거주지, 종사업종, 소득, 연령, 성별 및 자녀 수 중 적어도 하나 이상과 관련된 키워드에 상기 식별자를 생성하는 자연어 처리된 정책 데이터를 기초로 개인에게 정책을 추천하기 위한 태깅 과정 수행 방법에 관한 것이다.The present invention relates to a method for performing a tagging process for recommending a policy to an individual based on natural language processed policy data performed by a server, comprising: (a) a step of receiving keywords extracted from a policy announcement through a preset policy data natural language processing learning model; (b) a step of selecting keywords corresponding to preset category items from among the keywords extracted according to step (a) by considering weights through a named entity recognition module, and generating an identifier indicating a category item corresponding to the selected keyword; and (c) a step of generating a classification result table by entering the selected keyword into a category item corresponding to the keyword based on the identifier, and in step (b), a method for performing a tagging process for recommending a policy to an individual based on natural language processed policy data, wherein the identifier is generated for a keyword related to at least one of the residence, type of business, income, age, gender, and number of children of an individual who is a target of policy recommendation from among the keywords extracted according to step (a).

Description

{METHOD AND APPARATUS OF PERFORMING TAGGING PROCESS FOR RECOMMENDING POLICY TO INDIVIDUAL BASED ON NATURAL LANGUAGE-PROCESSED POLICY DATA}

본 발명은 자연어 처리된 정책 데이터를 기초로 개인에게 정책을 추천하기 위한 태깅 과정 수행 방법 및 장치에 관한 것으로서, 보다 상세하게는, 정책 데이터 자연어 처리 학습 모델에 의해 자연어 처리된 정책 데이터를 개인에게 추천하기 위한 용도로 태깅 및 분류하는 것에 관한 것이다.The present invention relates to a method and device for performing a tagging process for recommending policies to individuals based on policy data processed in a natural language, and more specifically, to tagging and classifying policy data processed in a natural language by a policy data natural language processing learning model for the purpose of recommending policies to individuals.

종래에는 일반적으로, 국가에서 수행하는 정책에 지원하기 위해 지원을 희망하는 사용자가 직접 정책 공고문을 숙지하고 대상 정책을 탐색하여 왔다. 이 경우, 정책의 지원 대상이 되지 않음에도 불구하고 숙지가 미흡하여 사용자에게 불필요한 정보가 제공되는 문제가 있었다.In the past, in order to support a policy implemented by the state, users who wanted to apply directly read the policy announcement and searched for the target policy. In this case, there was a problem in that unnecessary information was provided to users due to insufficient knowledge even though they were not eligible for support by the policy.

특히, 개인의 경우, 정책 공고문이 기관, 부서 및 담당자별로 작성 방법 및 기준 법령이 상이하여, 기업을 대상으로 하는 정책인지 개인을 대상으로 하는 정책인지 파악하는데 어려움을 겪고 있다.In particular, for individuals, the writing method and standard laws for policy announcements differ depending on the organization, department, and person in charge, making it difficult to determine whether the policy is for companies or individuals.

종래에 정부기관으로부터 발행되는 정책 공고문은 자연어처리, 텍스트 처리 기술 및 개체 명 인식 기술을 활용하여, 정제된 데이터를 추출하더라도, 약 7,000자가량의 문장 성분이 추출되기 때문에, 이를 정제하여 데이터화 하는 기술의 필요성이 대두된다.Even if the policy announcements issued by government agencies are refined by extracting data using natural language processing, text processing technology, and entity recognition technology, only about 7,000 characters of sentence components are extracted, so the need for technology to refine and digitize this arises.

본 발명은 전술한 종래 기술의 문제점을 해결하기 위한 것으로서, 자연어 처리된 정책 데이터를 통하여 개인에게 정책을 추천하기 위한 태깅 자동화 방법을 제공하는 것을 일 기술적 과제로 한다.The present invention is intended to solve the problems of the above-mentioned prior art, and has as its technical task to provide an automated tagging method for recommending policies to individuals through natural language processed policy data.

또한, 본 발명은 정책 공고문으로부터 추출한 정책 데이터를 자연어 처리한 결과물을 인 키워드를 식별자와 태깅하고, 이를 기반으로 다양한 카테고리 항목 별로 분류하는 과정을 수행함으로써, 종래의 방식보다 수집한 정책 데이터를 편리하게 정책추천에 활용하는 것을 도모하는 것을 다른 기술적 과제로 한다.In addition, the present invention aims to facilitate the utilization of collected policy data for policy recommendation more conveniently than conventional methods by performing a process of tagging keywords as identifiers and classifying the results of natural language processing of policy data extracted from a policy announcement and categorizing them into various category items based on the keywords.

또한, 정책 데이터의 태깅과 분류 과정에서 개인을 위한 정책인 경우, 해당 조건에 맞는 정책 데이터를 기 설정된 알고리즘에 따라 부여되는 가중치에 기반하여 개인 사용자에게 적합한 정책 데이터를 태깅 및 분류할 수 있다.In addition, in the process of tagging and classifying policy data, if the policy is for an individual, the policy data that meets the conditions can be tagged and classified based on the weight given according to the preset algorithm.

본 발명이 해결하려는 과제들은 이상에서 언급한 과제들로 제한되지 않으며, 언급되지 않은 또 다른 과제들은 아래의 기재로부터 명확하게 이해될 수 있을 것이다.The problems to be solved by the present invention are not limited to the problems mentioned above, and other problems not mentioned can be clearly understood from the description below.

상술한 기술적 과제를 달성하기 위한 기술적 수단으로서, 본 발명의 일 실시 예에 따르는, 장치에 의해 수행되는 자연어 처리된 정책 데이터를 기초로 개인에게 정책을 추천하기 위한 태깅 과정 수행 방법은, (a) 기 설정된 정책 데이터 자연어 처리 학습 모델을 통해 정책 공고문에서 추출된 키워드를 입력 받는 단계; (b) 개체명 인식 모듈을 통하여, 상기 (a) 단계에 따라 추출된 키워드 중에서 기 설정된 카테고리 항목에 대응하는 키워드를 가중치를 고려하여 선별하고, 상기 선별된 키워드에 대응되는 카테고리 항목을 나타내는 식별자를 생성하는 단계; 및 (c) 상기 식별자를 기준으로, 상기 선별된 키워드를 해당 키워드에 대응되는 카테고리 항목에 기입하여 분류 결과표를 생성하는 단계를 포함하며, 상기 (b)단계에서, 상기 (a) 단계에 따라 추출된 키워드 중에서 정책 추천 대상이 되는 개인의 거주지, 종사업종, 소득, 연령, 성별 및 자녀 수 중 적어도 하나 이상과 관련된 키워드에 상기 식별자를 생성하는 것일 수 있다.As a technical means for achieving the above-described technical task, according to an embodiment of the present invention, a method for performing a tagging process for recommending a policy to an individual based on natural language processed policy data performed by a device comprises: (a) a step of receiving keywords extracted from a policy announcement through a preset policy data natural language processing learning model; (b) a step of selecting keywords corresponding to preset category items from among the keywords extracted according to step (a) by considering weights through a named entity recognition module, and generating an identifier indicating a category item corresponding to the selected keyword; and (c) a step of generating a classification result table by entering the selected keyword into a category item corresponding to the keyword based on the identifier, wherein in step (b), the identifier may be generated for a keyword related to at least one of the residence, business type, income, age, gender, and number of children of an individual who is a target of policy recommendation from among the keywords extracted according to step (a).

또한, 상기 (a)단계는, 상기 정책 데이터 자연어 처리 학습 모델은 형태소 분석을 이용하여 상기 정책 공고문으로부터 명사 형태의 문장 성분을 추출하도록 학습된 것일 수 있다.In addition, in the step (a), the policy data natural language processing learning model may be trained to extract sentence components in the form of nouns from the policy announcement using morphological analysis.

또한, 상기 가중치는, 상기 정책 데이터 자연어 처리 학습 모델에 의해 상기 정책 공고문으로부터 추출된 복수의 명사들을 기초로 설정되되, 상기 정책 공고문에서 추출된 명사들의 빈도를 기초로 추출 빈도가 높은 명사의 가중치가 추출 빈도가 낮은 명사의 가중치보다 크도록 설정되는 것일 수 있다.In addition, the weight may be set based on a plurality of nouns extracted from the policy announcement by the policy data natural language processing learning model, and the weight of a noun with a high extraction frequency may be set to be greater than the weight of a noun with a low extraction frequency based on the frequency of the nouns extracted from the policy announcement.

또한, 상기 (a)단계는, (a-1) 기 설정된 알고리즘에 따라, 동일한 의미를 나타내는 것으로 판단되는 복수의 명사들에 대하여 같은 벡터 값을 설정하고, 이와 유사한 의미를 나타내는 것으로 판단되는 다른 명사에 대하여 상기 벡터 값과 기 설정된 차이 이내로 인접한 벡터 값을 설정하여 명사-벡터 쌍을 매칭하는 단계; (a-2) 기 설정된 알고리즘에 따라 상기 정책 공고문으로부터 추출된 명사 중 벡터 값이 일치하거나 기 설정된 차이 이내인 명사를 하나의 키워드로 분류하는 단계; 및 (a-3) 상기 (a-1) 단계에 따라 생성된 복수의 명사-벡터 쌍 및 상기 (a-2) 단계에 따라 분류된 키워드를 상기 정책 데이터 자연어 처리 학습 모델의 학습 값 중 입력 값으로 설정하고, 상기 (a-2) 단계에 따라 분류된 키워드를 상기 정책 데이터 자연어 처리 학습 모델의 학습 값 중 출력 값으로 설정하여 학습을 수행하는 단계를 포함하는 것일 수 있다.In addition, the step (a) may include: (a-1) a step of setting the same vector value for a plurality of nouns determined to have the same meaning according to a preset algorithm, and setting an adjacent vector value within a preset difference from the vector value for another noun determined to have a similar meaning, thereby matching noun-vector pairs; (a-2) a step of classifying nouns extracted from the policy announcement according to a preset algorithm, whose vector values are identical or within a preset difference, into one keyword; and (a-3) a step of performing learning by setting the plurality of noun-vector pairs generated according to the step (a-1) and the keywords classified according to the step (a-2) as input values among learning values of the policy data natural language processing learning model, and setting the keywords classified according to the step (a-2) as output values among learning values of the policy data natural language processing learning model.

또한, 상기 가중치는 패널티 가중치를 더 포함하고, 상기 (b)단계는, 상기 패널티 가중치를 기초로, 상기 (a) 단계에 따라 추출된 키워드와 해당 키워드에 대응되는 카테고리 항목의 관련성을 평가하는 것일 수 있다.In addition, the weight may further include a penalty weight, and the step (b) may evaluate the relevance between the keyword extracted according to the step (a) and the category item corresponding to the keyword based on the penalty weight.

또한, 상기 (c)단계는, 상기 정책 추천 대상이 되는 개인의 거주지, 종사업종, 소득, 연령, 성별 및 자녀 수와 관련된 키워드를 기 설정된 알고리즘에 따라 상기 정책 공고문에 대응하는 정책의 소관기관 카테고리, 정책 명칭 카테고리 및 지원대상 카레고리 중 어느 하나의 카테고리 항목으로 분류하여 상기 분류 결과표를 생성하는 것일 수 있다.In addition, the step (c) above may be to generate the classification result table by classifying keywords related to the residence, type of business, income, age, gender, and number of children of the individual who is the target of the policy recommendation into one of the categories of the relevant organization category, policy name category, and support target category corresponding to the policy announcement according to a preset algorithm.

또한, 상기 (b)단계는, 상기 기 설정된 카테고리 항목에 대응하는 키워드가 선별되지 않는 경우, 상기 개체명 인식 모듈을 이용하여 상기 정책 공고문의 문맥을 다시 파악하고, 상기 카테고리 항목에 대응하는 다른 키워드를 선별하는 것일 수 있다.In addition, the step (b) may be such that, if a keyword corresponding to the above-described category item is not selected, the context of the policy announcement is re-identified using the entity recognition module and another keyword corresponding to the category item is selected.

또한, 생성된 식별자 중 동일한 식별자에 해당하는 키워드가 복수개가 존재하는 경우, 상기 (c) 단계에서, 하나의 카테고리 항목에 복수개의 키워드를 매칭하는 것일 수 있다.In addition, if there are multiple keywords corresponding to the same identifier among the generated identifiers, in step (c), multiple keywords may be matched to one category item.

또한, 상기 (c)단계는, 상기 정책 데이터 자연어 처리 학습 모델이 상기 식별자를 기준으로 기 설정된 알고리즘에 따라 각각의 카테고리 항목과 그에 대응하는 키워드를 매칭하고, 매칭된 키워드를 해당 카테고리 항목에 기입하여 상기 분류 결과표를 생성하고, 적어도 둘 이상의 카테고리 항목을 결합한 하나의 항목에 대한 키워드를 매칭하고, 매칭된 키워드를 해당 항목에 기입하여 상기 분류 결과표를 생성하는 것일 수 있다.In addition, the step (c) may be such that the policy data natural language processing learning model matches each category item with its corresponding keyword according to a preset algorithm based on the identifier, enters the matched keyword into the corresponding category item, and generates the classification result table, and matches keywords for one item combining at least two category items, and enters the matched keyword into the corresponding item, thereby generating the classification result table.

본 발명의 일 실시 예에 따르는, 자연어 처리된 정책 데이터를 기초로 개인에게 정책을 추천하기 위한 태깅 과정 수행 장치는, 자연어 처리된 정책 데이터를 기초로 개인에게 정책을 추천하기 위한 태깅 과정 수행 방법을 수행하는 프로그램이 저장된 메모리; 및 상기 프로그램을 실행하는 프로세서를 포함하며, 상기 프로세서는, 기 설정된 정책 데이터 자연어 처리 학습 모델을 통해 정책 공고문에서 추출된 키워드를 입력 받고, 개체명 인식 모듈을 이용하여 추출된 키워드 중에서 기 설정된 카테고리 항목에 대응하는 키워드를 가중치를 고려하여 선별하고, 상기 선별된 키워드에 대응되는 카테고리 항목을 나타내는 식별자를 생성하고, 상기 식별자를 기준으로 상기 선별된 키워드를 해당 키워드에 대응되는 카테고리 항목에 기입하여 분류 결과표를 생성하고, 상기 추출된 키워드 중에서 정책 추천 대상이 되는 개인의 거주지, 종사업종, 소득, 연령, 성별 및 자녀 수 중 적어도 하나 이상과 관련된 키워드에 상기 식별자를 생성하는 것을 수행하도록 구성되는 것일 수 있다.According to one embodiment of the present invention, a tagging process performing device for recommending a policy to an individual based on natural language processed policy data includes: a memory storing a program for performing a tagging process performing a method for recommending a policy to an individual based on natural language processed policy data; and a processor executing the program, wherein the processor may be configured to perform the following: receiving keywords extracted from a policy announcement through a preset policy data natural language processing learning model, selecting keywords corresponding to preset category items from among the extracted keywords by considering weights using a named entity recognition module, generating identifiers indicating category items corresponding to the selected keywords, entering the selected keywords into category items corresponding to the keywords based on the identifiers to generate a classification result table, and generating the identifiers for keywords related to at least one of a residence, a type of business, an income, an age, a gender, and the number of children of an individual who is a target of policy recommendation from among the extracted keywords.

본 발명에 따르면, 자연어 처리가 완료된 방대한 양의 정책 데이터를 복수의 카테고리 별로 분류함으로써, 데이터 표준화를 제공할 수 있다. According to the present invention, data standardization can be provided by classifying a large amount of policy data that has undergone natural language processing into a plurality of categories.

또한, 본 발명에 따르면, 정책 데이터를 복수의 카테고리 항목에 대응하는 가중치에 기초하여 키워드로 선별하며, 정책 데이터에 카테고리 항목을 나타내는 식별자를 생성하는 태깅 과정을 통해, 추출한 정책 데이터를 정책 추천 및 통계 산출 등의 산업분야에 활용하기에 적합한 형태로 가공할 수 있다.In addition, according to the present invention, the policy data is selected as keywords based on weights corresponding to multiple category items, and through a tagging process that generates an identifier representing the category items in the policy data, the extracted policy data can be processed into a form suitable for use in industrial fields such as policy recommendation and statistical production.

나아가, 본 발명에 따르면, 정책 대상자, 즉 사용자가 개인인 경우, 태깅 과정에서 정책 공고문에 포함된 정책 데이터를 개인을 대상으로 하는 항목에 가중치를 두어 개인 대상 정책 데이터를 위주로 분류할 수 있다.Furthermore, according to the present invention, when the policy target, i.e. the user, is an individual, the policy data included in the policy announcement can be classified with a focus on individual-targeting policy data by giving weight to items targeting individuals during the tagging process.

도1은 본 발명의 일 실시 예에 따르는, 자연어 처리된 정책 데이터를 기초로 개인에게 정책을 추천하기 위한 태깅 과정을 자동화하는 시스템에 대한 구조도 이다.
도2는 본 발명의 일 실시 예에 따르는, 서버의 내부구성을 나타내는 블록도 이다.
도3a는 본 발명의 일 실시 예에 따르는, 정책 데이터 중 참조칼럼에 대한 예시도 이다.
도3b는 본 발명의 일 실시 예에 따르는, 정책 데이터 중 대상칼럼에 대한 예시도 이다.
도4는 본 발명의 일 실시 예에 따르는, 카테고리 항목에 대한 예시도 이다.
도5는 본 발명의 일 실시 예에 따르는, 식별자가 생성된 정책 공고문에 대한 예시도 이다.
도6은 본 발명의 일 실시 예에 따르는, 키워드가 기입된 카테고리 항목의 예시도 이다.
도7은 본 발명의 일 실시 예에 따르는, 유저데이터를 입력받는 입력UI에 대한 예시도 이다.
도8은 본 발명의 일 실시 예에 따르는, 사용자 단말 상에 표시되는 추천후보군에 대한 예시도 이다.
도9a는 본 발명의 일 실시 예에 따르는, 정책 데이터를 대상으로 자연어 처리를 수행하는 방법의 수행 순서도 이다.
도9b는 본 발명의 일 실시 예에 따르는, 자연어 처리된 정책 데이터를 기초로 개인 또는 기업에게 정책을 추천하기 위한 태깅 과정 수행 방법의 수행 순서도 이다.
도9c는 본 발명의 일 실시 예에 따르는, 태깅이 완료된 정책 데이터를 이용하여 개인 또는 기업에게 정책을 추천하는 방법의 수행 순서도 이다.FIG. 1 is a structural diagram of a system that automates a tagging process for recommending policies to individuals based on natural language processed policy data according to one embodiment of the present invention.
Figure 2 is a block diagram showing the internal configuration of a server according to one embodiment of the present invention.
FIG. 3a is an example diagram of a reference column among policy data according to one embodiment of the present invention.
Figure 3b is an example diagram of a target column among policy data according to one embodiment of the present invention.
FIG. 4 is an example diagram of category items according to one embodiment of the present invention.
FIG. 5 is an example diagram of a policy announcement in which an identifier is generated according to one embodiment of the present invention.
FIG. 6 is an example diagram of a category item with keywords written therein according to one embodiment of the present invention.
FIG. 7 is an example diagram of an input UI for receiving user data according to one embodiment of the present invention.
FIG. 8 is an example diagram of a group of recommended candidates displayed on a user terminal according to one embodiment of the present invention.
FIG. 9a is a flowchart illustrating a method for performing natural language processing on policy data according to an embodiment of the present invention.
FIG. 9b is a flowchart illustrating a method for performing a tagging process for recommending a policy to an individual or a company based on natural language processed policy data according to an embodiment of the present invention.
FIG. 9c is a flowchart illustrating a method for recommending policies to individuals or companies using tagged policy data according to an embodiment of the present invention.

아래에서는 첨부한 도면을 참조하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 본 발명의 실시예를 상세히 설명한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.Hereinafter, with reference to the attached drawings, embodiments of the present invention will be described in detail so that those skilled in the art can easily practice the present invention. However, the present invention may be implemented in various different forms and is not limited to the embodiments described herein. In addition, in order to clearly describe the present invention in the drawings, parts that are not related to the description are omitted, and similar parts are assigned similar drawing reference numerals throughout the specification.

명세서 전체에서, 어떤 부분이 다른 부분과 "연결"되어 있다고 할 때, 이는 "직접적으로 연결"되어 있는 경우뿐 아니라, 그 중간에 다른 소자를 사이에 두고 "전기적으로 연결"되어 있는 경우도 포함한다. 또한 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미한다.Throughout the specification, when a part is said to be "connected" to another part, this includes not only the case where it is "directly connected" but also the case where it is "electrically connected" with another element in between. Also, when a part is said to "include" a component, this does not mean that it excludes other components, but rather that it may include other components, unless otherwise stated.

본 명세서에 있어서 '부(部)'란, 하드웨어에 의해 실현되는 유닛(unit), 소프트웨어에 의해 실현되는 유닛, 양방을 이용하여 실현되는 유닛을 포함한다. 또한, 1 개의 유닛이 2 개 이상의 하드웨어를 이용하여 실현되어도 되고, 2 개 이상의 유닛이 1 개의 하드웨어에 의해 실현되어도 된다. 한편, '~부'는 소프트웨어 또는 하드웨어에 한정되는 의미는 아니며, '~부'는 어드레싱 할 수 있는 저장 매체에 있도록 구성될 수도 있고 하나 또는 그 이상의 프로세서들을 재생시키도록 구성될 수도 있다. 따라서, 일 예로서 '~부'는 소프트웨어 구성요소들, 객체지향 소프트웨어 구성요소들, 클래스 구성요소들 및 태스크 구성요소들과 같은 구성요소들과, 프로세스들, 함수들, 속성들, 프로시저들, 서브루틴들, 프로그램 코드의 세그먼트들, 드라이버들, 펌웨어, 마이크로코드, 회로, 데이터, 데이터베이스, 데이터 구조들, 테이블들, 어레이들 및 변수들을 포함한다. 구성요소들과 '~부'들 안에서 제공되는 기능은 더 작은 수의 구성요소들 및 '~부'들로 결합되거나 추가적인 구성요소들과 '~부'들로 더 분리될 수 있다. 뿐만 아니라, 구성요소들 및 '~부'들은 디바이스 또는 보안 멀티미디어카드 내의 하나 또는 그 이상의 CPU들을 재생시키도록 구현될 수도 있다.In this specification, the term 'unit' includes a unit realized by hardware, a unit realized by software, and a unit realized using both. In addition, one unit may be realized by using two or more pieces of hardware, and two or more units may be realized by one piece of hardware. Meanwhile, the '~ unit' is not limited to software or hardware, and the '~ unit' may be configured to be in an addressable storage medium and may be configured to reproduce one or more processors. Accordingly, as an example, the '~ unit' includes components such as software components, object-oriented software components, class components, and task components, and processes, functions, properties, procedures, subroutines, segments of program code, drivers, firmware, microcode, circuits, data, databases, data structures, tables, arrays, and variables. The functionality provided within the components and '~sub-components' may be combined into a smaller number of components and '~sub-components' or further separated into additional components and '~sub-components'. In addition, the components and '~sub-components' may be implemented to regenerate one or more CPUs within the device or the secure multimedia card.

이하에서 언급되는 "단말"은 네트워크를 통해 서버나 타 단말에 접속할 수 있는 컴퓨터나 휴대용 단말기로 구현될 수 있다. 여기서, 컴퓨터는 예를 들어, 웹 브라우저(WEB Browser)가 탑재된 노트북, 데스크톱(desktop), 랩톱(laptop), VR HMD(예를 들어, HTC VIVE, Oculus Rift, GearVR, DayDream, PSVR 등)등을 포함할 수 있다. 여기서, VR HMD 는 PC용 (예를 들어, HTC VIVE, Oculus Rift, FOVE, Deepon 등)과 모바일용(예를 들어, GearVR, DayDream, 폭풍마경, 구글 카드보드 등) 그리고 콘솔용(PSVR)과 독립적으로 구현되는 Stand Alone 모델(예를 들어, Deepon, PICO 등) 등을 모두 포함한다. 휴대용 단말기는 예를 들어, 휴대성과 이동성이 보장되는 무선 통신 장치로서, 스마트폰(smart phone), 태블릿 PC, 웨어러블 디바이스뿐만 아니라, 블루투스(BLE, Bluetooth Low Energy), NFC, RFID, 초음파(Ultrasonic), 적외선, 와이파이(WiFi), 라이파이(LiFi) 등의 통신 모듈을 탑재한 각종 디바이스를 포함할 수 있다. 또한, "네트워크"는 단말들 및 서버들과 같은 각각의 노드 상호 간에 정보 교환이 가능한 연결 구조를 의미하는 것으로, 근거리 통신망(LAN: Local Area Network), 광역 통신망(WAN: Wide Area Network), 인터넷 (WWW: World Wide Web), 유무선 데이터 통신망, 전화망, 유무선 텔레비전 통신망 등을 포함한다. 무선 데이터 통신망의 일례에는 3G, 4G, 5G, 3GPP(3rd Generation Partnership Project), LTE(Long Term Evolution), WIMAX(World Interoperability for Microwave Access), 와이파이(Wi-Fi), 블루투스 통신, 적외선 통신, 초음파 통신, 가시광 통신(VLC: Visible Light Communication), 라이파이(LiFi) 등이 포함되나 이에 한정되지는 않는다.The "terminal" mentioned below can be implemented as a computer or portable terminal that can access a server or other terminal via a network. Here, the computer can include, for example, a notebook, desktop, laptop, VR HMD (e.g., HTC VIVE, Oculus Rift, GearVR, DayDream, PSVR, etc.) equipped with a WEB Browser. Here, the VR HMD includes all of PC-use (e.g., HTC VIVE, Oculus Rift, FOVE, Deepon, etc.), mobile-use (e.g., GearVR, DayDream, Storm Magic, Google Cardboard, etc.), and console-use (PSVR) and Stand Alone models (e.g., Deepon, PICO, etc.) that are implemented independently. Portable terminals are wireless communication devices that ensure portability and mobility, for example, and may include not only smart phones, tablet PCs, and wearable devices, but also various devices equipped with communication modules such as Bluetooth (BLE, Bluetooth Low Energy), NFC, RFID, ultrasonic, infrared, WiFi, and LiFi. In addition, "network" refers to a connection structure that enables information exchange between each node, such as terminals and servers, and includes a local area network (LAN), a wide area network (WAN), the Internet (WWW: World Wide Web), wired and wireless data communication networks, telephone networks, and wired and wireless television communication networks. Examples of wireless data communication networks include, but are not limited to, 3G, 4G, 5G, 3GPP (3rd Generation Partnership Project), LTE (Long Term Evolution), WIMAX (World Interoperability for Microwave Access), Wi-Fi, Bluetooth, infrared, ultrasonic, visible light communication (VLC), and LiFi.

본 발명은 정책 데이터를 대상으로 자연어 처리를 수행하고, 자연어 처리가 완료된 데이터를 기초로 개인 또는 기업에게 정책을 추천하기 위한 태깅을 수행한 후, 태깅이 완료된 정책 데이터를 이용하여 개인 또는 기업에게 정책을 추천하는 방법 및 그 장치에 관한 것으로서, 정책 공고문으로부터 소정의 정보를 수집하고, 수집한 정보를 기 설정된 알고리즘에 따라 분석 및 분류함으로써, 사용자에게 적용될 수 있거나 사용자가 해당하는 특정 정책에 대한 접근성을 높이기 위한 기술이다.The present invention relates to a method and a device for performing natural language processing on policy data, performing tagging for recommending policies to individuals or businesses based on the data for which natural language processing has been completed, and then recommending policies to individuals or businesses using the tagged policy data. The method and device are a technology for collecting predetermined information from a policy announcement and analyzing and classifying the collected information according to a preset algorithm, thereby increasing accessibility to specific policies applicable to or applicable to users.

이하에서, 도 1 내지 도 9c를 참조하여, 본 발명의 일 실시예에 따르는 정책 데이터 자연어 처리 방법, 자연어 처리된 정책 데이터의 태깅 및 분류 방법과 분류된 정책 데이터를 기반으로 개인 또는 기업에게 정책을 추천하는 방법 및 그 장치에 대하여 차례대로 설명하도록 한다.Hereinafter, with reference to FIGS. 1 to 9c, a policy data natural language processing method according to one embodiment of the present invention, a tagging and classification method for natural language processed policy data, and a method and device for recommending a policy to an individual or a company based on the classified policy data will be sequentially described.

도1을 참조하면, 본 발명의 일 실시예에 따르는 시스템은, 서버(100), 정책기관 서버(200) 및 사용자 단말(300)로 구성될 수 있다.Referring to Figure 1, a system according to one embodiment of the present invention may be composed of a server (100), a policy agency server (200), and a user terminal (300).

도2를 참조하면, 본 발명의 일 실시예에 따르는 서버(100)는 정책 데이터를 대상으로 자연어 처리를 수행하는 방법, 자연어 처리가 완료된 데이터를 기초로 개인 또는 기업에게 정책을 추천하기 위한 태깅 과정 수행 방법 및 태깅이 완료된 정책 데이터를 이용하여 개인 또는 기업에게 정책을 추천하는 방법 중 적어도 하나 이상의 방법을 수행하는 프로그램(또는 애플리케이션)이 저장된 메모리와 위 프로그램을 실행하는 프로세서를 포함하는 장치일 수 있다. 여기서 프로세서는 메모리에 저장된 프로그램의 실행에 따라 다양한 기능을 수행할 수 있다.Referring to FIG. 2, a server (100) according to one embodiment of the present invention may be a device including a memory storing a program (or application) that performs at least one of a method of performing natural language processing on policy data, a method of performing a tagging process for recommending a policy to an individual or a business based on data on which natural language processing has been completed, and a processor executing the above program. Here, the processor may perform various functions according to the execution of the program stored in the memory.

다음으로, 정책기관 서버(200)는, 정부기관, 공공기관 및 민간 단체 등 정책을 시행하는 곳에서 운용되며, 현재 시행 중인 복수의 정책 및 정책 공고문에 대한 정보를 저장하고 있는 장치일 수 있다. 정책기관 서버(200)는 서버(100)와 통신망을 통하여 유선 또는 무선으로 연결될 수 있다.Next, the policy agency server (200) may be a device that is operated in a place that implements policies, such as a government agency, public agency, or private organization, and stores information on multiple policies and policy announcements currently being implemented. The policy agency server (200) may be connected to the server (100) via a wired or wireless communication network.

본 발명의 일 실시예에 따르는 사용자 단말(300)은 서버(100)와 유선 또는 무선으로 연결되어 통신할 수 있는 것으로서, 스마트폰, 태블릿PC, PDA 및 데스크 탑 등의 형태로 구현될 수 있다. A user terminal (300) according to one embodiment of the present invention is capable of communicating with a server (100) by wire or wireless connection, and can be implemented in the form of a smart phone, tablet PC, PDA, and desktop.

먼저, 이하에서 본 발명의 일 실시예에 따르는 정책 데이터를 대상으로 자연어 처리를 수행하는 방법의 수행과정에 대하여 설명하도록 한다.First, the execution process of a method for performing natural language processing on policy data according to one embodiment of the present invention will be described below.

서버(100)는 정책기관 서버(200)에 접속하여 정책기관 서버(200)로부터 발행되는 정책 공고문을 수신하거나, 정책 기관 서버(200)가 운영하는 웹사이트 상에서 크롤링을 수행하여 정책 공고문을 수집할 수 있다. The server (100) can access the policy agency server (200) to receive a policy notice issued from the policy agency server (200), or collect the policy notice by crawling the website operated by the policy agency server (200).

서버(100)는 수집한 정책 공고문에 대하여 형태소 분석기를 이용하여 복수의 명사들을 추출한다.The server (100) extracts multiple nouns from the collected policy announcement using a morphological analyzer.

도3a을 참조하면, 본 발명의 일 실시예에 따르는 서버(100)는 정책 공고문의 내용을 나타내는 텍스트를 참조 컬럼과 대상 컬럼으로 나누어 구분할 수 있다.Referring to FIG. 3a, a server (100) according to one embodiment of the present invention can divide text representing the content of a policy announcement into a reference column and a target column.

참조 컬럼은, 정책 공고문에 포함된 내용 중 정책명(서비스 명) 및 정책 목적(서비스 목적)을 포함하는 것일 수 있다.The reference column may include the policy name (service name) and policy purpose (service purpose) from among the contents included in the policy announcement.

또한, 도3b를 참조하면, 대상 컬럼은, 정책 공고문에 포함된 내용 중 지원대상, 지원내용 및 지원비용(예를 들어, 시설비, 운영비 및 인건비 등)을 포함하는 것일 수 있다.In addition, referring to Fig. 3b, the target column may include the support target, support content, and support cost (e.g., facility cost, operating cost, and personnel cost) among the contents included in the policy announcement.

서버(100)는 정책 공고문의 내용을 참조 컬럼 또는 대상 컬럼을 식별하고, 각각의 컬럼 내에서 명사들을 추출하며 명사들 각각이 등장하는 횟수를 산출할 수 있다.The server (100) can identify reference columns or target columns in the content of a policy announcement, extract nouns within each column, and calculate the number of times each noun appears.

이때, 본 발명의 일 실시예에서 명사를 추출하는데 활용될 수 있는 형태소 분석기는 KOMORAN, KoNLPy 및 Khaiii 형태소 분석기를 포함하며, 정책 공고문에 포함된 전체 텍스트를 인식하여 복수의 문장 성분을 추출하는 것일 수 있다.At this time, the morphological analyzer that can be utilized to extract nouns in one embodiment of the present invention includes KOMORAN, KoNLPy and Khaiii morphological analyzers, and may recognize the entire text included in the policy announcement and extract multiple sentence components.

예를 들어, 농번기 아이돌봄방 운영지원 사업에 관한 정책 공고문에 대하여 KOMORAN 형태소 분석기를 명사(문장성분)을 추출하는 경우, 해당 정책 공고문의 전체 텍스트 중, 참조 컬럼에 해당하는 서비스명에서 농번기, 아이, 돌봄, 운영 및 지원 등의 명사를 추출하고, 마찬가지로 대상 컬럼에서 서비스 목적, 지원내용 및 지원대상 등의 명사를 추출할 수 있다.For example, when extracting nouns (sentence elements) from a policy notice regarding the support project for child care centers during the busy farming season using the KOMORAN morphological analyzer, nouns such as busy farming season, child, care, operation, and support can be extracted from the service name corresponding to the reference column among the entire text of the policy notice, and nouns such as service purpose, support content, and support target can be extracted from the target column.

이후, 서버(100)는 임베딩(embedding)기법을 이용하여 상술한 바와 같은 과정에 따라 추출된 복수의 명사들 각각에 대한 수치화된 벡터(vector)를 생성할 수 있다.Thereafter, the server (100) can generate a numerical vector for each of the multiple nouns extracted according to the process described above using an embedding technique.

이때, 기 설정된 알고리즘에 따라, 추출된 복수의 명사들 중에서, 동일한 의미를 나타내는 것으로 판단되는 복수의 명사들에 대하여는 동일한 벡터 값이 설정되고, 이와 유사한 의미를 나타내는 것으로 판단되는 다른 명사들에 대하여 해당 명사들에 설정된 벡터 값과 기 설정된 차이 이내로 인접한 벡터 값이 설정될 수 있다.At this time, according to a preset algorithm, among the extracted multiple nouns, the same vector value may be set for multiple nouns judged to have the same meaning, and for other nouns judged to have similar meanings, an adjacent vector value may be set within a preset difference from the vector value set for the corresponding nouns.

즉, 유사 의미를 지니는 것으로 판단되는 복수의 명사들 간에는 벡터 값의 차이가 작게 설정되고, 상이한 의미의 복수의 명사들 간에는 벡터 값의 크기가 크게 설정되어 각 명사들 간의 활용과 구분이 용이하도록 설정될 수 있다.That is, the difference in vector values between multiple nouns judged to have similar meanings is set to be small, and the size of the vector values between multiple nouns with different meanings is set to be large, so that the usage and distinction between each noun can be set to be easy.

본 발명에서 활용되는 임베딩 기법 및 상술한 기 설정된 알고리즘은 사람이 쓰는 자연어를 기계가 이해할 수 있는 숫자형태인 벡터로 바꾼 결과 혹은 그 일련의 과정 전체를 의미하는 것으로서, 종래 기술에 해당하기 때문에 본 명세서에서는 자세히 설명하지 않는다.The embedding technique utilized in the present invention and the above-described preset algorithm mean the result of converting a natural language written by a person into a vector, which is a numerical form understandable by a machine, or the entire series of processes thereof, and therefore are not described in detail in this specification because they correspond to prior art.

상술한 바와 같이 설정된 벡터 값을 기반으로, 서버(100)는 설정된 벡터 값과 추출한 명사를 매칭하여 복수의 명사-벡터 쌍들을 생성할 수 있다.Based on the vector values set as described above, the server (100) can generate multiple noun-vector pairs by matching the set vector values with the extracted nouns.

이때, 정책 공고문 중 대상 컬럼에서 추출한 명사에 대하여 각 명사의 바로 앞에 위치하는 문장 성분과 각 명사의 바로 뒤에 위치하는 문장 성분을 포함하여 명사-벡터 쌍을 생성함으로써, 단순히 하나의 문장 성분과 벡터 쌍을 생성하는 것보다 문장의 문맥을 파악하기에 용이하도록 생성될 수 있다.At this time, by generating a noun-vector pair including the sentence component located immediately before each noun and the sentence component located immediately after each noun for the nouns extracted from the target column in the policy announcement, it can be generated in a way that makes it easier to understand the context of the sentence than simply generating a pair of a single sentence component and a vector.

다음으로, 서버(100)는 명사-벡터 쌍들을 기초로 명사-벡터 사전을 생성한다. 본 발명의 일 실시예에 따르는 명사-벡터 사전은 복수의 정책 공고문에 대하여 정책 공고문의 내용에서 추출된 명사에 벡터 값을 나타내는 식별자를 생성하고 저장한 것으로, 명사-벡터 사전에 포함된 명사-벡터 쌍을 키워드로 정의한다.Next, the server (100) generates a noun-vector dictionary based on the noun-vector pairs. The noun-vector dictionary according to one embodiment of the present invention generates and stores identifiers representing vector values for nouns extracted from the contents of a plurality of policy announcements, and defines noun-vector pairs included in the noun-vector dictionary as keywords.

키워드는 후술할 자연어 처리가 완료된 데이터를 기초로 복수의 카테고리 항목 별로 기 설정된 알고리즘에 따라 분류될 수 있다.Keywords can be classified into multiple category items according to a preset algorithm based on data that has undergone natural language processing, which will be described later.

다음으로, 서버(100)는 명사-벡터 사전에 포함된 각 명사- 벡터 쌍들 간의 거리 값과 정책 공고문 내에서 어느 하나의 명사가 포함된 횟수를 기초로, 각 명사-벡터 쌍마다 가중치를 설정할 수 있다.Next, the server (100) can set a weight for each noun-vector pair based on the distance value between each noun-vector pair included in the noun-vector dictionary and the number of times a noun is included in the policy announcement.

가중치는 추출된 명사들의 빈도를 기초로 추출 빈도가 높은 명사의 가중치가 추출 빈도가 낮은 명사의 가중치보다 크도록 설정되는 것으로, 정책 공고문 내에 해당 명사가 포함된 횟수가 많을수록 최대 가중치가 부여되어 순차적으로 횟수가 적은 명사까지 기 설정된 가중치 간격에 따라 설정될 수 있다.The weights are set based on the frequency of the extracted nouns so that the weights of nouns with a high extraction frequency are greater than the weights of nouns with a low extraction frequency. The greater the number of times a noun is included in a policy announcement, the greater the maximum weight is assigned, and the weights can be sequentially set according to preset weight intervals down to nouns with a low extraction frequency.

즉, 가중치가 높게 부여된 명사는 정책 공고문에서 많이 언급된 명사로, 정책 공고문의 내용 중 핵심 내용에 해당하는 것으로 판단될 수 있으며, 후술할 정책 공고문 중 키워드를 추출하는 자연어 처리 학습 모델을 생성하는 데에 사용될 수 있다.That is, nouns with high weights are nouns that are frequently mentioned in policy announcements, and can be judged to correspond to the core content of the policy announcements, and can be used to create a natural language processing learning model that extracts keywords from policy announcements, which will be described later.

또한, 가중치는 패널티 가중치를 포함하여, 패널티 가중치를 기초로, 자연어 처리가 완료된 데이터를 기초로 개인에게 정책을 추천하기 위한 태깅을 수행하는 과정에서 정책 공고문으로부터 추출된 키워드와 해당 키워드에 대응되는 카테고리 항목 간의 관련성을 평가하는 데 활용될 수 있다.In addition, the weights, including penalty weights, can be utilized to evaluate the relevance between keywords extracted from a policy announcement and category items corresponding to the keywords in the process of performing tagging to recommend policies to individuals based on data that has undergone natural language processing, based on the penalty weights.

다음으로, 서버(100)는 가중치 및 명사-벡터 사전을 기반으로 기계학습을 수행하여 정책 데이터 자연어 처리 학습 모델을 생성할 수 있다.Next, the server (100) can perform machine learning based on the weights and noun-vector dictionary to generate a policy data natural language processing learning model.

본 발명의 또 다른 실시예에 따르는 정책 데이터 자연어 처리 학습 모델은 KoBERT classifier, LSTM classifier 및 SVC 방식의 모델 중 어느 하나를 포함하는 것일 수 있으며, 적어도 하나 이상의 모델이 혼합된 것일 수 있다.A policy data natural language processing learning model according to another embodiment of the present invention may include any one of a KoBERT classifier, an LSTM classifier, and an SVC method model, and may be a mixture of at least one model.

이 실시예에서, 정책 데이터 자연어 처리 학습 모델에 포함된 각각의 분류기(Classifier)는 복수 개의 옵션 값들 중에 어느 하나의 값을 추출해내는 것으로, 이진분류기(Binary Classifier)와 다중분류기(Muti Classifier)를 포함할 수 있다. 따라서, 정책 데이터 자연어 처리 학습 모델은 추출하고자 하는 데이터에 따라 이진분류기와 다중분류기를 취사 선택하여 사용하는 것일 수 있다.In this embodiment, each classifier included in the policy data natural language processing learning model extracts one value among multiple option values, and may include a binary classifier and a multi-classifier. Accordingly, the policy data natural language processing learning model may selectively use a binary classifier and a multi-classifier depending on the data to be extracted.

또한, 정책 데이터 자연어 처리 학습 모델은 기 설정된 알고리즘에 따라 추출된 복수의 명사들 중 벡터 값이 일치하거나 기 설정된 차이 이내인 명사, 즉, 유사한 의미인 것으로 판단되는 명사들을 하나의 키워드로 분류하고, 명사-벡터 쌍들 및 분류된 키워드를 정책 데이터 자연어 처리 학습 모델의 학습 값 중 입력 값으로 설정하고, 키워드를 출력 값으로 설정하여 학습을 수행하여 생성될 수 있다.In addition, the policy data natural language processing learning model can be created by classifying nouns among a plurality of nouns extracted according to a preset algorithm, which have vector values that match or are within a preset difference, that is, nouns judged to have similar meanings, into a single keyword, and setting the noun-vector pairs and the classified keyword as input values among the learning values of the policy data natural language processing learning model, and performing learning by setting the keyword as an output value.

상술한 바와 같이 생성된 정책 데이터 자연어 처리 학습 모델은, 학습 결과를 기초로 정책 공고문으로부터 새로운 명사가 입력되었을 때, 자동으로 키워드를 분류하여 출력할 수 있도록 구성될 수 있다.As described above, the policy data natural language processing learning model can be configured to automatically classify and output keywords when a new noun is input from a policy announcement based on the learning results.

다시 말해, 정책 데이터 자연어 처리 학습 모델이 생성된 이후, 서버(100)에 새로운 정책 공고문이 입력되는 경우, 정책 데이터 자연어 처리 학습 모델을 이용하여 새로운 정책 공고문으로부터 명사-벡터 쌍을 추출하고, 명사-벡터 사전(키워드)을 자동으로 갱신하고, 카테고리 항목과 태깅한 결과(분류 결과)를 출력할 수 있다.In other words, after the policy data natural language processing learning model is created, when a new policy announcement is input to the server (100), the policy data natural language processing learning model can be used to extract noun-vector pairs from the new policy announcement, automatically update the noun-vector dictionary (keyword), and output the category items and the tagging results (classification results).

도9a를 참조하면, 본 발명의 일 실시예에 따르는 정책 데이터를 대상으로 자연어 처리를 수행하는 방법은 다음과 같은 순서에 따라 수행될 수 있다.Referring to FIG. 9a, a method for performing natural language processing on policy data according to one embodiment of the present invention can be performed in the following order.

먼저, 서버(100)가 정책기관 서버(200)로부터 정책 공고문을 수신하고, 형태소 분석기를 이용하여 정책 공고문으로부터 복수의 명사를 추출한다(S101).First, the server (100) receives a policy announcement from a policy agency server (200) and extracts multiple nouns from the policy announcement using a morphological analyzer (S101).

추출한 명사에 대하여 임베딩기법을 통해 각 명사에 대한 수치화된 벡터를 생성하고, 복수의 명사-벡터 쌍을 생성한다(S102).For the extracted nouns, a numerical vector is generated for each noun using the embedding technique, and multiple noun-vector pairs are generated (S102).

다음으로, 서버(100)는 복수의 명사-벡터 쌍을 포함하는 명사-벡터 사전을 생성한다(S103).Next, the server (100) generates a noun-vector dictionary including multiple noun-vector pairs (S103).

이후, 명사-벡터 사전에 포함된 각 명사- 벡터 쌍들 간의 거리값, 정책공고문 내에서 노출된 횟수를 기초로, 각 명사-벡터 쌍마다 가중치를 설정한다(S104).Thereafter, based on the distance value between each noun-vector pair included in the noun-vector dictionary and the number of times it is exposed in the policy announcement, a weight is set for each noun-vector pair (S104).

상술한 과정을 통해 자연어 처리가 완료된 데이터를 기초로 개인에게 정책을 추천하기 위한 태깅 과정과, 태깅 과정을 자동화하는 방법은 다음과 같다.The tagging process for recommending policies to individuals based on data that has undergone natural language processing through the above-described process and the method for automating the tagging process are as follows.

먼저, 기 설정된 정책 데이터 자연어 처리 학습 모델을 통해 정책 공고문에서 추출된 키워드를 개체명 인식 모듈을 통하여, 기 설정된 카테고리 항목에 대응하는 키워드를 가중치를 고려하여 선별하고, 선별된 키워드에 대응되는 카테고리 항목을 나타내는 식별자를 생성한다.First, keywords extracted from the policy announcement are selected by considering weights through a named entity recognition module using a pre-established policy data natural language processing learning model, and keywords corresponding to pre-established category items are selected, and an identifier representing the category item corresponding to the selected keyword is generated.

이때, 정책추천의 대상이 개인인 경우, 추출된 키워드 중, 개인의 거주지, 종사업종, 소득, 연령, 성별 및 자녀 수 중 적어도 하나 이상과 관련된 키워드에 식별자가 생성된다.At this time, if the target of the policy recommendation is an individual, an identifier is created for a keyword related to at least one of the individual's residence, type of business, income, age, gender, and number of children among the extracted keywords.

반면, 정책추천의 대상이 기업인 경우에는, 기업의 매출액, 소재지, 종사업종, 종업원수, 설립년도 및 업태 중 적어도 하나 이상과 관련된 키워드에 식별자가 생성된다.On the other hand, if the target of the policy recommendation is a company, an identifier is created for a keyword related to at least one of the company's sales, location, type of business, number of employees, year of establishment, and business type.

이를 통해, 본 발명은 정책 공고문의 문장 성분을 추출하는 과정에서는 해당 정책의 대상이 개인이건 기업이건 상관하지 않고 추출하나, 추출한 정책 데이터를 각각의 카테고리 항목 별로 태깅하는 과정에서는 개인과 기업을 대상으로 하는 정책을 구분하여 태깅 과정을 수행할 수 있다.Through this, the present invention extracts sentence components of a policy announcement without regard to whether the target of the policy is an individual or a company, but in the process of tagging the extracted policy data by each category item, the tagging process can be performed by distinguishing between policies targeting individuals and companies.

본 발명에서 의미하는 태깅은, 추출된 키워드 별로 식별자가 생성되어, 각각의 식별자에 따라 카테고리 항목 별로 분류되는 것을 의미한다. Tagging in the present invention means that an identifier is generated for each extracted keyword and each category item is classified according to each identifier.

만일, 기 설정된 카테고리 항목에 대응하는 키워드가 선별되지 않는 경우, 상기 개체명 인식 모듈을 이용하여 상기 정책 공고문의 문맥을 다시 파악하고, 상기 카테고리 항목에 대응하는 다른 키워드를 선별하는 과정이 수행될 수 있다.If a keyword corresponding to a preset category item is not selected, the context of the policy announcement can be re-identified using the entity name recognition module, and a process of selecting another keyword corresponding to the category item can be performed.

예를 들어, 서비스명[생활발명 발굴지원], 서비스목적[고학력 경력 단절 여성의 아이디어 창출과 경제활동 참여를 높여 우리 경제의 새로운 혁신과 재도약의 기반 마련]이라는 정책이 있다고 가정하면, 이 정책은 서비스명만으로는 여성 대상자만을 위한 정책인지 여부를 알 수 없지만, 서비스 목적에 설정된 가중치와 정책 데이터 자연어 처리 학습 모델의 문맥파악을 통해, “성별” 카테고리 항목 이외에 새로운 카테고리 항목인 "성별-제외" 카테고리에 해당 키워드가 선별될 수 있다.For example, let's say there is a policy with the service name [Support for Discovering Lifestyle Inventions] and the service purpose [Increasing the idea creation and economic activity participation of highly educated women who have taken a break from their careers to lay the foundation for new innovation and a leap forward in our economy], it is not possible to know whether this policy is only for women just from the service name, but through the contextual understanding of the policy data natural language processing learning model and the weight set for the service purpose, the corresponding keyword can be selected in the new category "Gender-Excluded" in addition to the "Gender" category.

따라서, 본 발명은 하나의 명사에 생성된 식별자를 기반으로 카테고리 항목을 분류하는 것이 아닌, 추출한 명사의 바로 앞과 뒤로 위치한 문장성분을 함께 키워드로 선별함으로써, ‘여성은 제외’라는 문맥의 정책이 있을 때, 이를 읽어내지 못하고 해당 키워드의 카테고리 항목을 ‘여성’으로 분류하게 되는 것을 방지할 수 있다.Therefore, the present invention selects sentence components located immediately before and after an extracted noun as keywords instead of classifying category items based on an identifier generated for a single noun, thereby preventing the category item of the corresponding keyword from being classified as ‘women’ when there is a contextual policy of ‘excluding women’ because it is not read.

정책 공고문의 문맥 파악은, 상술한 바와 같이 해당 키워드에 대응하는 명사-벡터 쌍의 바로 앞에 위치한 문장성분과 바로 뒤에 위치한 문장 성분을 추가하여 카테고리 항목과 태깅하는 과정을 재수행 하는 것일 수 있으며, 기 설정된 카테고리 항목에 대응하는 키워드가 선별될 때까지 반복하여 수행되는 것일 수 있다.The contextual understanding of the policy announcement may be accomplished by re-performing the process of tagging the category items by adding the sentence elements located immediately before and immediately after the noun-vector pair corresponding to the keyword as described above, and repeating the process until the keyword corresponding to the preset category item is selected.

도4에 도시된 바와 같이, 카테고리 항목은, 성별, 학력, 직장, 가구원, 기혼여부, 자녀 수 및 자녀 유무, 소관기관 유형, 지원유형, 신청절차, 수집유형 및 대상특성 등 복수의 항목으로 구성될 수 있으며, 정책 추천의 대상이 개인인 경우, 개인의 거주지, 종사업종, 소득, 연령, 성별 및 자녀 수를 포함하고, 정책 추천의 대상이 기업인 경우에는, 기업의 매출액, 소재지, 종사업종, 종업원수, 설립년도 및 업태를 추가로 포함하여 구성될 수 있다.As illustrated in Figure 4, a category item may be composed of multiple items, such as gender, education level, job, household members, marital status, number and presence of children, type of relevant organization, type of support, application procedure, type of collection, and target characteristics. If the target of the policy recommendation is an individual, the category item may include the individual's residence, type of business, income, age, gender, and number of children. If the target of the policy recommendation is a company, the category item may additionally include the company's sales, location, type of business, number of employees, year of establishment, and business type.

또한, 각각의 카테고리 항목 별로 해당 카테고리 항목으로 기입하여 분류할 식별자가 미리 설정될 수 있다.Additionally, an identifier to be classified can be preset by entering it as a category item for each category item.

예를 들어, 지역을 나타내는 식별자가 LOC, 근로자 수를 나타내는 식별자가 NOH, 기한을 나타내는 식별자가 DUR로 미리 설정되었다고 가정하였을 때, 도5에 도시된 바와 같이, 정책 공고문으로부터 추출한 명사의 뒤에 식별자가 생성되어 병기되는 태깅 과정이 수행될 수 있으며, 이 식별자들은 후술할 분류 결과표 생성에 활용될 수 있다.For example, assuming that an identifier representing a region is preset as LOC, an identifier representing the number of workers is preset as NOH, and an identifier representing a deadline is preset as DUR, a tagging process can be performed in which identifiers are generated and appended to the end of nouns extracted from a policy announcement, as shown in Fig. 5, and these identifiers can be utilized in generating a classification result table to be described later.

다음으로, 서버(100)는 상기 식별자를 기준으로, 선별된 키워드를 해당 키워드에 대응하는 카테고리 항목에 기입하여 분류 결과표를 생성한다.Next, the server (100) creates a classification result table by entering the selected keywords into category items corresponding to the keywords based on the identifier.

이때, 정책 추천 대상이 개인인 경우에는 거주지, 종사업종, 소득, 연령, 성별 및 자녀 수와 관련된 키워드를, 정책 추천 대상이 기업인 경우에는 기업의 매출액, 소재지, 종사업종, 종업원수, 설립년도 및 업태와 관련된 키워드를 기 설정된 알고리즘에 따라 정책 공고문에 대응하는 정책의 소관기관 카테고리, 정책 명칭 카테고리 및 지원대상 카레고리 중 어느 하나의 카테고리 항목으로 분류하여 분류 결과표를 생성한다.At this time, if the policy recommendation target is an individual, keywords related to residence, type of business, income, age, gender, and number of children are classified into one of the categories of the relevant agency category, policy name category, and support target category corresponding to the policy announcement according to a preset algorithm, and a classification result table is generated. If the policy recommendation target is a corporation, keywords related to the corporation's sales, location, type of business, number of employees, year of establishment, and business type are classified into one of the categories of the relevant agency category, policy name category, and support target category corresponding to the policy announcement according to a preset algorithm.

도6을 참조하면, 분류 결과표는 도시된 바와 같이 생성된 식별자 중 동일한 식별자에 해당하는 키워드가 복수개가 존재하는 경우, 하나의 카테고리 항목에 복수개의 키워드를 매칭할 수 있다.Referring to FIG. 6, the classification result table can match multiple keywords to one category item if there are multiple keywords corresponding to the same identifier among the generated identifiers as illustrated.

예를 들어, 카테고리 항목이 소관 기관=경상북도 성주군인 경우, 정책 공고문으로부터 경상북도 또는 성주군과 유사한 것으로 판단되는 복수의 식별자를 포함하는 키워드가 해당 카테고리 항목으로 기입되어 분류될 수 있다.For example, if a category item is a relevant organization = Seongju-gun, Gyeongsangbuk-do, a keyword containing multiple identifiers judged to be similar to Gyeongsangbuk-do or Seongju-gun from the policy announcement can be entered as the category item and classified.

본 발명의 추가 실시예에 따르면, 앞서 설명한 기계학습의 결과물인 정책 데이터 자연어 처리 학습 모델에 의해 식별자를 기준으로 기 설정된 알고리즘에 따라 각각의 카테고리 항목과 그에 대응하는 키워드를 매칭하고, 매칭된 키워드를 해당 카테고리 항목에 기입한 분류 결과표가 생성될 수 있으며, 적어도 둘 이상의 카테고리 항목을 결합한 하나의 항목에 대한 키워드를 매칭하고, 매칭된 키워드를 해당 항목에 병합하여 기입한 분류 결과표가 생성될 수도 있다.According to a further embodiment of the present invention, a classification result table can be generated by matching each category item with its corresponding keyword according to a preset algorithm based on an identifier by a policy data natural language processing learning model, which is a result of the machine learning described above, and entering the matched keyword into the corresponding category item, or a classification result table can be generated by matching keywords for one item combining at least two or more category items and merging and entering the matched keyword into the corresponding item.

도9b를 참조하면, 본 발명의 일 실시예에 따르는 자연어 처리가 완료된 데이터를 기초로, 개인 또는 기업에게 정책을 추천하기 위한 태깅 과정 수행 방법의 수행 순서는 다음과 같다.Referring to FIG. 9b, the execution sequence of a method for performing a tagging process for recommending a policy to an individual or a company based on data on which natural language processing has been completed according to one embodiment of the present invention is as follows.

먼저, 서버(100)는 기 설정된 자연어 처리 기계학습모델을 통해 정책공고문에서 추출된 키워드 입력을 수신한다(S201).First, the server (100) receives keyword input extracted from a policy announcement through a preset natural language processing machine learning model (S201).

다음으로, 개체명 인식 모듈을 이용하여, 복수의 카테고리 항목에 대응하는 각각의 키워드를 가중치를 기초로 선별하고, 선별된 키워드에 카테고리 항목을 나타내는 식별자를 생성한다(S202).Next, using the entity name recognition module, each keyword corresponding to multiple category items is selected based on weights, and an identifier representing the category item is generated for the selected keyword (S202).

이후, 생성된 식별자를 기준으로, 식별자가 생성된 복수의 키워드를 각각 해당하는 카테고리 항목에 기입하여 분류 결과표를 완성한다(S203).Thereafter, based on the generated identifier, multiple keywords for which the identifier was generated are entered into the corresponding category items to complete the classification result table (S203).

상술한 바와 같이 생성된 분류 결과표를 기반으로, 서버(100)는 태깅이 완료된 정책 데이터를 이용하여 개인 또는 기업에게 정책을 추천할 수 있다.Based on the classification result table generated as described above, the server (100) can recommend a policy to an individual or company using the policy data for which tagging has been completed.

서버(100)는 정책기관 서버(200)로부터 수집한 정책 데이터 외에도, 사용자 단말(300)의 접속 기록으로부터 유저 데이터 및 행동 데이터를 수집할 수 있다.In addition to policy data collected from the policy agency server (200), the server (100) can collect user data and behavioral data from the access records of the user terminal (300).

여기서 정책 데이터는, 상술한 자연어 처리 및 태깅 과정을 통해 생성된 키워드에 대한 정보와 분류 결과표를 포함할 수 있다.Here, the policy data may include information on keywords generated through the natural language processing and tagging process described above and a classification result table.

다시 말해, 정책 데이터는, 정책 기관에서 시행하는 정책에 관한 정책 공고문에서 기 설정된 정책 데이터 자연어 처리 학습 모델을 통해 추출된 키워드, 그리고, 식별자가 생성된 키워드를 기 설정된 카테고리 항목 별로 분류하도록 키워드와 카테고리 항목을 태깅한 태깅 정보를 포함하는 것일 수 있다.In other words, policy data may include keywords extracted through a preset policy data natural language processing learning model from policy announcements regarding policies implemented by policy agencies, and tagging information that tags keywords and category items to classify keywords for which identifiers are generated by preset category items.

또한, 유저 데이터는, 정책 추천의 대상이 개인인 경우, 사용자 단말(300)로부터 입력 받은 개인의 거주지, 종사업종, 소득, 연령, 성별 및 자녀 수에 대한 정보 중 적어도 하나를 포함하는 것일 수 있다.In addition, if the target of the policy recommendation is an individual, the user data may include at least one of the information on the individual's residence, type of business, income, age, gender, and number of children input from the user terminal (300).

마찬가지로, 정책 추천의 대상이 기업인 경우에는, 사용자 단말(300)로부터 입력 받은 기업의 매출액, 소재지, 종사업종, 종업원수, 설립년도 및 업태에 대한 정보 중 적어도 하나를 포함하는 것일 수 있다.Likewise, if the target of the policy recommendation is a company, it may include at least one of the information on the company's sales, location, type of business, number of employees, year of establishment, and business type input from the user terminal (300).

도7을 참조하면, 본 발명의 일 실시예에 따르는 서버(100)는 사용자 단말(300)로 입력UI를 제공하여 정책 추천 대상이 개인인지 기업인지에 따라 필요로하는 정보를 수집할 수 있다.Referring to FIG. 7, a server (100) according to one embodiment of the present invention can provide an input UI to a user terminal (300) to collect necessary information depending on whether the target of the policy recommendation is an individual or a company.

이어서, 행동 데이터는 사용자 단말(300)이 서버(100) 및 정책기관 서버(200)에 접속한 로그 데이터를 기반으로 파악되는 접속 시간 및 접속 세션 수를 포함하는 것일 수 있다.Next, the behavioral data may include the connection time and number of connection sessions identified based on log data of the user terminal (300) connecting to the server (100) and the policy agency server (200).

서버(100)는 정책 데이터와 유저 데이터를 비교 분석하여 상기 사용자 단말(300)로 추천할 복수의 정책들을 포함하는 추천 후보군을 생성하는데, 정책 데이터에 포함된 상기 키워드 및 태깅 정보와 유저 데이터에 포함된 정보들의 관련성이 기 설정된 수치 값 이상 일치하는 정책들을 포함하도록 상기 추천 후보군을 생성한다.The server (100) compares and analyzes policy data and user data to generate a recommendation candidate group including multiple policies to be recommended to the user terminal (300), and generates the recommendation candidate group so as to include policies in which the relevance of the keywords and tagging information included in the policy data and the information included in the user data matches a preset numerical value or higher.

이때, 서버(100)는 사용자가 사용자 단말(300)을 통해 입력한 정보를 기반으로, 정책 데이터와 유저 데이터 간의 교집합에 해당하는 정책을 선별하여 최초 추천 후보군을 생성할 수 있다.At this time, the server (100) can select a policy corresponding to the intersection between the policy data and the user data based on the information entered by the user through the user terminal (300) to generate an initial recommendation candidate group.

추가로 서버(100)는 최로 추천 후보군을 행동 데이터를 기반으로 최초 추천 후보군에 포함된 정책들 간의 순위를 설정하는 과정을 수행한다.Additionally, the server (100) performs a process of setting a ranking among policies included in the initial recommendation candidate group based on behavioral data.

서버(100)는 추천 후보군에 포함된 정책들 중에서 각 유저 데이터 중 행동 데이터의 접속 시간이 기 설정된 시간 이상이고 접속 세션 수가 기 설정된 수치 이상인 정책들 각각의 접속 시간을 합한 값과 접속 세션 수를 합한 값을 기준으로, 상기 추천 후보군에 포함된 정책들 간의 순위를 재정렬한다.The server (100) rearranges the rankings of the policies included in the recommended candidate group based on the sum of the access times and the number of access sessions of each policy among the policies included in the recommended candidate group, for which the access time of each user data of the behavioral data is longer than a preset time and the number of access sessions is longer than a preset number.

예를 들어, 사용자 단말(300)이 추천 후보군에 포함된 복수의 정책에 대하여 각 정책을 제공하는 인터넷 세션에 얼마나 오래 접속하였는지 여부와 얼마나 많이 접속하였는지 여부를 기준으로 사용자 단말(300)이 오래 그리고 많이 접속한 정책을 사용자가 주목하는 정책인 것으로 판단하여 이를 우선적으로 추천하는데 활용할 수 있다.For example, based on how long and how many times the user terminal (300) has been connected to an Internet session providing each policy among multiple policies included in the recommendation candidate group, a policy that the user terminal (300) has connected to for a long time and frequently can be determined to be a policy that the user is interested in, and this can be used to recommend it preferentially.

만일 사용자 단말(300)이 서버(100)에 처음 접속한 경우라면, 서버(100)는 기 저장된 복수의 행동 데이터 중에서 사용자 단말(300)이 서버(100)에 처음 접속할 때 발생하는 행동 데이터와 관련성이 기 설정된 수치 이상인 유사 행동 데이터를 기초로 추천 후보군에 포함된 정책들의 순위를 재정렬할 수 있으며 유저 데이터를 통한 순위 재정렬 또한 가능하다.If the user terminal (300) is accessing the server (100) for the first time, the server (100) can rearrange the order of policies included in the recommendation candidate group based on similar behavior data having a relevance higher than a preset value to the behavior data that occurs when the user terminal (300) first accesses the server (100) among a plurality of previously stored behavior data, and rearrangement of the order using user data is also possible.

도8을 참조하면, 사용자 단말(300)로 제공되는 추천 후보군은 도시된 바와 같이 복수의 정책을 포함할 수 있으며, 각각의 정책은 해당 정책의 수행기관종류 및 수행기관명, 정책명 및 지원기간이 표시되어 제공될 수 있다.Referring to FIG. 8, the group of recommended candidates provided to the user terminal (300) may include multiple policies as illustrated, and each policy may be provided with the type and name of the implementing agency of the policy, the policy name, and the support period displayed.

예를 들어, 유저 데이터가 30세, 미혼 직장인K씨인 사용자 단말(300)이 최초로 서버(100)에 접속한 경우, 이와 유사한 30세, 미혼 직장인P씨의 유저 데이터가 순위 재정렬에 활용될 수 있으며, K씨의 사용자 단말(300)의 행동 데이터가 청년 내일 채움 공제 정책에서 오랜 시간 많이 접속한 것으로 파악되는 경우, 청년 내일 채움 공제 정책이 K씨의 사용자 단말(300)로 추천될 수 있다.For example, if a user terminal (300) whose user data is 30-year-old, single office worker K, first accesses the server (100), the user data of a similar 30-year-old, single office worker P can be used for re-arranging the ranking, and if the behavioral data of K's user terminal (300) is determined to have accessed the Youth Tomorrow Filling Deduction Policy for a long time, the Youth Tomorrow Filling Deduction Policy can be recommended to K's user terminal (300).

이와 마찬가지로, 유저 데이터가 S그룹, 유통업, 연 매출 100억인 사용자 단말(300)이 최초로 서버(100)에 접속한 경우, 이와 유사한 유저 데이터를 유통업, 연 매출 90억인 Y산업의 유저 데이터가 순위 재정렬에 활용될 수 있으며, S그룹 사용자 단말(300)의 행동 데이터가 해외 수출 지원 사업 정책에서 오랜 시간 자주 접속한 것으로 파악되는 경우, 해외 수출 지원 사업 정책이 S그룹의 사용자 단말(300)로 제공될 수 있다.Likewise, if a user terminal (300) of Group S, distribution industry, and annual sales of 10 billion KRW connects to the server (100) for the first time, similar user data of Industry Y, distribution industry and annual sales of 9 billion KRW, can be used for rearranging the ranking, and if the behavioral data of Group S user terminal (300) is determined to have connected frequently for a long time in the overseas export support business policy, the overseas export support business policy can be provided to Group S user terminal (300).

도9c를 참조하면, 본 발명의 일 실시예에 따르는, 태깅이 완료된 정책 데이터를 이용하여 개인 또는 기업에게 정책을 추천하는 방법은 다음과 같은 순서로 수행될 수 있다.Referring to FIG. 9c, a method for recommending a policy to an individual or a company using tagged policy data according to one embodiment of the present invention can be performed in the following order.

먼저, 서버(100)가 정책기관 서버(200)로부터 정책 데이터를 수집하고, 사용자 단말(300)의 접속기록으로부터 유저데이터 및 행동데이터를 수집한다(S301).First, the server (100) collects policy data from the policy agency server (200) and collects user data and behavioral data from the access record of the user terminal (300) (S301).

이후, 자연어 처리와 태깅이 완료된 정책 데이터를 유저데이터를 매칭하여 복수의 정책을 포함하는 추천후보군을 생성한다(S302) .Afterwards, the policy data, for which natural language processing and tagging have been completed, is matched with user data to generate a group of recommended candidates including multiple policies (S302).

다음으로, 행동데이터를 기반으로 추천후보군의 순위를 재설정한다(S303).Next, the ranking of the recommended candidates is reset based on the behavioral data (S303).

그리고, 정렬된 추천후보군에 포함된 정책을 순위가 높은 순에서 낮은 순으로 사용자 단말(300)로 추천한다(S304).Then, policies included in the sorted recommendation candidate group are recommended to the user terminal (300) in order of highest to lowest ranking (S304).

본 발명의 일 실시예는 컴퓨터에 의해 실행되는 프로그램 모듈과 같은 컴퓨터에 의해 실행가능한 명령어를 포함하는 기록 매체의 형태로도 구현될 수 있다. 컴퓨터 판독 가능 매체는 컴퓨터에 의해 액세스될 수 있는 임의의 가용 매체일 수 있고, 휘발성 및 비휘발성 매체, 분리형 및 비분리형 매체를 모두 포함한다. 또한, 컴퓨터 판독가능 매체는 컴퓨터 저장 매체를 모두 포함할 수 있다. 컴퓨터 저장 매체는 컴퓨터 판독가능 명령어, 데이터 구조, 프로그램 모듈 또는 기타 데이터와 같은 정보의 저장을 위한 임의의 방법 또는 기술로 구현된 휘발성 및 비휘발성, 분리형 및 비분리형 매체를 모두 포함한다. An embodiment of the present invention may also be implemented in the form of a recording medium containing computer-executable instructions, such as program modules, that are executed by a computer. The computer-readable medium may be any available medium that can be accessed by a computer, and includes both volatile and nonvolatile media, removable and non-removable media. In addition, the computer-readable medium may include all computer storage media. The computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information, such as computer-readable instructions, data structures, program modules, or other data.

본 발명의 방법 및 시스템은 특정 실시예와 관련하여 설명되었지만, 그것들의 구성 요소 또는 동작의 일부 또는 전부는 범용 하드웨어 아키텍쳐를 갖는 컴퓨터 시스템을 사용하여 구현될 수 있다.Although the methods and systems of the present invention have been described with respect to specific embodiments, some or all of their components or operations may be implemented using a computer system having a general-purpose hardware architecture.

전술한 본 발명의 설명은 예시를 위한 것이며, 본 발명이 속하는 기술분야의 통상의 지식을 가진 자는 본 발명의 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 쉽게 변형이 가능하다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다. 예를 들어, 단일형으로 설명되어 있는 각 구성 요소는 분산되어 실시될 수도 있으며, 마찬가지로 분산된 것으로 설명되어 있는 구성 요소들도 결합된 형태로 실시될 수 있다.The above description of the present invention is for illustrative purposes, and those skilled in the art will understand that the present invention can be easily modified into other specific forms without changing the technical idea or essential characteristics of the present invention. Therefore, it should be understood that the embodiments described above are exemplary in all respects and not restrictive. For example, each component described as a single component may be implemented in a distributed manner, and likewise, components described as distributed may be implemented in a combined form.

본 발명의 범위는 상기 상세한 설명보다는 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구범위의 의미 및 범위 그리고 그 균등 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본 발명의 범위에 포함되는 것으로 해석되어야 한다.The scope of the present invention is indicated by the claims described below rather than the detailed description above, and all changes or modifications derived from the meaning and scope of the claims and their equivalent concepts should be interpreted as being included in the scope of the present invention.

100: 서버 200: 정책기관 서버
300: 사용자 단말100: Server 200: Policy Agency Server
300: User Terminal

Claims

A method for performing a tagging process for recommending a policy to an individual based on natural language processed policy data performed by a server,
(a) A step of receiving keywords extracted from a policy announcement through a preset policy data natural language processing learning model;
(b) a step of extracting a selection keyword corresponding to a preset category item by considering the weight among the keywords extracted according to step (a) using an entity name recognition module, and generating an identifier indicating a category item corresponding to the selection keyword; and
(c) a step of generating a classification result table by entering the selection keyword in a category item corresponding to the selection keyword based on the identifier,
In the above step (b), the identifier is created for a keyword related to at least one of the residence, type of business, income, age, gender, and number of children of the individual who is the target of the policy recommendation among the keywords extracted according to the above step (a).
Step (b) above,
If the above selection keyword corresponding to the above preset category item is not extracted, the step of re-identifying the context of the policy announcement using the entity name recognition module and extracting another keyword corresponding to the above preset category item is further included.
This is performed repeatedly until a keyword corresponding to the above preset category item is extracted.
A method for performing a tagging process to recommend policies to individuals based on natural language processed policy data.

In the first paragraph,
The above policy data natural language processing learning model is,
It is learned to extract sentence components in the form of nouns from the above policy announcement using morphological analysis.
A method for performing a tagging process to recommend policies to individuals based on natural language processed policy data.

In the first paragraph,
The above weights are,
The above policy data is set based on multiple nouns extracted from the above policy announcement by the natural language processing learning model.
A method for performing a tagging process for recommending a policy to an individual based on natural language processed policy data, wherein the weight of a noun with a high extraction frequency is set to be greater than the weight of a noun with a low extraction frequency based on the frequency of the nouns extracted from the above policy announcement.

In the first paragraph,
Step (a) above,
(a-1) A step for matching noun-vector pairs by setting the same vector value for a plurality of nouns judged to have the same meaning according to a preset algorithm, and setting an adjacent vector value within a preset difference from the vector value for another noun judged to have a similar meaning to the plurality of nouns;
(a-2) a step of classifying nouns extracted from the policy announcement according to a preset algorithm into one keyword, among which nouns whose vector values match or are within a preset difference; and
(a-3) A method for performing a tagging process for recommending a policy to an individual based on natural language processed policy data, comprising the steps of: setting a plurality of noun-vector pairs generated according to the step (a-1) and keywords classified according to the step (a-2) as input values among learning values of the policy data natural language processing learning model, and performing learning by setting keywords classified according to the step (a-2) as output values among learning values of the policy data natural language processing learning model.

In the first paragraph,
The above weights further include penalty weights,
Step (b) above,
A method for performing a tagging process for recommending a policy to an individual based on natural language processed policy data, comprising a step of evaluating the relevance of a keyword extracted according to step (a) and a category item corresponding to the keyword extracted according to step (a) based on the penalty weight.

In the first paragraph,
Step (c) above,
A method for performing a tagging process for recommending a policy to an individual based on natural language processed policy data, comprising the step of generating a classification result table by classifying keywords related to the residence, type of business, income, age, gender and number of children of an individual who is the target of the above policy recommendation into one of the categories of the relevant organization, the policy name and the support target corresponding to the policy announcement according to a preset algorithm.

delete

In the first paragraph,
If there are multiple keywords corresponding to the same identifier among the generated identifiers,
A method for performing a tagging process for recommending a policy to an individual based on natural language processed policy data, wherein, in step (c) above, multiple keywords are matched to one category item.

In Article 6,
Step (c) above,
The above policy data natural language processing learning model includes a step of matching each category item with its corresponding keyword according to a preset algorithm based on the identifier, entering the matched keyword into each category item to generate the classification result table, matching a keyword for one item combining at least two or more category items, and entering the matched keyword into one item combining at least two or more category items to generate the classification result table.
A method for performing a tagging process to recommend policies to individuals based on natural language processed policy data.

In a device for performing a tagging process for recommending policies to individuals based on natural language processed policy data,
A memory storing a program for performing a tagging process for recommending policies to individuals based on natural language processed policy data; and
comprising a processor for executing the above program;
The above processor,
A method of receiving keywords extracted from a policy announcement through a preset policy data natural language processing learning model, extracting selection keywords corresponding to preset category items by considering weights among the extracted keywords using a named entity recognition module, generating an identifier representing a category item corresponding to the selected keyword, entering the selected keyword in a category item corresponding to the selected keyword based on the identifier to generate a classification result table, generating the identifier for a keyword related to at least one of the residence, type of business, income, age, gender, and number of children of an individual who is a target of policy recommendation among the extracted keywords, and if the selection keyword corresponding to the preset category item is not extracted, re-identifying the context of the policy announcement using the named entity recognition module, extracting another keyword corresponding to the preset category item, and repeatedly extracting another keyword corresponding to the category item until the keyword corresponding to the preset category item is extracted.
A device that performs a tagging process to recommend policies to individuals based on natural language processed policy data.