KR20230091749A

KR20230091749A - Mehtod and device for generating datamap

Info

Publication number: KR20230091749A
Application number: KR1020220054565A
Authority: KR
Inventors: 우예린; 원희선; 민 차우 응웬; 손시운
Original assignee: 한국전자통신연구원
Priority date: 2021-12-16
Filing date: 2022-05-03
Publication date: 2023-06-23

Abstract

Disclosed are a method and device for generating a data map to describe various metadata information about a dataset. According to the present invention, the method for generating a data map comprises: a step of generating, by a relational DB data map generation module, a relational data map describing asset-related metadata of a catalog classified according to a classification system and a category in a preset standard and using a standard vocabulary to convert the relational data map to an RDF triple structure; and a step of transmitting, by a graphic DB data map generation module, change data related to a change event in the relational data map to a message queue and using a standard vocabulary for the change data input to the message queue to generate a graph-type data map mapping nodes, node attributes, relationships between the nodes.

Description

Method and device for generating data map {MEHTOD AND DEVICE FOR GENERATING DATAMAP}

본 개시는 데이터맵 생성 방법 및 장치에 관한 것이며, 보다 구체적으로 공공 및 민간 분야 등의 다양한 영역에서 생성되는 각종 데이터셋의 상호 공유와 활용 지원을 위해 데이터셋에 대한 다양한 메타데이터 정보를 기술하는 데이터맵 생성 방법 및 장치에 관한 것이다.The present disclosure relates to a method and apparatus for generating a data map, and more specifically, data describing various metadata information about a dataset to support mutual sharing and utilization of various datasets generated in various areas such as the public and private sectors. It relates to a method and apparatus for generating a map.

국내외 여러 분야에서 데이터 기반의 융합 비즈니스 모델과 새로운 데이터의 출현 빈도가 급증하고 공공, 민간 등 상호 간의 데이터 공유에 대한 요구가 커지고 있다. 이에 따라 공공, 민간 등에서 보유하고 있는 데이터셋에 대한 정보를 기술한 데이터맵의 공유가 확산 추세에 있으나, 서로 다른 형식으로 상호간 정확한 해석과 활용이 원활하지 못한 문제점이 있다.Data-based convergence business models and the frequency of new data are rapidly increasing in various fields at home and abroad, and the demand for data sharing between the public and private sectors is growing. Accordingly, sharing of data maps describing information on datasets held by public and private sectors is on the rise, but there is a problem in that accurate interpretation and utilization of data in different formats are not smooth.

기존에도 업무와 관련하여 데이터셋이 관리되고 공유되고 있으나, 최근 데이터 개방 확산과 신산업 개척에 대한 기대로 상호간 데이터 공유의 필요성이 더욱 증대되고 있다. 이에 따라, 데이터 산업환경의 변화에 대응할 수 있는 데이터맵의 구조 및 생성 방안이 필요한 실정이다.In the past, data sets have been managed and shared in relation to work, but the need for mutual data sharing is further increasing with the recent spread of data openness and expectations for new industry development. Accordingly, there is a need for a structure and generation method of a data map capable of responding to changes in the data industry environment.

본 개시의 기술적 과제는 공공 및 민간 분야 등의 다양한 영역에서 생성되는 각종 데이터셋의 상호 공유와 활용 지원을 위해 데이터셋에 대한 다양한 메타데이터 정보를 기술하는 데이터맵 생성 방법 및 장치를 제공하는데 그 목적이 있다. The technical problem of the present disclosure is to provide a method and apparatus for generating a data map that describes various metadata information about a dataset in order to support mutual sharing and utilization of various datasets generated in various areas such as the public and private sectors. there is

본 개시에서 이루고자 하는 기술적 과제들은 이상에서 언급한 기술적 과제들로 제한되지 않으며, 언급하지 않은 또 다른 기술적 과제들은 아래의 기재로부터 본 개시가 속하는 기술분야에서 통상의 지식을 가진 자에게 명확하게 이해될 수 있을 것이다.The technical problems to be achieved in the present disclosure are not limited to the technical problems mentioned above, and other technical problems not mentioned will be clearly understood by those skilled in the art from the description below. You will be able to.

본 개시의 일 양상에 따르면, 데이터맵 생성 방법이 제공된다. 상기 데이터맵 생성 방법은, 관계형 DB용 데이터맵 생성 모듈에 의해, 분류 체계와 카테고리에 따라 분류된 카탈로그의 자산 관련 메타데이터에 대해 기 설정의 규격으로 기술한 관계형 데이터맵을 생성하고, 표준 어휘를 이용하여 상기 관계형 데이터맵을 RDF 트리플 구조로 변환하는 단계; 및 그래프형 DB용 데이터맵 생성 모듈에 의해, 상기 관계형 데이터맵에서의 변경 이벤트와 관련된 변경 데이터를 메시지큐로 전달하며, 상기 메시지큐로 입력된 상기 변경 데이터에 대해 표준 어휘를 사용하여 노드, 노드 속성 및 노드간 관계를 맵핑하는 그래프형 데이터맵을 생성하는 단계를 포함한다. According to one aspect of the present disclosure, a method for generating a datamap is provided. The data map creation method generates a relational data map in which asset-related metadata of a catalog classified according to a classification system and category is described in a preset standard by a data map creation module for a relational DB, and uses a standard vocabulary. converting the relational data map into an RDF triple structure by using; and a data map generation module for a graph-type DB, which transmits change data related to a change event in the relational data map to a message queue, and uses a standard vocabulary for the change data input to the message queue to obtain nodes and nodes. and generating a graph-type data map that maps attributes and relationships between nodes.

본 개시의 다른 실시예에 따르면, 상기 그래프형 데이터맵은 상기 자산을 상기 노드로 할당하고, 상기 자산 관련 메타데이터의 용어의 범위에 따라, 상기 자산 관련 메타데이터를 상기 노드 속성 및 상기 노드간 관계 중 어느 하나로 할당하도록 생성될 수 있다.According to another embodiment of the present disclosure, the graph-type data map allocates the asset to the node, and according to the range of terms of the asset-related metadata, the asset-related metadata is assigned to the node attribute and the relationship between the nodes. It can be created to assign to any one of them.

상기 실시예에 추가하여, 상기 노드 속성은 상기 자산 관련 메타데이터가 상기 표준 어휘의 용어의 범위에 속하는 경우에 생성되고, 상기 노드간 관계는 상기 자산 관련 메타데이터가 상기 표준 어휘의 용어의 범위에 속하지 않는 경우에 생성될 수 있다.In addition to the above embodiment, the node attribute is generated when the asset-related metadata falls within the scope of a term in the standard vocabulary, and the inter-node relationship is such that the asset-related metadata falls within the scope of a term in the standard vocabulary. It can be created if it does not belong.

상기 실시예에 추가하여, 상기 그래프형 데이터맵은 상기 노드간 관계와 관련된 속성값을 신규 노드에 할당하도록 생성될 수 있다. In addition to the above embodiment, the graph-type data map may be created so as to assign attribute values related to the relationship between the nodes to new nodes.

본 개시의 또 다른 실시예에 따르면, 상기 자산 관련 메타데이터는 데이터셋, 데이터서비스 및 분석모델 중 적어도 하나와 함께, 각 자산에 대한 사용방법, 표준 어휘, 품질지표 중 적어도 하나를 포함할 수 있다. According to another embodiment of the present disclosure, the asset-related metadata may include at least one of a usage method for each asset, a standard vocabulary, and a quality index, together with at least one of a dataset, a data service, and an analysis model. .

본 개시의 다른 양상에 따르면, 데이터맵 생성 장치가 제공된다. 상기 데이터맵 생성 장치는, 분류 체계와 카테고리에 따라 분류된 카탈로그의 자산 관련 메타데이터에 대해 기 설정의 규격으로 기술한 관계형 데이터맵을 생성하고, 표준 어휘를 이용하여 상기 관계형 데이터맵을 RDF 트리플 구조로 변환하는 관계형 DB용 데이터맵 생성 모듈; 및 상기 관계형 데이터맵에서의 변경 이벤트와 관련된 변경 데이터를 메시지큐로 전달하며, 상기 메시지큐로 입력된 상기 변경 데이터에 대해 표준 어휘를 사용하여 노드, 노드 속성 및 노드간 관계를 맵핑하는 그래프형 데이터맵을 생성하는 그래프형 DB용 데이터맵 생성 모듈을 포함한다. According to another aspect of the present disclosure, an apparatus for generating a data map is provided. The data map generation device generates a relational data map in which asset-related metadata of a catalog classified according to a classification system and category is described in a preset standard, and uses a standard vocabulary to convert the relational data map into an RDF triple structure. Data map creation module for relational DB that converts to; and graph-type data that transfers change data related to a change event in the relational data map to a message queue, and maps nodes, node attributes, and relationships between nodes using a standard vocabulary for the change data input to the message queue. Includes a data map creation module for graph-type DB that creates maps.

본 개시에 대하여 위에서 간략하게 요약된 특징들은 후술하는 본 개시의 상세한 설명의 예시적인 양상일 뿐이며, 본 개시의 범위를 제한하는 것은 아니다.The features briefly summarized above with respect to the disclosure are merely exemplary aspects of the detailed description of the disclosure that follows, and do not limit the scope of the disclosure.

본 개시에 따르면, 공공 및 민간 분야 등의 다양한 영역에서 생성되는 각종 데이터셋의 상호 공유와 활용 지원을 위해 데이터셋에 대한 다양한 메타데이터 정보를 기술하는 데이터맵 생성 방법 및 장치를 제공할 수 있다. According to the present disclosure, it is possible to provide a method and apparatus for generating a data map describing various metadata information on a dataset in order to support mutual sharing and utilization of various datasets generated in various areas, such as the public and private sectors.

본 개시에 따르면, 데이터의 소재, 형식, 내용 등 데이터에 대한 다양한 정보인 메타데이터 정보를 표준 형식으로 배포하여 상호간 데이터 검색과 공유가 원활히 이루어질 수 있다. According to the present disclosure, metadata information, which is various information about data, such as whereabouts, formats, and contents of data, is distributed in a standard format, so that mutual data search and sharing can be smoothly performed.

본 개시에 따르면, 다양한 분야에 산재되어 있는 데이터를 효율적으로 공유하고, 검색 정확도를 향상시킬 수 있다. According to the present disclosure, data scattered in various fields can be efficiently shared and search accuracy can be improved.

본 개시에서 얻을 수 있는 효과는 이상에서 언급한 효과들로 제한되지 않으며, 언급하지 않은 또 다른 효과들은 아래의 기재로부터 본 개시가 속하는 기술 분야에서 통상의 지식을 가진 자에게 명확하게 이해될 수 있을 것이다.Effects obtainable in the present disclosure are not limited to the effects mentioned above, and other effects not mentioned may be clearly understood by those skilled in the art from the description below. will be.

도 1은 본 개시의 일 실시예에 따른 데이터맵 생성 장치의 개략 블록도이다.
도 2는 데이터맵의 구조를 도시한 도면이다.
도 3은 분류 체계 및 카테고리에 따라 자산을 분류한 카탈로그가 관리되는 것을 예시한 도면이다.
도 4는 자산의 메타데이터를 설명하기 위한 RDF 트리플 구조를 예시한 도면이다.
도 5는 관계형 DB용 데이터맵 생성 모듈의 블록도이다.
도 6은 그래프형 DB용 데이터맵 생성 모듈의 블록도이다.
도 7은 그래프형 데이터베이스에서 노드의 추가를 예시하는 도면이다.1 is a schematic block diagram of an apparatus for generating a data map according to an embodiment of the present disclosure.
2 is a diagram showing the structure of a data map.
3 is a diagram illustrating the management of catalogs in which assets are classified according to classification systems and categories.
4 is a diagram illustrating an RDF triple structure for describing metadata of an asset.
5 is a block diagram of a data map generation module for relational DB.
6 is a block diagram of a data map generation module for a graph type DB.
7 is a diagram illustrating the addition of nodes in a graph-like database.

이하에서는 첨부한 도면을 참고로 하여 본 개시의 실시 예에 대하여 본 개시가 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 상세히 설명한다. 그러나, 본 개시는 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시 예에 한정되지 않는다. Hereinafter, with reference to the accompanying drawings, embodiments of the present disclosure will be described in detail so that those skilled in the art can easily carry out the present disclosure. However, the present disclosure may be implemented in many different forms and is not limited to the embodiments described herein.

본 개시의 실시 예를 설명함에 있어서 공지 구성 또는 기능에 대한 구체적인 설명이 본 개시의 요지를 흐릴 수 있다고 판단되는 경우에는 그에 대한 상세한 설명은 생략한다. 그리고, 도면에서 본 개시에 대한 설명과 관계없는 부분은 생략하였으며, 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.In describing the embodiments of the present disclosure, if it is determined that a detailed description of a known configuration or function may obscure the gist of the present disclosure, a detailed description thereof will be omitted. And, in the drawings, parts irrelevant to the description of the present disclosure are omitted, and similar reference numerals are attached to similar parts.

본 개시에 있어서, 어떤 구성요소가 다른 구성요소와 "연결", "결합" 또는 "접속"되어 있다고 할 때, 이는 직접적인 연결 관계 뿐만 아니라, 그 중간에 또 다른 구성요소가 존재하는 간접적인 연결관계도 포함할 수 있다. 또한 어떤 구성요소가 다른 구성요소를 "포함한다" 또는 "가진다"고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 배제하는 것이 아니라 또 다른 구성요소를 더 포함할 수 있는 것을 의미한다.In the present disclosure, when a component is said to be "connected", "coupled" or "connected" to another component, this is not only a direct connection relationship, but also an indirect connection relationship where another component exists in the middle. may also be included. In addition, when a component "includes" or "has" another component, this means that it may further include another component without excluding other components unless otherwise stated. .

본 개시에 있어서, 제 1, 제 2 등의 용어는 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용되며, 특별히 언급되지 않는 한 구성요소들 간의 순서 또는 중요도 등을 한정하지 않는다. 따라서, 본 개시의 범위 내에서 일 실시 예에서의 제 1 구성요소는 다른 실시 예에서 제 2 구성요소라고 칭할 수도 있고, 마찬가지로 일 실시 예에서의 제 2 구성요소를 다른 실시 예에서 제 1 구성요소라고 칭할 수도 있다. In the present disclosure, terms such as first and second are used only for the purpose of distinguishing one element from another, and do not limit the order or importance of elements unless otherwise specified. Accordingly, within the scope of the present disclosure, a first component in one embodiment may be referred to as a second component in another embodiment, and similarly, a second component in one embodiment may be referred to as a first component in another embodiment. can also be called

본 개시에 있어서, 서로 구별되는 구성요소들은 각각의 특징을 명확하게 설명하기 위함 이며, 구성요소들이 반드시 분리되는 것을 의미하지는 않는다. 즉, 복수의 구성요소가 통합되어 하나의 하드웨어 또는 소프트웨어 단위로 이루어질 수도 있고, 하나의 구성요소가 분산되어 복수의 하드웨어 또는 소프트웨어 단위로 이루어질 수도 있다. 따라서, 별도로 언급하지 않더라도 이와 같이 통합된 또는 분산된 실시 예도 본 개시의 범위에 포함된다. In the present disclosure, components that are distinguished from each other are to clearly explain each characteristic, and do not necessarily mean that the components are separated. That is, a plurality of components may be integrated to form a single hardware or software unit, or a single component may be distributed to form a plurality of hardware or software units. Accordingly, even such integrated or distributed embodiments are included in the scope of the present disclosure, even if not mentioned separately.

본 개시에 있어서, 다양한 실시 예에서 설명하는 구성요소들이 반드시 필수적인 구성요소들은 의미하는 것은 아니며, 일부는 선택적인 구성요소일 수 있다. 따라서, 일 실시 예에서 설명하는 구성요소들의 부분집합으로 구성되는 실시 예도 본 개시의 범위에 포함된다. 또한, 다양한 실시예에서 설명하는 구성요소들에 추가적으로 다른 구성요소를 포함하는 실시 예도 본 개시의 범위에 포함된다. In the present disclosure, components described in various embodiments do not necessarily mean essential components, and some may be optional components. Therefore, an embodiment composed of a subset of components described in one embodiment is also included in the scope of the present disclosure. In addition, embodiments including other components in addition to the components described in various embodiments are also included in the scope of the present disclosure.

이하에서는, 본 명세서의 도면을 참조하여 본 개시에 따른 실시예들을 설명한다. Hereinafter, embodiments according to the present disclosure will be described with reference to the drawings of this specification.

도 1은 본 개시의 일 실시예에 따른 데이터맵 생성 장치의 개략 블록도이다. 1 is a schematic block diagram of an apparatus for generating a data map according to an embodiment of the present disclosure.

데이터맵 생성 시스템은 데이터맵 생성 장치(100), 데이터 거버넌스 시스템(200) 및 데이터 포털 시스템(300)을 포함한다. The data map generating system includes a data map generating device 100 , a data governance system 200 and a data portal system 300 .

데이터맵 생성 장치(100)는 관계형 DB용 데이터맵 생성 모듈(110) 및 그래프형 DB용 데이터맵 생성 모듈(120)을 포함할 수 있다. The data map generating device 100 may include a data map generating module 110 for relational DB and a data map generating module 120 for graph type DB.

관계형 DB용 데이터맵 생성 모듈(110)은 분류 체계와 카테고리에 따라 분류된 카탈로그의 자산과 관련된 메타데이터에 대해 기 설정의 규격으로 기술한 관계형 데이터맵을 생성하고, 표준 어휘를 이용하여 관계형 데이터맵을 RDF(Resource Description Framework) 트리플 구조로 변환할 수 있다. RDF 트리플 구조는 검색 결과를 통합함으로써, RDF 파일이 생성될 수 있다. The data map generation module 110 for a relational DB creates a relational data map in which metadata related to assets of catalogs classified according to classification schemes and categories is described in a preset standard, and uses a standard vocabulary to create a relational data map. can be converted to RDF (Resource Description Framework) triple structure. An RDF triple structure can be created by integrating search results into an RDF file.

그래프형 DB용 데이터맵 생성 모듈(120)은 관계형 데이터맵에서의 변경 이벤트와 관련된 변경 데이터를 메시지큐로 전달하며, 메시지큐로 입력된 변경 데이터에 대해 표준 어휘를 사용하여 노드, 노드 속성 및 노드간 관계를 맵핑하는 그래프형 데이터맵을 생성할 수 있다. The data map generation module 120 for graph-type DB transfers change data related to change events in the relational data map to the message queue, and uses a standard vocabulary for the change data input to the message queue to identify nodes, node attributes, and nodes. You can create a graph-type data map that maps the relationship between

데이터맵 생성 장치(100)는 관계형 데이터베이스와 그래프 데이터베이스를 병행하여 사용하며, 관계형 데이터베이스에 저장된 관계형 데이터맵을 그래프 데이터베이스에 적합한 그래프형 데이터맵으로 변환할 수 있다. The apparatus 100 for generating a data map may use a relational database and a graph database in parallel, and convert a relational data map stored in the relational database into a graph data map suitable for the graph database.

자산의 데이터 속성과 자산간 관계 정보를 포함한 메타데이터를 기 설정된 규격(즉, 표준)에 의거하여 데이터맵으로 기술함으로써, 사용자의 질의에 대해, 데이터의 다양한 메타데이터를 바탕으로 의미론적 탐색이 가능하다. 또한, 관계형 데이터베이스에서 그래프 데이터베이스로 전환됨으로써, 탐색 과정에서 보다 향상된 검색 성능을 지원할 수 있다.By describing the metadata, including the data properties of assets and the relationship information between assets, in a data map based on the preset specifications (i.e., standards), semantic search is possible based on various metadata of data for user queries. do. In addition, by converting a relational database to a graph database, it is possible to support more improved search performance in a search process.

데이터 거버넌스 시스템(200)은 데이터맵의 분류 체계 및 카테고리를 관리함과 아울러서, 각 카테고리 별 표준 어휘 및 품질 지표 등을 관리할 수 있다. The data governance system 200 can manage the classification system and categories of the data map, as well as standard vocabulary and quality indicators for each category.

데이터 포털 시스템(300)은 데이터맵 생성 장치(100)와 통신하면서, 다양한 데이터셋을 수집하여 데이터맵 생성 장치(100)으로 전달하고, 데이터맵 생성 장치(100)에서 생성된 RDF 트리플 구조로 제공된 RDF 파일에 기초하여, 사용자 검색 요청에 따른 검색 결과를 제공할 수 있다. The data portal system 300 communicates with the data map generator 100, collects various data sets, transfers them to the data map generator 100, and provides the RDF triple structure generated by the data map generator 100. Based on the RDF file, search results according to user search requests can be provided.

도 2는 데이터맵의 구조를 도시한 도면이다. 도 2는 관계형 DB용 데이터맵 생성 모듈(110)에서 생성되는 관계형 데이터맵에 관한 것이다. 2 is a diagram showing the structure of a data map. 2 relates to a relational data map generated by the data map generation module 110 for relational DB.

데이터맵은 데이터셋 간의 융합, 활용, 탐색 등을 위해 다양한 메타데이터 정보로 표현된 자료 구조일 수 있다. 각 기관에서 운영하는 이종 플랫폼 간에 표준 규격의 메타데이터로 기술된 데이터맵을 교환하면, 플랫폼 상호 간의 운용성이 증대되어 필요한 데이터를 정확하게 검색할 수 있다.A data map may be a data structure expressed in various metadata information for convergence, utilization, search, etc. between datasets. When data maps described as standard metadata are exchanged between heterogeneous platforms operated by each institution, operability between platforms is increased and necessary data can be accurately retrieved.

데이터맵은 적어도 하나의 자산을 갖는 카탈로그가 각 분야별로 구분된 분류 체계에 지정되도록 구성되고, 카탈로그를 구성하는 각 자산은 분류 체계의 하위 레벨인 카테고리에 속하도록 구성될 수 있다. 데이터맵은 카탈로그 및 자산 관련 메타데이터 정보를 기술할 수 있다. 각 카테고리에는 자산에 대한 사용방법, 어휘, 품질지표 등이 추가될 수 있다.The data map may be configured so that a catalog having at least one asset is assigned to a classification system classified for each field, and each asset constituting the catalog may be configured to belong to a category, which is a lower level of the classification system. Datamaps can describe metadata information related to catalogs and assets. In each category, how to use the asset, vocabulary, quality index, etc. can be added.

카탈로그는 데이터 카탈로그로서, 자산들을 목록화 하도록 구성될 수 있다. 자산은 웹 상에 존재하는 모든 데이터를 총칭할 수 있다. 자산은 도 2에 예시된 바와 같이, 데이터들의 집합인 데이터셋, 데이터에 접근 가능한 서비스 또는 엔드 포인트들의 집합인 데이터서비스 및 분석모델 중 적어도 하나일 수 있다. 분석 모델은 예를 들어, 데이터를 분석하는 알고리즘 및 솔루션의 집합으로 구성될 수 있다. 데이터셋 또는 데이터서비스, 분석모델 등을 목록화하면서 자산에 대한 메타데이터, 자산 간의 관계, 또는 자산이 속한 카테고리와 관리되는 분류체계 등의 정보를 포함하는 자산 관련 메타데이터를 수록하는 데이터맵으로 정의될 수 있다. 데이터맵은 W3C에서 정의하는 데이터 카탈로그의 일종이라고 할 수 있다. A catalog is a data catalog and can be configured to catalog assets. Assets can collectively refer to all data existing on the web. As illustrated in FIG. 2 , the asset may be at least one of a data set, which is a set of data, a data service, and an analysis model, which is a set of services or endpoints that can access data. An analytic model may consist of, for example, a set of algorithms and solutions that analyze data. It is defined as a data map that lists asset-related metadata, including information such as metadata about assets, relationships between assets, categories to which assets belong, and managed classification systems, while listing datasets, data services, and analysis models. It can be. Datamap is a kind of data catalog defined by W3C.

데이터맵의 정보는 관계형 데이터베이스 또는 그래프 데이터베이스 등에 저장되어 RDF, JONS LD 등의 통상적인 표준 형식의 자료구조로 생성되어 배포될 수 있다.The data map information can be stored in a relational database or graph database, and then created and distributed in a standard data structure such as RDF or JONS LD.

도 3은 분류 체계 및 카테고리에 따라 자산을 분류한 카탈로그가 관리되는 것을 예시한 도면이다. 3 is a diagram illustrating the management of catalogs in which assets are classified according to classification systems and categories.

예를 들어, 헬스케어 카탈로그(예컨대, catalog 001)가 질병 데이터셋, 처방 데이터셋(예컨대, dataset 001, dataset 002), 데이터셋에 접근 가능한 API 서비스를 포함한다면, 데이터맵은 헬스케어 카탈로그에 대한 메타데이터를 서술할 수 있다. 헬스케어 카탈로그의 메타데이터는 예컨대, 자산의 작성일, 자산의 작성자, 카탈로그에 포함된 자산들, 자산 관리에서 사용되는 분류체계 등을 포함할 수 있다. 이에 더하여, 상기 데이터맵은 예를 들어, 각 데이터셋에 대한 메타데이터와 데이터서비스인 API(Application Program Interface)에 대한 메타데이터 등의 정보를 포함할 수도 있다. 이에 따라, 사용자가 데이터를 검색할 때 데이터맵을 기반으로 검색하면, 데이터맵에서 내포하고 있는 다양한 메타데이터와 자산 간의 상호 연관성으로 인하여, 사용자의 의도에 따른 보다 정확한 검색이 가능하다. For example, if a healthcare catalog (e.g., catalog 001) includes a disease dataset, a prescription dataset (e.g., dataset 001, dataset 002), and an API service accessible to the dataset, the data map is a data map for the healthcare catalog. Metadata can be described. Metadata of the healthcare catalog may include, for example, a creation date of an asset, a creator of the asset, assets included in the catalog, and a classification system used in asset management. In addition, the data map may include, for example, information such as metadata for each dataset and metadata for API (Application Program Interface), which is a data service. Accordingly, when a user searches for data based on the data map, a more accurate search according to the user's intention is possible due to the correlation between various metadata and assets included in the data map.

분류 체계는 도 3에 예시된 바와 같이, 카테고리가 계층적인 구조로 구성될 수 있다. 카탈로그는 분류 체계를 선택하여 카탈로그가 포함하고 있는 자산들을 분류 체계의 카테고리에 따라 관리될 수 있다. As illustrated in FIG. 3 , the classification system may have a hierarchical structure of categories. A catalog can select a classification system and manage the assets included in the catalog according to the categories of the classification system.

도 4는 자산의 메타데이터를 설명하기 위한 RDF 트리플 구조를 예시한 도면이다. 4 is a diagram illustrating an RDF triple structure for describing metadata of an asset.

우선, RDF는 웹에 있는 자원의 정보를 표현하기 위한 표준 규격일 수 있다. 자원의 메타데이터를 보다 명확하게 표현하기 위해, 정해진 형식의 한 종류이며 자원 및 자원의 메타데이터를 RDF라는 공통된 형식으로 서술함으로써, 자원(또는 데이터)의 탐색과 관리가 용이하며, 이종 플랫폼 또는 응용 프로그램 간 데이터 교환이 원활하게 구현될 수 있다. 자원이 RDF 형식에 따르면, 트리플(triples) 구조로 설명될 수 있다. 트리플 구조는 자원을 설명하는 3요소, 즉 <subject>, <predicate>, <object>로 표현될 수 있다. <predicate>가 <subject>와 <object>의 관계를 설명하는 구조일 수 있다. <subject>는 설명하고자 하는 자원(데이터)에 해당하며, <object>는 <subject>와 연관된 자원일 수 있다. <predicate>는 <subject>와 <object> 간 관계, 특성을 표현할 수 있다. <predicate>는 W3C의 DCAT과 본 개시에 따른 데이터맵에서 사용하는 '속성 Property'와 동일한 의미일 수 있다. 본 개시에 따른 데이터맵은 설명하고자 하는 자산의 메타데이터를 표현하기 위해 RDF 트리플 구조를 사용하여, 자산을 subject로 서술하고, 자산의 메타데이터 즉 속성과 관계 등을 object와 property로 표현하여 서술할 수 있다. First of all, RDF may be a standard specification for expressing information of resources on the web. In order to express the metadata of resources more clearly, it is a type of fixed format, and by describing resources and their metadata in a common format called RDF, it is easy to search and manage resources (or data), and heterogeneous platforms or applications Data exchange between programs can be implemented smoothly. According to the RDF format, a resource can be described in a triples structure. The triple structure can be expressed with three elements that describe resources: <subject>, <predicate>, and <object>. <predicate> can be a structure that describes the relationship between <subject> and <object>. <subject> corresponds to a resource (data) to be explained, and <object> may be a resource related to <subject>. <predicate> can express the relationship and characteristics between <subject> and <object>. <predicate> may have the same meaning as 'property property' used in DCAT of W3C and data map according to the present disclosure. In the data map according to the present disclosure, an RDF triple structure is used to express metadata of assets to be described, assets are described as subjects, and metadata of assets, that is, properties and relationships, etc. are expressed and described as objects and properties. can

도 4는 RDF 트리플 구조의 일례를 보여주고 있으며, 데이터셋 001의 제목이 '혈압 수집 데이터'인 경우, 데이터셋은 도 4에 예시된 RDF로 서술될 수 있다. 여기서, 이종 플랫폼 중 한쪽 플랫폼은 메타데이터를 '제목'으로 기술하고, 다른쪽 플랫폼은 '"이름'으로 서술됨으로써, 메타데이터가 서로 상이하게 표현될 수 있다. 이 경우, 데이터셋의 제목을 나타내는 메타데이터가 상이하여, 데이터의 공유, 유통 또는 탐색에 어려움이 발생할 수도 있다. 이를 해결하기 위해, 표준 어휘를 사용하여 메타데이터가 서술될 수 있다. 도 4에서 사용한 어휘는 Dublin Core 어휘이며, 'title'이라는 용어가 Dublin Core 어휘에 정의된 용어라는 표시를 하기 위해, 'dcterms'라는 prefix(접두사)를 가미하면서, 'dcterms:title'라고 서술될 수 있다. 어휘(vocabulary)는 용어(term)들의 집합으로서, 해당 용어가 속한 어휘를 특정하여 표시함으로써 구별될 수 있다. 이외에도 어휘는 SKOS(Simple Knowledge Organization System)어휘, FOAF(Friend Of A Friend)어휘, DCAT(Data CATalog vocabulary)어휘 등 다양한 어휘가 있으며, 본 개시에서는 데이터맵을 표현할 때. 이러한 표준 어휘들을 사용하여 메타데이터를 서술할 수 있다. 4 shows an example of an RDF triple structure, and when the title of dataset 001 is 'blood pressure collection data', the dataset can be described as the RDF illustrated in FIG. 4 . Here, one of the heterogeneous platforms describes the metadata as 'title' and the other platform describes the metadata as 'name', so that the metadata can be expressed differently. In this case, the title of the dataset is indicated. Due to different metadata, difficulties may occur in data sharing, distribution, or search. To solve this problem, metadata can be described using standard vocabulary. The vocabulary used in Figure 4 is the Dublin Core vocabulary, and ' To indicate that the term title' is a term defined in the Dublin Core vocabulary, it can be described as 'dcterms:title', with the prefix 'dcterms' added. As a set of vocabularies, it can be distinguished by specifying and indicating the vocabulary to which the term belongs In addition, various vocabularies such as SKOS (Simple Knowledge Organization System) vocabulary, FOAF (Friend Of A Friend) vocabulary, and DCAT (Data CATalog vocabulary) vocabulary can be distinguished. In this disclosure, when expressing a data map, metadata can be described using these standard vocabularies.

메타데이터를 표현함에 있어서, 표준 어휘의 정확한 사용 여부, 메타데이터 별로 제약 조건이 정의되어 있는 경우, 해당 메타데이터 및 트리플 구조가 제약 조건에 부합되어 작성된지 여부와 관련된 유효성 검사 등이 수반될 수 있다. 이 경우, 유효성 검사에 적용되는 메타데이터의 품질 판단 지표 내지 기준이 품질 지표일 수 있다. 데이터맵의 분류 체계 및 카테고리를 관리하는 데이터 거버넌스 시스템(SODAS+ Data Governance System; 도 5의 200 참조)이 표준 어휘와 함께, 품질 지표를 관리할 수 있다. In expressing metadata, validation related to whether standard vocabulary is correctly used, if constraints are defined for each metadata, whether the corresponding metadata and triple structure are created in accordance with the constraints, etc. may be accompanied. . In this case, the quality determination index or criterion of the metadata applied to the validity check may be a quality index. A data governance system (SODAS+ Data Governance System; see 200 in FIG. 5) that manages classification schemes and categories of data maps can manage quality indicators along with standard vocabularies.

도 5는 관계형 DB용 데이터맵 생성 모듈의 블록도이다. 5 is a block diagram of a data map generation module for relational DB.

관계형 DB용 데이터맵 생성 모듈(110)은 관계형 DB에 저장되는 관계형 데이터맵을 생성하는 관계형 데이터맵 생성부(111) 및 배포부(112)를 포함할 수 있다. The data map generation module 110 for relational DB may include a relational data map generation unit 111 and a distribution unit 112 that generate a relational data map stored in a relational DB.

관계형 데이터맵 생성부(111)는 자산에 대한 메타데이터 정보에 관한 관계형 데이터맵을 생성하고, 관계형 데이터맵은 관계형 데이터맵 생성부(111)의 관계형 데이터베이스에 저장될 수 있다. 배포부(112)는 표준 어휘를 적용하여, 관계형 데이터맵을 RDF 트리플 구조로 변환하고, 검색 결과를 통합한 RDF 파일을 생성할 수 있다. The relational data map generating unit 111 may generate a relational data map related to metadata information about an asset, and the relational data map may be stored in a relational database of the relational data map generating unit 111 . The distribution unit 112 may apply a standard vocabulary, convert the relational data map into an RDF triple structure, and generate an RDF file in which search results are integrated.

관계형 데이터맵은 도 2에 설명한 바와 같이, 적어도 하나의 자산을 갖는 카탈로그가 각 분야별로 구분된 분류 체계에 지정되도록 구성될 수 있다. 관계형 카탈로그를 구성하는 각 자산은 분류 체계의 하위 레벨인 카테고리에 속하도록 구성될 수 있다. 데이터맵은 카탈로그 및 자산 관련 메타데이터 정보를 기술할 수 있다. 각 카테고리에는 자산에 대한 사용방법, 어휘, 품질지표 등이 추가될 수 있다. 전술한 점과 관련된 상세한 설명은 도 2를 통해 서술되어 있어 생략하기로 한다. As described in FIG. 2 , the relational data map may be configured such that a catalog having at least one asset is assigned to a classification system classified for each field. Each asset that makes up the relational catalog can be organized to belong to a category, which is a lower level of the taxonomy. Datamaps can describe metadata information related to catalogs and assets. In each category, how to use the asset, vocabulary, quality index, etc. can be added. A detailed description related to the foregoing point is described through FIG. 2 and will be omitted.

도 6은 그래프형 DB용 데이터맵 생성 모듈의 블록도이다. 6 is a block diagram of a data map generation module for a graph type DB.

그래프형 DB용 데이터맵 생성 모듈(120)은 관계형 DB(121), 그래프 DB 처리부(123), 그래프형 데이터맵 생성부(124) 및 배포부(125)를 포함할 수 있다. The data map generation module 120 for graph type DB may include a relational DB 121, a graph DB processing unit 123, a graph type data map generation unit 124, and a distribution unit 125.

관계형 DB용 데이터맵 생성 모듈(110)과 연계되는 관계형 DB(121)는 관계형 데이터베이스에서의 데이터 저장 및 수정, 삭제 등의 변경 이벤트를 인식할 수 있다. 관계형 DB(121)는 CDC(Change Data Capture; 122)에 의해, 관계형 데이터베이스(또는 관계형 데이터맵 생성부(111))에서 변경된 데이터를 식별 및 추적하여, 변경 데이터를 메세지큐에 입력할 수 있다. 관계형 데이터베이스에서 생성되는 데이터맵과 그래프 데이터베이스에서 생성되는 데이터맵은 내용 및 구조적으로 차이가 없도록, 메시지큐는 양 데이터베이스 시스템을 상호 동기화할 수 있다. 메시지큐는 CDC(122) 기술에 포함된 요소이며, CDC(122)는 소스 데이터베이스의 테이블에 발생하거나 적용되는 삽입, 업데이트 및 삭제 작업, 변경 이벤트에 따른 처리를 타겟 데이터베이스에 적용할 수 있도록 기록하는 기술일 수 있다. CDC 기술을 사용한 통상적인 소스 시스템은 예컨대, Debezium일 수 있다. Debezium은 소스 데이터베이스에서 발생하는 변경 사항을 메시지큐 스트림에 기록하여 타겟 데이터베이스에서 해당 변경이벤트를 순차적으로 적용할 수 있도록 한다. 이러한 방식으로 관계형 데이터베이스에서 발생하는 변경 이벤트를 메시지큐를 통하여 그래프 데이터베이스에 적용해 두 데이터베이스 간 동기화가 가능하다. The relational DB 121 associated with the data map creation module 110 for relational DB can recognize change events such as storage, modification, and deletion of data in the relational database. The relational DB 121 may identify and track changed data in the relational database (or the relational data map generator 111) by a CDC (Change Data Capture) 122 and input the changed data to the message queue. The message queue can synchronize both database systems so that there is no difference in content and structure between the data map created in the relational database and the data map created in the graph database. The message queue is an element included in the CDC 122 technology, and the CDC 122 records insert, update, and delete operations occurring or applied to tables in the source database, and processing according to change events so that they can be applied to the target database. it could be technology. A typical source system using CDC technology would be Debezium, for example. Debezium records the changes that occur in the source database to the message queue stream so that the corresponding change events can be sequentially applied in the target database. In this way, synchronization between the two databases is possible by applying change events that occur in the relational database to the graph database through the message queue.

그래프 DB 처리부(123)는 메시지큐에서 대기하는 데이터에 대해, 표준 어휘를 사용하여, 그래프형 데이터베이스(또는 그래프형 데이터맵)를 구성하는 요소, 즉 노드, 노드 속성, 노드간 관계로 맵핑하여 상술한 요소들을 생성할 수 있다. 각 데이터 자산은 하나의 노드로 생성되며. 각 노드의 속성은 자산의 메타데이터와 맵핑될 수 있다. 이 경우, 표준 어휘에서 각 메타데이터의 범위는 그래프 노드의 속성으로 맵핑되거나, 신규 노드를 생성하여 노드간 관계로 맵핑될 수 있다. The graph DB processor 123 maps the data waiting in the message queue to elements constituting a graph-type database (or graph-type data map), that is, nodes, node attributes, and relationships between nodes using a standard vocabulary. elements can be created. Each data asset is created as a single node. Attributes of each node may be mapped with asset metadata. In this case, the range of each metadata in the standard vocabulary may be mapped to an attribute of a graph node or may be mapped to a relationship between nodes by creating a new node.

그래프형 데이터맵 생성부(124)는 그래프 데이터베이스의 각 요소로 매칭된 자산과 자산의 메타데이터를 데이터맵의 구조에 따라, 그래프 형태를 갖는 그래프 데이터베이스로 저장할 수 있다.The graph data map generation unit 124 may store assets matched with each element of the graph database and metadata of the assets as a graph database having a graph form according to the structure of the data map.

배포부(125)는 관계형 DB용 데이터맵 생성 모듈(110)과 마찬가지로, 그래프 형태로 구축된 데이터맵을 RDF 파일로 생성할 수 있다. RDF 파일은 예컨대, Turtle, N-Triples, JSON-LD(my: 연결된 데이터를 위한 JavaScript 개체 표기법), TriG, RDF/XML 형태로 구성될 수 있다. Like the data map generation module 110 for relational DB, the distribution unit 125 may generate a data map constructed in a graph form as an RDF file. RDF files may be configured in the form of, for example, Turtle, N-Triples, JSON-LD (my: JavaScript object notation for connected data), TriG, and RDF/XML.

도 7은 그래프형 데이터베이스에서 노드의 추가를 예시하는 도면이다. 7 is a diagram illustrating the addition of nodes in a graph-like database.

그래프 데이터베이스는 도 7에서와 같이, 노드, 노드 속성, 노드간 관계에 상응하는 관계선으로 구성될 수 있다. 각 노드는 노드 속성을 가지며, 노드 간에는 관계선으로 관계를 형성할 수 있다. As shown in FIG. 7, the graph database may be composed of nodes, node properties, and relationship lines corresponding to relationships between nodes. Each node has a node property, and a relationship can be formed between nodes with a relationship line.

하나의 자산(예컨대, dataset 001)에 하나의 노드가 생성될 수 있다. 메타데이터 종류에 따라, 자산의 메타데이터가 일부는 그래프 데이터베이스 노드 속성(예컨대, dataset 001의 노드 내의 디스크립션)으로 생성되며, 다른 일부는 관계선으로 할당될 수 있다. 노드 속성 및 관계선으로 결정되는 기준은 메타데이터를 서술하는 표준 어휘의 용어에 따를 수 있다. 용어는 해당 용어를 적용할 수 있는 범위(Range)가 설정되어 있다. One node can be created for one asset (eg, dataset 001). Depending on the type of metadata, some of the asset's metadata may be generated as graph database node properties (e.g., descriptions within the nodes of dataset 001), and other portions may be assigned as relationship lines. The criterion determined by the node properties and relationship lines may follow terms of a standard vocabulary describing metadata. For terms, a range to which the term can be applied is set.

도 7을 예로 들어 설명하면, 'dcterms:title' 속성의 범위(Range)는 'rdfs:Literal'이다. 텍스트 형태의 문자열 또는 숫자와 같은 리터럴(literal) 값들이 'rdfs:Literal'에 속한다. 따라서 'dataset 001'의 제목이라는 메타데이터를 서술할 때에, 'dcterms:title' 용어를 사용한다면 텍스트 문자열 또는 숫자와 같은 리터럴 값만 그 값으로 작성할 수 있다. 다른 예로, 'dcat:mediaType'은 DCAT 어휘에 속한 용어이며, 해당 배포 파일이 자산이 속하는 미디어 타입을 나타내는 메타데이터 속성이다. 'dcat:mediaType'의 범위(Range)는 'dcterms:MediaType'으로 IANA에서 정의하고 있는 미디어 형식들만 'dcat:mediaType'의 값이 될 수 있다. Referring to FIG. 7 as an example, the range of the 'dcterms:title' property is 'rdfs:Literal'. Literal values such as text strings or numbers belong to 'rdfs:Literal'. Therefore, when describing metadata called the title of 'dataset 001', if the term 'dcterms:title' is used, only literal values such as text strings or numbers can be written as the value. As another example, 'dcat:mediaType' is a term belonging to the DCAT vocabulary, and is a metadata property indicating the media type to which the asset of the corresponding distribution file belongs. The range of 'dcat:mediaType' is 'dcterms:MediaType', and only the media types defined by IANA can be the value of 'dcat:mediaType'.

도 7의 예시에 따르면, 자산은 하나의 노드로 생성되고, 자산의 메타데이터(속성)를 나타내는 용어의 범위(Range)가 'rdfs:Literal', 'xsd:duration', 'xsd:decimal'인 경우, 자산의 속성은 해당 노드의 노드 속성으로 설정될 수 있다. According to the example of FIG. 7, an asset is created as one node, and the range of terms representing metadata (attributes) of the asset is 'rdfs:Literal', 'xsd:duration', 'xsd:decimal' In this case, the property of the asset may be set as the node property of the corresponding node.

용어의 범위가 3가지 이외에 해당하는 속성은 'dcterms:license'에서 같이, 관계선으로 생성될 수 있다. 아울러, 해당 속성의 값인 'https://creativecommons.org/licenses/by/4.0'은 신규 노드에 할당되어, 해당 노드와 관계성을 가지는 형태로 구성될 수 있다. Attributes with a range of terms other than three can be created as relation lines, as in 'dcterms:license'. In addition, 'https://creativecommons.org/licenses/by/4.0', the value of the corresponding attribute, may be assigned to a new node and configured in a form having a relationship with the corresponding node.

본 개시에 따르면, 관계형 데이터맵은 데이터셋들 간의 연관 관계들이 기술되어 있으며, 연관 관계들을 포함하는 정보들을 그래프 데이터베이스에 저장하면, RDF 등의 트리플 구조의 파일 생성과 연관검색 성능을 향상시킬 수 있다.According to the present disclosure, relational data maps describe associations between datasets, and storing information including associations in a graph database can improve the creation of triple-structured files such as RDF and the performance of association searches. .

상세하게는, 카탈로그 및 자산 관련 메타데이터를 관리하는 관계형 데이터베이스가 그래프 데이터베이스로 전환되면, 그래프 데이터베이스가 검색 성능 측면에서 관계형 데이터베이스보다 더 우수하다. 따라서, 그래프 데이터베이스 기반으로 데이터맵을 생성하여 관리함으로써, 데이터 탐색 및 검색 정확성이 향상될 수 있다.Specifically, when a relational database managing catalogs and asset-related metadata is converted to a graph database, the graph database is superior to the relational database in terms of search performance. Accordingly, data search and search accuracy can be improved by creating and managing a data map based on a graph database.

본 개시의 예시적인 방법들은 설명의 명확성을 위해서 동작의 시리즈로 표현되어 있지만, 이는 단계가 수행되는 순서를 제한하기 위한 것은 아니며, 필요한 경우에는 각각의 단계가 동시에 또는 상이한 순서로 수행될 수도 있다. 본 개시에 따른 방법을 구현하기 위해서, 예시하는 단계에 추가적으로 다른 단계를 포함하거나, 일부의 단계를 제외하고 나머지 단계를 포함하거나, 또는 일부의 단계를 제외하고 추가적인 다른 단계를 포함할 수도 있다.Exemplary methods of this disclosure are presented as a series of operations for clarity of explanation, but this is not intended to limit the order in which steps are performed, and each step may be performed concurrently or in a different order, if desired. In order to implement the method according to the present disclosure, other steps may be included in addition to the exemplified steps, other steps may be included except for some steps, or additional other steps may be included except for some steps.

본 개시의 다양한 실시 예는 모든 가능한 조합을 나열한 것이 아니고 본 개시의 대표적인 양상을 설명하기 위한 것이며, 다양한 실시 예에서 설명하는 사항들은 독립적으로 적용되거나 또는 둘 이상의 조합으로 적용될 수도 있다.Various embodiments of the present disclosure are intended to explain representative aspects of the present disclosure, rather than listing all possible combinations, and matters described in various embodiments may be applied independently or in combination of two or more.

또한, 본 개시의 다양한 실시 예는 하드웨어, 펌웨어(firmware), 소프트웨어, 또는 그들의 결합 등에 의해 구현될 수 있다. 하드웨어에 의한 구현의 경우, 하나 또는 그 이상의 ASICs(Application Specific Integrated Circuits), DSPs(Digital Signal Processors), DSPDs(Digital Signal Processing Devices), PLDs(Programmable Logic Devices), FPGAs(Field Programmable Gate Arrays), 범용 프로세서(general processor), 컨트롤러, 마이크로 컨트롤러, 마이크로 프로세서 등에 의해 구현될 수 있다. In addition, various embodiments of the present disclosure may be implemented by hardware, firmware, software, or a combination thereof. For hardware implementation, one or more application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field programmable gate arrays (FPGAs), It may be implemented by a processor (general processor), controller, microcontroller, microprocessor, or the like.

본 개시의 범위는 다양한 실시 예의 방법에 따른 동작이 장치 또는 컴퓨터 상에서 실행되도록 하는 소프트웨어 또는 머신-실행가능한 명령들(예를 들어, 운영체제, 애플리케이션, 펌웨어(firmware), 프로그램 등), 및 이러한 소프트웨어 또는 명령 등이 저장되어 장치 또는 컴퓨터 상에서 실행 가능한 비-일시적 컴퓨터-판독가능 매체(non-transitory computer-readable medium)를 포함한다. The scope of the present disclosure is software or machine-executable instructions (eg, operating systems, applications, firmware, programs, etc.) that cause operations according to methods of various embodiments to be executed on a device or computer, and such software or It includes a non-transitory computer-readable medium in which instructions and the like are stored and executable on a device or computer.

Claims

The relational data map creation module for relational DB creates a relational data map that describes asset-related metadata in a catalog classified according to classification system and category in a preset standard, and uses a standard vocabulary to generate the relational data map. Converting to RDF triple structure; and
The data map creation module for graph-type DB transfers change data related to the change event in the relational data map to the message queue, and uses standard vocabulary for the change data input to the message queue to determine node and node properties. and generating a graph-type data map mapping relationships between nodes.