[go: up one dir, main page]

CN110222110A - A kind of resource description framework data conversion storage integral method based on ETL tool - Google Patents

A kind of resource description framework data conversion storage integral method based on ETL tool Download PDF

Info

Publication number
CN110222110A
CN110222110A CN201910510063.4A CN201910510063A CN110222110A CN 110222110 A CN110222110 A CN 110222110A CN 201910510063 A CN201910510063 A CN 201910510063A CN 110222110 A CN110222110 A CN 110222110A
Authority
CN
China
Prior art keywords
data
rdf
storage
conversion
data conversion
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910510063.4A
Other languages
Chinese (zh)
Inventor
孙坦
鲜国建
赵瑞雪
李娇
黄永文
寇远涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Agricultural Information Institute of CAAS
Original Assignee
Agricultural Information Institute of CAAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Agricultural Information Institute of CAAS filed Critical Agricultural Information Institute of CAAS
Priority to CN201910510063.4A priority Critical patent/CN110222110A/en
Publication of CN110222110A publication Critical patent/CN110222110A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/258Data format conversion from or to a database
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/283Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明公开一种基于ETL工具的资源描述框架数据转换存储一体化方法,步骤如下:1、关系型数据预处理,通过数据库添加唯一键值,作为数据转换过程中的资源名;2、获取关系型数据库中结构化数据的字段信息,确定待转换数据的必备字段;3、针对数据转换脚本添加RDF输出格式形成完整程序;4、根据待转换数据类型确定命名空间及命名空间前缀,限定变量定义范围;5、设置主语及实例映射规则,将获取的源数据字段转换为HttpURI;6、设置属性映射规则;7、设置存储格式及存储目标位置,执行转换及存储操作。本发明可提高数据处理效率与连贯性;可提高资源调度效率;可支持零编码低人工成本的数据转换,具较好的可扩展性。

The invention discloses a resource description framework data conversion and storage integration method based on an ETL tool. The steps are as follows: 1. Preprocessing relational data, adding a unique key value through a database as a resource name in the data conversion process; 2. Obtaining the relationship 3. Add RDF output format to the data conversion script to form a complete program; 4. Determine the namespace and namespace prefix according to the data type to be converted, and limit variables Define the scope; 5. Set the subject and instance mapping rules, convert the obtained source data fields into HttpURI; 6. Set the attribute mapping rules; 7. Set the storage format and storage target location, and perform conversion and storage operations. The invention can improve data processing efficiency and coherence; can improve resource scheduling efficiency; can support zero coding and low labor cost data conversion, and has good scalability.

Description

一种基于ETL工具的资源描述框架数据转换存储一体化方法An integrated method for data conversion and storage of resource description framework based on ETL tools

技术领域technical field

本发明涉及计算机技术领域,特别涉及一种基于ETL工具的资源描述框架数据转换存储一体化方法。The invention relates to the field of computer technology, in particular to an integrated method for data conversion and storage of a resource description framework based on an ETL tool.

背景技术Background technique

随着语义网的发展,以资源描述框架(Resource Description Framework,简称RDF)形式发布出来的数据迅速激增并逐渐成为互联网的基础。RDF的基本结构是三元组,包括主语(subject)、谓语(predicate)和宾语(object),分别对应于资源、属性和属性值。RDF数据可以与其它数据库关联,构建语义网上互联共享的数据空间,使得数据更加可用、可操作。With the development of the Semantic Web, the data published in the form of Resource Description Framework (RDF) surges rapidly and gradually becomes the basis of the Internet. The basic structure of RDF is a triple, including subject, predicate and object, which correspond to resources, attributes and attribute values respectively. RDF data can be associated with other databases to build a shared data space on the Semantic Internet, making the data more available and operable.

目前,由于文档网络向数据网络的转变,大量数据存储在关系型数据库(RelatedDatabase,简称RDB)中,RDB在小数据处理、复杂SQL操作及可视性等方面具有一定的优势,但语义的欠缺使得其在实际应用场景中的价值发挥受限。因此,关系型数据向RDF的转化受到了相关机构和学者的关注,研发出了各种通用或者特定领域的映射工具或者应用软件,如D2RQ根据关联数据原则把关系型数据作为RDF图发布到映射平台,Virtuoso RDF视图把SQL SELECT查询结果集转换成RDF集合,W3C制定的R2RML通过定义词汇表、映射规则和语法、映射框架和工作机制实现RDB到RDF的映射转换。但目前各种工具或者语言在映射规则文件准备、扩展性及转换后的数据存储模式上均有很大的局限性,且人工时间投入成本有待优化。At present, due to the transformation from document network to data network, a large amount of data is stored in a relational database (Related Database, referred to as RDB). RDB has certain advantages in small data processing, complex SQL operations and visibility, but lacks semantics. This limits its value in practical application scenarios. Therefore, the conversion of relational data to RDF has attracted the attention of relevant institutions and scholars, and various general or specific domain mapping tools or application software have been developed. For example, D2RQ publishes relational data as RDF graphs to mapping Platform, Virtuoso RDF view converts SQL SELECT query result set into RDF set, and R2RML formulated by W3C realizes the mapping conversion from RDB to RDF by defining vocabulary, mapping rules and syntax, mapping framework and working mechanism. However, at present, various tools or languages have great limitations in the preparation of mapping rule files, scalability and converted data storage mode, and the cost of labor time investment needs to be optimized.

RDB到RDF的映射方式上,亚马逊云(Amazon Cloud)尝试通过抽取转换装载过程(Extract-Transform-Load,简称ETL)将源数据库转换成图表示的静态方法,但在超大数据集的存储问题上遇到难题,且因为专业ETL工具不支持RDF输出等原因,该方法并未采用相关工具,而是采用编码方式实现,灵活性及可扩展性欠缺。然而,ETL工具因其用户友好、组件丰富、定制规则灵活、管理功能强大等优势,在数据仓储建设中被广泛使用,形成了一整套数据管理生态体系,具有良好的应用环境,但因ETL工具在图数据格式支持等方面的局限性,限制了其在RDF转换或生成实践中的应用。In the way of mapping RDB to RDF, Amazon Cloud (Amazon Cloud) tries to convert the source database into a static method represented by a graph through the Extract-Transform-Load (ETL) process, but in terms of the storage of large data sets Encountered difficulties, and because professional ETL tools do not support RDF output and other reasons, this method does not use related tools, but implements it by coding, which lacks flexibility and scalability. However, ETL tools are widely used in data warehouse construction due to their user-friendliness, rich components, flexible customization rules, and powerful management functions, forming a complete data management ecosystem with a good application environment. The limitations in graph data format support and other aspects limit its application in RDF conversion or generation practice.

发明内容Contents of the invention

本发明针对现有关系型数据库向RDF三元组数据转换及存储过程中存在的依赖人工编码匹配而导致的时间成本高、扩展性差,数据存储格式单一等局限性问题,提出了一种基于ETL工具的资源描述框架数据转换存储一体化方法,最终实现高适配的RDF三元组数据转换及存储。上述发明通过以下技术方案实现:The present invention aims at the limitations of high time cost, poor scalability, single data storage format and other limitations caused by the dependence on manual code matching in the existing relational database to RDF triple data conversion and storage process, and proposes an ETL-based The resource description framework data conversion and storage integration method of the tool finally realizes highly adaptable RDF triple data conversion and storage. Above-mentioned invention realizes by following technical scheme:

一方面,提供一种对现有ETL工具映射规则及图数据存储模式的改造方法,所述方法包括:On the one hand, a method for transforming existing ETL tool mapping rules and graph data storage modes is provided, the method comprising:

步骤1.在ETL工具代码层定义RDB中表、列、行分别映射到RDF中类、属性、资源的转换规则,,并对其进行功能封装;Step 1. Define conversion rules for mapping tables, columns, and rows in RDB to classes, attributes, and resources in RDF in the ETL tool code layer, and perform functional encapsulation on them;

步骤2.设计ETL工具的RDF数据混合存储模式,可灵活选择本地或图数据库存储,或者两者同时存储,并对其进行功能封装。Step 2. Design the RDF data hybrid storage mode of the ETL tool, which can flexibly choose local or graph database storage, or both, and encapsulate its functions.

其中,所述的RDB映射到RDF中的转换规则,定义为:RDB.表->RDF.类、RDB.列->RDF.属性、RDB.行->RDF.资源,其中,任意一条实例数据都可指定多个类。Wherein, the conversion rule in which RDB is mapped to RDF is defined as: RDB.table->RDF.class, RDB.column->RDF.attribute, RDB.row->RDF.resource, wherein any instance data Multiple classes can be specified.

另一方面,提供了一种基于ETL工具的资源描述框架数据转换存储一体化方法,所述方法包括:On the other hand, an ETL tool-based resource description framework data conversion and storage integration method is provided, the method comprising:

步骤1.关系型数据预处理,通过数据库添加唯一键值(UniqueKey),作为数据转换过程中的资源名。Step 1. Relational data preprocessing, adding a unique key value (UniqueKey) through the database as the resource name in the data conversion process.

步骤2.获取关系型数据库中结构化数据的字段信息,确定待转换数据的必备字段。在ETL工具中添加数据转换脚本,输入连接到源数据库,即存储待转换数据表的关系型数据库,通过SQL查询给出源数据表中所有字段,根据预构建的语义模型选定并获取数据表中待转换的必备字段。Step 2. Obtain the field information of the structured data in the relational database, and determine the necessary fields of the data to be converted. Add a data conversion script in the ETL tool, input and connect to the source database, that is, the relational database that stores the data table to be converted, and query all the fields in the source data table through SQL query, select and obtain the data table according to the pre-built semantic model Mandatory fields to be converted in .

步骤3.针对数据转换脚本添加RDF输出格式形成完整程序。Step 3. Add the RDF output format for the data conversion script to form a complete program.

步骤4.根据待转换数据的类型确定命名空间及命名空间前缀,限定变量定义范围。Step 4. Determine the namespace and namespace prefix according to the type of data to be converted, and limit the scope of variable definition.

步骤5.设置主语及实例映射规则,将获取的源数据字段对应转换为HttpURI。Step 5. Set the subject and instance mapping rules, and convert the acquired source data fields into HttpURIs.

步骤6.设置属性映射规则,包括:获取的源字段、对象属性、数据属性、对象及谓语实例URI、多值分隔符、数据类型;其中,获取的源字段,即步骤2选定的字段;对象属性,取值范围是预构建语义模型中的所有类,可多值;数据属性,取值范围是字面量,可多值;对象及谓语实例URI:唯一标识,对应于命名空间里某个类的实体对象;多值分隔符:聚合操作时可选用;数据类型:生成数据的溯源信息,支持用户基于xml data type的数据统计分析查找。Step 6. Set attribute mapping rules, including: obtained source field, object attribute, data attribute, object and predicate instance URI, multi-value delimiter, data type; wherein, the obtained source field is the field selected in step 2; Object attribute, the value range is all classes in the pre-built semantic model, which can be multi-valued; data attribute, the value range is literal value, which can be multi-valued; object and predicate instance URI: unique identifier, corresponding to a namespace in the namespace The entity object of the class; multi-value delimiter: optional for aggregation operations; data type: generate data traceability information, and support users to search for statistical analysis of data based on xml data type.

步骤7.设置存储格式及存储目标位置,执行转换及存储操作。Step 7. Set the storage format and storage target location, and perform conversion and storage operations.

本发明与现有技术相比的优点在于:The advantage of the present invention compared with prior art is:

本发明首次将主流的ETL工具应用到RDF生成过程中,整合打通了数据仓库建设、RDF生成、RDF存储设置的一系列流程,提高数据处理效率与连贯性;The present invention applies mainstream ETL tools to the RDF generation process for the first time, integrates a series of processes of data warehouse construction, RDF generation, and RDF storage settings, and improves data processing efficiency and consistency;

本发明基于RDF转换及存储需求对ETL工具的改造,使其支持RDF三元组数据输出模式,并设计了“本地+多个图数据库”的混合存储模式,可以提高资源调度效率;The invention transforms the ETL tool based on RDF conversion and storage requirements, so that it supports the RDF triple data output mode, and designs a hybrid storage mode of "local + multiple graph databases", which can improve resource scheduling efficiency;

改造后的ETL工具将映射规则及封装为插件形式,可支持零编码低人工成本的数据转换,且可前置切词、标引工具,具有较好的可扩展性。The modified ETL tool encapsulates the mapping rules and tools in the form of plug-ins, which can support data conversion with zero coding and low labor costs, and can pre-install word segmentation and indexing tools, which has good scalability.

附图说明Description of drawings

图1为本发明中提供的RDB到RDF映射规则示意图。FIG. 1 is a schematic diagram of the mapping rules from RDB to RDF provided in the present invention.

图2为本发明的一种基于ETL工具的资源描述框架数据转换存储一体化流程图。FIG. 2 is a flow chart of data conversion and storage integration of a resource description framework based on an ETL tool in the present invention.

具体实施方式Detailed ways

为使本发明的目的、技术方案和优点更加清楚,下面将结合附图对本发明的实施方式进一步地详细描述。In order to make the purpose, technical solution and advantages of the present invention clearer, the following will further describe in detail the embodiments of the present invention in conjunction with the accompanying drawings.

实施例一Embodiment one

本发明实例提供一种对现有ETL工具映射规则及图数据存储模式的改造方法,如图2所示,该方法的处理流程包括如下步骤:Examples of the present invention provide a method for transforming existing ETL tool mapping rules and graph data storage modes, as shown in Figure 2, the processing flow of the method includes the following steps:

步骤1.在ETL工具代码层定义RDB中表、列、行分别映射到RDF中类、属性、资源的转换规则,如图1,并对其进行功能封装;具体如下:Step 1. Define the conversion rules for mapping tables, columns, and rows in RDB to classes, attributes, and resources in RDF at the ETL tool code layer, as shown in Figure 1, and perform functional encapsulation on them; the details are as follows:

S1.1定义资源所属类的命名空间及前缀,支持常用命名空间及自定义两种方式,其中,添加的常用命名空间包括:S1.1 Define the namespace and prefix of the class to which the resource belongs, and support common namespaces and customization. Among them, the common namespaces added include:

xmlns:rdf命名空间,规定了带有前缀rdf的元素来自“http://www.w3.org/1999/02/22-rdf-syntax-ns#”The xmlns:rdf namespace specifies elements prefixed with rdf from "http://www.w3.org/1999/02/22-rdf-syntax-ns#"

xmlns:rdfs命名空间,规定了带有前缀rdfs的元素来自“http://www.w3.org/2000/01/rdf-schema/”The xmlns:rdfs namespace specifies elements with the prefix rdfs from "http://www.w3.org/2000/01/rdf-schema/"

xmlns:xsd命名空间,规定了带有前缀xsd的元素来自“http://www.w3.org/2001/XM LSchema#”The xmlns:xsd namespace specifies elements with the prefix xsd from "http://www.w3.org/2001/XM LSchema#"

xmlns:owl命名空间,规定了带有前缀owl的元素来自“http://www.w3.org/2002/07/owl#”The xmlns:owl namespace specifies that elements with the prefix owl come from "http://www.w3.org/2002/07/owl#"

xmlns:dc命名空间,规定了带有前缀dc的元素来自“http://purl.org/dc/elements/1.1/”The xmlns:dc namespace specifies that elements prefixed with dc come from "http://purl.org/dc/elements/1.1/"

xmlns:dcterms命名空间,规定了带有前缀dcterms的元素来自“http://purl.org/dc/ter ms/”The xmlns:dcterms namespace specifies elements prefixed with dcterms from "http://purl.org/dc/terms/"

S1.2遵循W3C提出的“Direct Mapping”规范,在ETL工具代码层定义RDB到RDF的简单映射规则。S1.2 follows the "Direct Mapping" specification proposed by W3C, and defines simple mapping rules from RDB to RDF at the ETL tool code layer.

RDF三元组的主语-谓语-宾语表示中,主语为要描述的资源,是可拥有URI的任何事物;谓语为资源的属性,是拥有名称的资源;宾语为属性值,可以是另外一个资源或者字符串。RDF的类和属性都是直接从RDB模式的名称中生成,因此RDB到RDF的简单映射规则定义为:RDB.表->RDF.类、RDB.列->RDF.属性、RDB.行->RDF.资源,其中,任意一条实例数据都可指定多个类。In the subject-predicate-object representation of RDF triples, the subject is the resource to be described, which is anything that can have a URI; the predicate is an attribute of a resource, which is a resource with a name; the object is an attribute value, which can be another resource or a string. The classes and properties of RDF are generated directly from the name of the RDB schema, so the simple mapping rules from RDB to RDF are defined as: RDB.Table -> RDF. Class, RDB. Column -> RDF. Property, RDB. Row -> RDF.Resource, where any piece of instance data can specify multiple classes.

S1.3设计数据记录限制函数,数据如果不是最后一条数据,则需要按批次拆分文件保存,并初始化下一个文件。S1.3 Design a data record limit function. If the data is not the last piece of data, it needs to be saved in batches and initialize the next file.

以本发明实施方式适用的KETTLE工具为一个示例,该示例不应对本发明实施例的功能和使用范围带来任何限制。在Java运行环境下定义数据记录限制函数processRow(),通过getSplitEvery判断是否是最后一条数据。Taking the KETTLE tool applicable to the embodiment of the present invention as an example, this example should not bring any limitation to the function and scope of use of the embodiment of the present invention. Define the data record restriction function processRow() in the Java operating environment, and judge whether it is the last piece of data through getSplitEvery.

步骤2.设计ETL工具的RDF数据混合存储模式,可灵活选择本地或图数据库存储,或者两者同时存储,并基于ETL工具组件封装各功能界面。Step 2. Design the RDF data hybrid storage mode of the ETL tool, which can flexibly choose local or graph database storage, or both, and encapsulate each functional interface based on the ETL tool components.

实施例二Embodiment two

实施例一为本发明实例的工具准备,本发明实例提供一种基于ETL工具的资源描述框架数据转换存储一体化方法,包括如下处理步骤:Embodiment 1 is the tool preparation for the example of the present invention. The example of the present invention provides a resource description framework data conversion and storage integration method based on ETL tools, including the following processing steps:

步骤1.关系型数据预处理,通过数据库添加唯一键值(UniqueKey),作为数据转换过程中的资源名。Step 1. Relational data preprocessing, adding a unique key value (UniqueKey) through the database as the resource name in the data conversion process.

在关系型数据库的关系模型中增加自增型数值作为唯一键值,数据转换过程中将自动生成为RDF三元组中的唯一资源标识符的部分。Add the self-incrementing value as the unique key value in the relational model of the relational database, and it will be automatically generated as the part of the unique resource identifier in the RDF triple during the data conversion process.

步骤2.在通过实施例一改造的ETL工具中添加数据转换脚本,输入连接到关系型数据库,获取待转换数据的必备字段。Step 2. Add a data conversion script in the ETL tool modified by Embodiment 1, input and connect to the relational database, and obtain the necessary fields of the data to be converted.

在ETL工具中添加数据转换脚本,添加表输入(table input),连接到源数据库(即存储待转换数据表的关系型数据库),本发明实施案例中,以关系型数据库MySQL为例,通过SQL查询给出源数据表中所有字段,根据预构建的语义模型选定并获取数据表中待转换的必备字段,抽取结果如期刊数据的“期刊名称”“期号”。Add data conversion script in ETL tool, add table input (table input), be connected to source database (namely store the relational database of data table to be converted), in the implementation case of the present invention, take relational database MySQL as example, through SQL The query gives all the fields in the source data table, selects and obtains the necessary fields to be converted in the data table according to the pre-built semantic model, and extracts the results such as the "journal name" and "issue number" of the journal data.

步骤3.数据转换脚本中添加RDF三元组输出(RDF output),形成完整程序。Step 3. Add RDF triple output (RDF output) to the data conversion script to form a complete program.

步骤4.根据待转换数据的类型添加命名空间和命名空间前缀,限定元素定义范围。Step 4. Add a namespace and a namespace prefix according to the type of data to be converted to limit the scope of element definition.

命名空间(namespace)是解决元素和属性聚合到一个共同标题下名称冲突的方法,它可以赋予不同命名空间下同名元素不同含义。在XML中,命名空间可以确保所有元素和属性名在整个系统中是唯一的,不必担心元素名相同而引发的混淆。本实施案例中常用xmlns:rdf、xmlns:rdfs、xmlns:xsd、xmlns:owl、xmlns:dc、xmlns:dcterms等。用户也可根据预构建的语义模型需求自定义或指定已定义的自有空间。Namespace (namespace) is a method to resolve name conflicts of elements and attributes aggregated under a common title, which can give different meanings to elements with the same name under different namespaces. In XML, a namespace ensures that all element and attribute names are unique throughout the system, so you don't have to worry about confusion caused by the same element name. In this implementation case, xmlns:rdf, xmlns:rdfs, xmlns:xsd, xmlns:owl, xmlns:dc, xmlns:dcterms, etc. are commonly used. Users can also customize or specify a defined self-owned space according to the pre-built semantic model requirements.

步骤5.基于映射规则设置数据转换脚本程序。Step 5. Set up the data conversion script program based on the mapping rules.

结构化数据向RDF三元组的转换需明确源数据字段到目标三元的映射规则,配置相应参数,确保各元素、属性和实例对应的转换。The conversion of structured data to RDF triples needs to clarify the mapping rules from source data fields to target triples, and configure corresponding parameters to ensure the corresponding conversion of each element, attribute, and instance.

具体实施步骤如下:The specific implementation steps are as follows:

(5.1)添加待转换数据中的源字段,设置主语及实例映射规则,指定待转换数据的资源类型URI(Unified Resource Identifier,统一资源标识符)及所属类,其中,所属类可多值。(5.1) Add the source field in the data to be converted, set the subject and instance mapping rules, specify the resource type URI (Unified Resource Identifier, Uniform Resource Identifier) and the class of the data to be converted, where the class can have multiple values.

(5.2)将待转换字段映射到步骤4中添加的命名空间下,对应于命名空间里某个类的实体对象。(5.2) Map the field to be converted to the namespace added in step 4, corresponding to an entity object of a certain class in the namespace.

(5.3)设置谓词及属性名称、宾语实例URI。(5.3) Set the predicate and attribute name, object instance URI.

(5.4)对于同类实例进行聚合操作时可选用多值分隔符。(5.4) Multi-value delimiters can be selected when performing aggregation operations on similar instances.

(5.5)设置数据类型,即生成数据的溯源信息,用于统计分析中基于xml datatype的查找。(5.5) Set the data type, that is, the traceability information of the generated data, which is used for searching based on xml datatype in statistical analysis.

(5.6)设置数据集元数据映射规则,包括谓词及属性类型、谓词及属性名称、属性取值、数据类型。(5.6) Set metadata mapping rules for datasets, including predicates and attribute types, predicates and attribute names, attribute values, and data types.

步骤6.编辑存储格式及存储目标位置,根据需求可灵活选择本地或图数据库存储,或者两者同时存储。Step 6. Edit the storage format and storage target location. According to the needs, you can flexibly choose local or graph database storage, or both.

步骤7.执行数据转换操作,同时存储转换后的RDF三元组。Step 7. Perform data conversion operations and store the converted RDF triples at the same time.

本发明说明书中未作详细描述的内容属于本领域专业技术人员公知的现有技术。The contents not described in detail in the description of the present invention belong to the prior art known to those skilled in the art.

以上所述仅是本发明一种基于ETL工具的资源描述框架数据转换存储一体化方法的优选实施方式,应当指出,对于本技术领域的普通技术人员来说,在不脱离本发明一种基于ETL工具的资源描述框架数据转换存储一体化方法原理的前提下,还可以作出若干改进和润饰,这些改进和润饰都应属于本发明所附的权利要求的保护范围。The above is only a preferred embodiment of the method for integrating resource description framework data conversion and storage based on ETL tools in the present invention. On the premise of the resource description framework data conversion and storage integration method principle of the tool, some improvements and modifications can be made, and these improvements and modifications should belong to the protection scope of the appended claims of the present invention.

Claims (4)

1. a kind of remodeling method to existing ETL tool mapping ruler and diagram data memory module, it is characterised in that: the method Include:
Step 1. ETL instrumentation code layer defines table in RDB, column and row is respectively mapped to class in RDF, the conversion of attribute, resource rule Then, and to it function package is carried out;
The RDF data that step 2. designs ETL tool mixes memory module, and local or chart database storage, or both may be selected It stores simultaneously, and function package is carried out to it.
2. remodeling method according to claim 1, it is characterised in that: the RDB in the step 1, which is mapped in RDF, to be turned Rule is changed, is defined as: RDB. table -> RDF. class, RDB. column -> RDF. attribute, RDB. row -> RDF. resource, wherein any one Instance data all may specify multiple classes.
3. a kind of resource description framework data conversion storage integral method based on ETL tool, it is characterised in that: the storage Integral method includes:
The pretreatment of step 1. relational data, adds unique key assignments by database, as the resource name in data conversion process;
Step 2. obtains the field information of structural data in relevant database, determines the mandatory field of data to be converted;
Step 3. forms complete routine for data conversion script addition RDF output format;
Step 4. determines NameSpace and namespace prefix according to the type of data to be converted, limits variable-definition range;
Subject and example mapping ruler is arranged in step 5., and the source data field corresponding conversion that will acquire is HttpURI;
Step 6. sets a property mapping ruler, comprising: source field, object properties, data attribute, object and the predicate example of acquisition URI, multivalue separator, data type;
Storage format and storage target position is arranged in step 7., executes conversion and storage operation.
4. storage integral method according to claim 3, it is characterised in that: the step 2 specifically: in ETL tool Middle addition data conversion script, input are connected to source database, that is, store the relevant database of tables of data to be converted, pass through SQL query provides all fields in source data table, is selected and is obtained to be converted in tables of data according to the semantic model of prebuild Mandatory field.
CN201910510063.4A 2019-06-13 2019-06-13 A kind of resource description framework data conversion storage integral method based on ETL tool Pending CN110222110A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910510063.4A CN110222110A (en) 2019-06-13 2019-06-13 A kind of resource description framework data conversion storage integral method based on ETL tool

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910510063.4A CN110222110A (en) 2019-06-13 2019-06-13 A kind of resource description framework data conversion storage integral method based on ETL tool

Publications (1)

Publication Number Publication Date
CN110222110A true CN110222110A (en) 2019-09-10

Family

ID=67816968

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910510063.4A Pending CN110222110A (en) 2019-06-13 2019-06-13 A kind of resource description framework data conversion storage integral method based on ETL tool

Country Status (1)

Country Link
CN (1) CN110222110A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110598072A (en) * 2019-09-24 2019-12-20 恩亿科(北京)数据科技有限公司 Feature data aggregation method and device
CN110704635A (en) * 2019-09-16 2020-01-17 金色熊猫有限公司 Conversion method and device for ternary group data in knowledge graph
CN111291024A (en) * 2020-02-19 2020-06-16 京东方科技集团股份有限公司 Data processing method, device, electronic device and storage medium
CN112163248A (en) * 2020-10-12 2021-01-01 重庆大学 Rule-based process resource environmental load data normalization method
CN112163031A (en) * 2020-11-11 2021-01-01 西安四叶草信息技术有限公司 Method for extracting graph data based on mind map
CN112256927A (en) * 2020-10-21 2021-01-22 网易(杭州)网络有限公司 Method and device for processing knowledge graph data based on attribute graph
CN112925749A (en) * 2021-02-20 2021-06-08 北京同邦卓益科技有限公司 Data processing method and device, electronic equipment and storage medium
CN113535758A (en) * 2021-09-09 2021-10-22 浩鲸云计算科技股份有限公司 Big data system and method for converting traditional database scripts into cloud in batch
CN114996370A (en) * 2022-08-03 2022-09-02 杰为软件系统(深圳)有限公司 Data conversion and migration method from relational database to semantic triple

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101436192A (en) * 2007-11-16 2009-05-20 国际商业机器公司 Method and apparatus for optimizing inquiry aiming at vertical storage type database
US8037108B1 (en) * 2009-07-22 2011-10-11 Adobe Systems Incorporated Conversion of relational databases into triplestores
CN105446966A (en) * 2014-05-30 2016-03-30 国际商业机器公司 Relation data-to-RDF format data mapping rule generation method and device
CN106611046A (en) * 2016-12-16 2017-05-03 武汉中地数码科技有限公司 Big data technology-based space data storage processing middleware framework

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101436192A (en) * 2007-11-16 2009-05-20 国际商业机器公司 Method and apparatus for optimizing inquiry aiming at vertical storage type database
US8037108B1 (en) * 2009-07-22 2011-10-11 Adobe Systems Incorporated Conversion of relational databases into triplestores
CN105446966A (en) * 2014-05-30 2016-03-30 国际商业机器公司 Relation data-to-RDF format data mapping rule generation method and device
CN106611046A (en) * 2016-12-16 2017-05-03 武汉中地数码科技有限公司 Big data technology-based space data storage processing middleware framework

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
刘振等: "RDB_to_RDF的技术方法和工具综述", 《现代图书情报技术》 *
温浩宇: "《Web网站设计与开发教程 HTML5、JSP版》", 30 April 2018 *

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110704635A (en) * 2019-09-16 2020-01-17 金色熊猫有限公司 Conversion method and device for ternary group data in knowledge graph
CN110704635B (en) * 2019-09-16 2023-12-12 金色熊猫有限公司 Method and device for converting triplet data in knowledge graph
CN110598072B (en) * 2019-09-24 2022-03-01 恩亿科(北京)数据科技有限公司 Feature data aggregation method and device
CN110598072A (en) * 2019-09-24 2019-12-20 恩亿科(北京)数据科技有限公司 Feature data aggregation method and device
CN111291024A (en) * 2020-02-19 2020-06-16 京东方科技集团股份有限公司 Data processing method, device, electronic device and storage medium
CN111291024B (en) * 2020-02-19 2023-11-24 京东方科技集团股份有限公司 Data processing method, device, electronic equipment and storage medium
CN112163248A (en) * 2020-10-12 2021-01-01 重庆大学 Rule-based process resource environmental load data normalization method
CN112256927A (en) * 2020-10-21 2021-01-22 网易(杭州)网络有限公司 Method and device for processing knowledge graph data based on attribute graph
CN112256927B (en) * 2020-10-21 2024-06-04 网易(杭州)网络有限公司 Knowledge graph data processing method and device based on attribute graph
CN112163031B (en) * 2020-11-11 2023-06-16 西安四叶草信息技术有限公司 Graph data extraction method based on thought guide graph
CN112163031A (en) * 2020-11-11 2021-01-01 西安四叶草信息技术有限公司 Method for extracting graph data based on mind map
CN112925749A (en) * 2021-02-20 2021-06-08 北京同邦卓益科技有限公司 Data processing method and device, electronic equipment and storage medium
CN113535758A (en) * 2021-09-09 2021-10-22 浩鲸云计算科技股份有限公司 Big data system and method for converting traditional database scripts into cloud in batch
CN113535758B (en) * 2021-09-09 2021-12-24 浩鲸云计算科技股份有限公司 Big data system and method for converting traditional database scripts into cloud in batch
CN114996370A (en) * 2022-08-03 2022-09-02 杰为软件系统(深圳)有限公司 Data conversion and migration method from relational database to semantic triple

Similar Documents

Publication Publication Date Title
CN110222110A (en) A kind of resource description framework data conversion storage integral method based on ETL tool
EP2652645B1 (en) Extensible rdf databases
US8140558B2 (en) Generating structured query language/extensible markup language (SQL/XML) statements
US7496599B2 (en) System and method for viewing relational data using a hierarchical schema
Jensen et al. Converting XML DTDs to UML diagrams for conceptual data integration
CA2421214C (en) Method and apparatus for xml data storage, query rewrites, visualization, mapping and referencing
CN106934062A (en) A kind of realization method and system of inquiry elasticsearch
JP2016207202A (en) Computer program for executing query mediator, multilingual data tier query method, multilingual data tier query method
US20060047648A1 (en) Comprehensive query processing and data access system and user interface
CN104573022A (en) Data query method and device for HBase
WO2020139079A1 (en) System and method for analyzing heterogeneous data by utilizing data virtualization components
CN116795859A (en) Data analysis method, device, computer equipment and storage medium
AU2007275507B2 (en) Semantic aware processing of XML documents
Ma et al. Modeling and querying temporal RDF knowledge graphs with relational databases
Graube et al. Integrating industrial middleware in linked data collaboration networks
CN101504660A (en) Query management method and system of pure extensible markup language database
CN114443656A (en) Customizable automated data model analysis tool and use method thereof
CN101719162A (en) Multi-version open geographic information service access method and system based on fragment pattern matching
Ren et al. Intelligent visualization system for big multi-source medical data based on data lake
Su-Cheng et al. Mapping of extensible markup language-to-ontology representation for effective data integration
Hauswirth et al. Linked data management
Yu et al. Research on knowledge storage and query technology based on general graph data processing framework
Chirathamjaree A data model for heterogeneous data sources
CN105574016A (en) Method for half-structured Web information extraction technology
Nenadić et al. Extending JSON-LD Framing Capabilities

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190910