CN1965316A

CN1965316A - Index for accessing XML data

Info

Publication number: CN1965316A
Application number: CN 200580018627
Authority: CN
Inventors: 西瓦桑卡兰·钱德拉塞卡; 拉维·默西; 阿希什·蒂索; 安霍-图安·特兰; 斯里达·穆卡玛拉; 埃里克·塞德拉; 尼普恩·阿加瓦尔
Original assignee: Oracle International Corp
Current assignee: Oracle International Corp
Priority date: 2004-04-09
Filing date: 2005-04-06
Publication date: 2007-05-16
Anticipated expiration: 2025-04-06
Also published as: CN100517318C

Abstract

Provides techniques for indexing XML documents. According to one embodiment, a PATH table is created to store one row for each index node of the XML document. A row of a node's PATH table includes (1) information for locating the XML document containing the node, (2) information identifying the node's path, and (3) information identifying the node's location within the hierarchy of the XML document containing the node . A row of a node's PATH table may also include a value if the node is associated with a value. The PATH table is made easy to answer XPath queries through secondary indexes.

Description

Index for accessing XML data

优先权声明priority statement

本申请要求于2004年4月9日提交的标题为“XML INDEXFOR XML DATA STORED IN VARIOUS STORAGE FORMATS”的第60/560,927号美国临时专利申请的优先权，以及要求于2004年6月16日提交的标题为“XML INDEX FOR XML DATA STORED INVARIOUS STORAGE FORMATS”的美国临时专利申请第60/580,445号的优先权，其全部内容结合于此作为参考。This application claims priority to U.S. Provisional Patent Application No. 60/560,927, filed April 9, 2004, entitled "XML INDEXFOR XML DATA STORED IN VARIOUS STORAGE FORMATS," and claims Priority to U.S. Provisional Patent Application No. 60/580,445, entitled "XML INDEX FOR XML DATA STORED INVARIOUS STORAGE FORMATS," the entire contents of which are hereby incorporated by reference.

技术领域technical field

本发明涉及管理信息，更具体地，涉及存取包含在XML文档中的信息。This invention relates to managing information, and more particularly to accessing information contained in XML documents.

背景技术Background technique

近年来，有很多允许存储和查询可扩展标记语言数据(“XML数据”)的数据库系统。尽管有很多XML查询的演进标准，但是所有标准都包括一些XPath的变量。然而，数据库系统通常不是处理XPath查询的最优系统，并且对数据库系统的查询性能还有很多要求。在XML模式定义可用的具体情况下，可获知XML实例文档中所用的结构和数据类型。然而，在XML模式定义不可用以及要查找的文档不符合任何模式的情况下，没有用于使用XPath查询的有效技术。In recent years, there have been many database systems that allow the storage and query of Extensible Markup Language data ("XML data"). Although there are many evolving standards for XML queries, all include some XPath variant. However, database systems are usually not the optimal system for processing XPath queries, and there are still many requirements for the query performance of database systems. In the specific case where an XML schema definition is available, the structure and data types used in the XML instance document are known. However, there are no efficient techniques for querying using XPath where XML schema definitions are not available and the document to be found does not conform to any schema.

一些数据库系统可采用点对点(ad-hoc)机制来满足与文档(其中，文档模式是未知的)相悖的Xpath查询。例如，数据库系统可通过对所有文档执行全扫描来满足XPath查询。虽然所有文档的全扫描可用于满足所有XPath查询，但是由于缺少索引，所以实施会非常慢。Some database systems may employ an ad-hoc mechanism to satisfy XPath queries against documents (where the document schema is unknown). For example, a database system may satisfy an XPath query by performing a full scan of all documents. While a full scan of all documents could be used to satisfy all XPath queries, the implementation would be very slow due to the lack of indexes.

满足XPath查询的另一种方法涉及文本关键字的使用。具体地，很多数据库系统都支持文本索引，并且这些数据库系统可用于满足特定的XPath。然而，该技术仅能满足XPath查询的小子集。因此，在现有数据库系统中没有可用于处理多种XPath查询的有效索引技术。Another way to satisfy XPath queries involves the use of literal keywords. In particular, many database systems support text indexing, and these database systems can be used to satisfy certain XPaths. However, this technique can only satisfy a small subset of XPath queries. Therefore, there is no effective indexing technique available for processing various XPath queries in existing database systems.

本部分描述的方法是可以推行的方法，而不必是先前已经构思或推行的方法。因此，除非另外指出，不可以仅仅因为本部分中描述的方法包含在该部分中而认为是现有技术。The approaches described in this section are approaches that could be pursued and not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, the approaches described in this section should not be admitted to be prior art merely by virtue of their inclusion in this section.

附图说明Description of drawings

通过附图中的实例而不是限制的方式来描述本发明，在附图中相同的参考标号指向相似的元件，其中：The present invention is described by way of example and not limitation in the accompanying drawings, in which like reference numerals refer to similar elements, in which:

图1是示出用于基于新XML文档更新XML索引的步骤的流程图；以及Figure 1 is a flowchart showing the steps for updating an XML index based on a new XML document; and

图2可以实施本文所述技术的系统的框图。Figure 2 is a block diagram of a system in which the techniques described herein may be implemented.

具体实施方式Detailed ways

在以下描述中，为了解释的目的，叙述了大量特定细节以提供对本发明的全面理解。然而，很显然，在没有这些特定细节的情况下也可以实施本发明。在其他实例中，以框图形式示出了已知的结构和装置，以避免不必要的使本发明不清楚。In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the invention. It may be evident, however, that the invention may be practiced without these specific details. In other instances, well-known structures and devices are shown in block diagram form in order to avoid unnecessarily obscuring the present invention.

功能综述Function overview

提供了一种用于在XML文档中索引路径、值、和次序(order，顺序)信息的机制。该机制可以在不考虑用于存储实际XML数据的数据结构(“基本结构”)和格式的情况下使用。例如，实际XML数据可以以诸如CLOB(存储实际XML文本的字符LOB)、O-R(XML模式下的对象关系结构形式)、或BLOB(存储一些二进制形式的XML的二进制LOB)的任意形式存在于数据库内部或外部的结构中。Provides a mechanism for indexing path, value, and order (order) information in an XML document. This mechanism can be used regardless of the data structure ("base structure") and format used to store the actual XML data. For example, the actual XML data can exist in the database in any form such as CLOB (a character LOB that stores actual XML text), O-R (an object-relational structure form in XML schema), or a BLOB (a binary LOB that stores some binary form of XML) inside or outside the structure.

这里描述的技术涉及使用一组用于存取XML数据的共同组成索引的结构。根据一个实施例，该索引(这里称为“XML数据”)包括三个逻辑结构：路径索引、次序索引、和值索引。在一个实施例中，所有这三个逻辑结构都存在于单个表中，这里称为PATH_TABLE。The technique described here involves using a set of structures that collectively comprise an index for accessing XML data. According to one embodiment, the index (referred to herein as "XML data") includes three logical structures: a path index, an order index, and a value index. In one embodiment, all three of these logical structures exist in a single table, referred to here as PATH_TABLE.

XPath查询语言最常用的部分包括基于值的判定和导航(父子派生)存取。如下文中将要更详细地描述的，通过追踪路径、值、和次序信息，XML索引可用于有效地满足这些存取方法。另外，基于如何实施XML索引的实施例，使用XML索引可产生以下有益效果中的一个或多个：(1)改善基于XPath查询的搜索性能，这包括路径匹配以及值判定(predicate)；(2)在片段由XPath表达式标识的情况下，处理片段提取；(3)在存在适当XML模式定义的情况下，关于值判定的数据类型认知；(4)通过添加新定义支持演进XML模式和XML索引的能力；(5)处理多类XPath表达式(包括子轴和后代轴)，以及等式和范围判定；(6)用户通过包括或排除索引中指定的路径组或命名空间来控制索引路径组的能力，这在文档定向的情况中尤其有用，其中，索引中省略了与格式化相关的标志；(7)允许存储于索引中的实际文本值的用户化，例如，删掉空格(whitespace)、不区分大小写(case-insensitive)；(8)大量加载索引以及支持并行索引创建的良好性能。The most commonly used parts of the XPath query language include value-based determination and navigation (parent-child derivation) access. As will be described in more detail below, XML indexes can be used to efficiently satisfy these access methods by tracking path, value, and order information. In addition, based on the embodiment of how the XML index is implemented, the use of the XML index can produce one or more of the following benefits: (1) improve search performance for XPath-based queries, including path matching and value predicates; (2) ) handles fragment extraction where the fragment is identified by an XPath expression; (3) datatype awareness with respect to value determination, where appropriate XML Schema definitions exist; (4) supports evolution of XML Schema and The ability to index XML; (5) handle multiple types of XPath expressions (including sub-axis and descendant axes), as well as equality and range determination; (6) users control the index by including or excluding path groups or namespaces specified in the index The ability to group paths, which is especially useful in document-oriented situations where formatting-related flags are omitted from the index; (7) allows customization of the actual text values stored in the index, e.g., removing spaces ( whitespace), case-insensitive (case-insensitive); (8) a large number of loaded indexes and good performance that supports parallel index creation.

XML文档实例XML document instance

为了解释的目的，下文中将参照下面两个XML文档给出实例：For explanatory purposes, examples are given below with reference to the following two XML documents:

po1.xmlpo1.xml

<Reference>SBELL-2002100912333601PDT</Reference><Reference>SBELL-2002100912333601PDT</Reference>

<User>SVOLLMAN</User><User>SVOLLMAN</User>

</Action></Action>

</Actions></Actions>

........

</PurchaseOrder></PurchaseOrder>

po2.xmlpo2.xml

<User>ZLOTKEY</User><User>ZLOTKEY</User>

</Action></Action>

</Action></Action>

</Actions></Actions>

........

</PurchaseOrder></PurchaseOrder>

如上面指出的，po1.xml和po2.xml仅仅是XML文档的两个实例。在此所述的技术不限于具有任何特定类型、结构、或内容的XML文档。下面将根据本发明的各种实施例给出如何索引和存取这种文档的实例。As noted above, po1.xml and po2.xml are just two instances of XML documents. The techniques described herein are not limited to XML documents having any particular type, structure, or content. Examples of how such documents are indexed and accessed according to various embodiments of the invention are given below.

XML索引XML index

根据一个实施例，XML索引是改善包括基于XPath的判定和/或基于XPath的片段提取的查询性能的域索引(domain index)。例如，可以在存储为CLOB或结构化存储的无模式XMLType列(column，栏)以及基于模式的XML上建立XML索引。在一个实施例中，XML索引是通过协作使用路径索引、值索引、和次序索引所产生的逻辑索引。According to one embodiment, the XML index is a domain index that improves query performance including XPath-based determination and/or XPath-based fragment extraction. For example, XML indexes can be built on schemaless XMLType columns (column, column) stored as CLOB or structured storage, as well as schema-based XML. In one embodiment, the XML index is a logical index generated through the cooperative use of path indexes, value indexes, and order indexes.

路径索引提供了基于简单(导航)路径表达式来查找片段的机制。值索引提供了基于值等式或范围的查找。可能存在多个次级值索引——每种数据类型一个。次序索引使分级次序信息与索引的节点相关联。次序索引用于确定XML节点之间的父子、祖孙、以及兄弟关系。Path indexes provide a mechanism for finding fragments based on simple (navigation) path expressions. Value indexes provide lookups based on value equality or ranges. There may be multiple secondary value indexes - one for each data type. An order index associates hierarchical order information with the nodes of the index. Sequence indexes are used to determine parent-child, grandparent, and sibling relationships between XML nodes.

当用户提交含有XPath(作为判定或片断标识符)的查询时，XPath语句被分解成存取XML索引表的SQL查询。所生成的查询通常执行一组受路径、值、和次序约束的查找，并将其结果适当地合并。When a user submits a query containing XPath (as a predicate or fragment identifier), the XPath statement is decomposed into a SQL query to access the XML index table. The generated queries typically perform a set of lookups subject to path, value, and order constraints, and combine their results appropriately.

路径表path table

根据一个实施例，逻辑XML索引包括PATH表和次级索引组。如上所述，每个索引XML文档可以包括许多索引节点。对于每个索引节点，PATH表包含一行。对于每个索引节点，该节点的PATH表的行包含与该节点相关的各种信息。According to one embodiment, a logical XML index includes a PATH table and a set of secondary indexes. As mentioned above, each index XML document can include many index nodes. The PATH table contains one row for each inode. For each index node, a row of that node's PATH table contains various information related to that node.

根据一个实施例，PATH表中包含的信息包括：(1)指示到节点的路径的PATHID；(2)用于对基本结构内的节点的片段数据进行定位的“位置数据”；以及(3)指示包含节点的XML文档的结构分级中该节点的位置的“分级数据”。可选地，对于与值相关的那些节点，PATH表还可以包含值信息。以下将更详细地描述每种类型的信息。According to one embodiment, the information contained in the PATH table includes: (1) the PATHID indicating the path to the node; (2) "location data" used to locate the fragment data of the node within the base structure; and (3) "Hierarchy data" indicating the position of the node in the structural hierarchy of the XML document containing the node. Optionally, for those nodes associated with values, the PATH table may also contain value information. Each type of information is described in more detail below.

路径path

XML文档结构在XML文档内的节点之间建立父子关系。XML文档中节点的“路径”反映了从“根”节点开始到特定节点的一系列父子链路。例如，到po2.xml中的“User”节点的路径为/PurchaseOrder/Actions/Action/User，这是因为“User”节点是“Action”节点的子节点，“Action”节点是“Actions”节点的子节点，以及“Actions”节点是“PurchaseOrder”节点的子节点。The XML document structure establishes parent-child relationships between nodes within the XML document. The "path" of a node in an XML document reflects a series of parent-child links starting from a "root" node to a particular node. For example, the path to the "User" node in po2.xml is /PurchaseOrder/Actions/Action/User, this is because the "User" node is a child node of the "Action" node, and the "Action" node is a child node of the "Actions" node child nodes, and the "Actions" node is a child node of the "PurchaseOrder" node.

XML索引所索引的一组XML文档在此称为“索引XML文档”。根据一个实施例，可以在所有索引XML文档中的所有路径或索引XML文档中的路径子集上建立XML索引。下文中描述了用于指定索引哪条路径的技术。由特定XML索引所索引的路径组在此称为“索引XML路径”。A set of XML documents indexed by an XML index is referred to herein as an "indexed XML document." According to one embodiment, XML indexes can be built on all paths in all indexed XML documents or a subset of paths in indexed XML documents. Techniques for specifying which path to index are described below. The set of paths indexed by a particular XML index is referred to herein as an "indexed XML path".

路径ID以上所述仅为本发明的优选实施例而已，并不用于限制本发明，对于本领域的技术人员来说，本发明可以有各种更改和变化。凡在本发明的精神和原则之内，所作的任何修改、等同替换、改进等，均应包含在本发明的保护范围之内。以上所述仅为本发明的优选实施例而已，并不用于限制本发明，对于本领域的技术人员来说，本发明可以有各种更改和变化。凡在本发明的精神和原则之内，所作的任何修改、等同替换、改进等，均应包含在本发明的保护范围之内。The path ID above is only a preferred embodiment of the present invention, and is not intended to limit the present invention. For those skilled in the art, the present invention may have various modifications and changes. Any modifications, equivalent replacements, improvements, etc. made within the spirit and principles of the present invention shall be included within the protection scope of the present invention. The above descriptions are only preferred embodiments of the present invention, and are not intended to limit the present invention. For those skilled in the art, the present invention may have various modifications and changes. Any modifications, equivalent replacements, improvements, etc. made within the spirit and principles of the present invention shall be included within the protection scope of the present invention.

根据一个实施例，为每条索引XML路径都分配唯一路径标识符(ID)。例如，可以为po1.xml和po2.xml中路径分配下表中示出的路径ID：According to one embodiment, each index XML path is assigned a unique path identifier (ID). For example, paths in po1.xml and po2.xml may be assigned the path IDs shown in the table below:

路径ID route ID 路径 path 1 1 /PurchaseOrder /PurchaseOrder 2 2 /PurchaseOrder/Reference /PurchaseOrder/Reference 3 3 /PurchaseOrder/Actions /PurchaseOrder/Actions 4 4 /PurchaseOrder/Actions/Action /PurchaseOrder/Actions/Action 5 5 /PurchaseOrder/Actions/Action/User /PurchaseOrder/Actions/Action/User

可以使用多种技术来标识路径以及为路径分配路径ID。例如，用户可以明确地列举路径，以及为这样标识的路径指定相应的路径ID。可选地，数据库服务器可以将每个XML文档解析为添加至索引XML文档组中的文档。在解析操作期间，数据库服务器识别任何还没有被分配路径ID的路径，并且自动地为这些路径分配新的路径ID。路径ID到路径的映射可以以各种方式存储在数据库中。根据一个实施例，路经ID到路径的映射独立于XML索引本身存储为元数据。Various techniques may be used to identify paths and assign path IDs to paths. For example, a user may explicitly enumerate paths, and assign corresponding path IDs to paths so identified. Optionally, the database server can parse each XML document into a document that is added to the set of indexed XML documents. During the resolution operation, the database server identifies any paths that have not been assigned a path ID, and automatically assigns new path IDs to these paths. The mapping of route IDs to routes can be stored in the database in various ways. According to one embodiment, the mapping of route IDs to routes is stored as metadata independently of the XML index itself.

根据一个实施例，相同的存取结构用于符合不同模式的XML文档。由于索引XML文档可能符合不同模式，因此每个XML文档通常将仅包含分配了路径ID的路径子集。According to one embodiment, the same access structure is used for XML documents conforming to different schemas. Since indexed XML documents may conform to different schemas, each XML document will generally only contain a subset of paths to which a path ID is assigned.

位置数据location data

与节点相关的位置数据表示包含该节点的XML文档在基本结构中存在的位置。因此，基于基本结构的性质，位置数据的性质随着实施的不同而各不相同。根据如何存储实际XML文档，位置数据还可以包括指向XML文档内的逻辑指针或定位符。逻辑指针可用于提取与由XPath标识的节点相关的片段。The position data associated with a node indicates where in the basic structure the XML document containing the node exists. Therefore, the nature of location data varies from implementation to implementation, based on the nature of the underlying structure. Depending on how the actual XML document is stored, the location data may also include logical pointers or locators into the XML document. Logical pointers can be used to extract fragments related to nodes identified by XPath.

为了解释的目的，将假设：(1)基本结构是关系数据库中的表；以及(2)每个索引XML文档都存储在基本表的相应行中。在这样的情境中，节点的位置数据可以包括，例如(1)基本表内的行的ID(rowid)，其中，基本表中存储了包含该节点的XML文档，以及(2)定位符，在XML文档内提供对对应于节点的片段数据的快速存取。For purposes of explanation, it will be assumed that: (1) the base structure is a table in a relational database; and (2) each indexed XML document is stored in a corresponding row of the base table. In such a context, the node's location data may include, for example, (1) the ID (rowid) of the row in the base table in which the XML document containing the node is stored, and (2) a locator, in Quick access to fragment data corresponding to nodes is provided within the XML document.

分级数据Hierarchical data

用于节点的PATH表的行还包括表示节点在包含该节点的XML文档的分级结构中的存在位置的信息。这里，这种分级信息称为节点的“次序键(OrderKey)”。A row of the PATH table for a node also includes information indicating where the node exists in the hierarchical structure of the XML document containing the node. Here, such hierarchical information is called "OrderKey" of a node.

根据一个实施例，使用Dewey型值来表示分级次序信息。具体地，在一个实施例中，通过对节点的直接父节点的次序键添加值来创建节点的次序键，其中，添加的值表示该父节点的子节点中该特定子节点的位置。According to one embodiment, Dewey type values are used to represent hierarchical order information. Specifically, in one embodiment, the sequence key of a node is created by adding a value to the sequence key of the node's direct parent node, wherein the added value represents the position of the specific child node among the child nodes of the parent node.

例如，假设特定节点D是节点C的子节点，其中，节点C是节点B的子节点，节点B又是节点A的子节点。进一步假设节点D具有次序键1.2.4.3。次序键中末尾的“3”表示节点D是其父节点C的第三个子节点。类似地，“4”表示节点C是节点B的第四个子节点。“2”表示节点B是节点A的第二个子节点。最前面的“1”表示节点A是根节点(即，没有父节点)。For example, assume that a particular node D is a child of node C, where node C is a child of node B, which is in turn a child of node A. Assume further that node D has the order key 1.2.4.3. The "3" at the end in the sequence key indicates that node D is the third child of its parent node C. Similarly, "4" indicates that node C is the fourth child of node B. "2" indicates that node B is the second child of node A. The leading "1" indicates that node A is the root node (ie, has no parent).

如上所述，通过对父节点的次序键添加对应于子节点数目的值，可以容易地创建子节点的次序键。类似地，通过去除子节点的次序键中的末尾数字，可以容易地从子节点的次序键得出父节点的次序键。As described above, the ordinal key of the child node can be easily created by adding a value corresponding to the number of child nodes to the ordinal key of the parent node. Similarly, the ordinal key of the parent node can be easily derived from the ordinal key of the child node by removing the trailing digits in the ordinal key of the child node.

根据一个实施例，由每个次序键所表示的合成数被转换成字节可比较值(byte comparable value)，使得两个次序键之间的数学比较表示XML文档的结构分级中次序键对应的节点的相对位置。According to one embodiment, the composite number represented by each ordinal key is converted into a byte comparable value, such that a mathematical comparison between two ordinal keys represents the corresponding value of the ordinal key in the structural hierarchy of the XML document. The relative position of the node.

例如，在XML文档的分级结构中，与次序键1.2.7.7相关的节点在与次序键1.3.1相关的节点之前。因此，数据库服务器使用将次序键1.2.7.7转换成第一值以及将次序键1.3.1转换成第二值的转换机制，其中，第一值小于第二值。通过将第一值与第二值进行比较，数据库服务器可以容易地确定与第一值相关的节点在与第二值相关的节点之前。可以使用多种转换技术来实现该结果，并且本发明不限于任何特定的转换技术。For example, in the hierarchical structure of an XML document, the node associated with the sequence key 1.2.7.7 precedes the node associated with the sequence key 1.3.1. Thus, the database server uses a conversion mechanism that converts the sequence key 1.2.7.7 to a first value and converts the sequence key 1.3.1 to a second value, where the first value is less than the second value. By comparing the first value with the second value, the database server can easily determine that the node associated with the first value precedes the node associated with the second value. A variety of transformation techniques can be used to achieve this result, and the invention is not limited to any particular transformation technique.

值信息value information

索引文档中的一些节点可以是属性节点或对应于简单元素的节点。根据一个实施例，对于属性节点和简单元素，PATH表的行还存储属性和元素的实际值。例如，可以将这种值存储在PATH表中的“值列”中。下文中将详细描述在值列上建立的次级“值索引”。Some nodes in the index document may be attribute nodes or nodes corresponding to simple elements. According to one embodiment, for attribute nodes and simple elements, the rows of the PATH table also store the actual values of the attributes and elements. For example, such a value could be stored in a "value column" in the PATH table. The secondary "value index" built on the value column will be described in detail below.

PATH表实例Example of PATH table

根据一个实施例，PATH表包括如下表所述来定义的栏：According to one embodiment, the PATH table includes columns defined as described in the following table:

栏名称 column name 数据类型 type of data 描述 describe PATHID(路径ID) PATHID (path ID) RAW(8) RAW(8) 路径权标的ID。通过系统为每个不同的路径(例如，/a/b/c)分配唯一的ID。 The ID of the path token. Each distinct path (eg, /a/b/c) is assigned a unique ID by the system. RID(行ID) RID (row ID) UROWD/ROWID UROWD/ROWID 基本表中行的行ID。 The row ID of the row in the base table. ORDER KEY(次序键) ORDER KEY (order key) RAW(100) RAW(100) 节点的Dewey指令，例如3.21.5表示根节点的第3个子节点的第21个子节点的第5个子节点。 The Dewey instruction of the node, for example, 3.21.5 means the 5th child node of the 3rd child node of the root node and the 21st child node of the root node. LOCATOR(定位符) LOCATOR (locator) RAW(100) RAW(100) 对应于片段的起始位置的信息。在片段提取期间使用该信息。 Information corresponding to the start position of the fragment. This information is used during fragment extraction. VALUE(值) VALUE (value) RAW(2000)/BLOB RAW(2000)/BLOB 在属性和简单元素情况下的节点的值。由用户指定该类型(以及RAW列的大小) The value of the node in the case of attributes and simple elements. It is up to the user to specify the type (and the size of the RAW column)

如上所述，PATHID是分配给节点的数字，并且唯一地表示到节点的完全扩展路径。ORDER KEY是与节点相关的DEWEY次序号的系统表示。根据一个实施例，次序键的内部表示还保存文档次序。As mentioned above, PATHID is a number assigned to a node and uniquely represents the fully expanded path to the node. The ORDER KEY is a systematic representation of the DEWEY sequence number associated with the node. According to one embodiment, the internal representation of the order key also preserves the document order.

VALUE栏存储用于简单元素(即，没有子元素)节点和属性节点的有效文本值。根据一个实施例，相邻的文本节点通过连接接合。如下文中将要更详细描述的，提供允许用户通过在索引创建期间指定选项来定制存储在VALUE栏中的有效文本值的机制，例如，可以定制混合文本的行为、空格、区分大小写等。用户可以以包括有界RAW栏或BLOB的多种格式来存储VALUE栏。如果用户选择了有界存储，则在索引创建期间的任何溢出都将被标记为错误。The VALUE column stores valid text values for simple element (ie, no child elements) nodes and attribute nodes. According to one embodiment, adjacent text nodes are joined by links. As will be described in more detail below, mechanisms are provided that allow the user to customize the valid text values stored in the VALUE column by specifying options during index creation, for example, the behavior of mixed text, whitespace, case sensitivity, etc. can be customized. Users can store VALUE columns in a variety of formats including bounded RAW columns or BLOBs. If the user has selected bounded storage, any overflow during index creation will be flagged as an error.

下面的表是PATH表的实例，其中，该PATH表(1)具有上述的栏，以及(2)由用于po1.xml和po2.xml的条目(entry)构成。具体地，PATH表的每一行都对应于po1.xml或po2.xml的索引节点。在该实例中，假设po1.xml和po2.xml分别存储在基本表的行R1和R2中。The following table is an example of a PATH table (1) having the above-mentioned columns, and (2) consisting of entries for po1.xml and po2.xml. Specifically, each row of the PATH table corresponds to an index node of po1.xml or po2.xml. In this example, assume that po1.xml and po2.xml are stored in rows R1 and R2 of the base table, respectively.

构成的PATH表Formed PATH table

行ID line ID 路径ID route ID 资源标识(Rid) Resource ID (Rid) 次序键 sequence key 定位符 locator 值 value 1 1 1 1 R1 R1 1 1 2 2 2 2 R1 R1 1.1 1.1 SBELL-2002100912333601PDT SBELL-2002100912333601PDT 3 3 3 3 R1 R1 1.2 1.2 4 4 4 4 R1 R1 1.2.1 1.2.1 5 5 5 5 R1 R1 1.2.1.1 1.2.1.1 SVOLLMAN SVOLLMAN 6 6 1 1 R2 R2 1 1 7 7 2 2 R2 R2 1.1 1.1 ABEL-20021127121040897PST ABEL-20021127121040897PST 8 8 3 3 R2 R2 1.2 1.2 9 9 4 4 R2 R2 1.2.1 1.2.1 10 10 5 5 R2 R2 1.2.1.1 1.2.1.1 ZLOTKEY ZLOTKEY 11 11 4 4 R2 R2 1.2.2 1.2.2 12 12 5 5 R2 R2 1.2.2.1 1.2.2.1 KING KING

在该实例中，行ID列存储PATH表的每行的唯一标识符。根据其中创建有PATH表的数据库系统，行ID列可以是隐含列。例如，行的磁盘位置可用作行的唯一标识符。如下文将要详细描述的，次级次序索引和值索引使用PATH表的行ID值，以定位PATH表中的行。In this example, the row ID column stores a unique identifier for each row of the PATH table. Depending on the database system in which the PATH table is created, the row ID column may be an implied column. For example, a row's disk location can be used as a unique identifier for the row. As will be described in detail below, secondary order indexes and value indexes use PATH table row ID values to locate rows in the PATH table.

在上述实施例中，节点的PATHID、ORDERKEY、和VALUE都包括在单个表中。在可选的实施例中，可使用分开的表以将PATHID、ORDERKEY、和VALUE信息映射到相应的位置数据(例如，基本表资源标识(Rid)和定位符)。In the above embodiment, the PATHID, ORDERKEY, and VALUE of a node are all included in a single table. In an alternative embodiment, separate tables may be used to map PATHID, ORDERKEY, and VALUE information to corresponding location data (eg, base table resource identifier (Rid) and locator).

次级索引secondary index

PATH表包括定位XML文档或XML片段所需的信息，其能够满足大范围查询。然而，如果没有次级存取结构，使用PATH表来满足这种查询将会常常需要对PATH表进行全扫描。因此，根据一个实施例，由数据库服务器创建多个次级索引来加速查询，其中(1)执行路径查找和/或(2)识别基于次序的关系。根据一个实施例，在PATH表上创建下面的次级索引。The PATH table includes information needed to locate XML documents or XML fragments, which can satisfy a wide range of queries. However, without secondary access structures, using the PATH table to satisfy such queries would often require a full scan of the PATH table. Thus, according to one embodiment, multiple secondary indexes are created by the database server to speed up queries that either (1) perform path finding and/or (2) identify order-based relationships. According to one embodiment, the following secondary index is created on the PATH table.

●PATHID_NDEX on(pathid，rid)● PATHID_NDEX on (pathid, rid)

●ORDERKEY_INDEX on(rid，order_key)● ORDERKEY_INDEX on (rid, order_key)

●VALUEINDEXES●VALUE INDEXES

●PARENT_ORDERKEY_INDEX on(rid，● PARENT_ORDERKEY_INDEX on (rid,

SYS_DEWEY_PARENT(order_key))SYS_DEWEY_PARENT(order_key))

PATHID_INDEX(路径ID索引)PATHID_INDEX (path ID index)

PATHID_NDEX建立在PATH表的路径ID、资源标识列上。PATHID_NDEX is established on the path ID and resource identifier columns of the PATH table.

因此，PATHID_INDEX中的条目的形式为(键值，行ID)，其中，键值是表示特定路径ID/资源标识组合的复合值，并且行ID标识PATH表的特定行。Thus, entries in PATHID_INDEX are of the form (key value, row ID), where key value is a compound value representing a particular path ID/resource identifier combination, and row ID identifies a specific row of the PATH table.

当(1)基本表行和(2)节点的路径ID已知时，PATHID_INDEX可用于在PATH表中快速定位该节点的行。例如，基于键值“3.R1”，可遍历PATHID_INDEX以找到与键值“3.R1”相关的条目。假设PATH表如上所述构成，则索引条目具有值为3的行ID。值为3的行ID指向PATH表的第三行，该行是与路径ID3和资源标识R1相关的节点的行。When (1) the base table row and (2) the node's path ID are known, PATHID_INDEX can be used to quickly locate the node's row in the PATH table. For example, based on the key value "3.R1", the PATHID_INDEX may be traversed to find entries related to the key value "3.R1". Assuming the PATH table is structured as described above, the index entry has a row ID with a value of 3. The row ID whose value is 3 points to the third row of the PATH table, which is the row of the node related to the path ID3 and the resource identifier R1.

ORDERKEY_INDEX(次序键_索引)ORDERKEY_INDEX (order key_index)

ORDERKEY_INDEX建立在PATH表的资源标识和次序键列上。因此，ORDERKEY_INDEX中的条目的形式为(键值，行ID)，其中，键值是表示特定资源标识/次序键组合的复合值，并且行ID标识PATH表的特定行。ORDERKEY_INDEX is established on the resource identification and order key columns of the PATH table. Thus, an entry in ORDERKEY_INDEX is of the form (key value, row ID), where key value is a composite value representing a particular resource ID/order key combination, and row ID identifies a specific row of the PATH table.

当(1)基本表行和(2)节点的次序键已知时，ORDERKEY_INDEX可用于在PATH表中快速定位该节点的行。例如，基于键值“R1.’1.2，可以遍历ORDERKEY_INDEX以找到与键值“R1.’1.2相关的条目。假设PATH表如上所述构成，则索引条目将具有值为3的行ID。值为3的行ID指向PATH表的第三行，该行是与次序键1.2和资源标识R1相关的节点的行。When (1) the base table row and (2) the order key of the node are known, ORDERKEY_INDEX can be used to quickly locate the node's row in the PATH table. For example, based on the key value "R1.'1.2, the ORDERKEY_INDEX may be traversed to find entries related to the key value "R1.'1.2. Assuming the PATH table is structured as described above, the index entry will have a row ID with a value of 3. A row ID with a value of 3 points to the third row of the PATH table, which is the row of the node associated with the sequence key 1.2 and the resource identifier R1.

值索引value index

正如可以使用PATHID_INDEX加快基于路径查找的查询，通过建立在PATH表的值列上的索引可以加快基于值查找的查询。然而，PATH表的值列可以保持多种数据类型的值。因此，根据一个实施例，为存储在值列中的每种数据类型建立单独的值索引。所以，在值列保持字符串、数字、和时戳的实施中，还创建了下面的值(次级)索引：Just as you can use PATHID_INDEX to speed up queries based on path lookups, you can speed up queries based on value lookups by building an index on the value column of the PATH table. However, the value column of the PATH table can hold values of multiple data types. Therefore, according to one embodiment, a separate value index is established for each data type stored in the value column. So, in an implementation where value columns hold strings, numbers, and timestamps, the following value (secondary) indexes are also created:

●STRING_INDEX on●STRING_INDEX on

SYS_XMLVALUE_TO_STRING(value)SYS_XMLVALUE_TO_STRING(value)

●NUMBER_INDEX on●NUMBER_INDEX on

SYS_XMLVALUE_TO_NUMBER(value)SYS_XMLVALUE_TO_NUMBER(value)

●TIMESTAMP_INDEX on●TIMESTAMP_INDEX on

SYS_XMLVALUE_TO_TIMESTAMP(value)SYS_XMLVALUE_TO_TIMESTAMP(value)

这些值索引用于执行基于数据类型的比较(等式和范围)。例如，NUMBER值索引用于处理用户XPath中基于数字的比较。例如，NUMBER_INDEX中的条目可以为(数字，行ID)的形式，其中，行ID指向PATH表中与“数字”的值相关的节点的行。类似地，STRING_INDEX中的条目可以具有(字符串，行ID)的形式，以及TIMESTAMP_INDEX中的条目可以具有(时戳，行ID)的形式。These value indexes are used to perform data type based comparisons (equalities and ranges). For example, NUMBER value indexes are used to handle number-based comparisons in user XPath. For example, an entry in NUMBER_INDEX may be in the form of (number, row ID), where the row ID points to the row of the node in the PATH table associated with the value of "number". Similarly, entries in STRING_INDEX may have the form (string, row ID) and entries in TIMESTAMP_INDEX may have the form (timestamp, row ID).

PATH表中的值格式可以与数据类型的本身形式不对应。因此，当使用值索引时，数据库服务器可以调用转换功能，以将值字节从存储格式转换为指定数据类型。另外，如下文中将要描述的，数据库服务器应用任何必需的变换。根据一个实施例，对RAW值和BLOB值执行转换功能，并且在不可能转换时返回NULL。The format of the value in the PATH table may not correspond to the form of the data type itself. Therefore, when using a value index, the database server can call a conversion function to convert the value bytes from the storage format to the specified data type. Additionally, the database server applies any necessary transformations, as will be described below. According to one embodiment, the conversion function is performed on RAW values and BLOB values, and returns NULL when conversion is not possible.

默认为当创建XML索引时创建值索引。然而，用户可以基于对查询工作量的了解来取消一个或多个数值索引的创建。例如，如果所有的XPath判定仅涉及字符串比较，则可以避免NUMBER和TIMESTAMP值索引。The default is to create a value index when creating an XML index. However, users can cancel the creation of one or more numeric indexes based on knowledge of the query workload. For example, NUMBER and TIMESTAMP value indexes can be avoided if all XPath decisions involve only string comparisons.

PARENT_ORDERKEY_INDEXPARENT_ORDERKEY_INDEX

根据一个实施例，建立在PATH表上的次级索引组包括PARENT_ORDERKEY_INDEX。与ORDER_KEY索引类似，PARENT_ORDERKEY_INDEX建立在PATH表的资源标识和次序键列上。因此，PARENT_ORDERKEY_INDEX的索引条目具有(键值，行ID)的形式，其中，键值是对应于特定资源标识/次序键组合的复合值。然而，不同于ORDER_KEY索引，PARENT_ORDERKEY_INDEX条目中的行ID不指向具有特定资源标识/次序键组合的PATH表的行。相反，每个PARENT_ORDERKEY_INDEX条目的行ID都指向作为与资源标识/次序键组合相关的直接父节点的节点的PATH表的行。According to one embodiment, the secondary index group established on the PATH table includes PARENT_ORDERKEY_INDEX. Similar to the ORDER_KEY index, the PARENT_ORDERKEY_INDEX is built on the resource identifier and order key columns of the PATH table. Thus, an index entry for PARENT_ORDERKEY_INDEX has the form (key value, row ID), where key value is a compound value corresponding to a particular resource ID/order key combination. However, unlike an ORDER_KEY index, the row ID in a PARENT_ORDERKEY_INDEX entry does not point to a row of the PATH table with a particular resource ID/order key combination. Instead, the row ID of each PARENT_ORDERKEY_INDEX entry points to a row in the PATH table of the node that is the immediate parent of the node associated with the resource ID/order key combination.

例如，在上述构成的PATH表中，资源标识/次序键组合“R1.’1.2对应于PATH表的行3中的节点。PATH表的行3中的节点的直接父节点是由PATH表的行1所表示的节点。因此，与“R1.’1.2相关的PARENT_ORDERKEY_INDEX条目具有指向PATH表的行1的行ID。For example, in the PATH table constructed above, the resource ID/sequence key combination "R1.'1.2" corresponds to the node in row 3 of the PATH table. The immediate parent node of the node in row 3 of the PATH table is defined by the The node represented by row 1. Thus, the PARENT_ORDERKEY_INDEX entry associated with "R1.'1.2 has a row ID pointing to row 1 of the PATH table.

创建XML索引Create an XML index

根据一个实施例，响应于由数据库服务器接收的索引创建命令，在数据库内创建XML索引。为了解释的目的，在将被索引的XML文档将被存储在关系表的XML Type列中的情境中描述XML索引的创建。According to one embodiment, an XML index is created within the database in response to an index creation command received by the database server. For explanation purposes, the creation of an XML index is described in the context of the XML document to be indexed will be stored in the XML Type column of a relational table.

例如，假设基本结构为将样式表(stylesheet)存储为由ID列标识的XML Type的表stylesheet_tab。例如，可以使用下面的命令来创建该表：For example, assume that the basic structure is a table stylesheet_tab that stores stylesheets as XML Type identified by the ID column. For example, the following command can be used to create the table:

CREATE TABLE stylesheet_tab(id number，stylesheet XMLType)；CREATE TABLE stylesheet_tab(id number, stylesheet XMLType);

可在stylesheet_tab的stylesheet列上创建XML索引，以加速涉及基于Xpath的片段检索和XPath判定的查询。根据一个实施例，可以使用下面的命令来创建这种XML索引：An XML index can be created on the stylesheet column of stylesheet_tab to speed up queries involving XPath-based fragment retrieval and XPath determination. According to one embodiment, the following command can be used to create such an XML index:

CREATE INDEX ss_tab_xmli ON stylesheet_tab(stylesheet)CREATE INDEX ss_tab_xmli ON stylesheet_tab(stylesheet)

INDEXTYPE IS XML INDEX；INDEXTYPE IS XML INDEX;

下面的命令是如何在基于模式的XML类型上创建XML索引的实例：The following command is an example of how to create an XML index on a schema-based XML type:

CREATE TABLE purchaseorder OF XMLTypeCREATE TABLE purchaseorder OF XMLType

XMLSchemaXMLSchema

http://xmlns.oracle.com/xdb/documentation/purchaseOrder.xsd″http://xmlns.oracle.com/xdb/documentation/purchaseOrder.xsd″

ELEMENT″PurchaseOrder″；ELEMENT "PurchaseOrder";

CREATE INDEX purchaseorder_xmli ON purchaseorder(object_value)CREATE INDEX purchaseorder_xmli ON purchaseorder(object_value)

INDEXTYPE IS XML INDEX；INDEXTYPE IS XML INDEX;

上述命令仅仅是可以提交到数据库服务器使数据库服务器创建XML索引的命令实例。这里描述的技术不限于用于指定索引创建的任何形式或语法。The above commands are merely examples of commands that can be submitted to the database server to cause the database server to create an XML index. The techniques described here are not limited to any form or syntax for specifying index creation.

根据一个实施例，索引创建命令包括使用户可以指定XML索引的各种特性的参数，例如：According to one embodiment, the index creation command includes parameters that allow the user to specify various properties of the XML index, such as:

○包括在索引路径组中或从索引路径组排除的路径○ Paths included in or excluded from an indexed path group

○PATH表和次级索引的名称○ The name of the PATH table and secondary index

○PATH表和次级索引的存储选项(例如，是否将PATH表存储为分区表、索引组织表(Index Organized Table)等)○ Storage options for the PATH table and secondary indexes (for example, whether to store the PATH table as a partitioned table, Index Organized Table, etc.)

○用于处理值的规则○ Rules for processing values

○值列的列类型(例如，RAW或BLOB)○ the column type of the value column (for example, RAW or BLOB)

例如，用于处理值的规则可以指定：是否将值视为区分大小写、是否将值标准化(如果是，如何执行标准化)、以及如何处理混合内容节点(具有值和子节点的节点)的值。相对于混合内容节点，例如，该规则可以指定与应当忽略、连接、或特殊处理的混合内容节点相关的值。这仅是可以由用户指定的值处理规则的实例。可用规则组可以根据实施的不同而不同，并且可以基于所涉及的值类型而有进一步改变。For example, rules for handling values may specify: whether to treat values as case-sensitive, whether to normalize values (and if so, how to perform normalization), and how to handle values for mixed content nodes (nodes that have a value and child nodes). With respect to mixed content nodes, for example, the rule may specify values associated with mixed content nodes that should be ignored, concatenated, or treated specially. This is just an example of value handling rules that can be specified by the user. The set of available rules can vary from implementation to implementation and can further vary based on the value types involved.

当用户创建XML索引时，自动创建下面的PATH表和次级索引。默认为基于XML索引的名称，通过系统生成次级索引和PATH表的名称。然而，用户可以明确地指定这些对象的名称。When a user creates an XML index, the following PATH table and secondary indexes are automatically created. The default is based on the name of the XML index, and the name of the secondary index and PATH table is generated by the system. However, users can explicitly specify the names of these objects.

默认为从其上创建有XML索引的基本表的存储特性中得到PATH表和次级索引的存储选项。然而，用户还可明确地指定用于这些对象的存储特性。The default is to get the storage options for PATH tables and secondary indexes from the storage properties of the base table on which the XML index is created. However, the user can also explicitly specify storage properties for these objects.

下面的实例示出了如何在PATH表的单独表空间中创建号索引(number index)。The following example shows how to create a number index in a separate table space of the PATH table.

CREATE INDEX POIndex ON purchaseOrderCREATE INDEX POIndex ON purchaseOrder

NDEXTYPE IS XML INDEXNDEXTYPE IS XML INDEX

PARAMETERS‘PATHS(/PurchaseOrder/LineItems//*，/PurchaseOrder/LineItems/LineItem/@ItemNumber)PARAMETERS'PATHS(/PurchaseOrder/LineItems//*, /PurchaseOrder/LineItems/LineItem/@ItemNumber)

PATH TABLE POIndex_path_table tablespacePATH TABLE POIndex_path_table tablespace

tab_tbstab_tbs

VALUE STORE AS RAW(50)VALUE STORE AS RAW(50)

NUMBER INDEX POIndex_num_idx tablespaceNUMBER INDEX POIndex_num_idx tablespace

idx_tbs’idx_tbs'

用户选择索引的路径The user selects the path for the index

根据一个实施例，提供了一种机制，通过该机制，用户可以指定用于确定XML索引将索引哪条XML路径的规则。具体地，用户可以注册明显地包括特定XML路径的规则和/或明显地排除特定XML路径的规则。According to one embodiment, a mechanism is provided by which a user can specify rules for determining which XML paths an XML index will index. Specifically, a user may register rules that explicitly include certain XML paths and/or rules that explicitly exclude certain XML paths.

根据一个实施例，当用户创建XML索引时，默认为对基本文档中的所有节点进行索引(即，在PATH表中存在对应于文档中的所有节点的行)。然而，用户可以明确地指定将被索引的节点组(子树)，从而省略PATH表中的剩余节点。这通常用于排除已知的从查询观点来看是无用的片段。通过减少索引节点的数量，可以改善XML索引的空间利用和管理效率。According to one embodiment, when a user creates an XML index, the default is to index all nodes in the base document (ie, there are rows in the PATH table corresponding to all nodes in the document). However, users can explicitly specify groups of nodes (subtrees) to be indexed, thereby omitting the remaining nodes in the PATH table. This is typically used to exclude fragments that are known to be useless from a query point of view. By reducing the number of index nodes, the space utilization and management efficiency of XML indexes can be improved.

根据一个实施例，在创建XML索引时可能发生规则的初始注册。例如，假设要被索引的文档存储在purchaseOrder(订货单)表中。如果用户想要索引所有Lineitem(线项目)元素及其子节点、以及订货单参考号和请求者，则可发布下面的Create Index DDL：According to one embodiment, the initial registration of rules may occur when the XML index is created. For example, suppose the documents to be indexed are stored in the purchaseOrder (order order) table. If the user wants to index all Lineitem elements and their child nodes, as well as the Order Reference and Requester, the following Create Index DDL can be issued:

CREATE INDEX POIndex1 ON purchaseOrderCREATE INDEX POIndex1 ON purchaseOrder

INDEXTYPE IS XML INDEXINDEXTYPE IS XML INDEX

PARAMETERS‘PATHS(/PurchaseOrder/LineItems//*， PARAMETERS'PATHS(/PurchaseOrder/LineItems//*,

/PurchaseOrder/Reference，/PurchaseOrder/Reference,

/PurchaseOrder/Requestor)/PurchaseOrder/Requestor)

PATH TABLE POIndex_path_table’PATH TABLE POIndex_path_table’

在该实例中，POIndex_path_table表示由域索引用来存储索引数据的表名称。在前述实例中，规则明显地包括特定路径。规则没有明显地包括的所有路径将从该索引中排除。规则/PurchaseOrder/LineItems//*包括通配符“*”。因此，该规则明显地包括路径/PurchaseOrder/LineItems以及到源于路径/PurchaseOrder/LineItems的所有节点的路径。这仅仅是通配符可以如何在规则中使用的一个实例。根据一个实施例，路径选择规则机制在许多情况下都支持通配符。例如，规则/nodex/*/nodey/nodez选择(1)源于/nodex/以及(2)终止于/nodey/nodez的所有路径，而不考虑nodex与nodey/nodez之间的路径。In this example, POIndex_path_table represents the name of the table used by the domain index to store index data. In the preceding examples, the rules explicitly include specific paths. All paths not explicitly included by the rule will be excluded from the index. The rule /PurchaseOrder/LineItems//* includes the wildcard "*". Thus, the rule explicitly includes the path /PurchaseOrder/LineItems and paths to all nodes originating from the path /PurchaseOrder/LineItems. This is just one example of how wildcards can be used in rules. According to one embodiment, the routing rules mechanism supports wildcards in many cases. For example, the rule /nodex/*/nodey/nodez selects all paths that (1) originate in /nodex/ and (2) end in /nodey/nodez, regardless of paths between nodex and nodey/nodez.

用户还可指定明确地排除路径的规则。例如，为了索引除Lineitem描述和purchaseOrder动作之外文档的所有路径，使用下面的Create Index DDL(创建索引DDL)来创建索引：Users can also specify rules that explicitly exclude paths. For example, to index all paths of documents except the Lineitem description and the purchaseOrder action, use the following Create Index DDL to create the index:

CREATE INDEX POIndex2 ON purchaseOrderCREATE INDEX POIndex2 ON purchaseOrder

INDEXTYPE IS XML INDEX PARAMETERS‘PATHS EXCLUDEINDEXTYPE IS XML INDEX PARAMETERS‘PATHS EXCLUDE

/PurchaseOrder/LineItems/LineItem/Description，/PurchaseOrder/LineItems/LineItem/Description,

/PurchaseOrder/Actions)/PurchaseOrder/Actions)

PATH TABLE POIndex_path_table2’PATH TABLE POIndex_path_table2'

将文档添加至索引文档组Add a document to an indexed document group

当需要索引新XML文档时，收集路径、次序、和值信息并存储在XML索引中。根据一个实施例，当将XML文档添加至索引XML文档的储存库时，解析新XML文档，以识别到其中所包含的节点的路径。一旦识别了新XML文档中节点的路径，数据库服务器就确定将索引包括在新XML文档中的哪些节点。之后，数据库服务器基于这些节点中的每个节点来更新XML索引。When a new XML document needs to be indexed, path, order, and value information is collected and stored in the XML index. According to one embodiment, when an XML document is added to a repository of indexed XML documents, the new XML document is parsed to identify paths to nodes contained therein. Once the paths of the nodes in the new XML document are identified, the database server determines which nodes to include in the new XML document for indexing. The database server then updates the XML index based on each of these nodes.

参照图1，示出了根据本发明一个实施例的如何处理新XML文档的流程图。在图1中，步骤102和108定义了对新XML文档中的每个节点进行处理的循环。具体地，在步骤102中，选择之前未被处理的节点。在第一次迭代期间，被选择进行处理的第一个节点将是新XML文档的根节点。Referring to FIG. 1 , it shows a flowchart of how to process a new XML document according to an embodiment of the present invention. In FIG. 1, steps 102 and 108 define a loop for processing each node in the new XML document. Specifically, in step 102, a node that has not been processed before is selected. During the first iteration, the first node selected for processing will be the root node of the new XML document.

在步骤104，数据库服务器确定当前选择的节点的路径。在步骤106，基于该路径，数据库服务器确定是否索引当前选择的节点。特别地，当用户已经指定要索引的路径子集时，仅对对应于指定路径子集的那些节点添加索引条目。根据一个实施例，步骤106包括将与当前节点相关的路径与路径选择规则进行匹配，以检查是否应当索引当前节点。如果(1)用户已经注册了表示应该包括哪些路径的规则，以及(2)与当前节点相关的路径与用户指定的任何路径都不匹配，则将以该节点为根节点的子树(片段)从索引中省略。另一方面，如果(1)规则指定哪些路径将从索引中被排除，以及(2)与用户指定的任何路径的都匹配的节点被排除，则将以该节点为根节点的片段从索引中省略。例如，可以使用有限时序机来执行匹配操作。At step 104, the database server determines the path of the currently selected node. At step 106, based on the path, the database server determines whether to index the currently selected node. In particular, when the user has specified a subset of paths to be indexed, index entries are only added for those nodes corresponding to the specified subset of paths. According to one embodiment, step 106 comprises matching the paths related to the current node with routing rules to check whether the current node should be indexed. If (1) the user has registered rules indicating which paths should be included, and (2) the paths associated with the current node do not match any paths specified by the user, then a subtree (fragment) rooted at that node will be Omitted from the index. On the other hand, if (1) the rule specifies which paths are to be excluded from the index, and (2) nodes that match any path specified by the user are excluded, then the segment rooted at that node is removed from the index omitted. For example, a finite sequential machine can be used to perform matching operations.

如果在步骤106确定所选节点与将被索引的路径不相关，则控制进行至步骤108。在步骤108，数据库服务器确定新XML文档是否还具有要被处理的节点。如果新XML文档没有要被处理的节点，则完成更新XML索引的处理。否则，如果新XML文档还具有要被处理的节点，则控制返回到步骤102并处理其它节点。If at step 106 it is determined that the selected node is not relevant to the path to be indexed, then control passes to step 108 . At step 108, the database server determines whether the new XML document still has nodes to be processed. If the new XML document has no nodes to be processed, then the process of updating the XML index is done. Otherwise, if the new XML document still has nodes to be processed, control returns to step 102 and other nodes are processed.

如果在步骤106确定将要索引当前节点，则将以该节点为根节点的片段添加至索引。此外，还将其所有祖先节点(ancestor)(直到根节点的元素节点)添加至该索引。最后，还将祖先元素节点内的任意名称空间属性添加至该索引。If it is determined in step 106 that the current node is to be indexed, the segment with this node as the root node is added to the index. Additionally, all its ancestors (element nodes up to the root node) are added to the index. Finally, any namespace attributes within the ancestor element nodes are also added to the index.

在图1中，更具体地示出了中断处理将被索引的节点的操作，在步骤110中，确定与当前节点相关的路径是否分配有PATHID。在先前索引的XML文档中不存在恰当路径的情况下，该路径可能还没有被分配给PATHID。在这种情况下，控制进行至步骤112，将PATHID分配给该路径。之后，将新的PATHID到路径的映射存储在数据库中。In FIG. 1 , more specifically illustrating the operation of interrupt processing a node to be indexed, in step 110 it is determined whether the path associated with the current node is assigned a PATHID. In the event that a suitable path does not exist in a previously indexed XML document, the path may not have been assigned to PATHID. In this case, control passes to step 112 to assign a PATHID to the path. Afterwards, store the new PATHID-to-path mapping in the database.

在步骤114，将包括关于当前节点的信息的行添加至PATH表。在步骤116，通过当前节点的条目更新PATHID、ORDERKEY、和PARENT_ORDERKEY索引。如上所述，PATHID和ORDERKEY条目将指向当前节点的新行，而PARENT_ORDERKEY条目将指向当前节点的父节点的PATH表的行。At step 114, a row including information about the current node is added to the PATH table. At step 116, the PATHID, ORDERKEY, and PARENT_ORDERKEY indexes are updated with the current node's entries. As mentioned above, the PATHID and ORDERKEY entries will point to the new row for the current node, while the PARENT_ORDERKEY entry will point to the row of the PATH table for the current node's parent node.

在步骤118，确定当前节点是否与值相关。如果当前节点与值不相关，则控制返回到步骤108。如果当前节点与值相关，并且已经为值的数据类型创建了值索引，则在步骤120中，将索引条目添加至与该特定数据类型相关的值索引。之后，控制返回到步骤108。At step 118, it is determined whether the current node is associated with a value. If the current node is not associated with a value, then control returns to step 108 . If the current node is associated with a value, and a value index has been created for the data type of the value, then in step 120, an index entry is added to the value index associated with that particular data type. Thereafter, control returns to step 108 .

根据一个实施例，即使节点与不被索引的路径相关，但是如果该节点是被索引的任意节点的祖先节点，则仍然对该节点进行索引。因此，即使用户指定仅应当包括与/a/b/c/*匹配的路径，仍然会索引与路径/a、/a/b、和/a/b/c相关的节点，只要它们是与模式/a/b/c/*匹配的路径相关的任意节点的祖先节点。According to one embodiment, even if a node is related to a path that is not indexed, the node is still indexed if it is an ancestor node of any node being indexed. Thus, even if the user specifies that only paths matching /a/b/c/* should be included, nodes related to the paths /a, /a/b, and /a/b/c will still be indexed as long as they are associated with the pattern /a/b/c/* Ancestors of any node associated with the matched path.

变更XML索引Change XML index

根据一个实施例，提供了用于在创建索引之后变更XML索引特性的机制。例如，响应于变更索引的语句，可以执行XML索引的创建后变更。According to one embodiment, a mechanism is provided for altering XML index properties after the index has been created. For example, a post-creation alter of an XML index may be performed in response to a statement that alters the index.

用于XML索引的变更索引语句的重要用途是添加或删除索引路径。根据一个实施例，可通过下面的Alter Index DDL(变更索引DDL)将新路径添加至索引：An important use of the ALTER INDEX statement for XML indexes is to add or remove index paths. According to one embodiment, new paths can be added to the index via the following Alter Index DDL:

ALTER INDEX POIndexALTER INDEX POIndex

PARAMETERS‘PATHS(/PurchaseOrder/Reference，PARAMETERS'PATHS(/PurchaseOrder/Reference,

/PurchaseOrder/Actions/Action//*)’/PurchaseOrder/Actions/Action//*)’

在索引还没有索引订货单参考和Action元素的所有子节点的情况下，该DDL命令索引所有订货单参考和Action元素的所有子节点。类似地，下面的DDL在路径已经被索引的情况下将这些路径从索引中删除：This DDL command indexes all Order Order References and all child nodes of the Action element, in case the index does not already index all child nodes of the Order Order Reference and Action element. Similarly, the following DDL removes paths from the index if they are already indexed:

ALTER INDEX POIndexALTER INDEX POIndex

PARAMETERS‘PATHS EXCLUDE (/PurchaseOrder/Reference，/PurchaseOrder/Actions/Action//*)’PARAMETERS 'PATHS EXCLUDE (/PurchaseOrder/Reference, /PurchaseOrder/Actions/Action//*)'

如下面的实例所示，Alter Index Rename DDL(变更索引重命名DDL)使用户改变索引名称，而不明确地删掉和创建索引名称：As shown in the following example, the Alter Index Rename DDL (Alter Index Rename DDL) allows the user to change the index name without explicitly deleting and creating the index name:

ALTER INDEX POIndex RENAME PONewIndexALTER INDEX POIndex RENAME PONewIndex

确定是否可以使用XML索引Determine if XML indexes can be used

在查询时，如果可以将查询XPath静态地确定为用户指定Xpath的子集(并由此确保在该索引中)，则XML索引可用于答复查询。如果在对查询进行编辑时不能确定子集关系，则XML索引不能用于满足查询。At query time, an XML index can be used to answer a query if the query XPath can be statically determined to be a subset of the user-specified XPath (and thus guaranteed to be in that index). If the subset relationship cannot be determined when the query is edited, the XML index cannot be used to satisfy the query.

例如，通过下面的语句创建XML索引POIndex1：For example, create an XML index POIndex1 with the following statement:

CREATE INDEX POIndex1 ON purchaseOrderCREATE INDEX POIndex1 ON purchaseOrder

INDEXTYPE IS XML INDEXINDEXTYPE IS XML INDEX

PARAMETERS‘PATHS(/PurchaseOrder/LineItems//*，PARAMETERS'PATHS(/PurchaseOrder/LineItems//*,

/PurchaseOrder/Reference，/PurchaseOrder/Reference,

/PurchaseOrder/Requestor)/PurchaseOrder/Requestor)

PATH TABLE POIndex_path_table’PATH TABLE POIndex_path_table’

XML索引可用于答复查询XPath/PurchaseOrder/LineItem/Description。然而，由于在不同于/PurchaseOrder/LineItems的路径下存在<Description>元素，所以XML索引不能用于答复查询XPATH//Description。XML indexes can be used to answer queries XPath/PurchaseOrder/LineItem/Description. However, the XML index cannot be used to answer the query XPATH//Description due to the presence of the <Description> element at a different path than /PurchaseOrder/LineItems.

使用XML索引来答复XPath查询Use XML indexes to answer XPath queries

通过将XPath索引分解为基于值的简单路径和判定，XML索引可用于满足XPath查询。将最终分解的部分转化为索引PATH_TABLE上的SQL查询。根据一个实施例，索引存取方法的输入是包括一个或多个下列表达式的复合表达式：XML indexes can be used to satisfy XPath queries by decomposing XPath indexes into simple value-based paths and predicates. Turn the final decomposed part into a SQL query on the index PATH_TABLE. According to one embodiment, the input to the index access method is a compound expression comprising one or more of the following expressions:

●简单(导航)路径表达式，例如，/a/b● Simple (navigation) path expressions, eg, /a/b

●简单值表达式，例如，/a/b/c＞50● Simple value expressions, for example, /a/b/c > 50

●表达式之间的结构结合(即，分级关系)，例如，将XPath表达式/a/b[c＞50]表示为(/a/b)PARENT-OF(/a/b/c＞50)● Structural combination (ie, hierarchical relationship) between expressions, for example, express the XPath expression /a/b[c>50] as (/a/b)PARENT-OF(/a/b/c>50 )

假设查询具有简单判定，例如，/a/b/c＝foo，则这种查询可以对照如下PATH表来执行，：Assuming a query with a simple predicate, e.g., /a/b/c=foo, such a query can be performed against the following PATH table:

SELECT DISTINCT rid FROM path_tableSELECT DISTINCT rid FROM path_table

WHERE pathid＝:1 AND xmlvalue_to_string(value)＝‘foo’；WHERE pathid=:1 AND xmlvalue_to_string(value)='foo';

将路径/a/b/c的ID限定为变量1。对于该查询有多种执行方案。根据一个实施例，查询优化器基于成本采用最佳的执行方案。数据库服务器可以(1)使用路径ID上的次级索引、(2)使用xmlvalue_to_string(值)上的次级索引、或(3)使用以上两者、以及位图和结果。Qualify the ID of the path /a/b/c as variable 1. There are several execution schemes for this query. According to one embodiment, the query optimizer adopts the best execution plan based on cost. The database server can (1) use a secondary index on path ID, (2) use a secondary index on xmlvalue_to_string(value), or (3) use both, and the bitmap and result.

假设指定片段查找的查询，例如，XPath/a/b/c。可以使用以下语句对照PATH表来执行这种查询：Assume a query that specifies a fragment lookup, for example, XPath /a/b/c. Such a query can be performed against the PATH table using the following statement:

SELECT rid FROM path_table WHERE pathid＝:1 ORDER BY order_key；SELECT rid FROM path_table WHERE pathid=:1 ORDER BY order_key;

路径/a/b/c的ID限定为变量1。通过文档次序的查询返回匹配的结果。如果需要，连接对应于单个文档的所有片段。The ID of the path /a/b/c is limited to variable 1. A query in document order returns matching results. Concatenates all fragments corresponding to a single document, if desired.

假设基于简单判定指定片段查找的查询，例如，/a/b[c＝foo]。输入的XPath的标准化表示为(/a/b)PARENT-OF(/a/b/c＝foo)。以下查询可用于查找路径/a/b的匹配以及简单判定(/a/b/c＝foo)的匹配。Assume a query that specifies segment lookup based on a simple predicate, eg, /a/b[c=foo]. The normalized representation of the input XPath is (/a/b)PARENT-OF(/a/b/c=foo). The following query can be used to find matches for the path /a/b as well as matches for the simple predicate (/a/b/c=foo).

SELECT p1.rid，p1.offset FROM path_table p1，path_table p2SELECT p1.rid, p1.offset FROM path_table p1, path_table p2

WHERE p1.pathid＝:1 AND p2.pathid＝:2WHERE p1.pathid＝:1 AND p2.pathid＝:2

AND xmlvalue_to_string(p2.value)＝‘foo’AND xmlvalue_to_string(p2.value)='foo'

AND SYS_DEWEY_PARENT(p1.order_key)＝p2.order_keyAND SYS_DEWEY_PARENT(p1.order_key)＝p2.order_key

ORDER BY p1.rid，p1.offset；ORDER BY p1.rid, p1.offset;

然后，使用结构连接运算符(其使用Dewey次序键来表示)来合并结果。将路径/a/b和/a/b/c的ID限定为变量1和2。根据一个实施例，基于成本的优化器采用多种可能的执行方案中的最佳执行方案。The results are then combined using a structural join operator (which is represented using Dewey order keys). Qualify the IDs of paths /a/b and /a/b/c as variables 1 and 2. According to one embodiment, the cost-based optimizer employs the best execution plan among several possible execution plans.

XML索引还可用于执行数据类型认知操作。存在多种将数据类型信息附于XPath判定的机制。例如：XML indexes can also be used to perform data type aware operations. There are various mechanisms for attaching data type information to XPath decisions. For example:

●使用XML模式。如果基本表列具有相关的XML模式，则对照XML模式对用户XPath进行类型检查，从而使适当的数据类型与表达式相关。● Use XML Schema. If the base table column has an associated XML schema, the user XPath is type-checked against the XML schema so that the appropriate data type is associated with the expression.

●显式类型强制。XPath提供用于显式类型强制的运算符。● Explicit type coercion. XPath provides operators for explicit type coercion.

●隐式类型强制。XPath定义一些隐式类型计算规则。例如，如果比较运算符的RHS为NUMBER，则将LHS隐式强制为NUMBER。● Implicit type coercion. XPath defines some implicitly typed evaluation rules. For example, if the RHS of a comparison operator is NUMBER, the LHS is implicitly coerced to NUMBER.

根据一个实施例，在所有这些情况中，XML索引存取方法的输入是与数据类型信息相关的XPath表达式。数据类型信息被用在生成的SQL查询中，以确保选择适当的值索引。例如，类型检查的XPath如下：According to one embodiment, in all these cases, the input to the XML index access method is an XPath expression associated with data type information. The data type information is used in the generated SQL query to ensure that the appropriate value index is selected. For example, the XPath for type checking is as follows:

SYS_XMLVALUE_TO_NUMBER(/a/b/c)＞10.567SYS_XMLVALUE_TO_NUMBER(/a/b/c)＞10.567

下列查询中的结果对照使用NUMBER值索引的PATH表：The results in the following query are compared against the PATH table indexed with NUMBER values:

SELECT DISTINCT rid FROM path_tableSELECT DISTINCT rid FROM path_table

WHERE pathid＝:1 AND sys_xmlvalue_to_number(value)＞10.567；WHERE pathid＝:1 AND sys_xmlvalue_to_number(value)＞10.567;

如这里描述的，使用XML索引可以产生多种优点，包括：可以有效地估计多组XPath；可以满足涉及数据类型认知比较的XPath；可以从原始XML文档中有效地提取片段；用户可以选择仅对路径的子集进行索引，从而避免了使索引膨胀(bloating)；基于应用需要可以定制索引值；对偶数非模式的XML文档进行索引的能力可以满足多类用户的查询要求；能够使他们将其所有XML文档存储在Oracle中，而不用考虑查询性能。Using XML indexes, as described here, yields several advantages, including: multiple sets of XPaths can be efficiently estimated; XPaths involving datatype-aware comparisons can be satisfied; fragments can be efficiently extracted from raw XML documents; users can choose to only Indexing a subset of paths avoids index inflation (bloating); the index value can be customized based on application needs; the ability to index even non-schema XML documents can meet the query requirements of various types of users; it enables them to use All its XML documents are stored in Oracle regardless of query performance.

语法grammar

在上下文中已经描述了实施例，其中，响应于由数据库服务器接收的命令，数据库服务器创建并保持XML索引。该命令必须符合数据库服务器可以理解的语言。根据本发明的一个实施例，用于涉及XML索引的各种DDL命令的语法如下：Embodiments have been described in the context where the database server creates and maintains XML indexes in response to commands received by the database server. The command must conform to a language understood by the database server. According to one embodiment of the present invention, the syntax for various DDL commands involving XML indexes is as follows:

创建索引create index

语法grammar

CREATE INDEX<index_name>ON[<schema>.]CREATE INDEX<index_name>ON[<schema>.]

<table_name>(<column_name>)<table_name>(<column_name>)

INDEXTYPE IS[<schema>.]XMLINDEXINDEXTYPE IS[<schema>.]XMLINDEX

[LOCAL][LOCAL]

[PARALLEL][PARALLEL]

[PARAMETERS’<parameter_clause>‘]；[PARAMETERS'<parameter_clause>'];

实例example

Create index xmldoc_idx on xmldoc_tab(xmldoc)indextype isCreate index xmldoc_idx on xmldoc_tab(xmldoc)indextype is

XMLINDEXXMLINDEX

Parameters‘PATHS(/a/b/c，//e)，Parameters'PATHS(/a/b/c, //e),

PATH TABLE xmldoc_idx_pathtab’；PATH TABLE xmldoc_idx_pathtab';

根据一个实施例，域索引是对父XML表的均分。如果XML表未被分区，则域索引也没有被分区。PATH TABLE及其次级索引也被均分为XML表。According to one embodiment, the domain index is a split of the parent XML table. If the XML table is not partitioned, neither are the domain indexes. PATH TABLE and its sub-indexes are also equally divided into XML tables.

PARAMETERS子句的语法：Syntax of the PARAMETERS clause:

<parameters_clause>::＝<parameter_clause>[，<parameters_clause>]<parameters_clause>::=<parameter_clause>[, <parameters_clause>]

<parameter_clause>::＝[<paths_clause>|<parameter_clause>::=[<paths_clause>|

<path_table_clause>|<path_table_clause>|

<pathid_index_clause>|<pathid_index_clause>|

<orderkey_index_clause>|<orderkey_index_clause>|

<value_index_clause>]<value_index_clause>]

<paths_clause>::＝PATHS([<path_list_clause>|<paths_clause>::=PATHS([<path_list_clause>|

<namespaces_clause>]*)<namespaces_clause>]*)

<path_list_clause>::＝{<include_list_clause>|<path_list_clause>::＝{<include_list_clause>|

<exclude_list_clause>}<exclude_list_clause>}

<include_list_clause>::＝<xpath_list_clause><include_list_clause>::=<xpath_list_clause>

<exclude_list_clause>::＝EXCLUDE(<xpath_list_clause>)<exclude_list_clause>::＝EXCLUDE(<xpath_list_clause>)

<xpath_list_clause>::＝<xpath>[，<xpath_list_clause>]<xpath_list_clause>::=<xpath>[,<xpath_list_clause>]

<namespaces_clause>::＝NAMESPACES(<namespace_list_clause>)<namespaces_clause>::＝NAMESPACES(<namespace_list_clause>)

<namespace_list_clause>::＝<namespace>[，<namespace_list_clause>::=<namespace>[,

<namespace_list_clause>]<namespace_list_clause>]

<path_table_clause>::＝PATH TABLE[< identifier>]<path_table_clause>::=PATH TABLE[ <identifier> ]

[(< segment attributes clause>[(< segment attributes clause >

< table_properties>)] <table_properties> )]

<pathid_index_clause>::＝PATH<index_parameters><pathid_index_clause>::=PATH<index_parameters>

<orderkey_index_clause>::＝ORDER KEY<index_parameters><orderkey_index_clause>::=ORDER KEY<index_parameters>

[PARENT<index_parameters>][PARENT<index_parameters>]

<value_index_clause>::＝VALUE STORE AS<value_type><value_index_clause>::=VALUE STORE AS<value_type>

[<value_idx_clause>][<value_idx_clause>]

<value_type>::＝{RAW[(< integral number>)]<value_type>::＝{RAW[(< integral number >)]

BLOB}BLOB}

<value_idx_clause>::＝<value_idx1_clause>[，<value_idx_clause>::=<value_idx1_clause>[,

<value_idx_clause>]<value_idx_clause>]

<value_idx_1_clause>::＝[<string_parameters>|NUMBER|<value_idx_1_clause>::＝[<string_parameters>|NUMBER|

TIMESTAMP]TIMESTAMP]

<index_parameters><index_parameters>

<string_parameters>::＝STRING[<string_parametersl>[，<string_parameters>::=STRING[<string_parametersl>[,

<string_parameters>]]<string_parameters>]]

<string_parametersl>::＝NORMALIZED|<string_parametersl>::＝NORMALIZED|

IGNORE_MIXED_TEXT|CASE_INSENSITIVEIGNORE_MIXED_TEXT|CASE_INSENSITIVE

<index_parameters>::＝[INDEX[< identifier>][(<index_attributes>)]]<index_parameters>::=[INDEX[ <identifier> ][(<index_attributes>)]]

根据一个实施例，PARAMENTERS子句用于进行以下指定：According to one embodiment, the PARAMENTERS clause is used to specify:

●用于路径表和次级索引的名称和物理参数(表空间等)。即使没有明确指定，仍然创建PATH TABLE上的全部六个索引。• Names and physical parameters (tablespaces, etc.) for path tables and secondary indexes. All six indexes on PATH TABLE are created even if not explicitly specified.

○如果列VALUE的类型被显式地指定为BLOB，则将数值外联(out-of-line)存储在BLOB片段中。否则，将数值内联(inline)存储为RAW数据。RAW的<size>恰好为整数。o If the type of column VALUE is explicitly specified as BLOB, store the value out-of-line in the BLOB fragment. Otherwise, store the value inline as RAW data. RAW's <size> is exactly an integer.

○STRING值的属性为：○The properties of the STRING value are:

■当字符串的所有前导空格和结尾空格要被去除时，为NORMALIZED■ NORMALIZED when all leading and trailing spaces of the string are to be stripped

■在混合文本的情况下IGNORE_MIXED_TEX将值存储为NULL■ IGNORE_MIXED_TEX stores value as NULL in case of mixed text

■当将所有字符串转换为小写时，为■ When converting all character strings to lowercase, for

CASE_INSENSITIVECASE_INSENSITIVE

注意，在将值存储到VALUE列之前(并且在PATHTABLE上创建次级索引之前)应用上述操作。Note that the above is applied before storing the value into the VALUE column (and before creating the secondary index on PATHTABLE ).

●到索引的路径组。用户可以通过下列指定来控制索引路径组：• Groups of paths to indexes. Users can control index path groups by specifying:

○到索引的路径的显式列表。这可以包括通配符和//轴。例如，/a/b/c、/d/*、/x//yo An explicit list of paths to the index. This can include wildcards and // axes. For example, /a/b/c, /d/*, /x//y

○不被索引的路径的显式列表○ an explicit list of paths not to be indexed

删掉索引delete index

语法grammar

DROP INDEX<index_name>；DROP INDEX <index_name>;

实例example

Drop index xmldoc_idx；Drop index xmldoc_idx;

使用use

与XML索引的分量、下部PATH TABLE以及其次级索引一起删除XML索引。Drops the XML index along with its components, the lower PATH TABLE, and its subordinate indexes.

变更索引change index

语法grammar

ALTER INDEX<index_name>ALTER INDEX <index_name>

PARAMETERS’<parameter_clause>‘|PARAMETERS'<parameter_clause>'|

RENAME<new_index_name>|RENAME<new_index_name>|

REBUILD[ONLINE][PARALLEL[DEGREE<degree>]]REBUILD[ONLINE][PARALLEL[DEGREE<degree>]]

||

MODIFY PARTITION<partition_name>MODIFY PARTITION<partition_name>

PARAMETERS’<parameter_clause>‘|PARAMETERS'<parameter_clause>'|

RENAME PARTITION<partition_name>TO<new_partition_name>|RENAME PARTITION<partition_name>TO<new_partition_name>|

REBUILD PARTITION<partition_name>[ONLINE]REBUILD PARTITION<partition_name>[ONLINE]

[PARALLEL[DEGREE[PARALLEL[DEGREE

<degree>]]|<degree>]]|

实例example

Alter index xmldoc_idxAlter index xmldoc_idx

Parameters‘PATHS(/a/b/c，//e)，PATH TABLE Parameters 'PATHS(/a/b/c, //e), PATH TABLE

xmldoc_idx_pathtab’；xmldoc_idx_pathtab';

Alter index xmldoc_idx RENAME TO new_xmldoc_idx；Alter index xmldoc_idx RENAME TO new_xmldoc_idx;

Alter index xmldoc_idx REBUILD；Alter index xmldoc_idx REBUILD;

Alter index xmldoc_idx MODIFY PARTITION p1Alter index xmldoc_idx MODIFY PARTITION p1

Parameters‘PATHS(/a/b/c，//e)’；Parameters 'PATHS(/a/b/c, //e)';

Alter index xmldoc_idx RENAME PARTITION xmldoc_idxpart TOnew_xmldoc_idxpart；Alter index xmldoc_idx RENAME PARTITION xmldoc_idxpart TOnew_xmldoc_idxpart;

Alter index xmldoc_idx REBUILD PARTITION xmldoc_idxpart；Alter index xmldoc_idx REBUILD PARTITION xmldoc_idxpart;

硬件综述hardware overview

图2是示出可以执行本发明的实施例的计算机系统200的框图。计算机系统200包括用于传递信息的总线202或其它通信装置以及用于处理信息的与总线202连接的处理器204。计算机系统200还包括诸如随机访问存储器(RAM)或者其它动态存储装置的主存储器206，其连接至总线202用于储存信息和将由处理器204执行的指令。在执行将由处理器204执行的指令期间，主存储器206还可用于储存临时变量或其他中间信息。计算机系统200进一步包括只读存储器(ROM)208或连接至总线202的其他静态存储装置，用于存储静态信息和处理器204的指令。提供诸如磁盘或光盘的存储设备210，并连接至总线202用于存储信息和指令。FIG. 2 is a block diagram illustrating a computer system 200 on which an embodiment of the present invention may be implemented. Computer system 200 includes a bus 202 or other communication means for communicating information and a processor 204 coupled with bus 202 for processing information. Computer system 200 also includes a main memory 206 , such as a random access memory (RAM) or other dynamic storage device, coupled to bus 202 for storing information and instructions to be executed by processor 204 . Main memory 206 may also be used for storing temporary variables or other intermediate information during execution of instructions to be executed by processor 204 . Computer system 200 further includes a read only memory (ROM) 208 or other static storage device coupled to bus 202 for storing static information and instructions for processor 204 . A storage device 210, such as a magnetic or optical disk, is provided and connected to bus 202 for storing information and instructions.

计算机系统200可以经由总线202连接至诸如阴极射线管(CRT)的显示器212，用于向计算机用户显示信息。包括字母数字键和其他键的输入装置214连接至总线202，用于将信息和指令选择传递到处理器204。另一种类型的用户输入装置是光标控制216，诸如鼠标、跟踪球、或光标方向键，用于将方向信息和命令选择传递到处理器204并用于控制显示器212上的光标移动。输入装置通常在两个轴上(第一个轴(例如X轴)和第二个轴(例如Y轴))具有两个自由度，使装置能指定平面上的位置。Computer system 200 may be connected via bus 202 to a display 212, such as a cathode ray tube (CRT), for displaying information to a computer user. An input device 214 including alphanumeric and other keys is connected to bus 202 for communicating information and command selections to processor 204 . Another type of user input device is cursor control 216 , such as a mouse, trackball, or cursor direction keys, for communicating direction information and command selections to processor 204 and for controlling cursor movement on display 212 . The input device typically has two degrees of freedom in two axes, a first axis (eg, X-axis) and a second axis (eg, Y-axis), enabling the device to specify a position on a plane.

本发明涉及计算机系统200的使用，用于执行在此描述的技术。根据本发明的一个实施例，通过计算机系统200响应于执行包括在主存储器206中的一个或多个指令的一个或多个序列的处理器204，来实现这些技术。这样的指令可以从诸如存储装置210的其它计算机可读介质读入主存储器206。包括在主存储器206中的指令序列的执行，使得处理器204执行此处所述的处理步骤。在可选实施例中，可以使用硬连线电路(hard-wired circuitry)来取代软件指令或者与软件指令结合来实施该发明。因此，本发明的实施例将不限于硬件电路和软件的任何特定组合。The present invention is directed to the use of computer system 200 for performing the techniques described herein. These techniques are implemented by computer system 200 in response to processor 204 executing one or more sequences of one or more instructions contained in main memory 206 , according to one embodiment of the invention. Such instructions may be read into main memory 206 from other computer-readable media, such as storage device 210 . Execution of the sequences of instructions contained in main memory 206 causes processor 204 to perform the process steps described herein. In alternative embodiments, hard-wired circuitry may be used in place of or in combination with software instructions to implement the invention. Thus, embodiments of the invention should not be limited to any specific combination of hardware circuitry and software.

这里使用的术语“机器可读介质”是指参与提供数据以使机器以特定方式运转的任何介质。在使用计算机系统200实施的实施例中，例如，在提供指令给处理器204用于执行的过程中，涉及了多种机器可读介质。这种介质可以采取多种形式，包括但不限于非易失性介质、易失性介质、和传递介质。非易失性介质举例来说包括光盘或磁盘，诸如存储装置210。易失性介质包括动态存储器，诸如主存储器206。传输介质包括同轴电缆、铜线、和光纤，包括组成总线202的导线。传输介质还可采取声波或光波形式，例如那些在无线电波和红外线数据通信过程中产生的声波和光波。The term "machine-readable medium" as used herein refers to any medium that participates in providing data to cause a machine to function in a specific manner. In an embodiment implemented using computer system 200, various machine-readable media are involved, for example, in providing instructions to processor 204 for execution. Such a medium may take many forms, including but not limited to, non-volatile media, volatile media, and transmission media. Non-volatile media include, for example, optical or magnetic disks, such as storage device 210 . Volatile media includes dynamic memory, such as main memory 206 . Transmission media include coaxial cables, copper wire, and fiber optics, including the wires that make up bus 202 . Transmission media can also take the form of acoustic or light waves, such as those generated during radio-wave and infrared data communications.

通常形式的计算机可读介质包括如软盘、软性盘、硬盘、磁带，或者任何其它磁性介质、CD-ROM、任何其它光介质、打孔纸、纸带、或者任何带孔图样的物理介质、RAM、PROM、EPROM、FLASH-EPROM、或者其他任何存储芯片或者盒式磁带，或者以下提到的载波、或者计算机可读的任何其他介质。Common forms of computer readable media include, for example, floppy disks, floppy disks, hard disks, magnetic tape, or any other magnetic media, CD-ROMs, any other optical media, punched paper, paper tape, or any physical media with a pattern of holes, RAM, PROM, EPROM, FLASH-EPROM, or any other memory chips or cartridges, or carrier waves mentioned below, or any other medium readable by a computer.

各种形式的计算机可读介质可参与将一个或者多个指令的一个或多个序列承载到处理器204用于执行。例如，指令开始可承载在远程计算机的磁盘中。远程计算机可以将指令加载到其动态存储器中，然后使用调制解调器通过电话线发送指令。计算机系统200本地的调制解调器可接收电话线上的数据，并使用红外发射器将数据转换成红外信号。红外探测器可以接收红外信号携带的数据，并且合适的电路可以将数据放到总线202上。总线202将数据承载到主存储器206，处理器204从主存储器取回并执行这些指令。在由处理器204执行这些指令之前或之后，由主存储器206接收的指令可随意地储存在存储装置210上。Various forms of computer readable media may be involved in carrying one or more sequences of one or more instructions to processor 204 for execution. For example, the instructions may initially be carried on a disk in a remote computer. The remote computer can load the instructions into its dynamic memory and then send the instructions over a telephone line using a modem. A modem local to computer system 200 can receive the data on the telephone line and use an infrared transmitter to convert the data to an infrared signal. An infrared detector can receive the data carried in the infrared signal and appropriate circuitry can place the data on bus 202 . Bus 202 carries the data to main memory 206 , from which processor 204 retrieves and executes the instructions. The instructions received by main memory 206 can optionally be stored on storage device 210 either before or after execution of the instructions by processor 204 .

计算机系统200还包括连接至总线202的通信接口218。提供双向数据通信的通信接口218，连接到与局域网222连接的网络链路220。例如，通信接口218可以是综合业务数字网(ISDN)卡或者调制解调器，用于提供到相应类型的电话线的数据通信连接。又如，通信接口218可以是局域网(LAN)卡，用于提供至兼容局域网(LAN)的数据通信连接。也可以使用无线链路。在任何这样的实施中，通信接口218发送和接收承载表示各种类型的信息的数字数据流的电信号、电磁信号、和光学信号。Computer system 200 also includes a communication interface 218 connected to bus 202 . A communication interface 218 , which provides bi-directional data communication, is connected to a network link 220 connected to a local area network 222 . For example, communication interface 218 may be an Integrated Services Digital Network (ISDN) card or a modem for providing a data communication connection to a corresponding type of telephone line. As another example, communication interface 218 may be a local area network (LAN) card for providing a data communication connection to a compatible local area network (LAN). Wireless links may also be used. In any such implementation, communication interface 218 sends and receives electrical, electromagnetic, and optical signals that carry digital data streams representing various types of information.

网络链路220通常可通过一个或者多个网络向其它数据装置提供数据通信。例如，网络链路220可通过局域网222与主机224连接，或者与互联网服务提供商(ISP)226操作的数据设备连接。ISP226又通过目前通称为“互联网”228的全球分组数据通信网络提供数据通信服务。局域网222和互联网228都使用承载数字数据流的电信号、电磁信号、或光学信号。通过各种网络的信号和网络链路220上的信号以及通过通信接口218的信号，都传送数字数据给计算机系统200或者传送来自计算机系统的数字数据，是传输信息的载波的示例性形式。Network link 220 may generally provide data communication to other data devices through one or more networks. For example, network link 220 may connect to host computer 224 through local area network 222 or to data equipment operated by Internet Service Provider (ISP) 226 . ISP 226 in turn provides data communication services through a global packet data communication network currently known as the "Internet" 228 . Local area network 222 and Internet 228 both use electrical, electromagnetic or optical signals that carry digital data streams. The signals through the various networks and the signals on network link 220 and the signals through communication interface 218, both carrying digital data to and from computer system 200, are exemplary forms of carrier waves carrying the information.

计算机系统200能通过网络、网络链路220、和通信接口218发送消息和接收数据(包括程序代码)。在互联网的实例中，服务器230可通过互联网228、ISP226、局域网222、和通信接口218，传送用于应用程序的所请求的程序代码。Computer system 200 is capable of sending messages and receiving data (including program code) over a network, network link 220 , and communication interface 218 . In the example of the Internet, server 230 may transmit the requested program code for the application through Internet 228 , ISP 226 , local area network 222 , and communication interface 218 .

所接收的代码可以在其被接收时由处理器204执行，和/或储存在存储装置210或者其它非易失性介质中用于随后执行。按照这种方式，计算机系统200可以以载波的形式获得应用代码。The received code may be executed by processor 204 as it is received, and/or stored in storage device 210 or other non-volatile medium for subsequent execution. In this manner, computer system 200 can obtain the application code in the form of a carrier wave.

以上所述仅为本发明的优选实施例而已，并不用于限制本发明，对于本领域的技术人员来说，本发明可以有各种更改和变化。凡在本发明的精神和原则之内，所作的任何修改、等同替换、改进等，均应包含在本发明的保护范围之内。The above descriptions are only preferred embodiments of the present invention, and are not intended to limit the present invention. For those skilled in the art, the present invention may have various modifications and changes. Any modifications, equivalent replacements, improvements, etc. made within the spirit and principles of the present invention shall be included within the protection scope of the present invention.

Claims

1. method that is used for access from the information of XML document, described method comprises:

Identification is with indexed groups of nodes in XML document;

For with each node in the indexed described groups of nodes, the clauses and subclauses of described node are stored in the index, wherein, the described clauses and subclauses of given node comprise the locator data that is used to locate the XML content relevant with described given node, and in following at least one:

(a) ranked data is represented the classification position of described given node in comprising the described XML document of described given node; And

(b) path data is corresponding to the structure of the XML document by the comprising described specific node path to described given node; And

In response to request, use described index to locate information in the described XML document from the information of described XML document.

2. method according to claim 1, wherein, described index is implemented as relation table, and the step of carrying out each node clauses and subclauses of storage by the row of each node in the described groups of nodes of described relation table stored.

3. method according to claim 1, wherein, the relevant value of one or more nodes in described index stores and the described groups of nodes.

4. method according to claim 3, wherein, be stored in described value in the described index and comprise the value of numerous types of data, and described method also is included as every kind of step of setting up secondary index in two or more data types in the described numerous types of data.

5. method according to claim 1, wherein, the described locator data of described given node comprises first data that are used for the localization of XML document and second data that are used for the information that the location is relevant with described given node in described XML document.

6. method according to claim 1, wherein,

The clauses and subclauses of described given node comprise ranked data, and described ranked data is represented the classification position of described given node in comprising the described XML document of described given node; And

Described method also comprises based on described ranked data, sets up the secondary index that is used for storing entry in described index.

7. method according to claim 1, wherein,

The clauses and subclauses of described given node comprise path data, and described path data is corresponding to the structure of the XML document by the comprising described specific node path to described given node; And

Described method also comprises based on described path data, sets up the secondary index that is used for storing entry in described index.

8. method according to claim 1 also comprises based on the rating information relevant with the child node of described father node, sets up the secondary index that is used for storing entry for described father node in described index.

9. according to the described method of claim 1, wherein, identification may further comprise the steps the step of indexed groups of nodes:

Reception be used for determining should which path of index rule;

Determine and the path that the node in the indexed XML document is relevant; And

Be used for determining the described rule in which path of index based on (a), and (b) with the described path that the node in the indexed described XML document is relevant, discern which node of index.

10. method according to claim 9, wherein, described regular explicit identification will be included in the path in the described index.

11. method according to claim 9, wherein, described regular explicit identification will be by the path of getting rid of from described index.

12. method according to claim 4 comprises that also receiving indication will set up the user data of secondary value indexed data type to it.

13. a computer-readable medium that carries one or more instruction sequences when described one or more instruction sequences are carried out by one or more processors, makes described one or more processor enforcement of rights require the method described in 1.

14. a computer-readable medium that carries one or more instruction sequences when described one or more instruction sequences are carried out by one or more processors, makes described one or more processor enforcement of rights require the method described in 2.

15. a computer-readable medium that carries one or more instruction sequences when described one or more instruction sequences are carried out by one or more processors, makes described one or more processor enforcement of rights require the method described in 3.

16. a computer-readable medium that carries one or more instruction sequences when described one or more instruction sequences are carried out by one or more processors, makes described one or more processor enforcement of rights require the method described in 4.

17. a computer-readable medium that carries one or more instruction sequences when described one or more instruction sequences are carried out by one or more processors, makes described one or more processor enforcement of rights require the method described in 5.

18. a computer-readable medium that carries one or more instruction sequences when described one or more instruction sequences are carried out by one or more processors, makes described one or more processor enforcement of rights require the method described in 6.

19. a computer-readable medium that carries one or more instruction sequences when described one or more instruction sequences are carried out by one or more processors, makes described one or more processor enforcement of rights require the method described in 7.

20. a computer-readable medium that carries one or more instruction sequences when described one or more instruction sequences are carried out by one or more processors, makes described one or more processor enforcement of rights require the method described in 8.

21. a computer-readable medium that carries one or more instruction sequences when described one or more instruction sequences are carried out by one or more processors, makes described one or more processor enforcement of rights require the method described in 9.

22. a computer-readable medium that carries one or more instruction sequences when described one or more instruction sequences are carried out by one or more processors, makes described one or more processor enforcement of rights require the method described in 10.

23. a computer-readable medium that carries one or more instruction sequences when described one or more instruction sequences are carried out by one or more processors, makes described one or more processor enforcement of rights require the method described in 11.

24. a computer-readable medium that carries one or more instruction sequences when described one or more instruction sequences are carried out by one or more processors, makes described one or more processor enforcement of rights require the method described in 12.