CN120035819A

CN120035819A - Framework and method for cross-layer data consistency verification

Info

Publication number: CN120035819A
Application number: CN202380071033.2A
Authority: CN
Inventors: T·拉希里; S·苏雷什; B·C·哈默施密特; A·D·波佩斯库; J·卡普; 刘振华
Original assignee: Oracle International Corp
Current assignee: Oracle International Corp
Priority date: 2022-10-06
Filing date: 2023-07-10
Publication date: 2025-05-23
Also published as: EP4599334A1; WO2024076405A1

Abstract

The computer analyzes the relational schema of the database to generate a data entry schema encoded as JSON. The data entry schema is sent to the database client so that the client can verify the entered data before it is sent for storage. The entered data that conforms to the data entry schema is received from the client because the client used the data entry schema to verify the entered data before sending the data. The entered data that conforms to the data entry schema is stored in the database. The data entry schema and the relational schema have corresponding constraints on the data to be stored, such as a range restriction of a database column or a set of clear different valid values. The constraints can specify a format mask or regular expression that the values in the column should conform to, or a correlation between the values of multiple columns.

Description

Framework and method for cross-layer data consistency verification

Priority claim

The present application is in accordance with 35U.S. C. ≡119 (e) claims the benefit of provisional application 63/413,835 filed on 6 th 10 th 2022, the entire contents of which are hereby incorporated by reference as if fully set forth herein.

Technical Field

The present invention relates to verification of entered data. More specifically, techniques are described that involve generating a client-independent schema (client-generic schema) using a relational schema of a database that describes rules for consistently verifying entered data on any client prior to storing the entered data in the database.

Background

Relational databases have been successful in large part because they centralize data models, supporting unified application semantics. This approach has been effective for over 40 years, but the traditional relational model is now inadequate to meet the needs of rich application semantics that are increasingly implemented at the application layer rather than in databases. This makes the database more like a passive container, which provides data storage and data access and manipulation, while application semantics migrate out of the database into tools and micro-services. This new division of responsibilities may lead to a divergence between modules, a non-standard implementation, an increase in overall architecture complexity and a decrease in efficiency due to the wide variety of ecosystems implemented by the clients. For example, complex rules for validating data stored in a particular table may be implemented and enforced by a particular client application. If a different client application or microservice accesses the same data, another client may make changes that result in data violating the validation rules. To prevent this, another client may implement and execute the same authentication logic, thereby repeating the work and increasing the likelihood of inconsistencies.

To avoid unnecessary repetition of client-implemented authentication rules, the authentication function may be centralized on the server side. Database servers provide extremely complex mechanisms (e.g., data types with length restrictions, inspection constraints, reference constraints, etc.) to ensure cleanliness and integrity of stored data. But the data integrity framework exists only within the database layer.

Unfortunately, performing the verification in the database layer also has drawbacks. In particular, when performing authentication on the server side, the client application may still generate input of a database refusal for a format error. For example, a user may use a client to populate a complex and detailed form. Only after the form is completely filled out and eventually submitted to the database server will the user be notified that a validation error has occurred. Thus, performing server-side verification may result in wasteful database access and annoyance to the user by entering data into the UI form, submitting the entered data, and then receiving a web page that alerts that all fields in the form have not passed the database-side verification. For this reason, the client application typically does implement its own independent authentication mechanism to perform a sanity check on the input data so that the input of a format error can be immediately rejected. The problem with this approach is that nothing prevents application-side rules and server-side rules from deviating or diverging across different application modules and micro-services.

This trend of moving validation rules into applications has a number of drawbacks:

Resulting in a bias in the validation semantics between the application and the database,

Requiring applications to re-implement authentication across modules and micro-services, and

Complicating development and creating islands.

Today, applications and databases have completely separate mechanisms to validate data. Databases use extremely complex constraints that may have infinite complexity and are therefore unsuitable for client-side evaluation. Applications have their own custom rules to validate client input data, and these rules may often deviate from the database and across modules and micro-services. There is currently no mechanism to ensure that user input is verified consistently within an application and in a database. For example, because applications and database rules may be prone to bias, if some applications have performed client-side checks on entered data, they may avoid adding database constraints altogether.

Drawings

In the drawings:

FIG. 1 is a block diagram depicting an example database system that uses a relational schema of a database to generate a data entry schema that is a client-independent schema for verifying entered data at a database client prior to storing the entered data in the database;

FIG. 2 is a flow chart depicting an example computer process for generating a data entry pattern using a relational pattern of a database, the data entry pattern being a client-independent pattern for verifying entered data at a database client prior to storing the entered data in the database;

FIG. 3 is a block diagram illustrating a computer system in which embodiments of the invention may be implemented;

FIG. 4 is a block diagram illustrating a basic software system that may be used to control the operation of a computing system.

Detailed Description

In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the present invention. It may be evident, however, that the present invention may be practiced without these specific details. In other instances, well-known structures and devices are shown in block diagram form in order to avoid unnecessarily obscuring the present invention.

General overview

A cross-layer consistency method for real input data within an application and a database kernel using the same set of rules is described herein. In this approach, validation rules are centrally declared within the database and shared by all application modules. Techniques are also described that communicate rules to applications in a client-independent format to enable client-side enforcement of constraints. Specific examples of client-independent formats using JavaScript object notation (JSON) as specified validation rules will be given herein, but the techniques described herein are not limited to any particular client-independent format. Thus, embodiments of a canonical language using JSON as validation rules and schematic constraints are described, which is fundamentally different from the conventional use of JSON as payload transport format.

Given the relational table definition and its associated constraints, a JSON representation is automatically generated that captures as many validation rules of the table as possible. The database client can obtain and use the generated JSON as a language neutral cross-layer input validation mechanism for entering data for storage into the table.

In an embodiment, a computer analyzes relational patterns of a database to generate data entry patterns and encodes the data entry patterns in a client-independent format (e.g., JSON). As used herein, a "data entry schema" is a structured data set that describes validation rules and maps the validation rules to specific locations within the database schema. The data entry pattern is sent to the database client so that the client can verify the entered data before sending the entered data for storage in the database. The entered data is received from the database client in compliance with the data entry pattern because the client uses the data entry pattern to verify the entered data prior to sending the data. Entered data conforming to the data entry schema is stored in a database.

In an embodiment, the data entry pattern and the relationship pattern have corresponding constraints on the data to be stored, such as range limitations of the database column, such as a minimum and/or maximum value or a set of distinct valid values. Constraints may specify a format mask or regular expression that values in a column should conform to, or correlations between values of multiple columns in the same row of a relational table.

1.0 Example database System

FIG. 1 is a block diagram depicting an example database system 100 in an embodiment. The database system 100 uses the relational schema 140 of the database 120 to generate a data entry schema 170, which data entry schema 170 is a client-independent schema for verifying the entered data 190 at the database client 130 prior to storing the entered data 190 in the database 120. Database system 100 may be hosted by at least one computer, such as a rack server (such as a blade), a personal computer, mainframe, virtual computer, or other computing device. Database system 100 may comprise a database management system (DBMS) that may comprise one or more database servers, each of which may be a middleware such as database server 110. Database server 110 may contain and operate one or more relational databases, such as relational database 120.

1.1 Relational database schema

The relational schema 140 may define the relational database 120. The relationship schema 140 is schematically illustrated as being external to the relationship database 120 and the database server 110 to indicate that the relationship schema 140 may or may not be external to the relationship database 120 and may or may not be external to the database server 110, as discussed later herein. In one example, the relationship schema 140 is stored in a database dictionary 125 in the database 120. The database dictionary is a database metadata container, as discussed later herein.

Relationship schema 140 may define one or more relationship tables, such as relationship table 150, that contain one or more table columns, such as column 160, that store values, such as numbers or text strings, for data types 161. If the particular data type is not the data type 161 of column 160, database server 110 denies any attempt to store the value of the particular data type in column 160. Column 160 may be optional or necessary, as indicated by an option 162, indicating whether column 160 may or may not store a null value indicating no value.

1.2 Checking constraints

In addition to or instead of column constraints 161-162 that limit which values may be stored in column 160, relationship table 150 may have one or more CHECK constraints, such as CHECK constraint 163 that verifies the contents of relationship table 150. The CHECK constraint 163 may specify one or more constraints, such as:

A range constraint, such as a minimum and/or maximum or a set of distinct valid values,

The values in column 160 must conform to a format mask or regular expression,

Correlation between values of multiple columns in the same row of relationship table 150, and

Custom conditions, such as only prime numbers allowed.

The CHECK constraint may be implemented using the same or similar predicate expression syntax as used in filtering queries to the database server 110, such as a filter in the WHERE clause of a SELECT query in Structured Query Language (SQL). Thus, CHECK constraints can be very expressive and complex, such as a composite expression that contains multiple expression operators, each of which anticipates multiple parameters, such as so-called binary operators, such as logical (i.e., boolean) operators, and relational operators, such as those used to apply relational algebra. Likewise, the CHECK constraint may call a User Defined Function (UDF) with a complex and computationally intensive implementation.

The implementation of UDF may be transparent or may be opaque, whereby it is difficult or impossible to analyze the implementation. For example, UDF may be implemented only as so-called object code consisting of instruction sequence(s) of an Instruction Set Architecture (ISA) of a Central Processing Unit (CPU). In this case, the database server 110 may execute the UDF, but may not be able to analyze the instructions to implement the UDF. For example, database server 110 may apply a CHECK constraint that invokes a UDF with opacity, which prevents database server 110 from discovering the logical nature of the CHECK constraint because the CHECK constraint depends on an opaque UDF. For example, in some cases, database server 110 may be more or less incapable of fully modeling the CHECK constraint or generating an approximation of the CHECK constraint, such as to send the approximation to database client 130, as discussed later herein.

1.3 Relationship patterns specified in DDL statement(s)

Relationship schema 140 and/or any of the illustrative details, such as relationship table 150, columns 160, and constraints 161-163, may be individually or collectively defined by one or more Data Definition Language (DDL) statements as discussed later herein. The following is an example DDL statement defining an example relationship table containing example columns with example column constraints.

1.4 Client and Server

If the relational database 120 contains a relational table 150, the database client 130 may send the entered data 190 to the database server 110 for storage in the relational table 150. Database client 130 is a software application that may or may not share memory address space with database server 110, and may or may not reside on the same computer as database server 110. For example, database server 110 may be embedded in database client 130, located together in an Operating System (OS) process, or may reside in a separate process that cooperates using inter-process communication (IPC). The collaboration of database client 130 and database server 110 may use a client-server database protocol, such as open database connectivity (ODBC) or Java ODBC (JDBC).

1.5 Interactive data entry

The entered data 190 may be any information that enters the database system 100 through the database client 130 and is not automatically generated within the database client 130 itself. For example, the entered data 190 may be interactively (i.e., manually) entered into the database client 130, or may be received by the database client 130 from an external automation source. For example, an external system may use database client 130 to store data into relational database 120. In either case, the entered data may be more or less unreliable (i.e., invalid). For example, the entered data may not conform to the constraints 161-163 required for storage in the relationship table 150.

The entered data 190 may be stored into the relational database 120 only if the entered data 190 meets the relational schema 140. One goal of database system 100 is for database client 130 to ensure that entered data 190 is valid before sending entered data 190 to database server 110 for storage. Because point of interest separation is a design principle, database client 130 does not expect to obtain, understand, and directly enforce relational schema 140. For example, the relationship schema 140 may not be available to the database client 130, or the relationship schema 140 may be expressed in a domain-specific language (DSL), such as DDL, that the database client 130 may not understand.

1.6 Data entry modes

Alternatively, database client 130 obtains a data entry schema 170 that represents some limited aspects of relationship schema 170, but is not itself a relationship schema, and is expressed in an open and standardized format that can be processed regardless of the subject matter of database client 130 and regardless of the implementation of database client 130. In other words, the data entry pattern 170 is application and platform independent.

The data entry pattern 170 is encoded (e.g., in text form) in a client-independent format. For purposes of illustration, an example will be given in which the data entry schema 170 is encoded as a JavaScript object notation (JSON), which is a data exchange format that can express structured and nested data. Most or all web browsers natively support JavaScript, which provides two benefits. First, the web browser can easily parse the data entry schema 170 because JSON conforms to the syntax and syntax of JavaScript, which means that any JavaScript application or browser application can handle the data entry schema 170. Second, the database client 130 can be easily implemented in a web browser by implementing the database client 130 in JavaScript.

JSON parsing is common in common programming languages such as Java, c#, c++, and Python. JSON syntax and syntax is internationally standardized and is incorporated herein by reference in its entirety as the standard itself "ECMA-404:JSON data interchange syntax", second edition, published by the European Computer Manufacturers Association (ECMA) at month 12 in 2017.

The data entry pattern 170 is automatically generated from the relationship pattern 140. For example, database server 110 may generate data entry pattern 170 and send data entry pattern 170 to database client 130. The data entry schema 170 is an open (i.e., implementation-independent) artifact that any database client can obtain, interpret, and apply to entered data for verification before sending it to the database server 110.

1.7 Server validates influence on client

For example, the entered data 190 contains a field value 195, and the database client 130 may attempt to store the field value 195 into the column 160 by sending the entered data 190 to the database server 110 in a Data Manipulation Language (DML) statement, such as an INSERT or UPDATE statement of SQL. If field value 195 does not meet constraints 161-163, database server 110 may reject entered data 190, which may be troublesome to the user of database client 130. For example, if field value 195 is invalid because it does not conform to all of the database constraints 161-163, database server 110 may reject the DML statement without database server 110 having to execute the query plan.

One technical challenge is that when database server 110 rejects entered data 190, the user interface screen for interactive entry of entered data 190 may be reset to an empty entry field or may no longer be displayed. The entered data 190 may contain many field values entered in a sequence of various screens and it may be difficult to return to a particular screen in the sequence to re-enter the particular field whose entered value is invalid. For example, database server 110 may indicate to database client 130 that entered data 190 was rejected, but may not indicate which of the many entered fields of entered data 190 has an invalid value.

1.8 JSON generation for client authentication

The data entry mode 170 avoids these interactivity problems by using the following two phases of operation. The first stage generates a data entry pattern 170 from a portion or all of the relationship pattern 140. For example, any of the schema components 150, 160, and 161-163 can be explicitly excluded from transitioning to the data entry schema 170. For example, column 160 may have a flag that explicitly indicates that column 160 is included or excluded for generating data entry pattern 170. Also, some schema components may be implicitly excluded from generating the data entry schema 170, such as if at least a portion of the CHECK constraint 163 is opaque or otherwise does not support transitioning to the data entry schema 170.

The generation of the data entry pattern 170 may require processing of the pattern components 150, 160, and 161-163 and converting some or all of these components into corresponding ones of the data entry pattern 170. For example, the field 180 may be generated from the column 160 and other fields within the data entry pattern 170 may be generated from other columns in the same column 160 or the same relationship table 150 or other relationship tables in the relationship pattern 140. Since the data entry pattern 170 is encoded in JSON that supports compound (i.e., multi-field) and nested data structures, the relevant fields may be logically arranged into groups in the data entry pattern 170. For example, the data entry schema 170 can contain a different JSON object for each relationship table whose columns have corresponding fields in the data entry schema 170.

For example, field 180 may itself be a JSON object that is nested in another JSON object corresponding to relationship table 150 along with other JSON objects representing other fields of other columns in relationship table 150 than column 160. But it is not required that modes 140 and 170 have the same illustrative normalization. For example, the relationship schema 140 may have a plurality of relationship tables arranged in a multi-dimensional master-slave (master-slave) schema, such as a star or snowflake shape, but the data entry schema 170 may instead contain a flat set of fields without imposing any nesting or grouping.

1.9 Transition constraint semantics

Some of the constraints in the relationship schema 140, such as column constraints 161-162, may be more or less directly translated into corresponding constraints in the data entry schema 170. For example, the data type 161 may specify that the column 160 only stores numbers and may generate corresponding number constraints for the field 180 in the data entry mode 170.

The semantics of field constraints 181-183 are as follows. If the data types 161 have values that may be naturally ordered, such as by number or lexical order, the value range 181 may specify upper and/or lower bounds (i.e., value limits) of the field values 195, and each limit may be explicitly an inclusive or exclusive limit. The value range 181 may alternatively specify a valid set of different individual values, and all other values of the data type 161 are prohibited for the field value 195. Format 183 may specify a pattern, mask, or regular expression that field value 195 should conform to, such as a format of a phone number or email address. The data entry schema 170 can contain constraints, each specifying any of the following:

The regular expression is used to define the expression,

The limit of the count of array elements,

The limit of the count of array elements matching the criteria,

The limit of the count of fields of JSON objects in data entry schema 170,

An indication that the elements of the array should be different, an

An indication of whether the limit value is inclusive or exclusive.

The data entry pattern 170 may contain a version identifier (e.g., from a monotonically increasing sequence of numbers or time stamps). In an embodiment, the version identifier of the data entry pattern 170 is based on at least one of the version identifier of the relationship pattern 140 and a unique identifier that is automatically generated when the data entry pattern 170 is generated.

When the relationship schema 140 is modified, the data entry schema 170 can be automatically regenerated such that the data entry schema 170 is based on the latest version of the relationship schema 140. Each regeneration of the data entry pattern 170 may contain a new different version identifier. In an embodiment, the database server 110 may optionally be configured to reject the entered data 190 if the entered data 190 does not contain the newly regenerated version identifier of the data entry pattern 170.

In JSON, a field may be an array containing multiple values. If field 180 represents an array whose values are provided as elements in field value 195, then uniqueness 182 requires that field value 195 not contain a duplicate item, as opposed to requiring column 160 not to contain a duplicate item. For example, column 160 may contain data for many users, and may require uniqueness only within the user's data, rather than across all users. In this case, the field value 195 may be an array of values for one user and should not contain duplicate items, but the column 160 may contain duplicate items as long as each duplicate item of the same value is stored for a different respective user.

Example field constraints for example fields are contained below. Each example field corresponds to a respective example column in the example DDL statement described previously herein. The example fields and example field constraints are encoded as JSON in the example data entry schema below.

In the example data entry schema described above, the field "Category" is necessary and nested as one of the "properties" of "Product". The valid class field value must be a string of up to ten characters and must be exactly one of the enumerated words "Home" or "apparatus". To generate the example data entry patterns described above, field names and types are inferred from the tabular definition. The string length limit (maxLength) for a field is inferred from the varchar or char length of the table. Field constraints (such as enumeration and min/max checking) are inferred from the CHECK constraints listed in the table. The list of required fields is inferred from the NOT NULL constraint of the list.

1.10JSON PRECHECK constraint

In an embodiment, the standard SQL DDL syntax is enhanced to express the following novel example constraints as CHECK constraints.

CONSTRAINT<name>CHECK WITH JSON PRECHECK(cond1 AND cond2 AND...condN)

For column 160, the example constraints described above have the following novel features. "CONSTRAINT < name >" means that the constraint itself is a first type of database object, which can be identified, referenced, and processed separately, such as in the manner set forth herein. Each of cond1-condN may reference the same or different one of the respective columns.

The above "JSON PRECHECK" explicitly indicates that the constraint should be used to generate the data entry pattern 170, but the constraint is not necessarily enforced by the database server 110. In other words, the content in column 160 need not meet this novel constraint. In this context PRECHECK designates that the database client 130 apply the data entry pattern 170 to entered data 190. Optionally, database server 110 may also enforce PRECHECK upon receipt of entered data 190 from database client 130.

The database server 110 may provide a built-in function or UDF and when the database server 110 receives the entered data 190, optionally invoking the function results in either a) applying the data entry pattern 170 to the entered data 190 or b) applying a specific field constraint (such as a range of values 181) to the field value 195. The application of patterns 140 or 170, respectively, to stored or entered data may be referred to herein as verifying the data with patterns.

1.11 Constraint handling by clients

As explained above, the first operational phase generates the data entry pattern 170, which may be immediately or eventually followed by the second operational phase, requiring the database client 130 to verify the entered data 190 using the data entry pattern 170, and then to send the entered data 190 to the database server 110. For example, database client 130 may have more or less hard coded mappings between particular fields in data entry pattern 170 and particular user interface widgets displayed in the screen. For example, field 180 may represent a time type, and database client 130 may map field 180 to a text entry widget into which a user may enter time as a text string. For example, a field in the data entry schema 170 may have an identifier and the data entry widget may have its own identifier. Database client 130 may have a mapping from the respective identifier of each field in data entry schema 170 to the identifier of the respective corresponding widget.

In another example, database client 130 lacks a predefined user interface screen and instead processes data entry pattern 170 to dynamically generate the corresponding screen(s). For example, database client 130 may detect that field 180 represents a time type and generate a corresponding widget for entering a time value. The dynamically generated gadgets may be generic to any data type 161, such as text entry gadgets, or may be specific to a particular data type. For example, a date may be entered into a calendar widget.

1.12 Data entry SCHEMA meets the JSON SCHEMA Standard

JSON itself can specify complex data structures, but lacks expression syntax to traverse these data structures. As a representation standard, basic data processing such as filtering and verification is lacking in JSON. The data entry pattern 170, while encoded entirely in JSON lacking expressions, may contain expressions such as regular expressions and/or compound expressions composed of expression operators having predefined semantics.

In an embodiment, the data entry Schema 170 is a JSON conforming to the JSON Schema of the Internet Engineering Task Force (IETF), although JSON was originally created for use without Schema, and the techniques herein do not require that Schema be applied to the data entry Schema 170. The example data entry Schema described above is encoded as JSON conforming to JSON Schema. The IETF publication "JSON Schema Validation: A Vocabulary for Structural Validation of JSON" at month 10 of 2022 is incorporated herein by reference in its entirety.

1.13 Normalized constraint semantics

JSON Schema can provide standardized semantics for field constraints 181-183. For example, uniqueness 182 may be encoded using the "uniqueItems" key of JSON Schema, and data entry Schema 170 may have constraints encoded with a rich vocabulary of so-called validation keys of JSON Schema, with standardized semantics that data entry Schema 170 and database client 130 may utilize. In this case, any application that understands JSON Schema can fully and automatically use the data entry Schema 170 to verify entered data 190.

The following is an example mapping of DDL CHECK conditions to JSON Schema validation (i.e., field constraints). This example mapping may be used to generate a data entry pattern 170 from the relationship pattern 140. In this example map, < value > is always literal and is by no means an expression or reference to a different column.

1.14 Data type transitions

The following is an example mapping of SQL column types to JSON field types. This example mapping may be used to generate a data entry pattern 170 from the relationship pattern 140.

1.15 Predefined Format and formatting function

The data entry schema 170 is a semi-structured document. But the data entry schema 170 does not contain extensible markup language (XML) other than as shown in the example mappings described above, and the relational database 120 does not store XML.

Without JSON Schema, the data entry Schema 170 can have any predefined validation semantics for which the database server 110 can generate a representation into the data entry Schema 170, so long as the database client 130 can enforce the semantics by interpreting the data entry Schema 170. JSON Schema provides predefined and combinable validation semantics that can be arranged and configured in a declarative manner in the data entry Schema 170. Clients and servers in database system 100 may employ JSON Schema as described herein, or may instead agree to other predefined semantics, which may be more or less difficult for multiple parties to agree on. Either way, the data entry pattern 170 may be generated and enforced as discussed herein.

The following is an example function that may be invoked under PRECHECK conditions. As previously explained herein, PRECHECK are performed by database client 130 and then optionally by database server 110. Thus, there may be two executions of the same PRECHECK for the same field value 195. But the two executions may use different implementations of the same PRECHECK. The following example functions may be implemented as JavaScript for use by database client 130, and may alternatively be implemented as a built-in function or UDF for use by database server 110. In an embodiment, the built-in function or UDF is the smallest wrapper delegated to JavaScript implementations as long as the database server 110 can execute JavaScript.

The example functions described above may be explicitly referenced in format 183 as long as database client 130 may execute JavaScript. The preferred embodiment of format 183 does not reference an example function and does not require JavaScript on the client. Alternatively, format 183 is generated as a JSON attribute of field 180. For example, as shown in the example functions described above, the generated attribute may be named "format" and may have a value that identifies a predefined format. The predefined format is a complex format that is referenced by name only and without parameters. In various embodiments, each predefined format is implemented by or not by a corresponding example function as shown above.

2.0 Example data entry Pattern Generation Process

FIG. 2 is a flow chart depicting an example computer process that the database system 100 may execute to generate a data entry pattern 170 using the relational pattern 140 of the database 120, the data entry pattern 170 being a client-independent pattern for verifying the entered data 190 at the database client 130 prior to storing the entered data 190 to the database 120. Fig. 2 is discussed with reference to fig. 1. The following three example embodiments each use a different corresponding component of fig. 1 to perform the process of fig. 2. As follows, each of these three embodiments uses a different manner to obtain the relationship pattern 140 to generate the data entry pattern 170.

In a first embodiment, database server 110 performs steps 201-203 and 205-206 by locally accessing relationship schema 140. In the second and third embodiments, corresponding components other than the database server 110 generate the data entry pattern 170, which requires steps 201-202.

In the second embodiment, steps 201-202 are instead performed by database client 130, database client 130 accessing relational schema 140 by connecting to database server 110, database server 110 performing the remaining steps 203-206.

In a so-called offline third embodiment, which uses neither relational database 120 nor database server 110, the software tool analyzes DDL statements (e.g., in a database management script). The third embodiment generates the data entry pattern 170 by performing steps 201-202 and does not perform steps 203-206. For example, the third embodiment works even if there is no relational database 120 and database server 110.

Step 201 analyzes the relationship schema 140 to find table, column, and database constraints, such as column constraints. Identifiers and configurations of all these database objects are found by examining the relational schema 140, which may be encoded as DDL statements or stored in a database dictionary. The database dictionary will be discussed later herein. Step 201 may iterate through the discovered table, column, and database constraints to ignore (i.e., not process) database objects that explicitly should not or implicitly cannot be transitioned to the portion of the data entry schema 170.

Step 202 generates a data entry pattern 170 from the relationship pattern 140. Step 202 may iterate through the discovered tables, columns, and database constraints to generate corresponding portions of the data entry pattern 170, as previously discussed herein.

Step 203 sends the data entry pattern 170 to the database client 130. For example, database client 130 may request that database server 110 perform step 203. For example, as long as database server 110 comprises a web server, database client 130 may send a representational state (REST) request or other hypertext transfer protocol (HTTP) request to database server 110 to request a copy of data entry schema 170.

In the first and second embodiments, database client 130 performs step 204. In step 204, the database client 130 verifies that the entered data 190 conforms to the data entry pattern 170 and then sends the verified entered data 190 to the database server 110. Step 204 may be caused when entered data 190 is interactively entered as input into database client 130 (which may involve a user interface screen, a web page in a web browser, or a command line).

In step 205, database server 110 receives entered data 190 from database client 130. For example, database server 110 receives a DML statement containing entered data 190. The DML statement may specify that field value 195 be written into column 160. Database client 130 may contain a database driver that generates DML statements, which are plain text statements that may be generated and/or sent by database client 130 with or without the database driver. Of course, prior to sending the entered data 190, the database client 130 should successfully verify the entered data 190 using the data entry schema 170.

Step 206 stores the entered data 190 in the relational database 120. For example, database server 110 may store field value 195 into column 160 when executing a DML statement. Before storing the entered data 190, step 206 verifies that the entered data 190 conforms to the relationship schema 140. If the verification fails, then storing the entered data 190 via step 206 does not occur. For example, a DML statement is rejected without executing it.

2.1 First example Activity

The following example activities A1-A3 illustrate acts that may or may not be implemented and performed by the database server 110. Activity A1 is an optional (e.g., redundant) validation as previously discussed herein, and may be skipped (i.e., not performed). Activity A1 occurs between steps 205-206 described above and decides whether to terminate the process of FIG. 2 without performing the last step 206.

In activity A1, database server 110 detects whether entered data 190 corresponds to data entry pattern 170. If activity A1 detects that logged data 190 is invalid, then the last step 206 is skipped and the DML statement is rejected without being executed, for example. For example, if the entered data 190 is instead from a database client lacking the data entry pattern 170, verification of the entered data 190 with the data entry pattern 170 by the activity A1 may fail, and even if the entered data 190 appears valid if instead compared to the relationship pattern 140, verification of the activity A1 may fail. If the entered data 190: a) contains invalid field values, b) lacks values of necessary fields, or c) contains values of invalid fields (such as unrecognizable fields, forbidden fields, or too many fields in total), then the entered data 190 is invalid relative to the data entry pattern 170.

2.2 Second example Activity

Unlike activity A1, activities A2-A3 require behavior that is somewhat independent of the processing of FIG. 2. Activity A2 illustrates a future development of database server 110, unfortunately, the prior art may require a reboot after reconstructing its code base (e.g., to include a new data entry format for data entry schema 170). For example, an Original Equipment Manufacturer (OEM) may include some data entry formats and accompanying implementation logic in database server 110, such as a timestamp format of a timestamp JSON field that may specify a date, time, and time zone. Implementing logic correctly adapts leap years that cannot be directly expressed with a form mask or regular expression. If the user desires a new data entry format that is not an OEM-provided stock data entry format, the user should provide a format mask, regular expression, or accompanying implementation logic for the new data entry format. The new accompanying implementation logic should be added to the code base of database server 110. Under the prior art, modifying the code base of the database server requires restarting the database server.

Without restarting the database server 110, activity A2 may add a new data entry format to the relational database 120. For example, the accompanying implementation logic of the new data entry format may be contained in an Oracle PL/SQL package, which database server 110 can dynamically add to relational database 120 without restarting database server 110.

2.3 Third example Activity

The example activity A3 effectively operates in reverse by generating a definition of a new relationship table from the data entry pattern. For example, data entry patterns may be pre-existing and widely used, and new databases may desire corresponding relationship patterns. Example activity A3 may a) analyze the data entry schema for its elements (e.g., fields and field constraints) and their configurations, and b) iterate through the elements to generate corresponding portions of DDL statement(s) defining corresponding relationship schema elements. Example A3 may generate a new relationship schema or insert/replace elements in an existing relationship schema. Example activity A3 may be a) performed by database server 110, which database server 110 may or may not execute DDL statements to actually create table(s) and columns, b) performed by a software tool that generates DDL statements and sends them to database server 110 for execution, or c) performed by an offline software tool that generates DDL statements even if database server 110 is not present.

3.0 Database overview

Embodiments of the present invention are used in the context of a database management system (DBMS). Thus, a description of an example DBMS is provided.

In general, a server, such as a database server, is a combination of integrated software components and allocations of computing resources, such as memory, nodes, and processes on the nodes for executing the integrated software components, where the combination of software and computing resources is dedicated to providing a particular type of functionality on behalf of clients of the server. The database server controls and facilitates access to a particular database, processing requests from clients to access the database.

A user interacts with the database server of the DBMS by submitting commands to the database server that cause the database server to perform operations on data stored in the database. The user may be one or more applications running on a client computer that interacts with a database server. Multiple users may also be referred to herein collectively as users.

The database includes data and a database dictionary stored on a persistent memory mechanism such as a set of hard disks. The database is defined by its own separate database dictionary. The database dictionary includes metadata defining database objects contained in the database. In practice, a database dictionary defines a large portion of the contents of a database. The database object includes a table, a table column, and a table space. A tablespace is a set of one or more files used to store data for various types of database objects, such as tables. If the data of a database object is stored in a tablespace, then the database dictionary maps the database object to one or more tablespaces that hold the data of the database object.

The DBMS references the database dictionary to determine how to execute the database commands submitted to the DBMS. Database commands may access database objects defined by a dictionary.

The database command may be in the form of a database statement. In order for a database server to process database statements, the database statements must conform to the database language supported by the database server. One non-limiting example of a Database language supported by many Database servers is SQL, including proprietary forms of SQL (e.g., oracle Database 11 g) supported by Database servers such as Oracle. SQL data definition language ("DDL") instructions are issued to database servers to create or configure database objects, such as tables, views, or complex types. Data manipulation language ("DML") instructions are issued to the DBMS to manage data stored within a database structure. For example, SELECT, INSERT, UPDATE and DELETE are common examples of DML instructions found in some SQL implementations. SQL/XML is a common extension of SQL that is enabled when manipulating XML data in an object-relational database.

A multi-node database management system consists of interconnected nodes that share access to the same database. Typically, nodes are interconnected via a network and share access to shared storage to varying degrees, such as shared access to a set of disk drives and data blocks stored thereon. The nodes in the multi-node database system may be in the form of a set of computers (e.g., workstations, personal computers) interconnected via a network. Alternatively, the nodes may be nodes of a grid consisting of nodes in the form of server blades interconnected with other server blades on the rack.

Each node in a multi-node database system hosts a database server. A server (such as a database server) is a combination of an integrated software component and an allocation of computing resources (such as memory, nodes, and processes on nodes for executing the integrated software component on a processor), a combination of software and computing resources dedicated to performing a particular function on behalf of one or more clients.

Resources from multiple nodes in a multi-node database system may be allocated to run software for a particular database server. Each combination of allocation of resources in the software and nodes is a server referred to herein as a "server instance" or "instance. The database server may include multiple database instances, some or all of which run on separate computers (including separate server blades).

3.1 Query processing

A query is an expression, command, or set of commands that, when executed, causes a server to perform one or more operations on a data set. The query may specify the source data object(s) from which the result set(s) are to be determined, such as table(s), column(s), view(s), or snapshot(s). For example, the source data object(s) may appear in a FROM clause of a structured query language ("SQL") query. SQL is a well-known example language for querying database objects. As used herein, the term "query" is used to refer to any form of representation of a query, including queries in the form of database statements and any data structures for internal query representations. The term "table" refers to any source object that is referenced or defined by a query and that represents a set of rows, such as a database table, a view, or an inline query block (such as an inline view or sub-query).

The query may perform operations on data from the source data object(s) row by row as the object(s) are loaded, or on the entire source data object(s) after the object(s) have been loaded. The result set generated by some operations may make other operation(s) available, and in this way, the result set may be filtered out or narrowed down based on certain criteria, and/or coupled or combined with other result set(s) and/or other source data object(s).

A sub-query is a portion or component of a query that is different from, and may be evaluated separately from, the other portion(s) or component(s) of the query (i.e., as a separate query). Other portion(s) or component(s) of the query may form an external query, which may or may not include other sub-queries. Sub-queries nested within an external query may be evaluated separately one or more times while calculating results for the external query.

Generally, a query parser receives a query statement and generates an internal query representation of the query statement. In general, an internal query representation is a set of interconnected data structures that represent the various components and structures of a query statement.

The internal query representation may be in the form of a node graph, with each interconnected data structure corresponding to a node and a component of the represented query statement. The internal representation is typically generated in memory for evaluation, manipulation, and transformation.

Hardware overview

According to one embodiment, the techniques described herein are implemented by one or more special purpose computing devices. The special purpose computing device may be hardwired to perform the techniques, or may include a digital electronic device, such as one or more Application Specific Integrated Circuits (ASICs) or Field Programmable Gate Arrays (FPGAs), that are permanently programmed to perform the techniques, or may include one or more general purpose hardware processors programmed to perform the techniques in accordance with program instructions in firmware, memory, other storage, or a combination. Such special purpose computing devices may also incorporate custom hard-wired logic, ASICs, or FPGAs in combination with custom programming to implement the techniques. The special purpose computing device may be a desktop computer system, portable computer system, handheld device, networking device, or any other device that incorporates hardwired and/or program logic to implement these techniques.

For example, FIG. 3 is a block diagram that illustrates a computer system 300 upon which an embodiment of the invention may be implemented. Computer system 300 includes a bus 302 or other communication mechanism for communicating information, and a hardware processor 304 coupled with bus 302 for processing information. The hardware processor 304 may be, for example, a general purpose microprocessor.

Computer system 300 also includes a main memory 306, such as a Random Access Memory (RAM) or other dynamic storage device, coupled to bus 302 for storing information and instructions to be executed by processor 304. Main memory 306 also may be used for storing temporary variables or other intermediate information during execution of instructions to be executed by processor 304. These instructions, when stored in a non-transitory storage medium accessible to processor 304, cause computer system 300 to be a special purpose machine that is customized to perform the operations specified in the instructions.

Computer system 300 also includes a Read Only Memory (ROM) 308 or other static storage device coupled to bus 302 for storing static information and instructions for processor 304. A storage device 310, such as a magnetic disk, optical disk, or solid state drive, is provided and storage device 310 is coupled to bus 302 for storing information and instructions.

Computer system 300 may be coupled via bus 302 to a display 312, such as a Cathode Ray Tube (CRT), for displaying information to a computer user. An input device 314, including alphanumeric and other keys, is coupled to bus 302 for communicating information and command selections to processor 304. Another type of user input device is cursor control 316, such as a mouse, a trackball, or cursor direction keys for communicating direction information and command selections to processor 304 and for controlling cursor movement on display 312. Such input devices typically have two degrees of freedom in two axes, a first axis (e.g., x-axis) and a second axis (e.g., y-axis), which allows the device to specify positions in a plane.

Computer system 300 may implement the techniques described herein using custom hardwired logic, one or more ASICs or FPGAs, firmware, and/or program logic in conjunction with a computer system to make computer system 300a or to program computer system 300 into a special purpose machine. According to one embodiment, computer system 300 performs the techniques herein in response to processor 304 executing one or more sequences of one or more instructions contained in main memory 306. Such instructions may be read into main memory 306 from another storage medium, such as storage device 310. Execution of the sequences of instructions contained in main memory 306 causes processor 304 to perform the process steps described herein. In alternative embodiments, hard-wired circuitry may be used in place of or in combination with software instructions.

The term "storage medium" as used herein refers to any non-transitory medium that stores data and/or instructions that cause a machine to operate in a specific manner. Such storage media may include non-volatile media and/or volatile media. Non-volatile media includes, for example, optical disks, magnetic disks, or solid state drives, such as storage device 310. Volatile media includes dynamic memory, such as main memory 306. Common forms of storage media include, for example, a floppy disk, a flexible disk, hard disk, solid state drive, magnetic tape, or any other magnetic data storage medium, a CD-ROM, any other optical data storage medium, any physical medium with patterns of holes, a RAM, a PROM, and EPROM, a FLASH-EPROM, NVRAM, any other memory chip or cartridge.

Storage media are different from, but may be used in conjunction with, transmission media. Transmission media participate in transferring information between storage media. For example, transmission media includes coaxial cables, copper wire and fiber optics, including the wires that comprise bus 302. Transmission media can also take the form of acoustic or light waves, such as those generated during radio-wave and infra-red data communications.

Various forms of media may be involved in carrying one or more sequences of one or more instructions to processor 304 for execution. For example, the instructions may initially be carried on a magnetic disk or solid state drive of a remote computer. The remote computer can load the instructions into its dynamic memory and send the instructions over a telephone line using a modem. A modem local to computer system 300 can receive the data on the telephone line and use an infrared transmitter to convert the data to an infrared signal. An infrared detector can receive the data carried in the infrared signal and appropriate circuitry can place the data on bus 302. Bus 302 carries the data to main memory 306, from which main memory 306 processor 304 retrieves and executes the instructions. The instructions received by main memory 306 may optionally be stored on storage device 310 either before or after execution by processor 304.

Computer system 300 also includes a communication interface 318 coupled to bus 302. Communication interface 318 provides a two-way data communication coupling to a network link 320 that is connected to a local network 322. For example, communication interface 318 may be an Integrated Services Digital Network (ISDN) card, a cable modem, a satellite modem, or a modem to provide a data communication connection to a corresponding type of telephone line. As another example, communication interface 318 may be a Local Area Network (LAN) card to provide a data communication connection to a compatible LAN. Wireless links may also be implemented. In any such implementation, communication interface 318 sends and receives electrical, electromagnetic or optical signals that carry digital data streams representing various types of information.

Network link 320 typically provides data communication through one or more networks to other data devices. For example, network link 320 may provide a connection through local network 322 to a host computer 324 or to data equipment operated by an Internet Service Provider (ISP) 326. ISP 326 in turn provides data communication services through the world wide packet data communication network now commonly referred to as the "Internet" 328. Local network 322 and internet 328 both use electrical, electromagnetic or optical signals that carry digital data streams. The signals through the various networks and the signals on network link 320 and through communication interface 318, which carry the digital data to computer system 300 and from computer system 300, are exemplary forms of transmission media.

Computer system 300 can send messages and receive data, including program code, through the network(s), network link 320 and communication interface 318. In the Internet example, a server 330 might transmit a requested code for an application program through Internet 328, ISP 326, local network 322 and communication interface 318.

The received code may be executed by processor 304 as it is received, and/or stored in storage device 310, or other non-volatile storage for later execution.

Software overview

Fig. 4 is a block diagram of a basic software system 400 that may be used to control the operation of computing system 300. The software system 400 and its components (including their connections, relationships, and functions) are merely exemplary and are not meant to limit implementations of example embodiment(s). Other software systems suitable for implementing the example embodiment(s) may have different components, including components having different connections, relationships, and functions.

Software system 400 is provided for directing the operation of computing system 300. Software system 400, which may be stored on system memory (RAM) 306 and fixed storage (e.g., hard disk or flash memory) 310, includes a kernel or Operating System (OS) 410.

OS 410 manages low-level aspects of computer operation, including managing execution of processes, memory allocation, file input and output (I/O), and device I/O. One or more applications, represented as 402A, 402B, 402 c..402N, may be "loaded" (e.g., transferred from fixed storage 310 into memory 306) for execution by system 400. Applications or other software intended for use on computer system 300 may also be stored as a downloadable set of computer-executable instructions, for example, for downloading and installation from an internet location (e.g., a Web server, app store, or other online service).

Software system 400 includes a Graphical User Interface (GUI) 415 for receiving user commands and data in a graphical (e.g., a "click" or "touch gesture") manner. In turn, these inputs may be operated by system 400 in accordance with instructions from operating system 410 and/or application(s) 402. GUI 415 is also used to display the results of the operations from OS 410 and application(s) 402 so that the user can provide additional input or terminate a session (e.g., log off).

OS 410 may execute directly on bare hardware 420 (e.g., processor(s) 304) of computer system 300. Alternatively, a hypervisor or Virtual Machine Monitor (VMM) 430 may be interposed between bare hardware 420 and OS 410. In this configuration, VMM 430 acts as a software "buffer" or virtualization layer between OS 410 and bare hardware 420 of computer system 300.

VMM 430 instantiates and runs one or more virtual machine instances ("guest machines"). Each guest machine includes a "guest" operating system (such as OS 410) and one or more applications (such as application(s) 402) designed to execute on the guest operating system. VMM 430 presents a virtual operating platform to the guest operating system and manages execution of the guest operating system.

In some cases, VMM 430 may allow the guest operating system to run as if it were running directly on bare hardware 420 of computer system 400. In these examples, the same version of guest operating system configured to execute directly on bare hardware 420 may also execute on VMM 430 without modification or reconfiguration. In other words, VMM 430 may provide complete hardware and CPU virtualization to guest operating systems in some cases.

In other cases, guest operating systems may be specifically designed or configured to execute on VMM 430 to improve efficiency. In these instances, the guest operating system "realizes" that it is executing on the virtual machine monitor. In other words, VMM 430 may provide paravirtualization to guest operating systems in some cases.

The computer system process includes an allocation of hardware processor time, as well as an allocation of memory (physical and/or virtual), the allocation of memory to store instructions for execution by the hardware processor, to store data generated by execution of instructions by the hardware processor, and/or to store hardware processor state (e.g., contents of registers) between the allocation of hardware processor time when the computer system process is not running. The computer system processes run under the control of an operating system and may run under the control of other programs executing on the computer system.

Cloud computing

The term "cloud computing" is generally used herein to describe a computing model that enables on-demand access to a shared pool of computing resources, such as computer networks, servers, software applications, and services, and allows for rapid provisioning and release of resources with minimal administrative effort or service provider interaction.

Cloud computing environments (sometimes referred to as cloud environments or clouds) may be implemented in a variety of different ways to best suit different requirements. For example, in a public cloud environment, the underlying computing infrastructure is owned by an organization that makes its cloud services available to other organizations or the public. Instead, private cloud environments are generally intended for use by or within a single organization only. The community cloud is intended to be shared by several organizations within the community, while the hybrid cloud includes two or more types of cloud (e.g., private, community, or public) that are bound together by data and application portability.

In general, cloud computing models enable some of those responsibilities that may have been previously provided by an organization's own information technology department to be delivered instead as a service layer within the cloud environment for use by consumers (either inside or outside the organization, depending on the public/private nature of the cloud). The exact definition of components or features provided by or within each cloud service layer may vary depending on the particular implementation, but common examples include software as a service (SaaS) where a consumer uses software applications running on the cloud infrastructure while the SaaS provider manages or controls the underlying cloud infrastructure and applications. Platform as a service (PaaS), where consumers can use software programming languages and development tools supported by the PaaS's provider to develop, deploy, and otherwise control their own applications while the PaaS provider manages or controls other aspects of the cloud environment (i.e., everything in the runtime execution environment). Infrastructure as a service (IaaS), wherein a consumer can deploy and run any software application, and/or provide processes, storage, networking, and other underlying computing resources, while the IaaS provider manages or controls the underlying physical cloud infrastructure (i.e., everything below the operating system layer). A database as a service (DBaaS) in which consumers use a database management system or database server running on a cloud infrastructure while DbaaS providers manage or control underlying cloud infrastructure and applications.

The above-described basic computer hardware and software, and cloud computing environments are presented to illustrate the basic underlying computer components that may be used to implement the example embodiment(s). Example embodiment(s) are not necessarily limited to any particular computing environment or computing device configuration. Alternatively, in light of the present disclosure, example embodiment(s) may be implemented in any type of system architecture or processing environment that one of ordinary skill in the art would understand to be able to support the features and functions of example embodiment(s) presented herein.

In the foregoing specification, examples of the application have been described with reference to numerous specific details that may vary from implementation to implementation. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense. The sole and exclusive indicator of the scope of the application, and is intended by the applicants to be the scope of the application, the literal and equivalent scope of the set of claims that issue from this application, in the specific form in which such claims issue, including any subsequent correction.

Claims

1. A computer-implemented method comprising:

Generate a non-relational data entry schema from a database's relational schema;

wherein the data entry schema encodes one or more validation rules in a client-independent format, the one or more validation rules being applied to data to be stored at a particular location in the relational schema;

Send data entry mode to database client;

receiving, from the database client, entered data that the database client has verified as conforming to the data entry schema based on the data entry schema; and

The entered data that conforms to the data entry mode is stored in the database.

2 . The method of claim 1 , wherein the one or more validation rules comprise check constraints specified in a relational schema of a database.

3. The method of claim 2, wherein the check constraint comprises at least one item selected from the group consisting of:

Compound expressions,

expressions based on multiple columns in the database, and

An indication of whether check constraints should be used to generate the data entry schema.

4. The method of claim 1, wherein the data entry mode comprises at least one item selected from the group consisting of:

version identifier,

Regular expressions,

the limit of the count of object attributes,

The limit of the count of array elements,

limit on the count of array elements matching the criteria,

An indication that the elements of an array should be distinct,

an indication of whether the limit is inclusive or exclusive,

Specifications for time zone data entry,

JavaScript Object Notation (JSON), and

JSON that complies with the Internet Engineering Task Force (IETF) JSON Schema standard.

5. The method of claim 1, further comprising adding a new data entry format to the database without restarting a server of the database, wherein the generating the data entry mode is based on the new data entry format.

6. The method according to claim 1 further comprises the server of the database detecting whether the entered data conforms to the data entry mode.

7. The method of claim 6, wherein:

The entered data conforms to the relational schema of the database;

The detecting whether the entered data conforms to the data entry mode includes detecting that the entered data does not conform to the data entry mode.

8. The method of claim 1, wherein the sending data entry mode comprises using at least one selected from the group consisting of Representation State (REST) and Hypertext Transfer Protocol (HTTP).

9. The method of claim 1, wherein:

The sending data entry mode includes sending a semi-structured document from a server of a database;

Extensible Markup Language (XML) is not included in at least one item selected from the group consisting of a relational schema of a database and a data entry schema.

10. The method of claim 1, further comprising generating a definition of a relational table from a data entry schema in a non-relational schema.

11. A computer-implemented method comprising:

Generate a non-relational data entry model from the relational model of the database, where:

The generating of the data entry mode is not performed by the database server, and

The data entry schema encodes one or more validation rules in a client-independent format, the one or more validation rules being applied to data to be stored at a particular location in the relational schema;

12. The method of claim 11, wherein said generating a data entry pattern comprises checking at least one item selected from the group consisting of:

the database's relational schema in the database dictionary, and

A set of one or more data definition language (DDL) statements.

13. The method of claim 11, wherein the database does not exist during the generating data entry mode.

14. The method of claim 11, wherein the database client executes the generating data entry mode.

15. One or more non-transitory computer readable media storing instructions which, when executed by one or more processors, cause the steps of any of claims 1-14 to be performed.