[go: up one dir, main page]

WO1989003567A1 - A relational database using identifiers - Google Patents

A relational database using identifiers Download PDF

Info

Publication number
WO1989003567A1
WO1989003567A1 PCT/US1988/003477 US8803477W WO8903567A1 WO 1989003567 A1 WO1989003567 A1 WO 1989003567A1 US 8803477 W US8803477 W US 8803477W WO 8903567 A1 WO8903567 A1 WO 8903567A1
Authority
WO
WIPO (PCT)
Prior art keywords
row
relation
values
block
column
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/US1988/003477
Other languages
French (fr)
Inventor
Douglas Wyche Caldwell
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nucleus International Corp
Original Assignee
Nucleus International Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nucleus International Corp filed Critical Nucleus International Corp
Publication of WO1989003567A1 publication Critical patent/WO1989003567A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2365Ensuring data consistency and integrity
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases

Definitions

  • the present application relates generally to a computerized database system and more particularly to a method and apparatus for utilizing identifiers in a relational database for preserving referential integrity.
  • null values The reason for permitting null values is that some relationships may exist, but the relevant data is unavailable or unknown.
  • a primary key serves to uniquely identify a particular "relational entity” within a set of such entities.
  • a “relational entity” is defined as all or part of a row of a table (i.e., relation).
  • the two referential integrity rules may be stated as:
  • No relational entity may include either: a. a potentially ambiguous reference to another relational entity, or b. a reference to a non-existent or unknown relational entity.
  • a database containing bank account information in one table and transaction information in another table may not be able to correlate all transactions (in which the account numbers of the debtor and of the creditor are foreign keys) with the corresponding information for individual accounts (in which the account number is a primary key which uniquely identifies the account) if an individual account number is directed to the wrong account.
  • the database system will not be able to generate accurate account statements for all the individuals who participated in the various transactions and will most likely have some transactions for which either the necessary debit or credit account information is unavailable.
  • Referential integrity can be maintained in a conventional relational database if, every time data is entered into the database, there is a check performed by the system to determine that:
  • the new- key value duplicates an existing key value; and (3) whenever a value is updated or deleted which includes a primary key, the old key value corresponds to that of any existing value of any associated foreign key and, if so, updating all foreign keys corresponding to the changed key or setting to "null" (no value) all foreign keys corresponding to the deleted key.
  • one aspect of the preferred embodiment of "the present invention includes an apparatus and method, utilizing a computer for representing a relational database.
  • the relational database contains a plurality of relations, each containing one or more columns and rows.
  • a column has one or more values which all have a common characteristic.
  • Each value in the column corresponds to one of the rows of the relation.
  • Each row contains one or more values, a d each value is from a different column.
  • the present invention overcomes the shortcomings in the prior art by replacing each occurrence of the same value (or a related set of values) in a column with an "identifier" or “token” (a symbol that represents and is unique to a particular value by reference to the value in an underlying value set) .
  • the values for all columns which share a common characteristic are contained in one value set.
  • Each identifier may then be set equal to the ordinal position of each value in the value set.
  • the individual rows of a relation may also be individually identified by an identifier, so that one or more related values may be represented with a single "identifier" which references corresponding values of another row in the same or a different relation.
  • the present invention greatly simplifies the database design, data entry, and data query processes.
  • constructing a database in accordance with the present invention there is no need for any a priori definition of whic columns are to serve as primary keys; moreover, since an identifier provides an explicit reference to the intended relational entity, there is no need to be concerned whether a change to any existing values associated with a particular entity will cause a reference to become ambiguous, even if the change results in a primary key to cease to uniquely identify that particular relational entity.
  • identifiers in place of values and groups of related values in rows, it is possible to implement a relational database which maintains referential integrity.
  • the system can automatically check whether a newly added entity is an exact duplicate of an existing entity, without requiring any further checking of any added or changed key data or any comparisons with other associated key data in the same or a different table in the database.
  • FIG. 1 depicts three related relations or tables of a relational database as perceived by the user;
  • FIG. 2 depicts the values in six domains of the database;
  • FIG. 3 depicts a way the data depicted in FIG. 1 may be stored after processing the data according to the aspect of the present invention
  • FIG. 4 depicts certain changes to the tables of FIG. 1 as perceived by the user
  • FIG. 5 depicts the effects of the changes to FIG. 4 on the identifiers of FIG. 3;
  • FIG. 6 (comprising FIGS. 6(a), 6(b) and 6(c), corresponding respectively to FIGS. 3(a), 3(b) and 3(c)) depicts an improved representation of the data of FIG. 3, whereby referential integrity of the data is preserved;
  • FIG. 7 depicts a computer system equipped with a Relational Database Management System (RDMS) for creating a relational database with identifiers in accordance with the present invention
  • RDMS Relational Database Management System
  • FIG. 8 is a flow diagram of the BUILD DATABASE routine
  • FIG. 9 is a flow diagram of the CREATE DATABASE routine
  • FIG. 10 is a flow diagram of the LOAD DATABASE routine.
  • FIG. 11 (comprising FIGS. 11(a), 11(b), 11(c) and 11(d)) is a results table depicting the construction of the Persons relation with identifiers as shown in FIG. 6(a).
  • FIG. 12 (comprising FIGS. 12(a) and 12(b)) is a results table depicting the construction of the Employee relation with identifiers as shown in FIG. 6(b).
  • FIG. 13 is a results table depicting the construction of the Manager relation with identifiers as shown in FIG. 6(c).
  • FIGS. 1(a), 1(b) and 1(c) it may be seen that a typical relational database comprises several relations, each of which being representable as a table 12, 14, 16.
  • PERSON table 12 it may be seen that the table contains a number of characteristics (such as names and dates) .
  • FIG. 1 shows the data as it might be logically stored in a prior art relational database system. More specifically, each row 18(a), 18(b), . . .
  • each column 20(a), 20(b), . . . 20(i) corresponds to a particular characteristic.
  • FIGS. 1(a), 1(b) and 1(c) a description of the relational database containing the PERSON table, EMPLOYEE table and MANAGER table, is discussed.
  • the PERSON table at row 18(b) the entry for Mary Smith having birth date 3/14/39 is indicated at 9.
  • Mary Smith is also shown in the MOTHER columns of the PERSON table at row 18(i), as indicated at 11.
  • the dotted line 19 depicts the relationship of Mary Smith at row 18(b) with Mary Smith at row 18(i).
  • Mary Smith is married to Sam Jones, born 9/9/30 and is the mother to John Jones, born 3/5/62.
  • a different Mary Smith is listed.
  • Mary Smith at row 18(k) has a birth date of 5/8/35 which distinguishes this Mary Smith from the Mary Smith at row 18(b) .
  • Mary Smith at row 18(k) in the PERSON table is also the same Mary Smith in the EMPLOYEE table (FIG. 1(b)) at row 62(c).
  • Mary Smith is an employee of a particular company in the security department and makes $43,000.00 per year.
  • the relational database is set up so that each table, FIGS. 1(a), 1(b) and 1(c), are interdependent.
  • the EMPLOYEE table, FIG. 1(b) is directly dependent upon the PERSON table, FIG. 1(a), for its information.
  • the MANAGER table is directly dependent upon the EMPLOYEE table of FIG. 1(b).
  • Mary Smith of row 41(a) of the MANAGER table can only relate back to Mary Smith at row 42(c) of the EMPLOYEE table.
  • Mary Smith of the EMPLOYEE table relates back to Mary Smith in the PERSON table.
  • Other tables can be formed, such as the softball team or volleyball team, which may be dependent upon the PERSON table for its entries.
  • the relational database administrator sets up the database so that information from one table can be unambiguously referenced in another table.
  • PERSON-LAST 20(a), FATHER-LAST 20(f) and MOTHER-LAST 20(g) are all surnames.
  • the set of all possible values that a column or a number of related columns can contain is called a domain. It is often advantageous to consider all last names as coming from a single, separately defined domain.
  • the values which occur in the relational database are a subset of the domain of values, and the subset is referred to as a value set.
  • At least those values which share such a common characteristic and which are already in existence are arbitrarily ordered (e.g. , in the order they were originally entered into the system) to form an ordered value set.
  • the ordered set of Last Names 28 (FIG. 2) (which is associated with columns 20(a), 20(d), 20(g), 38(a) and 40(a) of FIG. 1(a)) is depicted as ⁇ being associated with respective unique identifiers 30 (FIG. 2) .
  • a non-null identifier to each unique (possibly null) value in a particular domain may be used by a relation having a column defined on that domain. Note that Jones 26 of the Last Name domain 28 has the identifier "1" associated with it.
  • each value in the Last Name column in the database of FIG. 1 is replaced by a unique identifier in FIG. 3.
  • unique identifier For example, for each "JONES" 26 there is associated the unique identifier "1", as indicated at 32 (FIG. 3(a)).
  • a similar procedure has been used to form the First Name value set 34 and its associated set of identifiers 30 illustrated in FIG. 2 , which replaces the value in the First Name column in the database of FIG. 1 by the unique identifiers as shown in FIG. 3.
  • a more detailed discussion on the process for building the relational database with identifiers is given below in section G.
  • the value sets 28 and 34 are respectively the sets of all values of last and first names actually used by all the related columns (e.g., columns 20(a), 20(d), 20(g), 38(a) and 40(a) of FIGS. 1 (a), 1(b) and 1(c) are associated with the ordered set of Last Names 28) , they must constantly change to reflect the current contents of those columns.
  • One straightforward implementation is to keep on adding values without deleting values no longer in use.
  • the value set could be allowed to shrink as well as grow, in which case it is advantageous to introduce a second type of identifier (not shown in FIGS.) to track the ordering of identifiers, since it would be very time consuming to renumber all the identifiers in the entire system whenever a particular value ceases to be in use.
  • variable length alphanumeric entries may now be represented by fixed length integers (the only requirement is that the system be able to accommodate integer values as large as the maximum number of unique values in any one value set) .
  • the present invention readily solves the foregoing problem of referential integrity by extending the use of identifiers so that each relational entity of a relation may reference another relational entity (or row) of the same or a different relation.
  • FIG. 6, comprising FIGS. 6(a) through 6(c), corresponds to FIGS. 1, 3, .4 and 5, but is modified to include row (relational entity) identifiers, wherein three related columns (e.g., Father's First and Last Names and birth date) have been replaced with a single column "FATHER" 68 whose entries reference the relational entities of the PERSON relation 12"' containing the Father column data.
  • the ordinal row numbers serve as the identifiers.
  • FIGS. 1(a), 1(b), 1(c) and 6(a), 6(b) and 6(c) a second version of the relational database using identifiers is shown.
  • the reference Mary Smith having a birth of 3/14/39 is shown linked to Mary Smith having a birth of 3/14/39 by dotted line 19.
  • Mary Smith is John- Jones' mother at row 18(i) of the PERSON table.
  • the PERSON table, as represented in FIG. 6(a) is also shown as linking Mary Smith (02 02 02) to Mary Smith (02) in row 18(i) of the PERSON relation 12"' by the dotted line 19.
  • a standard programming language such as SQL, may be used to define the structure of a relational database.
  • SQL a standard programming language
  • a JOIN is a SELECT over the Cartesian product of at least two relations of the relational database.
  • the JOIN operation is used to combine relations based on equivalent values in columns having the same characteristic.
  • JOIN involves combining each of N rows of the first relation (e.g., the PERSON relation, FIG. 1(a)), with each of M rows of a second relation (e.g., the EMPLOYEE relation (FIG. 1(b)), to form an N x M resultant relation.
  • the JOIN operation describes all resultant rows of a JOIN relation which do not have a match in the JOIN column.
  • This type of JOIN relation is generally referred to as the "EQUIJOIN" operation.
  • the JOIN operation can be performed more efficiently as a SELECT operation.
  • a JOIN of the PERSON table and EMPLOYEE table where PERSON-NAME is equal to EMPLOYEE-NAME is performed by selecting all identifiers in the EMPLOYEE table in the Employee Name column which correspond to row identifiers in the PERSON table.
  • identifiers 9, 10, 11 and 12 correspond to the rows having identifiers 9, 10, 11 and 12 of the PERSON table.
  • the SELECT operation over the EMPLOYEE table of FIG. 6(b) will automatically join the equivalent entries of the PERSON table of FIG. 6(a). This is a direct result of separating the functions of binding and information representation by the use of identifiers. Because the binding is already explicit, less work needs to be performed.
  • FIG. 7 a computer system having a programmable computer and computer programs for creating a relational database with identifiers is shown.
  • the system includes program computer 103, display 101, keyboard 107 for the computer and external device 109 for storage of data.
  • Hardware and software for creating a relation having identifiers is stored in the Relational Database Management System (RDMS) 105 (shown in phantom lines) , which is connected within the computer 103.
  • the RDMS 105 coordinates the various activities related to adding and implementing identifiers to a relational database.
  • RDMS Relational Database Management System
  • External device 109 is a permanent or buffered storage, typically in the form of a hard disk, for storing relations which are stored in expanded form where each value is no smaller than a byte.
  • the contents of the external device are typically maintained in records which are divided into fields where the Nth field of each record corresponds to a specific type of data.
  • the contents of the storage device 109 are loaded to a relational processor unit (RPU) in the RDMS 105.
  • RPU relational processor unit
  • the operation of the bus system and the operation of the components of the RDMS 105 are controlled by routines implemented by the RPU. When the routines are loaded into the RPU, coordination and processing of the various components of the RDMS 105 is established.
  • the BUILD process for building a relational database having identifiers is shown. More particularly, the BUILD DATABASE routine (151, FIG. 8) coordinates the CREATE phase (FIG. 9) and the LOAD phase (FIG. 10) to be discussed. During block 153 of the BUILD DATABASE routine (FIG. 8) , the CREATE DATABASE routine (FIG. 9) is called for building a "framework" for the relational database. During this routine, the framework for maintaining the relational database having identifiers is created. During block 154, the first relation to be inserted into the relational framework is selected.
  • a row of the relation from the load file stored in external device 109 is brought over to the RPU of the RDMS 105.
  • the INSERT ROW routine (FIG. 10) is called to " coordinate the process of assigning, identifiers to the incoming row.
  • the RPU determines whether there are any more rows of the selected relation left in the load file. Assuming that there are more rows remaining in the relation in the load file, processing continues at blocks 155, 152 and 161 until all of the rows of the particular relation are inserted. Assuming that all of the rows of the relation in the load file have been processed and inserted into the database, block 165 is processed.
  • FIG. 9 is a flow diagram of the CREATE DATABASE routine for building a "framework" for the individual domains of ordered value sets and the frameworks for the relations specified by the user to be included in the relational database.
  • the relational database as depicted in FIG. 9 is a flow diagram of the CREATE DATABASE routine for building a "framework" for the individual domains of ordered value sets and the frameworks for the relations specified by the user to be included in the relational database.
  • FIG. 1 and the domains as shown in FIG. 2 are an example of a relational database whose framework may be created by the CREATE DATABASE routine (FIG. 9) .
  • a domain framework is constructed.
  • the framework includes user supplied specifications concerning the characteristics of the domain such as the name, data type, value, size and/or other user specified restrictions on the values it may contain.
  • the RPU determines whether there are other domains which need frameworks to be built. Assuming that there are other domain frameworks to be constructed, processing continues at blocks 102 and 104 until all of the domain frameworks have been built. When all of the domain frameworks have been built, processing continues at block 106.
  • the framework for a relation is built.
  • a framework for a relational column is constructed.
  • a check is made on whether the particular column whose framework was just built draws its values from a domain or another relation, as specified by the administrator or user. Assuming that the column draws it values from a domain, processing proceeds at block 116 during which a reference to the associated domain is included in the framework for the column and processing continues at block 120.
  • processing continues at block 118.
  • a reference to the particular relation is included in the column framework to indicate which relation the column draws its values from. The processing continues at block 120.
  • the RPU determines whether there are any other columns which need the framework to be built for the current relation. Assuming that there are still more columns which need a framework to be constructed, processing continues at blocks 108, 110, 116, 118 and 120 until all columns of the relation have frameworks constructed. Assuming that all of the column frameworks for a particular relation are built, processing continues at block 124. During block 124, the RPU determines whether there are other relations which require frameworks to be constructed. Assuming that there are other relations which require frameworks to be constructed, processing continues at blocks 106, 108, 110, 116, 118, 120 and 124, until all of the relations have the required frameworks constructed.
  • processing continues at block 127.
  • processing returns to the calling program.
  • This procedure of creating the framework for a database is the same as described in the above- identified application entitled "A RELATIONAL DATABASE REPRESENTATION WITH RELATIONAL DATABASE OPERATION CAPABILITY.”
  • FIG. 10 a flow diagram for the INSERT ROW routine is shown.
  • the purpose of the INSERT ROW routine is for converting each row of a load file associated with a relational database into one or more assigned identifiers depending on whether the row is "atomic" or "non-atomic.” If the row is "atomic,” it is treated as an entity having an associated identifier assigned. If the row is "non-atomic,” it must be treated as two or more entities each having an associated identifier assigned.
  • a "framework" for the relational database has been constructed by the CREATE DATABASE routine (FIG. 9) and that the identifiers will have a place to be stored within the framework.
  • the first of the rows from the load file is examined to determine if it is "atomic," i.e., whether more than one identifier is required to represent the row
  • the RPU parses the row into one or more smaller entities. Each smaller entity corresponds to a column of the relation. Processing continues at block 264
  • the RPU determines if the entity is referenced in a domain or a relation. Assuming that the entity is a domain reference, processing continues at block 256. During block 256, the RPU determines if the entity is already in a domain. Assuming that the entity is currently in a domain, processing continues at block 267. However, if the entity is not in a domain, then processing continues at block 258. During block 258, the RPU first inserts the selected entity ' (value) into the domain, and then processing continues at block 267.
  • a number for the entity is returned as an identifier. All previous values stored in the domain have identifiers associated with them. When a new entity is added into the domain, the next available identifier is provided. Typically, the entities are added to the domain in an order by entry method and the identifiers are determined by the order of entry. In this way, the returned identifiers will not change previous identifiers associated with values in the domain. Processing continues at block 262 during which the returned identifier is placed in the appropriate row of the relation.
  • processing goes directly to block 270.
  • the RPU selects an identifier or identifiers by finding a matching entity or entities from the column's domain.
  • the RPU determines whether more than one identifier was returned. If more than one identifier is returned, the processing continues at block 274, during which a null value is inserted in the relational row. The null value is a flag to the user that there is not exactly one referenceable value and thus, a connection cannot be established for that identifier. Processing continues at block 280.
  • processing continues at block 276.
  • the RPU places the returned identifier in the appropriate relational row of the relation.
  • processing continues at block 280.
  • the RPU determines whether there are any more entities left for processing. Assuming that there are still more entities left for processing, then processing continues at block 278 to obtain the next entity from the load file. Processing continues at blocks 268, 256, 258, 267, 262, 270, 272, 274, 276, 280 and 278 until all of the entities for the particular relation are processed. If there are no more entities for processing, then processing continues at block 284.
  • the RPU returns an identifier for the row in which an identifier was placed during blocks 262 and/or 276. During block 285, this row number identifier is also placed in the relational row, completing the INSERT ROW routine. Processing continues at block 286, during which the RPU returns the processing to the BUILD DATABASE routine, block 161 (FIG. 8) .
  • the purpose of this example is to take the prior art relational database as depicted in FIGS. 1(a), 1(b) and 1(c) and convert it into the relational database represented by identifiers as shown in FIGS. 6(a), 6(b) and 6(c).
  • the relational database as shown in FIGS. 1(a), 1(b) and 1(c) is stored in the external device 109 (FIG. 7) .
  • the relational database data is stored in a load file in the order as shown in FIG. 1.
  • the order for loading the relational database into the RDMS 105 (FIG. 7) is determined by the relational database administrator.
  • FIG. 8 is- the overall block diagram which represents control over building the relational " database with identifiers.
  • the CREATE DATABASE routine (FIG. 9) is called.
  • the CREATE DATABASE routine builds the framework for the domains, relations and columns of the relational database shown in FIG. 6.
  • the first step of the CREATE DATABASE routine occurs during block 102, during which the framework for the first domain of the database is created. Following the example, the framework for the Last Name domain is generated first. Also established is the characteristic of the domain including its name, data type, value size, etc.
  • RPU determines if there are other domains which need a framework created. Processing continues at block 102 and the First Name domain framework is generated. Blocks 102 and 104 are repeatedly processed until all of the frameworks for the various domains of the relational database are generated. More particularly, the Salary domain, the Department domain, and the Title domain, have frameworks built.
  • the framework for the first relation of the relational database is constructed.
  • an empty framework for the PERSON table is constructed.
  • the framework for the first relational column of the PERSON table is built, more particularly, the Last Name column has its framework constructed.
  • the RPU includes a reference to the LAST NAME domain in the LAST name framework.
  • the framework for the FIRST name column of the Persons relation is constructed. Processing continues at block 111, during which the RPU determines if the First name column has values which are directly drawn from a domain. The First name column of the PERSON table draws its values directly from the FIRST NAME domain, and thus processing continues at block 116. During block 116, the RPU places a reference in the First name column to the domain for first names. Processing continues at block 120 during which the RPU determines whether there are other columns which need frameworks constructed for the PERSON table. There are still columns which need construction, thus processing continues at block 108.
  • the framework for the Birth date column of the PERSON table is constructed. Processing continues at block 110, during which the RPU determines if ' the Birth column draws its values from a domain. birth column draws its values from the birth domain and thus, processing continues at block 116. During block 116, the RPU places a reference to the Birth domain in the Birth column. Processing continues at block 120, during which the RPU determines that there is still another column left which needs to have its framework constructed. Processing continues at block 108.
  • the framework for the Employee relation is constructed.
  • Processing continues at block 108, during which the framework for the Employee column is built.
  • the Employee column refers back to the PERSON table for the employee's First Name, Last Name and birth Date.
  • the RPU determines that the Employee column draws its values from a relation.
  • processing continues at block 118.
  • RPU includes a reference in the Employee column to the PERSON table from which the column draws its values.
  • processing continues at block 120, during which the RPU determines that there are still more columns in 1 the relation which require construction. Processing continues at block 108.
  • the framework for the Salary column is built. Note that the PERSON table does not have a 5 Salary column.
  • the RPU determines that the Salary column draws its values directly from the Salary domain. Thus, processing continues at block 116. During block 116, the RPU places a reference to the Salary domain in the Salary column. Processing continues at
  • the framework for the Department column is constructed.
  • the RPU 15 determines that the Department column draws its values from the Department domain.
  • the RPU includes a reference to the Department domain .in the Department column. Processing continues at block 120, during which the RPU determines that there are no more
  • Block 124 is entered and the RPU determines that there are still other relations to be built for the relational database.
  • the MANAGER table relation is constructed. Processing continues at blocks 106, 108,
  • the RPU at block 124 determines that there are no more relations to be constructed for this relational database.
  • processing returns to the calling program or the BUILD DATABASE routine (FIG. 8) at block 154.
  • the RPU selects the Persons relation to be loaded into the RDMS 105. Recall that the Persons relation was strategically positioned to be input into the RDMS first. In this manner, identifiers can later be referenced. Processing continues at block 155 during which the RPU obtains the first row from the load file containing the Persons relation. Once again, the Persons relation depicted in FIG. 1(a) represents the load file and the order in which the rows are to be loaded into the RDMS. Then, during block 152, the INSERT ROW routine (FIG. 10) is called to convert the row from the load file into the identifier-represented row, as depicted at 18(a) " of FIG. 6(a) . Referring to FIG.
  • the RPU determines if the row or entity currently loaded is atomic, i.e., can be identified as a single entity.
  • the row loaded is row 18(a) of FIG. 1(a); Jones, Sam, 9/9/30. This row is non- atomic and thus, during block 264, it is parsed into three smaller entities which are atomic.
  • the first atomic entity is for Jones, and during block 266 it is obtained.
  • the RPU determines if Jones is referenced by any domain. Currently, all of the domains are empty. However, the CREATE DATABASE routine (FIG. 9) has constructed their frameworks.
  • the RPU determines that the value for Jones is not located in the Last Name domain. Thus, processing continues at block 258, during which the value for Jones is inserted into the Last Name domain as shown in the RESULTS table (FIG. 11(b)) at line 412.
  • the RESULTS table depicted in FIG. 11 is divided into two portions.
  • the left hand side of the table depicting the construction of the Persons relation represented in the form of identifiers is broken up into six columns: Identifier (ID#) , Person's Last Name, Person's First Name, birth Date, Father and Mother.
  • Identifier (ID#) the domains related to the PERSON table are shown, in four columns: Identifier (ID#) , Last Name, First Name, and birth Date.
  • the Identifier "01" is shown associated with Jones at 412 in FIG. 11(a). This Identifier is returned to the Persons relation and at block 262 it is placed in the first row of the Last Name column of the Persons relation (352, FIG. 11(a)).
  • the RPU determines whether Sam references a domain or a relation.
  • the value Sam is in a domain and thus, during block 256 the RPU determines whether the value Sam is already contained in the First
  • the RPU determines that the date, 9/9/30, references a domain, namely, the birth domain. Processing continues at block 256, during which it is determined that the date 9/9/30 is not contained in the birth domain. During block 258, the date 9/9/30 is inserted into the Birth domain, as shown at 420 of FIG. 11(a) . This Birth date is entered in the first row of the Birth domain and thus, is associated with the identifier "01". During block 267, the identifier is returned, and during block 262, the identifier "01" is placed in the first row of the Birth column of the PERSON table (360, FIG. 11(a)).
  • the RPU determines that there are no more entities left for processing. Processing continues at blocks 284 and 285 during which the identifier associated with the inserted row, for "Jones, Sam, 9/9/30" is returned and placed in the Persons relation table (362, FIG. 11(a)). The newly inserted row for "Jones, Sam, 9/9/30” is the first inserted row of the Persons relation and, thus, the identifier for the row is "01" (362, FIG. 11(a)). Processing continues at block 286, in which processing returns to the BUILD DATABASE routine (FIG. 8) at block 161. During block 161 (FIG.
  • the RPU determines whether there are any more rows in the Persons relation left for processing. There are still currently more rows in the Persons relation as shown in FIG. 1 and, thus, processing continues at block 155.
  • the next row from the load file (FIG. 1(a)) is obtained and then during block 152 the INSERT ROW routine (FIG. 10) is called again. The above process is repeated for Mary Smith, and
  • the RPU determines if there are any more rows in the Persons relation which need to be processed. There are many more rows remaining in the load file for the PERSON table and, thus, processing continues at blocks 155 and 152.
  • the next five input rows of the Persons relation (18(d), 18(e), 18(f), 18(g), 18(h) of FIG. 1(a)) follow the strategy as discussed earlier for the first three input rows (18(a), 18(b), 18(c) of FIG. 1(a)) which was discussed above. Referring to entries 392 and 454 of FIG. 11(b), the. Persons relation having its identifiers in the various domains are shown after rows 18(a) through 18(h) are loaded into the RDMS.
  • rows 18(e) , 18(f), 18(g), 18(h) include at least one entity that is already in a domain. Accordingly, the identifiers in the corresponding rows in the PERSON table, as indicated at 292, are not all the same value.
  • row 18(i) of FIG. 1(a) The next row of the PERSON table to be loaded is row 18(i) of FIG. 1(a).
  • row 18(i) "Jones, John, 3/5/62; Jones, Sam, 9/9/30; Smith, Mary, 9/14/39" is obtained from the load file.
  • Processing continues at block 152, during which the INSERT ROW routine (FIG. 10) is again called.
  • the RPU determines if the newly obtained row is atomic.
  • the row is not atomic and, thus, during block 264, the row is parsed into five atomic entities, mainly, Jones, John, 3/5/62; Jones, Sam, 9/9/30; and Smith, Mary, 3/14/39. Jones, Sam, 9/9/30 and Smith, Mary, 3/14/39 are parsed as two atomic entities because only two columns are specified namely, Father and Mother of the PERSONS table, and, by definition are entities, already occurring in prior rows of the table.
  • the first entity, "Jones” is obtained during block 266. Then, during block 268, the RPU determines that "Jones" references a domain.
  • processing continues at block 267, during which the identifier "01" associated with Jones is returned and during block 262 the identifier is placed into the ninth row of the Last Name column of the PERSON table (394, FIG. 11(b)).
  • processing continues at block 280, during which the RPU determines whether there are any more entities left for processing. The next entity, "John” is obtained during block 278 and processing continues at block 268.
  • RPU determines that "John” references the First Name domain.
  • 256 it is determined that "John” is not in the First Name domain.
  • "John” is inserted into the next row of the First Name domain which is row 8, FIG. 11(c).
  • the identifier "08" is returned and during block 262, the identifier is placed into the ninth row of the First Name column of the PERSON table (398, FIG. 11(c)).
  • Processing continues at block 28.0, during which the RPU determines if there are any more entities left for processing.
  • date "3/5/62" is obtained and processing continues at block 268.
  • processing continues at block 268. 5 During block 268, the RPU determines whether the entity, "Jones, Sam, 9/9/30" references a domain or a relation. This entity references an entry in a prior row of the PERSON table, namely, Sam Jones at row 18(a) of FIG. 1(a). Thus, processing continues at block 270, 0 during which the identifier associated with the entity "Jones, Sam, 9/9/30" is obtained. The identifier associated with that entity is "01", as indicated in the ID# column. See 362 of FIG. 11(a).
  • the RPU determines that there is exactly one identifier 5 returned and, thus, processing continues at block 276 which places the identifier into the Father column of the PERSON table (404, FIG. 11(d)). Processing continues at block 280, during which the RPU determines that there is still one more entit " left in the input row, namely, 0 "Smith, Mary, 3/14/39.” This entity is obtained at block 278. Processing continues at block 268.
  • Processing continues at block 280 during which the RPU determines that there are no more entities for processing 5 and thus processing continues at blocks 284 and 285.
  • the row number associated with the newly inserted row is placed into the Identifier column of the Persons relation.
  • the newly inserted row is the ninth row of the PERSON table and, thus, the identifier "09" is added to the Persons relation (408, FIG. 11(d)).
  • Processing returns to the calling program, BUILD DATABASE routine (FIG. 8) , at block 161.
  • the RPU determines whether there are any more relations to load. There are still two more relations left for loading and they are the Employee relation and the Manager relation. Processing continues at block 154, during which the Employee relation is selected as the next relation for loading. Then, in block 155, the first row of the EMPLOYEE table as shown in FIG. 1(b) is obtained. The first row of the EMPLOYEE table (FIG. 1(b)) is "Jones, John, 3/5/62, $31,000.00, Marketing.” Then, in block 152, the INSERT ROW routine (FIG. 10) is called to process this row of the load file, as indicated at 474-484 and 514-522 of FIG. 12.
  • the INSERT ROW routine is repeated for the three additional employees of the Employee Table, as indicated at 486-512 and 524-550 of FIG. 12.
  • processing continues at block 280, during which RPU determines that there are no more entities left for processing in the last inserted row.
  • the row number identified with the newly inserted row is determined to be "04" and it is placed into the fourth row of the Identifier column of the Employee relation (512, FIG. 12) .
  • processing returns to the BUILD DATABASE routine (FIG. 8) at block 161.
  • the RPU determines that there are no more rows in the Employee relation to be inserted and, thus, processing continues at block 165.
  • the RPU determines that there is still one more relation, namely, the Manager relation, which needs to be loaded.
  • Processing continues at block 154 during which the Manager relation is obtained.
  • the first row of the Manager relation "Smith, Mary, VP, Security” is obtained from the load file.
  • the INSERT ROW routine (FIG. 10) is called.
  • the input row is examined and it is determined that it is not atomic.
  • the RPU determines that there are other entities which need to be processed in the input row and processing continues at block 278, during which the entity "VP" is obtained. Processing returns to block 268 during which it is determined that the entity references a domain. At block 256, it is determined that the entity is not located in the Title domain and, thus, during block 258 it is inserted into the first row of the Title domain (572, FIG. 13) . During block 267, the identifier associated with "VP" is determined to be "01" and during block 262 it is placed into the first row of the Title column in the Manager relation (556, FIG. 13) . Processing continues at block 280, during which the RPU determines that there is one more entity left for processing.
  • the entity "Security” is obtained and processing continues at block 268.
  • block 168 it is determined that the entity references a domain and in block 256 it is determined to be in the second row of the Department domain (574, FIG. 13) .
  • the identifier associated with "Security” is determined to be "02" and during block 262 the identifier is placed in the first row of the Department column of the Manager relation (555, FIG. 13) .
  • Processing continues at block 280 during which the RPU determines that there are no more entities left to process for this particular inserted row.
  • the row number "01" of the newly inserted row is placed in the Identifier (ID#) column of the Manager relation (560, FIG. 13) .
  • processing returns to the BUILD DATABASE routine (FIG. 8) at block 161.
  • the RPU determines that there is still one more row in the Manager relation to be added and, thus, processing continues at block 155 and 152 to insert the row "Doe, John, VP, Marketing" into . its identifier form in the Manager relation.
  • the row is obtained from the load file and in block 152 the INSERT ROW routine (FIG. 10) is called.
  • the row is not atomic.
  • the row is parsed up into the following entities, "Doe, John", “VP”, "Marketing”.
  • the entity "Doe, John” is obtained and during 268 it is determined that it references a relation.
  • the identifier associated with the entity is determined to be "02" and at block 272, it is confirmed that only one identifier is returned.
  • the identifier "02" is placed into the second row of the Manager column of the Manager relation (562, FIG. 13).
  • the RPU determines that there are still entities left for processing and, thus, in block 278, the entity "VP" is obtained.
  • Processing continues at block 268, during which the entity is determined to reference a domain.
  • the entity is determined to be in the first row of the Title domain (580, FIG. 13).
  • the entity "Marketing" at block 278 is obtained and during block 268 it is determined to reference a domain.
  • block 256 it is determined that the entity is in the first row of the Department domain and has an identifier of "01."
  • the identifier is returned. In block 262, it is placed into the second row of the Department column of the Manager relation (566, FIG. 13) .
  • the row number associated with the newly inserted row is determined to be "02." This number is placed into the second row of the Identifier column of the Manager relation (568, FIG. 13).
  • the entire relations shown in FIGS. 1(a), 1(b) and 1(c) have been loaded into the RDMS (105, FIG. 7) and has been converted into a relational database represented by identifiers as shown in FIG. 6 and at 410 of FIG. 11 (PERSON table) , 512 of FIG. 12 (EMPLOYEE table) and 568 of FIG. 13 (MANAGER table) .

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Security & Cryptography (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A relational database management system in which entities stored in the database are replaced in the relations with coded identifiers (270, 285). Problems are encountered in relational database management when changing data which is correct for some entities but incorrect for others. The present invention solves the problem of referential integrity by extending the use of identifiers so that each relational entity of a relation may reference another relational entity (or row) of the same or different relation. A group of related entities in one row of a relation may be replaced by an identifier in all other relations in which the same entities are grouped in a row. The identifiers replace the entities in all relational database operations.

Description

A -RELATIONAL DATABASE USING IDENTIFIERS Background of the invention
A. Field of the Invention
The present application relates generally to a computerized database system and more particularly to a method and apparatus for utilizing identifiers in a relational database for preserving referential integrity.
B. Prior Art
An Introduction to Database Systems by C. J. Date, Published by Addison- esley Publishing Company in 1986 (Vol. 1) and 1983 (Vol. 2), defines "relational database" as a "database that is perceived by users as a collection of tables and nothing but tables." Vol. 1 at p. 96. The relational database system enables the user to generate new tables from old tables and to extract subsets of information from a given table. At the core of every relational database system is a strategy for manipulating data. The emphasis on data leads to a variety of issues which prior art database designers group together under the general heading of normalization and primary key determination.
One of the criteria for a relational database architecture which is lacking from known prior art relational database computer systems is "referential integrity". See generally. Vol. 1, Chap. 12 and Vol. 2 , Chap. 2 of the above-cited treatise. In conventional relational databases, relational referencing is always from a particular value of a foreign key, to the same value of a designated primary key (or alternate key) , associated with that foreign key. In conventional relational database phraseology, referential integrity rules for conventional relational databases may be stated as:
The only valid values for a foreign key are either:
1. a value already used in the associated primary key, or
2. the null value.
The reason for permitting null values is that some relationships may exist, but the relevant data is unavailable or unknown.
In conventional database systems, a primary key serves to uniquely identify a particular "relational entity" within a set of such entities. A "relational entity" is defined as all or part of a row of a table (i.e., relation). The two referential integrity rules may be stated as:
No relational entity may include either: a. a potentially ambiguous reference to another relational entity, or b. a reference to a non-existent or unknown relational entity.
Any violation of these rules will cause the database system to provide incorrect or incomplete information. For example, a database containing bank account information in one table and transaction information in another table may not be able to correlate all transactions (in which the account numbers of the debtor and of the creditor are foreign keys) with the corresponding information for individual accounts (in which the account number is a primary key which uniquely identifies the account) if an individual account number is directed to the wrong account. As a result, the database system will not be able to generate accurate account statements for all the individuals who participated in the various transactions and will most likely have some transactions for which either the necessary debit or credit account information is unavailable.
Referential integrity can be maintained in a conventional relational database if, every time data is entered into the database, there is a check performed by the system to determine that:
(1) whenever a value is entered or updated which includes a foreign key, the new value of the foreign key corresponds to that of an existing value in the associated primary key;
(2) whenever a value is entered or updated which includes a primary key, the new- key value duplicates an existing key value; and (3) whenever a value is updated or deleted which includes a primary key, the old key value corresponds to that of any existing value of any associated foreign key and, if so, updating all foreign keys corresponding to the changed key or setting to "null" (no value) all foreign keys corresponding to the deleted key.
In a database system which stores a vast quantity of data that represents a large and/or complex relationship, it is very inefficient to perform an integrity check on all relevant portions of the database each time a single entry is entered, changed or deleted.. However, without such integrity checks, the database is subject to corruption. As a consequence, most prior art database systems lack a built-in mechanism to maintain referential integrity and rely on the user to institute whatever safeguards are deemed necessary to prevent the database from becoming corrupted.
Smmtiaτv of the Invention
Briefly, one aspect of the preferred embodiment of" the present invention includes an apparatus and method, utilizing a computer for representing a relational database. The relational database contains a plurality of relations, each containing one or more columns and rows. A column has one or more values which all have a common characteristic. Each value in the column corresponds to one of the rows of the relation. Each row contains one or more values, a d each value is from a different column.
The present invention overcomes the shortcomings in the prior art by replacing each occurrence of the same value (or a related set of values) in a column with an "identifier" or "token" (a symbol that represents and is unique to a particular value by reference to the value in an underlying value set) . The values for all columns which share a common characteristic (for example, names of cities) are contained in one value set. Each identifier may then be set equal to the ordinal position of each value in the value set. The individual rows of a relation may also be individually identified by an identifier, so that one or more related values may be represented with a single "identifier" which references corresponding values of another row in the same or a different relation.
The present invention greatly simplifies the database design, data entry, and data query processes. In constructing a database in accordance with the present invention, there is no need for any a priori definition of whic columns are to serve as primary keys; moreover, since an identifier provides an explicit reference to the intended relational entity, there is no need to be concerned whether a change to any existing values associated with a particular entity will cause a reference to become ambiguous, even if the change results in a primary key to cease to uniquely identify that particular relational entity.
In summary, by using identifiers in place of values and groups of related values in rows, it is possible to implement a relational database which maintains referential integrity. The system can automatically check whether a newly added entity is an exact duplicate of an existing entity, without requiring any further checking of any added or changed key data or any comparisons with other associated key data in the same or a different table in the database.
It should be appreciated that although the basic principles underlying the present invention are particularly adaptable to a vector implementation, as discussed in co-pending application entitled, "A RELATIONAL DATABASE REPRESENTATION WITH RELATIONAL DATABASE OPERATION CAPABILITY," to Glaser et al., filed October 9, 1987, the present invention is adaptable to other database implementations and will be of considerable assistance in maintaining the integrity of any database in which it is employed.
Brief Description of Drawings
FIG. 1 (comprising FIGS. 1(a), 1(b) and 1(c)) depicts three related relations or tables of a relational database as perceived by the user; FIG. 2 depicts the values in six domains of the database;
FIG. 3 (comprising FIGS. 3(a), 3(b) and 3(c)) depicts a way the data depicted in FIG. 1 may be stored after processing the data according to the aspect of the present invention;
FIG. 4 (comprising FIGS. 4(a), 4(b) and 4(c), corresponding respectively to FIGS. 1(a), 1(b) and 1(c)) depicts certain changes to the tables of FIG. 1 as perceived by the user; FIG. 5 (comprising FIGS. 5(a), 5(b) and' 5(c), corresponding respectively to FIGS. 1(a), 1(b) and 1(c)) depicts the effects of the changes to FIG. 4 on the identifiers of FIG. 3;
FIG. 6 (comprising FIGS. 6(a), 6(b) and 6(c), corresponding respectively to FIGS. 3(a), 3(b) and 3(c)) depicts an improved representation of the data of FIG. 3, whereby referential integrity of the data is preserved;
FIG. 7 depicts a computer system equipped with a Relational Database Management System (RDMS) for creating a relational database with identifiers in accordance with the present invention;
FIG. 8 is a flow diagram of the BUILD DATABASE routine;
FIG. 9 is a flow diagram of the CREATE DATABASE routine;
FIG. 10 is a flow diagram of the LOAD DATABASE routine.
FIG. 11 (comprising FIGS. 11(a), 11(b), 11(c) and 11(d)) is a results table depicting the construction of the Persons relation with identifiers as shown in FIG. 6(a).
FIG. 12 (comprising FIGS. 12(a) and 12(b)) is a results table depicting the construction of the Employee relation with identifiers as shown in FIG. 6(b).
FIG. 13 is a results table depicting the construction of the Manager relation with identifiers as shown in FIG. 6(c).
TABLE OF CONTENTS
Detailed Description of the Invention
A. Introduction
1. A TYPICAL RELATIONAL DATABASE
2. REPRESENTING THE RELATIONAL DATABASE WITH IDENTIFIERS
B. The Problem - Referential Integrity
C. The Solution
D. Join Operation
E. Hardware Level of the Preferred Embodiment
F. Building a Relational Database with Identifiers
G. Detailed Example for Building a Relational Database Having Identifiers
1. Introduction
2. Build Framework for the Relational Database
3. Insert Identifier into the Relational Database Detailed Description of the Invention A. Introduction
The above-referenced co-pending patent applications disclose a vector-oriented relational database which is believed to offer a significant performance breakthrough for relational database computer systems. To the extent that this technology may be advantageous to the best mode of the present invention, .said applications are hereby incorporated by reference. However, it should be noted that the teachings of the present invention have considerable utility in the context of more conventional computerized relational database systems. Accordingly the present invention will be described for use in relational database systems which do not use vector technology disclosed in the above-referenced co-pending applications.
1. A Typical Relational Database
Referring now to FIGS. 1(a), 1(b) and 1(c), it may be seen that a typical relational database comprises several relations, each of which being representable as a table 12, 14, 16. Referring to PERSON table 12, it may be seen that the table contains a number of characteristics (such as names and dates) . FIG. 1 shows the data as it might be logically stored in a prior art relational database system. More specifically, each row 18(a), 18(b), . . .
18(m) of table 12 corresponds to a particular person
(i.e., to a particular "relational entity") and each column 20(a), 20(b), . . . 20(i) corresponds to a particular characteristic. Referring to FIGS. 1(a), 1(b) and 1(c), a description of the relational database containing the PERSON table, EMPLOYEE table and MANAGER table, is discussed. Referring to the PERSON table at row 18(b) , the entry for Mary Smith having Birth date 3/14/39 is indicated at 9. Note that Mary Smith is also shown in the MOTHER columns of the PERSON table at row 18(i), as indicated at 11. The dotted line 19 depicts the relationship of Mary Smith at row 18(b) with Mary Smith at row 18(i). Mary Smith is married to Sam Jones, born 9/9/30 and is the mother to John Jones, born 3/5/62. At row 18(k) of FIG. 1(a), a different Mary Smith is listed. Mary Smith at row 18(k) has a Birth date of 5/8/35 which distinguishes this Mary Smith from the Mary Smith at row 18(b) . . Note that Mary Smith at row 18(k) in the PERSON table is also the same Mary Smith in the EMPLOYEE table (FIG. 1(b)) at row 62(c). Mary Smith is an employee of a particular company in the security department and makes $43,000.00 per year. Note dotted line 21 which depicts the relationship of Mary Smith at row 18(k) of the PERSON table with Mary Smith at row 62(c) of the EMPLOYEE table. Additionally, note that Mary Smith is listed in the MANAGER table at row 41(a) . Mary Smith in the MANAGER table is listed as being in the security department and also being a vice president of the corporation. Dotted line 23 depicts the relationship of Mary Smith at row 62(c) of EMPLOYEE table with Mary Smith at row 41(a) of the MANAGER table.
The relational database is set up so that each table, FIGS. 1(a), 1(b) and 1(c), are interdependent. For example, the EMPLOYEE table, FIG. 1(b), is directly dependent upon the PERSON table, FIG. 1(a), for its information. Likewise, the MANAGER table is directly dependent upon the EMPLOYEE table of FIG. 1(b). Mary Smith of row 41(a) of the MANAGER table can only relate back to Mary Smith at row 42(c) of the EMPLOYEE table. Likewise, Mary Smith of the EMPLOYEE table relates back to Mary Smith in the PERSON table. Other tables can be formed, such as the softball team or volleyball team, which may be dependent upon the PERSON table for its entries. The relational database administrator (or manager) sets up the database so that information from one table can be unambiguously referenced in another table.
Referring again to the PERSON table of FIG. 1(a), all values of any one column share a common characteristic. For example, PERSON-LAST 20(a), FATHER-LAST 20(f) and MOTHER-LAST 20(g) are all surnames. The set of all possible values that a column or a number of related columns can contain is called a domain. It is often advantageous to consider all last names as coming from a single, separately defined domain. The values which occur in the relational database are a subset of the domain of values, and the subset is referred to as a value set.
2. Representing the Relational Database with Identifiers
In accordance with the present invention, at least those values which share such a common characteristic and which are already in existence (i.e., they are already used in the database in at least one column of at least one relational entity) , are arbitrarily ordered (e.g. , in the order they were originally entered into the system) to form an ordered value set. The ordered set of Last Names 28 (FIG. 2) (which is associated with columns 20(a), 20(d), 20(g), 38(a) and 40(a) of FIG. 1(a)) is depicted as ^ being associated with respective unique identifiers 30 (FIG. 2) . A non-null identifier to each unique (possibly null) value in a particular domain may be used by a relation having a column defined on that domain. Note that Jones 26 of the Last Name domain 28 has the identifier "1" associated with it.
Referring to the example of FIGS. 1 and 3, each value in the Last Name column in the database of FIG. 1 is replaced by a unique identifier in FIG. 3. For example, for each "JONES" 26 there is associated the unique identifier "1", as indicated at 32 (FIG. 3(a)). A similar procedure has been used to form the First Name value set 34 and its associated set of identifiers 30 illustrated in FIG. 2 , which replaces the value in the First Name column in the database of FIG. 1 by the unique identifiers as shown in FIG. 3. A more detailed discussion on the process for building the relational database with identifiers is given below in section G.
Since the value sets 28 and 34 are respectively the sets of all values of last and first names actually used by all the related columns (e.g., columns 20(a), 20(d), 20(g), 38(a) and 40(a) of FIGS. 1 (a), 1(b) and 1(c) are associated with the ordered set of Last Names 28) , they must constantly change to reflect the current contents of those columns. One straightforward implementation is to keep on adding values without deleting values no longer in use. Alternatively, the value set could be allowed to shrink as well as grow, in which case it is advantageous to introduce a second type of identifier (not shown in FIGS.) to track the ordering of identifiers, since it would be very time consuming to renumber all the identifiers in the entire system whenever a particular value ceases to be in use.
Comparing the value-oriented tables 12, 14 and 16 of the relational database of FIG. 1 with the table version using identifiers in 12', 14', and 16' of FIG. 3, by using the above defined identifiers to represent values, storage of the database is simplified and storage requirements are reduced. The ordered value sets (28, 34, 44, 45, 47 and 49 of FIG. 2) and their associated identifiers will not take up much more storage than a conventional index. If the identifier relation is represented with the above- referenced compressed bit vector technology, it may actually require less storage than a conventional relational database with many key fields. On the other hand, the data storage requirements for the actual tables is considerably reduced, since all the variable length alphanumeric entries may now be represented by fixed length integers (the only requirement is that the system be able to accommodate integer values as large as the maximum number of unique values in any one value set) .
Referring to FIGS. 1, 2 and 3, all occurrences of "JONES" (reference numeral 26) in FIG. 1 have been replaced by the single integer "01" (reference numeral 32) in FIG. 3. Since one eight-bit byte is normally used to store only one alphanumeric character, five bytes are required to store "JONES" while only one byte can be used to represent any one of over 200 different identifiers. This potentially represents a storage reduction (at least in this rather simple example) of 80%.
B. The Problem - Referential Integrity
Consider now a simple example for changing data which is correct for some entries, but incorrect for other entries. Note in the PERSON table 12 (FIG. 1(a)), that there are three individuals (18(b), 18(k) and 18(1)) with the first name MARY, two (18(b) and 18(k) ) with the last name SMITH, the other (18(1)) with the last name JONES. Assume that Mary Jones (18(1)) changes her name to Mary Smith also, and the data in the database is updated accordingly. This is shown diagrammatically in FIGS. 4 and 5, which respectively depict the affected tables 12", 14" as the system displays them to the user and as they are stored and manipulated in the system in the identifier form. From FIG. 4(a), it is clear that we still have three individuals, since the two Mary Smiths (18(b) and 18(k) ) have Birth dates of 3/14/39 and 5/8/35, respectively, while Mary Smith (18(1)), the former Mary Jones, has a Birth date of 12/31/43. The change may be implemented with identifiers by replacing "01" (FIG. 5(a)) with "02" (FIG. 5(a)) for those entries associated with the new Mary Smith (18(1)). This has been indicated in FIG. 5 by enclosing the old identifier 32 in brackets and underlining the new identifier 56. Note, however, that it is still necessary to make several changes (for example, in both the PERSON table 12• and the EMPLOYEE table 14' ; see FIG, 3) to reflect only one change of last name. Referring to the modified PERSON table 12" (FIGS. 4(a) and 5(a)) in which the change from JONES to SMITH has been implemented as indicated, it is also clear that Mary Smith (18(k), FIG. 4(a)) who is the mother of John Jones is Mary Smith (18(b)). Because the Birth is considered part of the primary key for the PERSON relation 12", there is a corresponding entry for Mother's Birth date, and as a result, Mary Smith (Mother - Last 20(g) and Mother - First 20(h) names) is still a valid foreign key.
The same situation occurs for the similarly changed EMPLOYEE 14" (employee payroll records typically include more information than just a person's legal name to identify that person) of FIGS. 4(b) and 5(b). Since the changed table also includes a Birth date column 39(c) (FIG. 4(b)) as well as First Name column 30(b) (FIG. 4(b)) and Last Name column 39(a) (FIG. 4(b)), even after the change in name has been implemented, it is clear that Mary Smith 62(d), FIG. 4(b)) corresponds to Mary Smith (18(d), FIG. 4(a)) due to the fact that the Mary Smith's can be differentiated by the Birth date.
However, organizational charts such as MANAGER table (see FIGS. 1(c), 3(c) and 4(c)) usually do not include any more information than the name of the organization 40(d), the title of the position 40(c), and the name 40(a), 40(b) of the individual currently occupying the position. Thus, it is no longer possible to reference Mary Smith 62(c) or 62(d), FIG. 4(c) or 5(c)) in the modified PERSON 12" and EMPLOYEE 14" relations from the MANAGER 16" relation (FIG. 4(c) or 5(c)). C. The Solution
The present invention readily solves the foregoing problem of referential integrity by extending the use of identifiers so that each relational entity of a relation may reference another relational entity (or row) of the same or a different relation.
FIG. 6, comprising FIGS. 6(a) through 6(c), corresponds to FIGS. 1, 3, .4 and 5, but is modified to include row (relational entity) identifiers, wherein three related columns (e.g., Father's First and Last Names and Birth date) have been replaced with a single column "FATHER" 68 whose entries reference the relational entities of the PERSON relation 12"' containing the Father column data. The ordinal row numbers serve as the identifiers.
Referring to FIGS. 1(a), 1(b), 1(c) and 6(a), 6(b) and 6(c), a second version of the relational database using identifiers is shown. In FIG. 1(a) , the reference Mary Smith having a Birth of 3/14/39 is shown linked to Mary Smith having a Birth of 3/14/39 by dotted line 19. In other words, Mary Smith is John- Jones' mother at row 18(i) of the PERSON table. Likewise, the PERSON table, as represented in FIG. 6(a), is also shown as linking Mary Smith (02 02 02) to Mary Smith (02) in row 18(i) of the PERSON relation 12"' by the dotted line 19. Note that the entire entity for Mary Smith at row 18(b) of the PERSON relation is represented in the MOTHER column 69 of the PERSON relation by the identifier 02. Referring to FIGS. 1(a) and 1(b), note that Mary Smith having Birth date 5/8/35, indicated at 15, is linked to Mary Smith having Birth date 5/8/35 of the EMPLOYEE table, as indicated at 13, by the dotted line 21. Likewise, referring to FIGS. 6(a) and 6(b), the identifier for Mary Smith having Birth date 5/8/35 (02 02 10) is 11. Mary Smith is represented in the EMPLOYEE table by the identifier 11 at row 62(c) . The two entries shown are linked by dotted line 21. Referring to FIGS. 1(b) and 1(c), note that Mary Smith of the EMPLOYEE table is also represented in the MANAGER table at row 41(a), as indicated at 17. These entries are shown linked by dotted line 23. More particularly, the identifier 03 which corresponds to the third entity of the EMPLOYEE table represents the Mary Smith entry in the MANAGER table (FIG. 6(c)). Note that the base identifiers in FIG. 6(a) and the columns Last, First and Birth remain the same as the identifiers in FIG. 3(a). These identifiers refer back to the value sets in FIG. 2 (28, 34 and 44) . Likewise, the identifiers in columns Salary and Department of FIG. 6(b) and the identifiers in the columns Title and Department of FIG. 6(c) refer to the value sets at 45, 47 and 49 Of FIG. 2.
A standard programming language, such as SQL, may be used to define the structure of a relational database. Referring to a specific example, it is convenient to define the "FATHER" column (FIG. 1(a)) as specifically referencing the LAST and FIRST columns, 20(a) and 20(b), by means of the following set of programming language statements:
1 CREATE DOMAIN LNAME CHAR
2 CREATE DOMAIN FNAME CHAR
10 CREATE TABLE PERSON
11 (
12 LAST LNAME,
13 FIRST FNAME,
• • • 20 FATHER PERSON,
30 )
40 CREATE ALIAS FLNAME FOR PERSON.FATHER.LAST
41 CREATE ALIAS FFNAME FOR PERSON.FATHER.FIRST The foregoing statements specify that the values of the first and last names are from separate alphanumeric domains (lines 1 and 2, above), that the PERSON table 12' includes the columns FIRST 20b, LAST 20a and FATHER (lines 10 through 20, above) ; that the column FIRST and LAST have values from the domains FNAME and LNAME respectively (lines 12 and 13, above); and that the column FATHER has as its domain (line 20, above) in the PERSON table 12'. Thus, the Father column is actually a recursive reference to entities in the same table and the father•s first (FFNAME) and last (FLNAME) names are the respective values for FIRST (20e) and LAST (20d) columns in the referenced relational entity.
By using row identifiers which reference other relational entities (rows) , the referential integrity problem has been solved, since the identifier is a direct reference to a relational entity and only indirectly references some or all of the columns of that relational entity. Furthermore, it is impossible for the user to directly affect the association or "binding" between different relations in the database. In the past, the functions of information presentation and binding were provided by the same data. Therefore, changing the data to change one of these dual functions could affect the other function. By using identifiers, the binding function is separated form the information representation function, ensuring referential integrity even though the information is later changed. Accordingly, changes to individual columns cannot affect the integrity of any previously existing reference. Moreover, in accordance with the present invention, significant performance improvements can be realized by using identifiers as references to other entities which, in many situations, obviates the need for the JOIN operation. D. Join Operation
The ability to "JOIN" two or more relations is considered to be the most powerful feature of a relational system, Date, Vol. 1, 4th Ed. (1986) . Essentially, a JOIN is a SELECT over the Cartesian product of at least two relations of the relational database. The JOIN operation is used to combine relations based on equivalent values in columns having the same characteristic. Typically, JOIN involves combining each of N rows of the first relation (e.g., the PERSON relation, FIG. 1(a)), with each of M rows of a second relation (e.g., the EMPLOYEE relation (FIG. 1(b)), to form an N x M resultant relation. Then, the JOIN operation describes all resultant rows of a JOIN relation which do not have a match in the JOIN column. This type of JOIN relation is generally referred to as the "EQUIJOIN" operation. By using identifiers as references to other entities, the JOIN operation can be performed more efficiently as a SELECT operation.
For example, a JOIN of the PERSON table and EMPLOYEE table where PERSON-NAME is equal to EMPLOYEE-NAME is performed by selecting all identifiers in the EMPLOYEE table in the Employee Name column which correspond to row identifiers in the PERSON table. Referring- to the EMPLOYEE column of the EMPLOYEE table of FIG. 6(b), identifiers 9, 10, 11 and 12 correspond to the rows having identifiers 9, 10, 11 and 12 of the PERSON table. In this way, the SELECT operation over the EMPLOYEE table of FIG. 6(b) will automatically join the equivalent entries of the PERSON table of FIG. 6(a). This is a direct result of separating the functions of binding and information representation by the use of identifiers. Because the binding is already explicit, less work needs to be performed. E. Hardware Level of the Preferred Embodiment
Referring to FIG. 7, a computer system having a programmable computer and computer programs for creating a relational database with identifiers is shown. The system includes program computer 103, display 101, keyboard 107 for the computer and external device 109 for storage of data. Hardware and software for creating a relation having identifiers is stored in the Relational Database Management System (RDMS) 105 (shown in phantom lines) , which is connected within the computer 103. The RDMS 105 coordinates the various activities related to adding and implementing identifiers to a relational database.
A detailed discussion of the specific components of the RDMS 105 is now presented. External device 109 is a permanent or buffered storage, typically in the form of a hard disk, for storing relations which are stored in expanded form where each value is no smaller than a byte. The contents of the external device are typically maintained in records which are divided into fields where the Nth field of each record corresponds to a specific type of data. The contents of the storage device 109 are loaded to a relational processor unit (RPU) in the RDMS 105. The operation of the bus system and the operation of the components of the RDMS 105 are controlled by routines implemented by the RPU. When the routines are loaded into the RPU, coordination and processing of the various components of the RDMS 105 is established. F. Building a Relational Database with Identifiers
Referring to FIG. 8, the BUILD process for building a relational database having identifiers is shown. More particularly, the BUILD DATABASE routine (151, FIG. 8) coordinates the CREATE phase (FIG. 9) and the LOAD phase (FIG. 10) to be discussed. During block 153 of the BUILD DATABASE routine (FIG. 8) , the CREATE DATABASE routine (FIG. 9) is called for building a "framework" for the relational database. During this routine, the framework for maintaining the relational database having identifiers is created. During block 154, the first relation to be inserted into the relational framework is selected. During block 155, a row of the relation from the load file stored in external device 109 is brought over to the RPU of the RDMS 105. Next, during block 152, the INSERT ROW routine (FIG. 10) is called to" coordinate the process of assigning, identifiers to the incoming row. Then, during block 161, the RPU determines whether there are any more rows of the selected relation left in the load file. Assuming that there are more rows remaining in the relation in the load file, processing continues at blocks 155, 152 and 161 until all of the rows of the particular relation are inserted. Assuming that all of the rows of the relation in the load file have been processed and inserted into the database, block 165 is processed. During block 165, the RPU determines whether there are any more relations left in the load file for processing. Assuming that there are more relations in the load file left remaining for processing, blocks 154, 155, 152, 161 and 165 are performed until all of the relations of the relational database still in the load file are inserted into the RDMS 105. After the relational database has been processed and identifiers have been assigned to all values and rows of the relational database, processing returns at block 167 to the calling program. FIG. 9 is a flow diagram of the CREATE DATABASE routine for building a "framework" for the individual domains of ordered value sets and the frameworks for the relations specified by the user to be included in the relational database. The relational database as depicted in FIG. 1 and the domains as shown in FIG. 2 are an example of a relational database whose framework may be created by the CREATE DATABASE routine (FIG. 9) . Referring now to block 102 of the CREATE DATABASE routine (FIG. 9) , a domain framework is constructed. The framework includes user supplied specifications concerning the characteristics of the domain such as the name, data type, value, size and/or other user specified restrictions on the values it may contain. During block 104, the RPU determines whether there are other domains which need frameworks to be built. Assuming that there are other domain frameworks to be constructed, processing continues at blocks 102 and 104 until all of the domain frameworks have been built. When all of the domain frameworks have been built, processing continues at block 106.
During block 106, the framework for a relation is built. During block 108, a framework for a relational column is constructed. During block 110, a check is made on whether the particular column whose framework was just built draws its values from a domain or another relation, as specified by the administrator or user. Assuming that the column draws it values from a domain, processing proceeds at block 116 during which a reference to the associated domain is included in the framework for the column and processing continues at block 120. Returning to block 110, if the column does not draw its values from a domain but instead from a relation, processing continues at block 118. During block 118, a reference to the particular relation is included in the column framework to indicate which relation the column draws its values from. The processing continues at block 120. During block 120, the RPU determines whether there are any other columns which need the framework to be built for the current relation. Assuming that there are still more columns which need a framework to be constructed, processing continues at blocks 108, 110, 116, 118 and 120 until all columns of the relation have frameworks constructed. Assuming that all of the column frameworks for a particular relation are built, processing continues at block 124. During block 124, the RPU determines whether there are other relations which require frameworks to be constructed. Assuming that there are other relations which require frameworks to be constructed, processing continues at blocks 106, 108, 110, 116, 118, 120 and 124, until all of the relations have the required frameworks constructed. Returning to block 124, if there are no more required frameworks to be constructed for the relations of the database, processing continues at block 127. During block 127, processing returns to the calling program. This procedure of creating the framework for a database is the same as described in the above- identified application entitled "A RELATIONAL DATABASE REPRESENTATION WITH RELATIONAL DATABASE OPERATION CAPABILITY." Referring to FIG. 10, a flow diagram for the INSERT ROW routine is shown. The purpose of the INSERT ROW routine is for converting each row of a load file associated with a relational database into one or more assigned identifiers depending on whether the row is "atomic" or "non-atomic." If the row is "atomic," it is treated as an entity having an associated identifier assigned. If the row is "non-atomic," it must be treated as two or more entities each having an associated identifier assigned. During the processing of the INSERT ROW routine (FIG. 10) , it is assumed that a "framework" for the relational database has been constructed by the CREATE DATABASE routine (FIG. 9) and that the identifiers will have a place to be stored within the framework.
Referring to block 254 of the INSERT ROW routine (FIG. 10) , the first of the rows from the load file is examined to determine if it is "atomic," i.e., whether more than one identifier is required to represent the row
(See Date, Vol. 1, p. 243).. Assuming that the row is atomic, processing continues at block 268. However, if the row is non-atomic, then processing continues at block
264. During block 264, the RPU parses the row into one or more smaller entities. Each smaller entity corresponds to a column of the relation. Processing continues at block
266, during which an entity from the parsed entity is selected.
Referring to block 268, the RPU determines if the entity is referenced in a domain or a relation. Assuming that the entity is a domain reference, processing continues at block 256. During block 256, the RPU determines if the entity is already in a domain. Assuming that the entity is currently in a domain, processing continues at block 267. However, if the entity is not in a domain, then processing continues at block 258. During block 258, the RPU first inserts the selected entity ' (value) into the domain, and then processing continues at block 267.
During block 267, a number for the entity is returned as an identifier. All previous values stored in the domain have identifiers associated with them. When a new entity is added into the domain, the next available identifier is provided. Typically, the entities are added to the domain in an order by entry method and the identifiers are determined by the order of entry. In this way, the returned identifiers will not change previous identifiers associated with values in the domain. Processing continues at block 262 during which the returned identifier is placed in the appropriate row of the relation.
Returning to block 268, if the RPU determines that the entity is referenced in the a relation rather than a domain, processing goes directly to block 270. During block 270, the RPU selects an identifier or identifiers by finding a matching entity or entities from the column's domain. During block 272, the RPU determines whether more than one identifier was returned. If more than one identifier is returned, the processing continues at block 274, during which a null value is inserted in the relational row. The null value is a flag to the user that there is not exactly one referenceable value and thus, a connection cannot be established for that identifier. Processing continues at block 280.
Returning to block 272, if exactly one identifier is returned, then processing continues at block 276. During block 276, the RPU places the returned identifier in the appropriate relational row of the relation. Processing continues at block 280. During block 280, the RPU determines whether there are any more entities left for processing. Assuming that there are still more entities left for processing, then processing continues at block 278 to obtain the next entity from the load file. Processing continues at blocks 268, 256, 258, 267, 262, 270, 272, 274, 276, 280 and 278 until all of the entities for the particular relation are processed. If there are no more entities for processing, then processing continues at block 284. During block 284, the RPU returns an identifier for the row in which an identifier was placed during blocks 262 and/or 276. During block 285, this row number identifier is also placed in the relational row, completing the INSERT ROW routine. Processing continues at block 286, during which the RPU returns the processing to the BUILD DATABASE routine, block 161 (FIG. 8) .
G. Detailed Example for Building a Relational Database having Identifiers 1. Introduction
The purpose of this example is to take the prior art relational database as depicted in FIGS. 1(a), 1(b) and 1(c) and convert it into the relational database represented by identifiers as shown in FIGS. 6(a), 6(b) and 6(c). The relational database as shown in FIGS. 1(a), 1(b) and 1(c) is stored in the external device 109 (FIG. 7) . The relational database data is stored in a load file in the order as shown in FIG. 1. The order for loading the relational database into the RDMS 105 (FIG. 7) is determined by the relational database administrator. Note, it is important that the relational database administrator order the entry of the data so that the identifiers generated by RDMS 105 are not ambiguously defined. Stated differently, data must be loaded into the referenced columns before the data is loaded into the referencing columns. For example, Sam Jones at row 18(a) of the PERSON table must be loaded first into the database because it is later referenced by the Father column at row 18(i) (FIG. 1(a)). Likewise, Mary Smith at row 18(b) of the PERSON table is later referenced in the same table in the Mother column at row 18(i) . Thus, the entry of Mary Smith must be entered prior to the entry of Mary Smith in the Mother column. Although this process is arbitrary if the database is in its value form, the identifiers for Sam Jones and Mary Smith cannot be used before first entering them into the RDMS 105.
Additionally, note that the EMPLOYEE table (FIG.
1(b)) and the MANAGER table (FIG. 1(c)) are both dependent on the PERSON table (FIG. 1(a)). Thus, both tables must be loaded into the RDMS 105 after the PERSON table is loaded into the system. In this way, identifiers generated for the PERSON table may be later used to represent employee names in the EMPLOYEE table (FIG. 1(b)) and the identifiers generated in the EMPLOYEE table may be used as manager names in the MANAGER table (FIG. 1(c)). Referring to FIGS. 8, 9, 10 and 11, the detailed example for generating the relational database as shown in FIG. 1 into the relational database as shown in FIG. 6 is now presented. FIG. 8 is- the overall block diagram which represents control over building the relational" database with identifiers. At block 153, the CREATE DATABASE routine (FIG. 9) is called.
2. Build Framework for the Relational Database
In FIG. 9, the CREATE DATABASE routine builds the framework for the domains, relations and columns of the relational database shown in FIG. 6. The first step of the CREATE DATABASE routine occurs during block 102, during which the framework for the first domain of the database is created. Following the example, the framework for the Last Name domain is generated first. Also established is the characteristic of the domain including its name, data type, value size, etc. During block 104, RPU determines if there are other domains which need a framework created. Processing continues at block 102 and the First Name domain framework is generated. Blocks 102 and 104 are repeatedly processed until all of the frameworks for the various domains of the relational database are generated. More particularly, the Salary domain, the Department domain, and the Title domain, have frameworks built.
Once the RPU determines that all the domains of the relational database have had their frameworks constructed, processing continues at block 106. During block 106, the framework for the first relation of the relational database is constructed. Thus, an empty framework for the PERSON table is constructed. During block 108, the framework for the first relational column of the PERSON table is built, more particularly, the Last Name column has its framework constructed. Processing continues at block 110, during which the RPU determines whether the column constructed draws its values from a domain or a relation. Because the LAST NAME column draws its values directly from the LAST NAME domain and not from another relation, processing continues at block 116. During block 116, the RPU includes a reference to the LAST NAME domain in the LAST name framework. Processing continues at block 120, during which the RPU determines whether there are any more columns in the relation which need a framework to be constructed. There are more columns associated with the PERSON table relation, and thus processing repeats at block 108.
During block 108, the framework for the FIRST name column of the Persons relation is constructed. Processing continues at block 111, during which the RPU determines if the First name column has values which are directly drawn from a domain. The First name column of the PERSON table draws its values directly from the FIRST NAME domain, and thus processing continues at block 116. During block 116, the RPU places a reference in the First name column to the domain for first names. Processing continues at block 120 during which the RPU determines whether there are other columns which need frameworks constructed for the PERSON table. There are still columns which need construction, thus processing continues at block 108.
During block 108, the framework for the Birth date column of the PERSON table is constructed. Processing continues at block 110, during which the RPU determines if ' the Birth column draws its values from a domain. Birth column draws its values from the Birth domain and thus, processing continues at block 116. During block 116, the RPU places a reference to the Birth domain in the Birth column. Processing continues at block 120, during which the RPU determines that there is still another column left which needs to have its framework constructed. Processing continues at block 108.
The process of constructing the framework for the columns which are related to the PERSON table (blocks 108, 110, 116 and 120) continues until all of the frameworks for the columns are constructed. More particularly, the FATHER column, and the MOTHER column all have their frameworks constructed. As shown by FIG. 6(a), a single column for FATHER and single column for MOTHER are the only columns required since first name, last name, and birth date information are already stored in other rows of the Persons relation. Assuming that all of the columns for the PERSON table have had their frameworks built, processing continues at block 124. During block 124, the RPU determines if there are any other relations which need construction. Since there are two more relations, the EMPLOYEE table and the MANAGER table, which need to be built, processing returns to block 106.
During block 106, the framework for the Employee relation is constructed. Processing continues at block 108, during which the framework for the Employee column is built. Note that in FIG. 6(b) the Employee column refers back to the PERSON table for the employee's First Name, Last Name and Birth Date. During block 110, the RPU determines that the Employee column draws its values from a relation. Thus, processing continues at block 118. During block 118, RPU includes a reference in the Employee column to the PERSON table from which the column draws its values. Processing continues at block 120, during which the RPU determines that there are still more columns in 1 the relation which require construction. Processing continues at block 108.
During block 108, the framework for the Salary column is built. Note that the PERSON table does not have a 5 Salary column. During block 110, the RPU determines that the Salary column draws its values directly from the Salary domain. Thus, processing continues at block 116. During block 116, the RPU places a reference to the Salary domain in the Salary column. Processing continues at
10 block 120, during which RPU determines that there is still another column which needs its framework constructed. Processing continues at block 108.
During block 108, the framework for the Department column is constructed. During block 110, the RPU
15 determines that the Department column draws its values from the Department domain. In block 116, the RPU includes a reference to the Department domain .in the Department column. Processing continues at block 120, during which the RPU determines that there are no more
20 columns in the Employee relation which need to be constructed. Block 124 is entered and the RPU determines that there are still other relations to be built for the relational database. The MANAGER table relation is constructed. Processing continues at blocks 106, 108,
25 110, 116, 118, 120 and 124 until the frameworks for the MANAGER table and its related columns are constructed. Assuming that the MANAGER table is constructed, the RPU at block 124 determines that there are no more relations to be constructed for this relational database. During block
30 127, processing returns to the calling program or the BUILD DATABASE routine (FIG. 8) at block 154.
3.5 3. Insert Identifier into the Relational Database
During block 154, the RPU selects the Persons relation to be loaded into the RDMS 105. Recall that the Persons relation was strategically positioned to be input into the RDMS first. In this manner, identifiers can later be referenced. Processing continues at block 155 during which the RPU obtains the first row from the load file containing the Persons relation. Once again, the Persons relation depicted in FIG. 1(a) represents the load file and the order in which the rows are to be loaded into the RDMS. Then, during block 152, the INSERT ROW routine (FIG. 10) is called to convert the row from the load file into the identifier-represented row, as depicted at 18(a) " of FIG. 6(a) . Referring to FIG. 10 at block 254, the RPU determines if the row or entity currently loaded is atomic, i.e., can be identified as a single entity. The row loaded is row 18(a) of FIG. 1(a); Jones, Sam, 9/9/30. This row is non- atomic and thus, during block 264, it is parsed into three smaller entities which are atomic. The first atomic entity is for Jones, and during block 266 it is obtained. Then during block 268, the RPU determines if Jones is referenced by any domain. Currently, all of the domains are empty. However, the CREATE DATABASE routine (FIG. 9) has constructed their frameworks. During block 256, the RPU determines that the value for Jones is not located in the Last Name domain. Thus, processing continues at block 258, during which the value for Jones is inserted into the Last Name domain as shown in the RESULTS table (FIG. 11(b)) at line 412.
The RESULTS table depicted in FIG. 11 is divided into two portions. The first portion, on the left hand side of the table, depicts the construction of the relation represented with identifiers. The second portion, on the right hand side, represents the construction of the domains or value sets used in the relational database. The left hand side of the table depicting the construction of the Persons relation represented in the form of identifiers is broken up into six columns: Identifier (ID#) , Person's Last Name, Person's First Name, Birth Date, Father and Mother. Referring to the right hand side of the RESULTS table (FIG. 11(a)), the domains related to the PERSON table are shown, in four columns: Identifier (ID#) , Last Name, First Name, and Birth Date. Processing continues at block 267, during which the RPU returns an Identifier for the value Jones to represent Jones in the Persons relation. The Identifier "01" is shown associated with Jones at 412 in FIG. 11(a). This Identifier is returned to the Persons relation and at block 262 it is placed in the first row of the Last Name column of the Persons relation (352, FIG. 11(a)). Processing continues at block 280 during which the RPU determines whether there are any more entities which need to be processed. The values Sam and his Birth Date still need to be processed. Thus, during block 278, the next entity, Sam, is obtained. Processing continues at block 268.
During block 268, the RPU determines whether Sam references a domain or a relation. The value Sam is in a domain and thus, during block 256 the RPU determines whether the value Sam is already contained in the First
Name domain. Sam is not contained in the First Name domain and, thus, processing continues at block 258, during which Sam is inserted into the First Name domain (416, FIG. 11(a)). During block 267 the Identifier "01" is returned and during block 262 the Identifier is placed in the first row of the First Name column of the PERSON table (356, FIG. 11(a)). Processing continues at block
280, during which the RPU determines that there is still another entity which needs to be processed, namely, the date 9/9/30. The date is obtained at block 278 and processing continues at block 268.
During block 268, the RPU determines that the date, 9/9/30, references a domain, namely, the Birth domain. Processing continues at block 256, during which it is determined that the date 9/9/30 is not contained in the Birth domain. During block 258, the date 9/9/30 is inserted into the Birth domain, as shown at 420 of FIG. 11(a) . This Birth date is entered in the first row of the Birth domain and thus, is associated with the identifier "01". During block 267, the identifier is returned, and during block 262, the identifier "01" is placed in the first row of the Birth column of the PERSON table (360, FIG. 11(a)). Then, during block 280, the RPU determines that there are no more entities left for processing. Processing continues at blocks 284 and 285 during which the identifier associated with the inserted row, for "Jones, Sam, 9/9/30" is returned and placed in the Persons relation table (362, FIG. 11(a)). The newly inserted row for "Jones, Sam, 9/9/30" is the first inserted row of the Persons relation and, thus, the identifier for the row is "01" (362, FIG. 11(a)). Processing continues at block 286, in which processing returns to the BUILD DATABASE routine (FIG. 8) at block 161. During block 161 (FIG. 8) , the RPU determines whether there are any more rows in the Persons relation left for processing. There are still currently more rows in the Persons relation as shown in FIG. 1 and, thus, processing continues at block 155. During block 155, the next row from the load file (FIG. 1(a)) is obtained and then during block 152 the INSERT ROW routine (FIG. 10) is called again. The above process is repeated for Mary Smith, and
Jane Doe, as indicated at 364-390 and 426-452 of FIG. 11.
During block 161, the RPU determines if there are any more rows in the Persons relation which need to be processed. There are many more rows remaining in the load file for the PERSON table and, thus, processing continues at blocks 155 and 152. The next five input rows of the Persons relation (18(d), 18(e), 18(f), 18(g), 18(h) of FIG. 1(a)) follow the strategy as discussed earlier for the first three input rows (18(a), 18(b), 18(c) of FIG. 1(a)) which was discussed above. Referring to entries 392 and 454 of FIG. 11(b), the. Persons relation having its identifiers in the various domains are shown after rows 18(a) through 18(h) are loaded into the RDMS. It should be noted that when the row "Smith, Bill, 4/12/18" is processed, since the last name Smith is already in the LAST NAME domain, it is not inserted at block 258, but the identifier "02" for Smith is inserted in the corresponding fourth row, LAST .name column, of the PERSON table, as indicated at 292 (FIG. 11(b)). Similarly, rows 18(e) , 18(f), 18(g), 18(h) include at least one entity that is already in a domain. Accordingly, the identifiers in the corresponding rows in the PERSON table, as indicated at 292, are not all the same value.
The next row of the PERSON table to be loaded is row 18(i) of FIG. 1(a). During block 155, row 18(i), "Jones, John, 3/5/62; Jones, Sam, 9/9/30; Smith, Mary, 9/14/39" is obtained from the load file. Processing continues at block 152, during which the INSERT ROW routine (FIG. 10) is again called.
Referring to FIG. 10, at block 254, the RPU determines if the newly obtained row is atomic. The row is not atomic and, thus, during block 264, the row is parsed into five atomic entities, mainly, Jones, John, 3/5/62; Jones, Sam, 9/9/30; and Smith, Mary, 3/14/39. Jones, Sam, 9/9/30 and Smith, Mary, 3/14/39 are parsed as two atomic entities because only two columns are specified namely, Father and Mother of the PERSONS table, and, by definition are entities, already occurring in prior rows of the table. The first entity, "Jones", is obtained during block 266. Then, during block 268, the RPU determines that "Jones" references a domain. During block 256, the value is determined to already exist in the LAST NAME domain as indicated at 456. Thus, processing continues at block 267, during which the identifier "01" associated with Jones is returned and during block 262 the identifier is placed into the ninth row of the Last Name column of the PERSON table (394, FIG. 11(b)). Processing continues at block 280, during which the RPU determines whether there are any more entities left for processing. The next entity, "John" is obtained during block 278 and processing continues at block 268.
During block 268, RPU determines that "John" references the First Name domain. During block 256, it is determined that "John" is not in the First Name domain. Thus, during block 258, "John" is inserted into the next row of the First Name domain which is row 8, FIG. 11(c). During block 267, the identifier "08" is returned and during block 262, the identifier is placed into the ninth row of the First Name column of the PERSON table (398, FIG. 11(c)). Processing continues at block 28.0, during which the RPU determines if there are any more entities left for processing. During block 278, date "3/5/62" is obtained and processing continues at block 268.
During block 268, it is determined that date 3/5/62 references a domain and during block 256 the value 3/5/62, is determined not to be in the Birth domain. During block 258, 3/5/62 is inserted into the last row of the Birth domain (462, FIG. 11(c)). Then, during block 267, the identifier associated with the newly inserted date is returned, and during block 262 the identifier "08" is placed into the last row of the Birth column of the PERSON table (402, FIG. 12) . Processing continues at block 280, during which the RPU determines if there are any more 1 entities left for processing. There are, so processing continues at block 278, during which the next atomic entity "Jones, Sam, 9/9/30" is obtained. Processing continues at block 268. 5 During block 268, the RPU determines whether the entity, "Jones, Sam, 9/9/30" references a domain or a relation. This entity references an entry in a prior row of the PERSON table, namely, Sam Jones at row 18(a) of FIG. 1(a). Thus, processing continues at block 270, 0 during which the identifier associated with the entity "Jones, Sam, 9/9/30" is obtained. The identifier associated with that entity is "01", as indicated in the ID# column. See 362 of FIG. 11(a). During block 272, the RPU determines that there is exactly one identifier 5 returned and, thus, processing continues at block 276 which places the identifier into the Father column of the PERSON table (404, FIG. 11(d)). Processing continues at block 280, during which the RPU determines that there is still one more entit " left in the input row, namely, 0 "Smith, Mary, 3/14/39." This entity is obtained at block 278. Processing continues at block 268.
During block 268, the entity "Smith, Mary, "3/14/39" is determined to reference a row of a relation and not a domain and, thus, processing continues at block 270. '5"' During block 270, an identifier for the entity for Mary
Smith is obtained. Recall at 376 of FIG. 11(a) that this entity was loaded into the Persons relation and the identifier "02" was associated with it. Processing continues at block 272 during which it is determined that 0 only one identifier was returned. Processing continues at block 276, during which the identifier "02" is placed into the Mother column of the Persons relation (406, FIG. 12).
Processing continues at block 280 during which the RPU determines that there are no more entities for processing 5 and thus processing continues at blocks 284 and 285. The row number associated with the newly inserted row is placed into the Identifier column of the Persons relation. The newly inserted row is the ninth row of the PERSON table and, thus, the identifier "09" is added to the Persons relation (408, FIG. 11(d)). Processing returns to the calling program, BUILD DATABASE routine (FIG. 8) , at block 161.
During block 161, it is determined that there are still more rows in the Persons relation that need to be loaded. Thus, processing continues at blocks 155 and 152. The next four rows of the PERSON table (18(j), 18(k) , 18(1) and 18(m) of FIG. 1(a)) are parsed up and identified in the same manner as row 18(i) (FIG. 1(a)) of the PERSON table which was described above. Assuming that blocks 155, 152 and 161 of the BUILD DATABASE routine have been fully executed, ' the output relation represented by identifiers is shown at 410 of FIG. 11(d) . The resultant domain to this point is shown at 472 of FIG. 11 as well. Once all the rows in the Persons relation have been input, then processing continues at block 165 of the BUILD DATABASE routine.
During block 165, the RPU determines whether there are any more relations to load. There are still two more relations left for loading and they are the Employee relation and the Manager relation. Processing continues at block 154, during which the Employee relation is selected as the next relation for loading. Then, in block 155, the first row of the EMPLOYEE table as shown in FIG. 1(b) is obtained. The first row of the EMPLOYEE table (FIG. 1(b)) is "Jones, John, 3/5/62, $31,000.00, Marketing." Then, in block 152, the INSERT ROW routine (FIG. 10) is called to process this row of the load file, as indicated at 474-484 and 514-522 of FIG. 12.
The INSERT ROW routine is repeated for the three additional employees of the Employee Table, as indicated at 486-512 and 524-550 of FIG. 12.
With the last Employee added to the Employee Table, processing continues at block 280, during which RPU determines that there are no more entities left for processing in the last inserted row. During block 284, the row number identified with the newly inserted row is determined to be "04" and it is placed into the fourth row of the Identifier column of the Employee relation (512, FIG. 12) . During block 286, processing returns to the BUILD DATABASE routine (FIG. 8) at block 161.
During block 161, the RPU determines that there are no more rows in the Employee relation to be inserted and, thus, processing continues at block 165. During block 165, the RPU determines that there is still one more relation, namely, the Manager relation, which needs to be loaded. Processing continues at block 154 during which the Manager relation is obtained. During block 155, the first row of the Manager relation "Smith, Mary, VP, Security" is obtained from the load file. Then, in block 152 the INSERT ROW routine (FIG. 10) is called.
Referring to FIG. 10, during block 254, the input row is examined and it is determined that it is not atomic.
During block 264, the row is parsed into three entities, namely, "Smith, Mary", "VP", and "Security." The entity
"Smith, Mary" is obtained during block 266. In block 268, it is determined that the entity references a relation and, thus, processing continues at block 270. During block 270, the identifier for Mary Smith is determined to be "03." During block 272, it is confirmed that only one
- identifier is returned, and during block 274, the identifier is placed into the first row of the Manager column of the Manager relation (552, FIG. 13). At block
280, the RPU determines that there are other entities which need to be processed in the input row and processing continues at block 278, during which the entity "VP" is obtained. Processing returns to block 268 during which it is determined that the entity references a domain. At block 256, it is determined that the entity is not located in the Title domain and, thus, during block 258 it is inserted into the first row of the Title domain (572, FIG. 13) . During block 267, the identifier associated with "VP" is determined to be "01" and during block 262 it is placed into the first row of the Title column in the Manager relation (556, FIG. 13) . Processing continues at block 280, during which the RPU determines that there is one more entity left for processing. The entity "Security" is obtained and processing continues at block 268. During block 168 it is determined that the entity references a domain and in block 256 it is determined to be in the second row of the Department domain (574, FIG. 13) . During block 267, the identifier associated with "Security" is determined to be "02" and during block 262 the identifier is placed in the first row of the Department column of the Manager relation (555, FIG. 13) . Processing continues at block 280 during which the RPU determines that there are no more entities left to process for this particular inserted row. During block 284, the row number "01" of the newly inserted row is placed in the Identifier (ID#) column of the Manager relation (560, FIG. 13) . During block 286, processing returns to the BUILD DATABASE routine (FIG. 8) at block 161.
During block 161, the RPU determines that there is still one more row in the Manager relation to be added and, thus, processing continues at block 155 and 152 to insert the row "Doe, John, VP, Marketing" into . its identifier form in the Manager relation. During block 155, the row is obtained from the load file and in block 152 the INSERT ROW routine (FIG. 10) is called. Referring to FIG. 10, during block 254 it is determined that the row is not atomic. The row is parsed up into the following entities, "Doe, John", "VP", "Marketing". During block 266 the entity "Doe, John" is obtained and during 268 it is determined that it references a relation. In block 270, the identifier associated with the entity is determined to be "02" and at block 272, it is confirmed that only one identifier is returned. During block 276, the identifier "02" is placed into the second row of the Manager column of the Manager relation (562, FIG. 13). During block 280, the RPU determines that there are still entities left for processing and, thus, in block 278, the entity "VP" is obtained. Processing continues at block 268, during which the entity is determined to reference a domain. During block 256 the entity is determined to be in the first row of the Title domain (580, FIG. 13). Processing continues at block 267, during which the identifier "01" is returned and in block 262 the identifier is placed into the second row of the Title column in the Manager relation (564, FIG. 13) . Processing continues at block 280, during which the RPU determines that there is one more entity left for processing. The entity "Marketing" at block 278 is obtained and during block 268 it is determined to reference a domain. During block 256, it is determined that the entity is in the first row of the Department domain and has an identifier of "01." During block 267, the identifier is returned. In block 262, it is placed into the second row of the Department column of the Manager relation (566, FIG. 13) . Processing continues at block 280, during which it is determined that no more entities exist for the inserted row. During block 284, the row number associated with the newly inserted row is determined to be "02." This number is placed into the second row of the Identifier column of the Manager relation (568, FIG. 13). At this point, the entire relations shown in FIGS. 1(a), 1(b) and 1(c) have been loaded into the RDMS (105, FIG. 7) and has been converted into a relational database represented by identifiers as shown in FIG. 6 and at 410 of FIG. 11 (PERSON table) , 512 of FIG. 12 (EMPLOYEE table) and 568 of FIG. 13 (MANAGER table) .
From the above description of the substitution of identifiers in the relations of a relational database, it will be appreciated that the DOMAIN columns created from the values in the database, as shown in the righthand columns of FIGS. 11, 12 and 13, are in effect columns of a new relation created in the process. The identifiers are generated in effect by assigned ordinal numbers to the rows of the this new relation. This arrangement results in the same identifier representing more than one value, namely, all the values in one row of the newly created relation. It also provides a single way of linking identifiers to their associated values.
The invention has been described in an exemplary and preferred embodiment, but is not limited thereto. Those skilled in the art will recognize a number of additional modifications and improvements which can be made to the invention without departure from the central sphere and scope. For example, a number of different software and hardware embodiments and a number of different software languages and hardware configurations will be suitable for implementing the disclosed invention.

Claims

WHAT IS CLAIMED IS:
1. A method, utilizing a computer, for representing a relational database using identifiers, said relational database comprising a plurality of relations, each said relation comprising one or more columns and rows, each said column having one or more values, each said value of each said column having a common characteristic, each said value of each said column corresponding to one of said rows, each said row comprising one or more values, and each said value of each said row being from a different column, said method comprising the steps of: for each said characteristic of said relational database, forming a set of unique values having the same characteristic, said step including the step of forming said unique values in a predetermined order of occurrence, for each said column of each said relation of said relational database, replacing each said value of each said column with an identifier, said identifier identifying the position of said value within said predetermined order of unique values in the associated set, for each said row having one or more values of a relation of said relational database, forming an identifier for identifying said row of values; and replacing said one or more values of said row with the associated identifier when said one or more values is contained in a different row of the same or a different relation.
2. The method of claim 1 wherein said step of forming said unique values in a predetermined order of occurrence includes the step of ordering said values in the order in which said values are entered into said set.
3. The method of claim 2. wherein the values in different sets having the same ordinal position within the respective sets have the same identifier.
4. The method of claim 1 wherein the identifiers of rows of values identify the order in which the rows occur within a relation.
5. A method, utilizing a computer, for building a relation of a relational database with identifiers, said relation comprising at least one row and at least one column, each row and each column having at least one value associated therewith, each value being associated with at least one row and one column, all values associated with any one column having a common characteristic, said method comprising the steps of: for each said common characteristic, forming a set which contains all set values of all said columns in said relation having said common characteristic; for each one of said values in said formed sets, associating a corresponding identifier; for each said row of said relation associating a -corresponding identifier with said row, and storing each said identifier corresponding to each said associated unique value or said row in said associated column.
6. A method, utilizing a computer, for representing a relational database with identifiers for each of the values and for predetermined groups of values, said relational database comprising one or more relations, each relation having values arranged in rows and columns, with the values in any one column having a common characteristic, said method comprising the steps of: forming a new relation of columns and rows from all the values in the database, each column of the new relation comprising a unique set of all the values in the database having a common characteristic; assigning a different identifier to each row of the new relation; replacing each value in the database with the identifier associated with the row containing the value in said new relation; assigning a different identifier to each row of each relation in the database, and replacing each subsequent occurrence of the same group of values in another row of the same or a different relation with said last-named identifier.
7. The method of claim 6 wherein the step of assigning said identifiers includes assigning identifiers that correspond to the ordinal positions of the associated rows within the associated relations.
PCT/US1988/003477 1987-10-09 1988-10-06 A relational database using identifiers Ceased WO1989003567A1 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US10743587A 1987-10-09 1987-10-09
US107,435 1987-10-09
US23770288A 1988-08-26 1988-08-26
US237,702 1988-08-26

Publications (1)

Publication Number Publication Date
WO1989003567A1 true WO1989003567A1 (en) 1989-04-20

Family

ID=26804779

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1988/003477 Ceased WO1989003567A1 (en) 1987-10-09 1988-10-06 A relational database using identifiers

Country Status (2)

Country Link
AU (1) AU2722388A (en)
WO (1) WO1989003567A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1997015015A3 (en) * 1995-10-13 1997-05-15 Annette Brueckner Information system and process for storing data therein

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE68926422T2 (en) * 1988-09-23 1996-11-07 Ibm Database management system

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4631673A (en) * 1985-01-22 1986-12-23 International Business Machines Corporation Method for refreshing multicolumn tables in a relational data base using minimal information

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4631673A (en) * 1985-01-22 1986-12-23 International Business Machines Corporation Method for refreshing multicolumn tables in a relational data base using minimal information

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
C.J. DATE, "An Introduction to Database Systems", published 1981 by Addison-Wesley Publishing Co., see pages 3-143, 159-169, and 498-503, especially pages 114, 115, 140 and 141. *
G. EVEREST, "Database Management: Objectives, System Functions, and Administration", published 1986 by McGraw-Hill Book Co., (New York), see pages 120-149, especially pages 124, 128, 129, 131, 136 and 141. *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1997015015A3 (en) * 1995-10-13 1997-05-15 Annette Brueckner Information system and process for storing data therein

Also Published As

Publication number Publication date
AU2722388A (en) 1989-05-02

Similar Documents

Publication Publication Date Title
US4933848A (en) Method for enforcing referential constraints in a database management system
US6374252B1 (en) Modeling of object-oriented database structures, translation to relational database structures, and dynamic searches thereon
EP0722591B1 (en) Method and apparatus for data storage and retrieval
US6564212B2 (en) Method of processing queries in a database system, and database system and software product for implementing such method
US6212524B1 (en) Method and apparatus for creating and populating a datamart
US6711563B1 (en) Methods of organizing data and processing queries in a database system, and database system and software product for implementing such methods
US6014670A (en) Apparatus and method for performing data transformations in data warehousing
EP0633538A2 (en) Relational database management system
US8150888B2 (en) Automatic elimination of functional dependencies between columns
US7739224B1 (en) Method and system for creating a well-formed database using semantic definitions
US20020095421A1 (en) Methods of organizing data and processing queries in a database system, and database system and software product for implementing such methods
US20020194196A1 (en) Method and apparatus for transforming data
US20020093522A1 (en) Methods of encoding and combining integer lists in a computer system, and computer software product for implementing such methods
US20050216497A1 (en) Uniform financial reporting system interface utilizing staging tables having a standardized structure
WO1998009237A1 (en) Corporate disclosure and repository system
WO2002047463A2 (en) A method and apparatus for transforming data
EP1672540A1 (en) Complex data access
US7617206B1 (en) Method for analyzing status of specialized tank files which store and handle large objects
Plew et al. Sams teach yourself SQL in 24 hours
WO1989003567A1 (en) A relational database using identifiers
US20030055838A1 (en) Data storing method and data storing structure
US20060074982A1 (en) Method for comparing tabular data
Maheshwari et al. Introduction to SQL and PL/SQL
Smith A Survey of Relational Database Management Systems of Microcomputers
Stephens et al. MySQL Column and Table Types

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AU JP

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): AT BE CH DE FR GB IT LU NL SE