CN100421107C

CN100421107C - Data structure and management system for supersets of relational databases

Info

Publication number: CN100421107C
Application number: CNB2003801108259A
Authority: CN
Inventors: 蒂莫西·C·欧文斯; 布鲁斯·E·哈里森
Original assignee: United Parcel Service of America Inc
Current assignee: United Parcel Service of America Inc
Priority date: 2003-10-21
Filing date: 2003-10-21
Publication date: 2008-09-24
Anticipated expiration: 2023-10-21
Also published as: JP2007535009A; WO2005050481A1; CA2543159A1; EP1687741A1; AU2003284305A1; CN1879104A; CA2543159C; MXPA06004481A

Abstract

A data structure, database management system, and method for validating data are disclosed. A data structure is described that includes a superset of an interconnected relational database containing a plurality of tables having a common data structure. The tables may be stored as sparse matrix linked lists. A method is disclosed for sorting records in a hierarchical order along a range of levels from general to specific. Examples are described for use with an address database, including methods for converting input addresses having subjective representations into output addresses having preferred representations. Preferred artifacts may be marked with tokens. Alias tables may be included. This abstract is provided to comply with regulations requiring that the abstract promptly inform searchers and others reading the subject matter of the invention. This abstract is submitted with the understanding that it will not be used to interpret or limit the scope or meaning of the claims.

Description

A data structure and management system for a superset of relational databases

技术领域 technical field

以下公开一般地涉及关系数据库(relational database)管理系统，尤其涉及用于在计算机网络环境中利用稀疏矩阵链接列表处理多个关系数据库上的分组数据的方法和装置。The following disclosure generally relates to relational database management systems, and more particularly to methods and apparatus for processing grouped data on multiple relational databases using sparse matrix linked lists in a computer network environment.

背景技术 Background technique

自从数字时代开始以来，数据库就一直是计算的主题。数据库一般是指持续性数据的一个或多个大型结构化集合，它通常与软件系统相关联，以创建、更新和查询数据。在数据库中，每个数据值被存储在字段(field)中；字段集合一起形成记录(record)；记录群组可以一起被存储在文件(file)中。Databases have been the subject of computing since the dawn of the digital age. A database generally refers to one or more large structured collections of persistent data, usually associated with software systems to create, update, and query data. In a database, each data value is stored in a field; a collection of fields together form a record; groups of records can be stored together in a file.

最初的数据库是平面(flat)的；其含意是所有数据都被存储在被称为定界文件(delimited file)的单行文本中。在定界文件中，每个字段由特殊的字符分隔，所述特殊字符例如是逗号。每个记录由不同的字符分隔，所述不同的字符例如是补注号(^)或制表(tab)字符。简单的定界文件看起来可能是这样的：The original databases were flat; this meant that all data was stored in single lines of text called delimited files. In a delimited file, each field is separated by a special character, such as a comma. Each record is separated by a different character, such as a caret (^) or a tab character. A simple delimited file might look like this:

Last，First，Age^Doe，John，26^Smith，Jane，43^Jones，David，34Last, First, Age^Doe, John, 26^Smith, Jane, 43^Jones, David, 34

每个字段可以被分配以被称为属性(attribute)的名称或类别。在以上示例性文件中，属性是Last，First和Age。属性指示要存储在每个字段中的数据的类型。对于大量数据，定界文件文本可能发展到很长。访问特定数据一般要求顺序搜索整个列表。随着计算机和数据库的容量的增大，对更高效的访问和更迅速的搜索技术的需求引起了新数据结构的开发。Each field can be assigned a name or category called an attribute. In the example file above, the attributes are Last, First and Age. Attributes indicate the type of data to be stored in each field. For large amounts of data, the delimited file text can grow to be very long. Accessing specific data generally requires a sequential search of the entire list. As the size of computers and databases has increased, the need for more efficient access and faster search techniques has led to the development of new data structures.

关系数据库模型是在20世纪70年代早期描述的。在关系数据库中，数据被存储在表格(table)中。表格将数据组织成行(row)和列(column)，从而为每个字段提供特定位置(例如行x、列y)。每行包含单个记录。列是按照属性按顺序排列的，因此每列中的所有字段包含相同类型的数据。以上定界文件可以表示成表格格式，如下：The relational database model was described in the early 1970s. In a relational database, data is stored in tables. Tables organize data into rows and columns, giving each field a specific location (eg row x, column y). Each row contains a single record. Columns are ordered by attribute, so all fields in each column contain the same type of data. The above delimited file can be expressed in a table format, as follows:

姓名年龄Surname First Name Age

Doe John 26Doe John 26

Smith Jane 43Smith Jane 43

Jones David 34Jones David 34

属性或列标题的集合有时被称为表格的模式(schema)。例如，以上表格可以被描述为具有模式(姓、名、年龄)的表格。The collection of attributes or column headings is sometimes called the table's schema. For example, the above table could be described as a table with schema (last name, first name, age).

数据库文件的表格格式使得搜索和访问数据更迅速和更高效。也可以基于列(字段)中的任何一个或多个将记录(行)被分类成新顺序。分类常被用于对记录进行排序，以使得最需要的数据在文件中更早出现，从而使得搜索更迅速。The tabular format of database files makes searching and accessing data faster and more efficient. Records (rows) can also be sorted into a new order based on any one or more of the columns (fields). Classification is often used to sort records so that the most needed data appears earlier in the file, making searching faster.

随着计算速度和容量的增大，数据库表格能够存储更大量的数据。可以添加附加的记录(行)来描述附加的实例。可以添加附加的属性(列)来适应关于每个实例的更多类型的数据。随着字段数目增大，更改表格结构的任务(添加或删除行和列)变得更复杂，并且增大了差错的可能性。此外，对于大型表格，基于一个或多个列对数据进行分类的任务变得更复杂和耗时。将不同类型的数据添加到单个大型二维表格中最终产生诸如冗余性、不一致性、存储要求增大以及分类和计算速度变慢之类的问题。As computing speed and capacity increase, database tables are able to store larger amounts of data. Additional records (rows) may be added to describe additional instances. Additional attributes (columns) can be added to accommodate more types of data about each instance. As the number of fields increases, the task of changing the structure of the table (adding or deleting rows and columns) becomes more complex and increases the potential for error. Also, for large tables, the task of classifying data based on one or more columns becomes more complex and time-consuming. Adding different types of data into a single large two-dimensional table ultimately creates issues such as redundancy, inconsistency, increased storage requirements, and slower classification and computation.

具有多个表格的关系数据库。为了适应包含相关数据的不同类型的字段，关系数据库模型可以包括多个表格。包含相关数据的多个表格可以利用关键字字段被链接在一起。关键字字段包含每个记录(或数据行)的唯一标识符。关键字字段可以包含实际数据，例如零件号码或社会安全号，只要它对于该记录是唯一的。这有时被称为逻辑关键字。关键字字段还可以是替换关键字，例如记录号码，它是不与实际数据相关的唯一标识符。此外，关键字可以用单个字段或字段集合来定义。简单关键字是基于单个字段的，而复合关键字是基于多个字段的。 A relational database with multiple tables . To accommodate different types of fields containing related data, a relational database model can include multiple tables. Multiple tables containing related data can be linked together using key fields. The key field contains a unique identifier for each record (or row of data). A key field can contain actual data, such as a part number or social security number, as long as it is unique to that record. This is sometimes called a logical keyword. The key field can also be a replacement key, such as a record number, which is a unique identifier not associated with the actual data. Additionally, keywords can be defined with a single field or with a collection of fields. Simple keywords are based on a single field, while compound keywords are based on multiple fields.

在关系数据库中，相关数据可以被存储在多个表格中。被称为“主关键字(primary key)”的关键字字段充当用于在表格中查找特定记录的唯一参考点。例如，示例“表格A”中的属性(或列标题)可以是(名称、年龄、社会安全号、雇员号码)。表格A的主关键字是社会安全号字段。In a relational database, related data can be stored in multiple tables. The key field, known as the "primary key", acts as a unique point of reference for locating a particular record in the table. For example, the attributes (or column headers) in the example "Table A" could be (name, age, social security number, employee number). The primary key of Form A is the social security number field.

在其中数据被存储在多个表格中的关系数据库中，另一个被称为“外部关键字(foreign key)”关键字字段被用于连接表格的参考点。例如，考虑另一个示例性表格“表格B”，它具有模式(雇员号码、部门号码、雇佣日期、薪金)。表格B的主关键字是唯一的雇员号码字段。返回参考表格A中的属性，表格A的外部关键字是雇员号码字段，这是因为它将表格A中的记录与表格B中的记录相链接。表格之间的这种关系可以利用实体关系表(Entity Relationship Diagram)来示出，其中每个表格包含唯一实体或类别的数据，例如“年龄”或“部门”。In relational databases where data is stored in multiple tables, another key field called a "foreign key" is used as a reference point for joining tables. For example, consider another exemplary table "Form B", which has a schema (EmployeeNumber, DepartmentNumber, HireDate, Salary). The primary key of Form B is the unique employee number field. Referring back to the attributes in table A, the foreign key of table A is the employee number field because it links the records in table A to the records in table B. This relationship between tables can be shown using an Entity Relationship Diagram, where each table contains data for a unique entity or category, such as "Age" or "Department".

关系数据库表格A(年龄) 表格B(部门)+Name +EmployeeNr+Age +DepartmentName+SSN +HireDate+EmployeeNr +Salary Relational Database Form A (Age) Form B (Department)+Name +EmployeeNr+Age +DepartmentName+SSN +HireDate+EmployeeNr +Salary

共享的“EmployeeNr”是两个表格共同的，它提供两个表格中的数据之间的链接。“EmployeeNr”字段是表格A中的外部关键字，但它是表格B中的主关键字。The shared "EmployeeNr" is common to both tables and provides a link between the data in both tables. The "EmployeeNr" field is a foreign key in table A, but it is a primary key in table B.

表格A和表格B不需要包括同样数目的记录。例如，表格A中的记录可以包括组织中的每个人的名称、年龄、社会安全号和雇员号码；表格B中的记录可以限于只是特定部门或分部中的那些。Table A and Table B need not contain the same number of records. For example, records in Form A may include the name, age, social security number, and employee number of everyone in the organization; records in Form B may be limited to only those in a particular department or division.

通过将离散的数据集合包括在相分离的表格中，关系数据库可以为了多种目的而访问所选择的表格。单个关系数据库可以包括任何数目的表格，从几个表格到数千个表格。By containing discrete sets of data in separate tables, a relational database can access selected tables for multiple purposes. A single relational database can contain any number of tables, from a few to thousands of tables.

查询语言允许用户与数据库交互并分析表格中的数据。查询(query)是用于从数据库提取数据集合的指令的汇集。查询不改变表格中的信息；它们只是将信息显示给用户。查询的结果有时被称为视图(view)。Query languages allow users to interact with databases and analyze data in tables. A query is a collection of instructions for extracting a data set from a database. Queries don't change the information in the table; they just display the information to the user. The result of a query is sometimes called a view.

已知的最好的查询语言是结构化查询语言(SQL)，其发音为“sequel”。SQL是用于数据库互用性的标准语言。查询大概是SQL最常被使用的方面，但是SQL命令也可以被用作编程工具，以创建和维护数据库。The best known query language is Structured Query Language (SQL), pronounced "sequel." SQL is the standard language for database interoperability. Queries are probably the most commonly used aspects of SQL, but SQL commands can also be used as programming tools to create and maintain databases.

数据库管理系统。数据库管理系统(有时缩写为DBMS)一般是指被特别设计为管理和操纵数据库中的信息的接口和一个或多个计算机软件程序。DBMS可以包括控制数据的组织、存储和检索以及数据库的安全性和完整性的复杂的软件程序套组。DBMS还可以包括用于接受来自外部应用的对数据的请求的接口。 database management system . A database management system (sometimes abbreviated DBMS) generally refers to an interface and one or more computer software programs specially designed to manage and manipulate information in a database. A DBMS can include a complex suite of software programs that control the organization, storage, and retrieval of data, as well as the security and integrity of the database. The DBMS may also include interfaces for accepting requests for data from external applications.

接口是被设计为提供用户和诸如DBMS这样的应用之间的操作性连接或接口的计算机程序。DBMS的接口可以提供一系列命令，这些命令允许用户创建、读取、更新和删除存储在数据库表格中的数据值。这些功能(创建、读取、更新、删除)有时被用首字母缩写CRUD来提及，因此具有这些命令的接口可以被称为CRUD接口。包括查询功能的数据库接口可以被称为CRUDQ接口。An interface is a computer program designed to provide an operative connection or interface between a user and an application such as a DBMS. An interface to a DBMS may provide a series of commands that allow users to create, read, update, and delete data values stored in database tables. These functions (create, read, update, delete) are sometimes referred to by the acronym CRUD, so an interface with these commands can be called a CRUD interface. A database interface that includes query functionality may be referred to as a CRUDQ interface.

基于COM的接口是指基于组件对象模型的软件。组件对象模型是由Digital Equipment公司和Microsoft开发的开放软件体系结构，其允许数据库系统的各种组件之间的互用性。A COM-based interface refers to software based on the Component Object Model. The Component Object Model is an open software architecture developed by Digital Equipment Corporation and Microsoft that allows interoperability between various components of a database system.

在包括多个表格的关系数据库中，数据库管理系统(DBMS)一般负责维护各种表格中的关键字字段之间的所有链接。这被称为维护数据库的“参照完整性(referential integrity)”In a relational database comprising multiple tables, a database management system (DBMS) is generally responsible for maintaining all links between key fields in the various tables. This is known as maintaining the "referential integrity" of the database

维护参照完整性通常是包括很大数目的表格的关系数据库中的挑战。关系数据库表格的链接的性质具有许多优点，但它也可以允许差错在表格之间以及整个数据库内传播，尤其是在记录或关键字字段被改变或删除时。对于其中各种用户能够通过CRUD接口访问数据库的系统，差错的可能性复合化了。Maintaining referential integrity is often a challenge in relational databases that include a very large number of tables. The linked nature of relational database tables has many advantages, but it can also allow errors to propagate between tables and throughout the database, especially when records or key fields are changed or deleted. For systems where various users can access the database through CRUD interfaces, the potential for errors is compounded.

在计算机网络环境中，大型数据库可以被容宿在中央服务器上，其中许多用户或订户利用通信链路从远程位置访问数据。访问速度通常由通信链路的类型和容量所限。将整个数据库的复本分发到远程位置一般是不切实际的，尤其对于其中数据必须在当前就可用的应用更是如此。此外，在本地存储的大型数据库将会对本地用户产生相当大的负担，这是因为远程系统一般小于中央服务器。在不具有足够容量的本地系统上存储大型数据库常常导致计算时间的不可接受的增加。为每个远程位置更新所有硬件的成本可能太过昂贵，尤其对于很大的用户网络来说更是如此。In a computer network environment, large databases may be hosted on a central server with many users or subscribers accessing the data from remote locations using communication links. Access speed is usually limited by the type and capacity of the communication link. Distributing a replica of an entire database to a remote location is generally impractical, especially for applications where the data must be currently available. Furthermore, large databases stored locally will place a considerable burden on local users, since the remote systems are generally smaller than the central server. Storing large databases on local systems that do not have sufficient capacity often results in an unacceptable increase in computation time. Updating all the hardware for each remote location can be prohibitively expensive, especially for very large user networks.

更新大型关系数据库中的数据从技术上来说可能是具有挑战性的并且是耗时的，尤其是在其中数据必须被频繁更新的网络环境中更是如此。发送整个数据库的更新后的复本通常是不切实际且成本高昂的。此外，分发的成本和延迟可能成为更新频率的障碍。Updating data in a large relational database can be technically challenging and time-consuming, especially in a network environment where data must be updated frequently. It is often impractical and costly to send an updated copy of the entire database. Additionally, the cost and delay of distribution can be a barrier to update frequency.

从而，本领域中需要一种改进的数据库管理系统，其能够维护和保护大量数据，以划算的方式分发频繁的更新，并且在网络内的所有位置迅速且高效地处理请求或数据。Accordingly, there is a need in the art for an improved database management system that is capable of maintaining and protecting large amounts of data, distributing frequent updates in a cost-effective manner, and processing requests or data quickly and efficiently at all locations within the network.

地址数据库。美国包括多于145,000,000个可投递地址。包含关于所有这些街道地址的信息的数据库是超大型数据库的一个示例。地址数据库可以从私有来源或从政府来源获得，例如从美国邮政服务(USPS)获得。 Address database . The United States includes more than 145,000,000 deliverable addresses. A database containing information about all these street addresses is an example of a very large database. Address databases may be obtained from private sources or from government sources, such as the United States Postal Service (USPS).

USPS向公众提供多种地址数据库，包括城市-州文件(City-Statefile)、五位ZIP文件(Five-Digit ZIP file)和ZIP+4文件。城市-州文件是具有相应的城市和县名称的邮政代码的综合列表。五位ZIP文件当与城市-州文件结合使用时，允许用户验证现有的五位邮政代码分配。ZIP+4文件提供了ZIP+4代码的综合列表。USPS provides a variety of address databases to the public, including City-Statefile, Five-Digit ZIP file, and ZIP+4 file. A city-state file is a comprehensive list of zip codes with corresponding city and county names. Five-digit ZIP files, when used in conjunction with city-state files, allow users to verify existing five-digit ZIP code assignments. The ZIP+4 file provides a comprehensive listing of ZIP+4 codes.

投递序列文件(DSF)是由USPS开发的计算机化的数据库，其包括USPS所服务的每个投递点的完全、标准化的地址，这些地址被存储在离散的记录中。每个单独的记录包含街道地址、ZIP+4代码、派发路线代码、投递序列号码(行走序列号码)、投递类型代码和季节性投递指示符。DSF包括足够完成地址验证和标准化的数据。DSF被提供给开发经核证的地址卫生软件的许可证持有人。USPS最近开发了新的投递点验证(DPV)数据库，以取代DSF。DPV数据库具有基本格式或增强型格式，增强型格式被称为DSF²，它包括附加的地址属性。The Delivery Sequence File (DSF) is a computerized database developed by the USPS that includes the full, standardized addresses of each delivery point served by the USPS, stored in discrete records. Each individual record contains street address, ZIP+4 code, dispatch route code, delivery serial number (walking serial number), delivery type code, and seasonal delivery indicator. The DSF includes enough data to complete address verification and normalization. DSF is provided to licensees who develop certified address hygiene software. USPS recently developed a new Point of Delivery Verification (DPV) database to replace DSF. The DPV database has a basic format or an enhanced format, called DSF ² , which includes additional address attributes.

地址标准化。对标准化邮递地址的需求是相对现代的发展。在二十世纪六十年代早期，邮件(其中大多数是业务邮件)量的巨大增长导致了了邮政服务的严重危机。计算机是支持邮件量的急剧增长的唯一的最大的力量。计算机允许了企业使多种邮递功能自动化，但是邮政服务却未对邮件量的激增作好准备。响应于该危机，制定了地区改进计划(ZIP)。到1963年7月，五位ZIP代码已经被分配给美国的所有可投递地址。ZIP代码标志着现化地址标准化时代的开始。 Address normalization . The need for standardized postal addresses is a relatively modern development. In the early 1960s, a huge increase in the volume of mail (most of which was business mail) led to a serious crisis in the Postal Service. Computers are the single greatest force supporting the dramatic growth in mail volume. Computers allowed businesses to automate many delivery functions, but the Postal Service was ill-prepared for the surge in mail volume. In response to this crisis, a District Improvement Plan (ZIP) was developed. By July 1963, five-digit ZIP codes had been assigned to all deliverable addresses in the United States. ZIP codes marked the beginning of the era of standardization of modernized addresses.

二十年后，引入了ZIP+4代码，其向ZIP代码添加了连字号和额外的四个数字。当今，通常是用多行光学字符读取器来对邮件分类的，所述多行光学字符读取器扫描整个地址，将11位投递点条形码(DPBC)打印在信封上，并且将邮件分类到每条投递路线上的已经建立的行走序列中的盘中。Twenty years later, the ZIP+4 code was introduced, which added a hyphen and an extra four digits to the ZIP code. Today, mail is typically sorted with a multi-line optical character reader that scans the entire address, prints an 11-digit Delivery Point Bar Code (DPBC) on the envelope, and sorts the mail into The trays in the established walking sequence on each delivery route.

地址标准化将给定地址变换成满足政府方针的最佳格式，例如由USPS制定的那些格式。标准化影响投递地址的所有成分，其中包括格式、字型、间距、字样、标点和ZIP代码或DPBC。例如，诸如以下非标准地址：Address normalization transforms a given address into the best format that meets government guidelines, such as those established by the USPS. Standardization affects all elements of the delivery address, including format, font, spacing, typeface, punctuation, and ZIP code or DPBC. For example, non-standard addresses such as:

John DoeJohn Doe

123 East Main Street，N.W.123 East Main Street, N.W.

Oakland Center，Suite A-4Oakland Center, Suite A-4

Atlanta，Georgia 30030Atlanta, Georgia 30030

在标准化之后可能看起来大不相同：May look quite different after normalization:

JOHN DOEJOHN DOE

123 E MAIN ST NW STE A4123 E MAIN ST NW STE A4

DECATUR GA 30030-1549DECATUR GA 30030-1549

地址可以被细分或解析成其成分，这些成分有时被称为工件(artifact)。例如，以上地址中的个体工件包括居住者或收存人(JohnDoe)、数字(123)、预定向(E)、主名称(Main)、类型(St)、后定向(NW)、次名称(STE)、次号码(A4)和城市、州以及ZIP+4代码(Decatur GA 30030-1549)。将地址划分成其个体工件在包括邮政分类和地址验证在内的许多场境中都是有用的。An address can be broken down or parsed into its components, which are sometimes called artifacts. For example, individual artifacts in the address above include Occupant or Recipient (JohnDoe), Number (123), Pre-Orientation (E), Main Name (Main), Type (St), Back Orientation (NW), Secondary Name ( STE), minor number (A4) and city, state and ZIP+4 code (Decatur GA 30030-1549). Dividing addresses into their individual artifacts is useful in many contexts including postal sorting and address validation.

地址验证。虽然标准化是指地址被格式化的方式，但是地址验证的过程却确认了给定地址是否是有效且是当前的地址。来自私有或政府源的地址数据库通常被用于验证地址。例如，上述USPS数据库可用于比较用途，以验证地址。 Address verification . While standardization refers to the way addresses are formatted, the process of address validation verifies that a given address is valid and current. Address databases from private or government sources are often used to verify addresses. For example, the aforementioned USPS database can be used for comparison purposes to verify addresses.

除了政府邮政服务之外，诸如商业包裹运输公司这样的私有企业常常会开发和维护用于存储唯一的且有价值的顾客信息的地址数据库。独立于政府邮政服务数据开发的私有数据库可以代表寻址准确和数据存储方面的下一代。在未来，将会有更多种政府和私有地址数据库可用。In addition to government postal services, private businesses such as commercial package carriers often develop and maintain address databases that store unique and valuable customer information. A private database developed independently of government postal service data could represent the next generation in addressing accuracy and data storage. In the future, a wider variety of government and private address databases will be available.

USPS地址数据库被定期以新数据更新。除了定期的、周期性的更新之外，USPS还开发了多个校正数据库，其中包括NCOA和LACS。国家地址变化(NCOA)数据库包含地址变化记录。可定位地址转换系统(LACS)包含用于经历从乡村路线到城市型地址的转换的地区的新地址。The USPS address database is regularly updated with new data. In addition to regular, periodic updates, USPS has developed several correction databases, including NCOA and LACS. The National Change of Address (NCOA) database contains address change records. The LOCATABLE ADDRESS CONVERSION SYSTEM (LACS) contains new addresses for regions undergoing a transition from rural routes to urban-type addresses.

由于人口的增长和变化，地址数据库一般要求频繁的更新。正如任何其他大型数据库一样，更新超大型地址数据库中的数据从技术来说通常是具有挑战性的并且是耗时的。从而，在地址数据库的场境中，本领域中需要一种改进的数据库管理系统，其能够维护和保护大量地址数据，以划算的方式向用户或订户分发频繁的更新，并且迅速且高效地处理对地址数据的请求。Due to population growth and changes, address databases typically require frequent updates. Just like any other large database, updating data in very large address databases is often technically challenging and time-consuming. Thus, in the context of address databases, there is a need in the art for an improved database management system that is capable of maintaining and securing large volumes of address data, distributing frequent updates to users or subscribers in a cost-effective manner, and quickly and efficiently processing A request for address data.

发明内容 Contents of the invention

以下发明内容是广泛的综述，而并不想要识别装置、方法、系统、过程等的关键或重要元素，或限定这种元素的范围。本发明内容以简化形式提供了概念性介质，以作为以下的更详细的描述的序言。The following summary is an extensive overview and is not intended to identify key or critical elements of an apparatus, method, system, process, etc., or to limit the scope of such elements. This Summary provides conceptual media in a simplified form as a prelude to the more detailed description that follows.

某些说明性示例装置、方法、系统、过程等是连续以下描述和附图来描述的。这些示例只是代表采用支持这些装置、方法、系统、过程等等的原理的各种方式中的几种，从而想要包括等同物。当结合附图考虑以下详细描述时，将明显看出其他有利的和新颖的特征。Certain illustrative example devices, methods, systems, processes, etc. are described in continuation of the following description and accompanying figures. These examples represent but a few of the various ways in which the principles underlying these apparatuses, methods, systems, processes, etc. may be employed, and equivalents are intended to be included. Other advantageous and novel features will become apparent when considering the following detailed description in conjunction with the accompanying drawings.

考虑到本发明的宽泛教导，提供了具有有利构造的数据结构、数据库管理系统、处理装置和相关方法。这里所描述的示例性装置、方法和系统帮助了提示和高效验证以主观表示给出的输入数据，并产生具有优选表示的输出数据。In view of the broad teachings of the present invention, advantageously structured data structures, database management systems, processing means and related methods are provided. Exemplary devices, methods, and systems described herein facilitate prompting and efficiently validating input data given subjective representations and generating output data with preferred representations.

在本发明的一个方面中，一种数据结构可以包括超集，该超集包括操作性地连接到一个或多个次要数据库的主要数据库，其中主要数据库和一个或多个次要数据库中的每一个包括第一表格，该第一表格操作性地链接到一个或多个其他表格，并且第一表格和一个或多个其他表格中的每一个共享共同的数据结构。数据库可以是关系数据库。共同数据结构可以包括稀疏矩阵链接列表。共同数据结构可以包括数据记录，这些记录是基于数据按从笼统到具体的一系列级别以分级顺序排列的。In one aspect of the invention, a data structure may include a superset comprising a primary database operatively connected to one or more secondary databases, wherein the primary database and the one or more secondary databases Each includes a first table operatively linked to one or more other tables, and the first table and the one or more other tables each share a common data structure. The database may be a relational database. A common data structure may include a sparse matrix linked list. The common data structure may include data records arranged in a hierarchical order based on the data at a range of levels from general to specific.

在数据结构中，主要数据库可以包括源表格，第一次要数据库可以包括别名表格，第二次要数据库可以包括标准化表格，并且第三次要数据库可以被配置成接受和存储输入数据。源表格可以包括从公共或私有来源获得的数据记录，别名表格可以包括记录的一个或多个等同表示，并且标准化表格可以包括记录的一个或多个标准化表示。在数据结构的另一个方面中，源表格可以包括从政府邮政服务和商业来源获得的地址记录。In the data structure, the primary database may include source tables, the first secondary database may include alias tables, the second secondary database may include normalization tables, and the third secondary database may be configured to accept and store input data. A source table may include data records obtained from public or private sources, an alias table may include one or more equivalent representations of a record, and a normalized table may include one or more normalized representations of a record. In another aspect of the data structure, source tables may include address records obtained from government postal services and commercial sources.

在数据结构内，第一表格包括优选记录，第一其他表格可以包括主要别名记录，并且第二其他表格可以包括次要别名记录。优选记录可以包括一个或多个优选表示，主要别名记录可以包括主要工件的一个或多个等同表示，并且次要别名记录可以包括次要工件的一个或多个等同表示。在相关方面中，优选记录可以包括地址的一个或多个优选表示。Within the data structure, a first table includes preferred records, a first other table may include primary alias records, and a second other table may include secondary alias records. A preferred record may include one or more preferred representations, a primary alias record may include one or more equivalent representations of a primary artifact, and a secondary alias record may include one or more equivalent representations of a secondary artifact. In a related aspect, a preference record may include one or more preference representations of an address.

在本发明的另一个方面中，提供了一种用于为最优搜索准备数据的方法，所述数据存储在包括多个链接的记录表格的一个或多个数据库中。该方法可以包括：基于数据按从笼统到具体的一系列级别以分级顺序排列表格中每一个中的记录；以及将表格中的每一个变换成一个或多个稀疏矩阵链接列表表格。当数据库存在于服务器-客户端网络环境中时，该方法还可以包括将一个或多个稀疏矩阵链接列表表格的复本从服务器分发到一个或多个客户端。数据库可以是互连以形成数据超集的关系数据库。在一个方面中，数据可以包括地址工件。In another aspect of the invention there is provided a method for preparing data for an optimal search, the data being stored in one or more databases comprising a plurality of linked record tables. The method may include arranging records in each of the tables in a hierarchical order based on the data at a range of levels from general to specific; and transforming each of the tables into one or more sparse matrix linked list tables. When the database exists in a server-client network environment, the method may also include distributing copies of the one or more sparse matrix linked list tables from the server to the one or more clients. The databases may be relational databases interconnected to form a superset of data. In one aspect, the data can include address artifacts.

在本发明的另一个方面中，提供了一种用于为最优搜索准备数据的装置，所述数据存储在包括多个链接的记录表格的一个或多个数据库中。该装置可以包括中央处理单元、存储器、基本输入/输出系统以及程序存储设备，该程序存储设备包含可由中央处理单元执行的程序模块。该程序模块可以包括：用于基于数据按从笼统到具体的一系列级别以分级顺序排列表格中每一个中的记录的装置；以及用于将表格中的每一个变换成一个或多个稀疏矩阵链接列表表格的装置。该装置还包括远离中央处理单元的一个或多个客户端。该程序模块还可以包括用于将一个或多个稀疏矩阵链接列表表格的复本从服务器分发到一个或多个客户端的装置。In another aspect of the invention, there is provided an apparatus for preparing data for an optimal search, the data being stored in one or more databases comprising a plurality of linked record tables. The apparatus may include a central processing unit, memory, a basic input/output system, and a program storage device containing program modules executable by the central processing unit. The program module may include: means for arranging records in each of the tables in a hierarchical order based on the data at a range of levels from general to specific; and for transforming each of the tables into one or more sparse matrices Fixtures for linked list forms. The apparatus also includes one or more clients remote from the central processing unit. The program modules can also include means for distributing copies of the one or more sparse matrix linked list tables from the server to the one or more clients.

在本发明的另一个方面中，提供了一种使用链接表格的数据库来将主观表示转换成优选表示的方法。该方法可以包括：捕捉主观表示并将其存储在链接表格中的第一链接表格中；将源数据存储在链接表格中的第二链接表格中；通过将主观表示与源数据相比较来从源数据中定位一个或多个候选表示；从一个或多个候选表示中选择优选表示，所述优选表示与主观表示最相似；以及发表该优选表示。In another aspect of the invention, a method of using a database of linked tables to convert subjective representations into preferred representations is provided. The method may include: capturing the subjective representation and storing it in a first one of the linked tables; storing the source data in a second one of the linked tables; Locating one or more candidate representations in the data; selecting a preferred representation from the one or more candidate representations, the preferred representation most similar to the subjective representation; and publishing the preferred representation.

该方法还可以包括：查看源数据以识别包含优选数据的一个或多个选择记录；以及将优选令牌添加到一个或多个选择记录；The method may also include: reviewing source data to identify one or more selection records that contain preferred data; and adding a preference token to the one or more selection records;

选择优选表示的步骤可以包括识别与一个或多个候选表示之一相关联的优选令牌。The step of selecting a preferred representation may include identifying a preferred token associated with one of the one or more candidate representations.

定位一个或多个候选表示的步骤还可以包括：(a)将主观表示解析成一个或多个离散工件；(b)选择一个或多个离散工件之一：(1)通过将一个离散工件与源数据相比较来从源数据中定位一个或多个候选工件；(2)从一个或多个候选工件中选择优选工件，优选工件与一个离散工件最相似；(3)存储该优选工件；(c)为一个或多个离散工件中的每一个重复步骤(b)；(d)组合优选工件以形成优选表示。The step of locating the one or more candidate representations may also include: (a) parsing the subjective representation into one or more discrete artifacts; (b) selecting one of the one or more discrete artifacts: (1) by combining a discrete artifact with comparing the source data to locate one or more candidate artifacts from the source data; (2) selecting a preferred artifact from the one or more candidate artifacts, the preferred artifact being most similar to a discrete artifact; (3) storing the preferred artifact; ( c) repeating step (b) for each of the one or more discrete artifacts; (d) combining the preferred artifacts to form the preferred representation.

定位一个或多个候选表示的步骤还可以包括：将别名数据存储在链接表格中的第三链接表格中；查看别名数据以识别包含优选别名表示的一个或多个选择别名记录；将优选别名令牌添加到一个或多个选择别名记录；通过将主观表示与别名数据相比较来从别名数据中定位一个或多个候选别名；从一个或多个候选别名中选择优选别名，所述优选别名与优选别名令牌最紧密关联；以及发表该优选别名作为候选表示。The step of locating one or more candidate representations may further comprise: storing alias data in a third link table among the link tables; reviewing the alias data to identify one or more selected alias records containing preferred alias representations; card is added to one or more selected alias records; one or more candidate aliases are located from the alias data by comparing the subjective representation with the alias data; a preferred alias is selected from the one or more candidate aliases, the preferred alias being compared with the alias data The preferred alias token is most closely associated; and the preferred alias is published as a candidate representation.

定位一个或多个候选别名的步骤还包括：(a)将主观表示解析成一个或多个离散工件；(b)选择一个或多个离散工件之一：(1)通过将一个离散工件与别名数据相比较来从源数据中定位一个或多个候选别名工件；(2)从一个或多个候选别名工件中选择优选别名工件，优选别名工件与优选别名令牌最紧密地关联；(3)存储该优选别名工件；(c)为一个或多个离散工件中的每一个重复步骤(b)；(d)将优选别名工件添加到优选别名。The step of locating one or more candidate aliases also includes: (a) parsing the subjective representation into one or more discrete artifacts; (b) selecting one of the one or more discrete artifacts: (1) by combining a discrete artifact with the alias data comparison to locate one or more candidate alias artifacts from the source data; (2) select a preferred alias artifact from the one or more candidate alias artifacts, the preferred alias artifact most closely associated with the preferred alias token; (3) storing the preferred alias artifact; (c) repeating step (b) for each of the one or more discrete artifacts; (d) adding the preferred alias artifact to preferred aliases.

在本发明的另一个方面中，提供了一种用于执行刚刚描述的方法步骤的装置。该装置可以包括：中央处理单元；存储器；基本输入/输出系统；以及程序存储设备，该程序存储设备包含可由中央处理单元执行的程序模块，其中该程序模块可以包括用于执行上述方法中的每个步骤的装置。In another aspect of the invention there is provided an apparatus for performing the steps of the just described method. The device may include: a central processing unit; a memory; a basic input/output system; and a program storage device, the program storage device includes a program module executable by the central processing unit, wherein the program module may include a program module for performing each of the above methods. step-by-step device.

在本发明的另一个方面中，提供了一种用于控制一个或多个外部应用对数据库的访问的方法。该方法可以包括：确立和存储多个规则集合，其中每一个与一个或多个外部应用中的一个相关；接收来自第一应用的请求；检索与第一应用相关的第一规则集合；以及应用第一规则集合以控制第一应用和数据库之间的交互。In another aspect of the invention, a method for controlling access to a database by one or more external applications is provided. The method may include: establishing and storing a plurality of rule sets, each associated with one of the one or more external applications; receiving a request from a first application; retrieving the first rule set associated with the first application; and applying A first set of rules to control interactions between the first application and the database.

在本发明的另一个方面中，提供了一种用于控制数据库内响应于来自一个或多个外部应用的数据捕捉的深度的方法。该方法可以包括：确立和存储多个规则集合，其中每一个与一个或多个外部应用中的一个相关；多个规则集合中的每一个包括要从数据库捕捉的数据的列表；接收来自第一应用的请求；检索与第一应用相关的第一规则集合；以及应用第一规则集合以限制第一应用可从数据库获得数据。In another aspect of the invention, a method is provided for controlling depth within a database in response to data capture from one or more external applications. The method may include: establishing and storing a plurality of rule sets, each of which is associated with one of the one or more external applications; each of the plurality of rule sets includes a list of data to be captured from a database; A request by an application; retrieving a first set of rules related to the first application; and applying the first set of rules to limit data available to the first application from the database.

在本发明的另一个方面中，提供了一种数据结构，其可以包括链接主要表格和一个或多个次要表格的数据库，所述表格中的每一个共享共同的数据结构；所述数据库被数据库管理系统所控制，该数据库管理系统被配置成将主要表格和一个或多个次要表格中的一个或多个变换成稀疏矩阵链接列表。数据库可以包括一个或多个互连的关系数据库。数据库管理系统可以包括接口和验证模块。接口可以控制一个或多个外部应用对数据库的访问。数据库管理系统可以被配置成将数据从主观表示转换成优选表示。In another aspect of the invention there is provided a data structure which may comprise a database linking a primary table and one or more secondary tables, each of said tables sharing a common data structure; said database being Controlled by a database management system configured to transform one or more of the primary table and the one or more secondary tables into a sparse matrix linked list. A database may include one or more interconnected relational databases. The database management system can include interface and authentication modules. An interface can control access to the database by one or more external applications. A database management system can be configured to convert data from a subjective representation to a preferred representation.

这些和其他目的由所公开的装置、方法和系统来实现，并且将从以下结合附图对优选实施例的详细描述中显现出来，附图中类似的标记指示类似的元件。These and other objects are achieved by the disclosed apparatus, method and system, and will appear from the following detailed description of the preferred embodiments, taken in conjunction with the accompanying drawings, in which like numerals indicate like elements.

附图说明 Description of drawings

通过结合附图理解以下描述，可以更容易地理解本发明，附图中：The present invention can be more easily understood by reading the following description in conjunction with the accompanying drawings, in which:

图1是根据本发明一个实施例的地址超集的框图。Figure 1 is a block diagram of an address superset according to one embodiment of the present invention.

图2是根据本发明一个实施例的通用数据集的框图。Figure 2 is a block diagram of a generic dataset according to one embodiment of the invention.

图3是根据本发明一个实施例的系统体系结构的图示。Figure 3 is a diagram of a system architecture according to one embodiment of the present invention.

图4是根据本发明一个实施例的独立服务模式的框图。FIG. 4 is a block diagram of an independent service mode according to one embodiment of the present invention.

图5是根据本发明一个实施例的数据表格的图示。Figure 5 is an illustration of a data table according to one embodiment of the present invention.

图6是根据本发明一个实施例的表格中的值图示。Figure 6 is an illustration of values in a table according to one embodiment of the invention.

图7是根据本发明一个实施例的链接的框图。Figure 7 is a block diagram of a link according to one embodiment of the invention.

图8是根据本发明一个实施例的链接列表的框图。Figure 8 is a block diagram of a linked list according to one embodiment of the present invention.

图9是根据本发明一个实施例的地址数据的表格。Figure 9 is a table of address data according to one embodiment of the present invention.

图10是根据本发明一个实施例的包含级别和节点的图示。Figure 10 is a diagram of containment levels and nodes according to one embodiment of the present invention.

图11是根据本发明一个实施例的具有令处于的地址数据的表格。Figure 11 is a table with address data for let in, according to one embodiment of the present invention.

图12是根据本发明一个实施例的匹配模块的流程图。Fig. 12 is a flowchart of a matching module according to one embodiment of the present invention.

图13是根据本发明一个实施例的别名数据的表格。Figure 13 is a table of alias data according to one embodiment of the present invention.

具体实施方式 Detailed ways

现在参考附图，在若干视图中，附图中类似的标记是指类似的元件。Referring now to the drawings, like numerals refer to like elements throughout the several views.

1.引论1. Introduction

本申请中使用的术语“计算机组件”是指与计算机相关的实体，不论是硬件、固件、软件、其组合还是执行中的软件。例如，计算机组件可以是但不限于是运行在处理器上的进程、处理器本身、对象、可执行程序、执行线程、程序、服务器和计算机。例如，运行在服务器上的应用和服务器本身都可以被称为计算机组件。一个或多个计算机组件可以驻留在进程内和/或执行线程内，并且计算机组件可以被本地化在单个计算机上，和/或分布在两个或更多个计算机之间。The term "computer component" as used in this application refers to a computer-related entity, whether hardware, firmware, software, a combination thereof, or software in execution. For example, a computer component can be, but is not limited to being, a process running on a processor, the processor itself, an object, an executable, a thread of execution, a program, a server, and a computer. For example, both an application running on a server and the server itself can be referred to as a computer component. One or more computer components can reside within a process and/or thread of execution and a computer component can be localized on a single computer and/or distributed between two or more computers.

这里所使用的“计算机通信”是指两个或更多个计算机组件之间的通信，并且例如可以是网络传送、文件传送、applet传送、电子邮件、超文件传送协议(HTTP)消息、数据报、对象传送、二进制大型对象(BLOB)传送等等。计算机通信例如可以发生在无线系统(例如IEEE 802.11)、以太网系统(例如IEEE 802.3)、令牌环系统(例如IEEE 802.5)、局域网(LAN)、广域网(WAN)、点到点系统、电路交换系统、分组交换系统等等之一。"Computer communication" as used herein refers to communication between two or more computer components, and may be, for example, network transfers, file transfers, applet transfers, electronic mail, Hyperfile Transfer Protocol (HTTP) messages, datagram , object transfers, binary large object (BLOB) transfers, and more. Computer communications can occur, for example, over wireless systems (such as IEEE 802.11), Ethernet systems (such as IEEE 802.3), token ring systems (such as IEEE 802.5), local area networks (LANs), wide area networks (WANs), point-to-point systems, circuit switched system, packet switching system, etc.

这里所使用的“逻辑”包括但不限于硬件、固件、软件和/或其中每一个的组合，以执行一个或多个功能或动作。例如，基于所需的应用或需求，逻辑可以包括软件控制的微处理器、诸如专用集成电路(ASIC)这样的分立逻辑或其他编程的逻辑设备。逻辑也可以完全实现为软件。"Logic" as used herein includes, but is not limited to, hardware, firmware, software, and/or a combination of each to perform one or more functions or actions. For example, based on the desired application or requirements, the logic may include a software controlled microprocessor, discrete logic such as an application specific integrated circuit (ASIC), or other programmed logic devices. Logic may also be fully implemented as software.

这里所使用的“信号”包括但不限于一个或多个电信号或光信号、模拟或数字、一个或多个计算机指令、比特或比特流等等。A "signal" as used herein includes, but is not limited to, one or more electrical or optical signals, analog or digital, one or more computer instructions, bits or bit streams, and the like.

这里所使用的“软件”包括但不限于致使计算机、计算机组件和/或其他电子设备以所需的方式执行功能、动作和/或行为的一个或多个计算机可读和/或可执行指令。指令可以以多种形式实现，例如例程、算法、存储的规程、模块、方法、线程和/或程序。软件也可以实现为多种可执行和/或可加载形式，其中包括但不限于独立程序、函数调用(本地和/或远程)、servelet、applet、存储在存储器中的指令、操作系统或浏览器的一部分，等等。应当意识到，计算机可读和/或可执行指令可以位于一个计算机组件上和/或分布在两个或多个通信的、协作的和/或并行处理的计算机组件之间，从而可以以串行、并行、海量并行和其他方式被加载和/或执行。本领域的普通技术人员将会意识到，软件的形式例如可以取决于所需应用的要求、它在其中运行的环境和/或设计者或编程者的希望等等。"Software" as used herein includes, but is not limited to, one or more computer-readable and/or executable instructions that cause a computer, computer component, and/or other electronic device to function, act, and/or behave in a desired manner. Instructions can be implemented in various forms such as routines, algorithms, stored procedures, modules, methods, threads and/or programs. Software can also be implemented in a variety of executable and/or loadable forms, including but not limited to stand-alone programs, function calls (local and/or remote), servelets, applets, instructions stored in memory, operating systems or browsers part of , and so on. It should be appreciated that computer readable and/or executable instructions may be located on one computer component and/or distributed between two or more communicating, cooperating and/or parallel processing computer components so that serial , parallel, massively parallel and otherwise are loaded and/or executed. Those of ordinary skill in the art will appreciate that the form of the software may depend, for example, on the requirements of the desired application, the environment in which it operates, and/or the wishes of the designer or programmer, among others.

“可操作连接”(或实体通常它“可操作地连接”的连接)是这样一个连接，在该连接中，信号、物理通信流和/或逻辑通信流可以被发送和/或接收。通常，可操作连接包括物理接口、电气接口和/或数据接口，但是要注意，可操作连接可以由足以允许可操作控制的这些或其他类型的连接的不同组合构成。An "operable connection" (or a connection to which an entity generally is "operably connected") is a connection in which signals, physical communications and/or logical communications can be sent and/or received. Generally, an operable connection includes a physical interface, an electrical interface, and/or a data interface, but it is to be noted that an operable connection may consist of various combinations of these or other types of connections sufficient to permit operable control.

这里所使用的“数据库”是指可以存储数据的物理和/或逻辑实体。数据库例如可以是以下的一个或多个：数据存储、关系数据库、表格、字段、列表、队列、堆等等。数据库可以驻留在一个逻辑和/或物理实体上，以及/或者可以被分布在两个或更多个逻辑和/或物理实体之间。A "database" as used herein refers to a physical and/or logical entity that can store data. A database can be, for example, one or more of the following: data store, relational database, table, field, list, queue, heap, and the like. A database can reside on one logical and/or physical entity and/or can be distributed between two or more logical and/or physical entities.

术语“模糊(fuzzy)”或“朦胧(blurry)”是指处理部分真实性的布尔逻辑的超集；换言之，“完全真实”和“完全虚假”之间的真值。任何特定理论或系统都可以被从离散的或明晰的形式归纳为连续的或模糊的形式。基于模糊逻辑或模糊匹配的系统可以使用具有与概率类似的各种程度的真值，只不过真实程度不需要总和为1。就向字母数字字符串应用模糊匹配而言，真值例如可以被表达为串中的匹配字符的数目。The terms "fuzzy" or "blurry" refer to a superset of Boolean logic that deals with partial truths; in other words, truth values between "completely true" and "completely false". Any given theory or system can be reduced from a discrete or explicit form to a continuous or fuzzy form. Systems based on fuzzy logic or fuzzy matching can use degrees of truth that are similar to probabilities, except that the degrees of truth need not sum to one. In the case of applying fuzzy matching to alphanumeric strings, the truth value can be expressed, for example, as the number of matching characters in the string.

这里所描述的系统、方法和对象例如可以被存储在计算机可读介质上。介质可以包括但不限于ASIC、CD、DVD、RAM、ROM、PROM、盘、载波、存储条等等。从而，示例性计算机可读介质可以存储用于管理传输资源的方法的计算机可执行指令。该方法包括基于从基于经验的传播数据库检索出的分析数据来计算传输资源的路线。该方法还包括接收来自传输资源的实时数据并基于实时数据与分析数据的综合来更新传输资源的路线。The systems, methods and objects described herein can be stored, for example, on computer readable media. The media may include, but is not limited to, ASICs, CDs, DVDs, RAM, ROMs, PROMs, disks, carrier waves, memory sticks, and the like. Thus, an exemplary computer-readable medium may store computer-executable instructions for a method of managing transport resources. The method includes calculating a route for a transmission resource based on analytical data retrieved from an empirically based dissemination database. The method also includes receiving real-time data from the transmission resource and updating the routing of the transmission resource based on the combination of the real-time data and the analytical data.

将会意识到，系统的过程和方法中的某些或全部涉及可能是动态或柔性过程的电子和/或软件应用，以便它们能够以不同于这里所描述的顺序的其他顺序被执行。本领域的普通技术人员还将意识到，实现为软件的元件可以用各种编程方法来实现，例如机器语言、程序性的技术、面向对象技术和/或人工智能技术。It will be appreciated that some or all of the processes and methods of the system involve electronic and/or software applications which may be dynamic or flexible processes such that they can be executed in other sequences than those described herein. Those of ordinary skill in the art will also appreciate that elements implemented as software may be implemented using various programming methods, such as machine language, procedural techniques, object-oriented techniques, and/or artificial intelligence techniques.

这里所描述的处理、分析和/或其他功能也可以通过诸如数字信号处理器电路、软件控制的微处理器或专用集成电路之类的功能上等同的电路来实现。实现为软件的组件不局限于任何特定的编程语言。更确切地说，这里的描述提供了本领域的技术人员可以用来制造电路或生成计算机软件以执行本发明的处理的信息。将会意识到，本系统和方法的功能和/或行为中的某些或全部可以实现为如上定义的逻辑。The processing, analysis and/or other functions described herein may also be implemented by functionally equivalent circuits such as digital signal processor circuits, software controlled microprocessors or application specific integrated circuits. Components implemented as software are not limited to any particular programming language. Rather, the description herein provides information that one skilled in the art can use to fabricate circuits or generate computer software to perform the processes of the invention. It will be appreciated that some or all of the functionality and/or behavior of the present systems and methods may be implemented as logic as defined above.

此外，就术语“includes(包括)”在详细描述或权利要求书中被使用的程度而言，它想要具有与术语“comprising(包括)”相类似的包含性，因为该术语在被使用时被解释为权利要求中的过渡性单词。此外，就术语“or(或)”在权利要求书中被使用的程度而言(例如A或B)，它是想要指“A或B或两者”。当作者想要指示“只有A或B但不是两者时”，作者将会采用短语“A或B但不是两者”。从而，这里对术语“或”的使用是包含性使用，而不是排除性使用。见Bryan A.Garner，A Dietionary of Modern Legal Usage 624(2d ed.1995)。Furthermore, to the extent the term "includes" is used in the detailed description or claims, it is intended to have a similar inclusiveness to the term "comprising" in that the term is used when to be construed as transitional words in the claims. Also, to the extent the term "or" is used in the claims (eg, A or B), it is intended to mean "A or B or both". When the author wants to indicate "only A or B but not both", the author will employ the phrase "A or B but not both". Thus, use of the term "or" herein is inclusive, not exclusive. See Bryan A. Garner, A Dietionary of Modern Legal Usage 624 (2d ed. 1995).

2.示例性实施例2. Exemplary embodiment

在这里，通常是以示例方式在本发明的系统作为地址管理系统的用途的场境中来描述本发明的系统的。虽然可以相当详细地描述与地址相关的示例，但是申请人的意图不是将本发明的范围限定或以任何方式限制到这种细节。创造性系统的更多用途、应用、优点和修改对于本领域的技术人员来说都是易于看出的。因此，本发明就其较宽的方面的而言并不局限于所示出和描述的具体细节、代表性装置和说明性示例。因此，可以脱离这种细节，而不会脱离一般创造性概念的精神或范围。The system of the present invention is described herein generally by way of example in the context of its use as an address management system. While examples relating to addresses may be described in considerable detail, it is not the applicant's intent to limit or in any way limit the scope of the invention to such detail. Further uses, applications, advantages and modifications of the inventive system will be readily apparent to those skilled in the art. Therefore, the invention in its broader aspects is not limited to the specific details, the representative apparatus, and illustrative examples shown and described. Accordingly, departures may be made from such details without departing from the spirit or scope of the general inventive concept.

现参考附图描述示例性装置、方法、系统、过程等等，在所有附图中类似的标记都被用来指类似的元件。在以下描述中，出于说明目的，阐述了许多具体细节以帮助全面理解装置、方法、系统、过程等等。但是，显而易见的是，装置、方法、系统、过程等等可以在没有这些具体细节的情况下实现。在其他情况下，公知的结构和设备是以框图形式示出的，以便简化描述。Exemplary apparatuses, methods, systems, processes, etc. are now described with reference to the drawings, wherein like numerals are used to refer to like elements throughout. In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the apparatus, methods, systems, processes, etc. It may be evident, however, that an apparatus, method, system, process, etc., may be practiced without these specific details. In other instances, well-known structures and devices are shown in block diagram form in order to simplify the description.

3.数据结构：超集3. Data Structure: Superset

3.1数据超集3.1 Data superset

在一个实施例，如图2所示，本发明的系统可以包括数据超集30。数据超集30可以包括四个或更多个离散的关系数据库31-35(包括数据库1、2、3、4、....、N，如图所示)。数据库31-35可以连接到数据库链接36的网络中的其他数据库。在一个实施例中，数据库31-35之一可以被指定为主要数据库，其他的可以被指定为次要数据库。若干个关系数据库31-35可以一起受数据库管理系统的控制，以创建能够在多个关系数据库表格上以有序的方式存储大量数据并执行复杂查询的单个数据超集30。In one embodiment, the system of the present invention may include a data superset 30 as shown in FIG. 2 . The data superset 30 may include four or more discrete relational databases 31-35 (including databases 1, 2, 3, 4, . . . , N, as shown). Databases 31-35 may be connected to other databases in the network of database links 36. In one embodiment, one of the databases 31-35 may be designated as the primary database and the other may be designated as the secondary database. Several relational databases 31-35 can be controlled together by a database management system to create a single data superset 30 capable of storing large amounts of data in an ordered fashion and executing complex queries on multiple relational database tables.

关系数据库31-35可以包含表格40的集合(包括表格A、B、C、...、N，如图所示)。表格40可以包含数据字段44的集合(包括字段1、字段2、字段3、字段n，如图所示)。可以以关系数据库领域中已知的方式利用一个或多个关键字48将表格40链接在一起。Relational databases 31-35 may contain a collection of tables 40 (including tables A, B, C, . . . , N, as shown). Table 40 may contain a collection of data fields 44 (including field 1, field 2, field 3, field n, as shown). Tables 40 may be linked together using one or more keywords 48 in a manner known in the relational database art.

在一个实施例中，每个数据库31-35可以具有共同的数据结构。在该方面中，每个关系数据库31-35可以包括相同数目的表格40，并且每个表格可以包括相同数目的字段44。数据超集30中的各种表格40之间的共同数据结构可以提供允许任何类型的数据的存储和处理的某种程度的灵活性。In one embodiment, each database 31-35 may have a common data structure. In this aspect, each relational database 31 - 35 may include the same number of tables 40 and each table may include the same number of fields 44 . A common data structure among the various tables 40 in the data superset 30 may provide a degree of flexibility that allows for the storage and processing of any type of data.

在一个实施例中，共同数据结构可以包括基于所存储的数据的值，按从笼统到具体的一系列级别，以分级顺序排列一个或多个表格40中的记录，如下文更详细描述。共同数据结构还可以包括将表格40存储为稀疏矩阵链接列表。In one embodiment, the common data structure may include arranging the records in one or more tables 40 in a hierarchical order at a range of levels from general to specific based on the value of the stored data, as described in more detail below. The common data structure may also include storing the table 40 as a sparse matrix linked list.

3.2地址超集3.2 Address superset

数据超集的一个示例性实施例在图1中示出。地址超集130可以包括若干个离散的关系数据库，在一个实施例中这些数据库包括邮政数据库131、运输公司数据库132、标准数据库133和计划数据库134。如图所示，数据库131-134可以连接到数据库链接36的网络中的其他数据库，以形成地址超集130。关系数据库131-134可以由地址数据库管理系统所控制。An exemplary embodiment of a data superset is shown in FIG. 1 . Address superset 130 may include several discrete relational databases, including postal database 131 , shipping company database 132 , standard database 133 and planning database 134 in one embodiment. As shown, databases 131 - 134 may be connected to other databases in the network of database links 36 to form address superset 130 . The relational databases 131-134 may be controlled by an address database management system.

数据库131-134可以包含数据表格140的集合，在一个实施例中，这些表格包括优选表格141、街道别名表格142和收存人别名表格143，如以下更详细描述。优选表格141还可以包括一个或多个用于存储令牌的字段，以充当特定记录的唯一标识符。表格141、142、143可以包含数据字段44的集合(其中包括字段1、字段2、字段3、...、字段n，如图所示)。可以以关系数据库领域已知的方式利用一个或多个关键字48将表格141、142、143链接在一起。Databases 131-134 may contain a collection of data tables 140, which in one embodiment include a preference table 141, a street alias table 142, and a depositor alias table 143, as described in more detail below. Preferably table 141 may also include one or more fields for storing tokens to serve as unique identifiers for a particular record. Tables 141, 142, 143 may contain a collection of data fields 44 (including Field 1, Field 2, Field 3, . . . , Field n, as shown). The tables 141, 142, 143 may be linked together using one or more keys 48 in a manner known in the relational database art.

在一个实施例中，每个数据库131-134可以具有共同的数据结构。在该方面中，每个关系数据库131-134可以包括相同数目的表格141-143，并且每个表格可以包括相同数目的字段44。地址数据超集130中的各种表格之间的共同数据结构可以提供允许任何类型的数据的存储和处理的某种程度的灵活性。在一个实施例中，共同数据结构可以包括基于所存储的地址数据的值，按从笼统到具体的一系列级别，以分级顺序排列一个或多个表格中的记录，如下文更详细描述。共同数据结构还可以包括将表格存储为或重新格式化为稀疏矩阵链接列表。In one embodiment, each database 131-134 may have a common data structure. In this aspect, each relational database 131 - 134 may include the same number of tables 141 - 143 , and each table may include the same number of fields 44 . A common data structure between the various tables in the address data superset 130 may provide a degree of flexibility that allows storage and processing of any type of data. In one embodiment, the common data structure may include arranging records in one or more tables in a hierarchical order at a range of levels from general to specific based on the value of the stored address data, as described in more detail below. Common data structures may also include storing or reformatting tables as sparse matrix linked lists.

4.系统体系结构4. System architecture

图3是根据本发明一个实施例的系统10的表示图。系统10可以包括按多层服务器-客户端关系分布的基础设施服务器25、一个或多个计算机网络、应用服务器200以及一个或多个客户端655。一个或多个计算机网络帮助基础设施服务器25、应用服务器200和一个或多个客户端255之间的通信。一个或多个计算机网络可以包括多种类型的计算机网络，例如互联网、专用内联网、专用外联网、公共交换电话网(PSTN)、广域网(WAN)、局域网(LAN)或本领域中已知的任何其他类型的网络。FIG. 3 is a representation of system 10 according to one embodiment of the invention. System 10 may include infrastructure server 25 , one or more computer networks, application server 200 , and one or more clients 655 distributed in a multi-tiered server-client relationship. One or more computer networks facilitate communication between infrastructure server 25 , application server 200 and one or more clients 255 . The one or more computer networks may include various types of computer networks such as the Internet, private intranets, private extranets, public switched telephone networks (PSTNs), wide area networks (WANs), local area networks (LANs), or other known in the art any other type of network.

如图3所示，主AMS服务器510可以驻留在基础设施服务器25上。诸如AMS GUI 324这样的图形用户接口可以与主AMS服务器510通信，如图所示。As shown in FIG. 3 , the main AMS server 510 may reside on the infrastructure server 25 . A graphical user interface such as the AMS GUI 324 can communicate with the main AMS server 510, as shown.

在一个实施例中，系统10中的下一层可以包括若干个AMS客户端655和次AMS服务器520。AMS客户端655中的某些可以包括用于一个或多个用户28的数据捕捉工作站155和GUI 26。在一个实施例中，应用服务器200可以驻留在AMS客户端655上。In one embodiment, the next level in system 10 may include several AMS clients 655 and secondary AMS servers 520 . Some of the AMS clients 655 may include a data capture workstation 155 and GUI 26 for one or more users 28. In one embodiment, the application server 200 may reside on the AMS client 655 .

在一个实施例中，从次AMS服务器520往下，下一层可以包括若干个AMS客户端655，其中每一个包括用于一个或多个用户28的数据捕捉工作站155和GUI 26。In one embodiment, the next tier down from the secondary AMS server 520 may include several AMS clients 655, each of which includes a data capture workstation 155 and GUI 26 for one or more users 28.

在示例性实施例中，基础设施服务器25可以包括中央处理器，该中央处理器经由系统接口或总线与基础设施服务器25内的其他元件通信。基础设施服务器25还包括输入和显示设备，用于接收和显示数据。输入和显示设备例如可以是与监视器结合使用的键盘或指示设备。基础设施服务器25还可以包括存储器，该存储器可以包括只读存储器(ROM)和随机访问存储器(RAM)。ROM可以用于存储基本输入/输出系统(BIOS)，其包含帮助在基础设施服务器25的元件之间传送信息的基本例程。In an exemplary embodiment, infrastructure server 25 may include a central processor that communicates with other elements within infrastructure server 25 via a system interface or bus. The infrastructure server 25 also includes input and display devices for receiving and displaying data. The input and display device may be, for example, a keyboard or pointing device used in conjunction with a monitor. The infrastructure server 25 may also include memory, which may include read only memory (ROM) and random access memory (RAM). ROM may be used to store a Basic Input/Output System (BIOS), which contains the basic routines that help transfer information between elements of infrastructure server 25 .

此外，基础设施服务器25可以包括至少一个存储设备，例如硬盘驱动器、软盘驱动器、CD-ROM驱动器或光盘驱动器，用于将信息存储在各种计算机可读介质上，例如硬盘、可移动磁盘或CD-ROM盘。这些类型的存储设备中的每一个可以通过适当接口连接到系统总线。存储设备以及与其相关联的计算机可读介质可以提供非易失性存储。注意到以下这一点是很重要的：上述计算机可读介质可以由本领域中已知的任何其他类型的计算机可读介质所取代。这种介质例如可以包括磁带、闪存卡、数字视频盘和Bernoulli盒式磁带。Additionally, infrastructure server 25 may include at least one storage device, such as a hard drive, floppy drive, CD-ROM drive, or optical drive, for storing information on various computer-readable media, such as hard drives, removable disks, or CDs. -ROM disk. Each of these types of storage devices can be connected to the system bus through an appropriate interface. Storage devices and their associated computer-readable media can provide non-volatile storage. It is important to note that the computer readable media described above may be replaced by any other type of computer readable media known in the art. Such media may include magnetic tapes, flash memory cards, digital video disks and Bernoulli cassettes, for example.

多个程序模块可以被RAM内的各种存储设备所存储。这种程序模块包括操作系统和一个或多个应用。同样位于基础设施服务器25内的还可以有网络接口，用于与计算机网络的其他元件接口和通信。基础设施服务器25的一个或多个组件从地理上而言可以是远离其他处理组件的。此外，一个或多个组件可以被组合。基础设施服务器25可以包括用于执行这里的功能的附加的组件。Multiple program modules can be stored by various storage devices in RAM. Such program modules include an operating system and one or more applications. Also located within the infrastructure server 25 may also be a network interface for interfacing and communicating with other elements of the computer network. One or more components of infrastructure server 25 may be geographically remote from other processing components. Additionally, one or more components may be combined. Infrastructure server 25 may include additional components for performing the functions herein.

4.1数据库管理系统(DBMS)4.1 Database Management System (DBMS)

再次参考图3，根据本发明的一个实施例，数据库管理系统(DBMS)可以驻留在主AMS服务器510(基础设施服务器25)、应用服务器200或次AMS服务器520上。DBMS可以包括接口600和程序套组500，与图4所示的AMS 110类似。Referring again to FIG. 3, a database management system (DBMS) may reside on primary AMS server 510 (infrastructure server 25), application server 200, or secondary AMS server 520, according to one embodiment of the present invention. The DBMS may include an interface 600 and a program suite 500, similar to the AMS 110 shown in FIG. 4 .

例如，可以在数据库管理系统(DBMS)作为地址管理系统(AMS)110的用途的场景中描述本发明的数据库管理系统(DBMS)。与DBMS类似，AMS 110可以驻留在主AMS服务器510(基础设施服务器25)、应用服务器200或次AMS服务器520上。在一个实施例中，AMS 110可以包括接口600和程序套组500，如图4所示。For example, the database management system (DBMS) of the present invention may be described in the context of the use of the database management system (DBMS) as the address management system (AMS) 110 . Similar to a DBMS, AMS 110 may reside on a primary AMS server 510 (infrastructure server 25), an application server 200, or a secondary AMS server 520. In one embodiment, AMS 110 may include interface 600 and program suite 500, as shown in FIG. 4 .

图4是根据本发明一个实施例的系统10的框图，其示出在独立服务模式640中操作的AMS 110。如图所示的系统10包括计算机15，它通过AMS GUI 324提供对一个或多个用户28的访问。FIG. 4 is a block diagram of system 10 showing AMS 110 operating in standalone service mode 640, according to one embodiment of the invention. The system 10 as shown includes a computer 15 that provides access to one or more users 28 through the AMS GUI 324.

4.2地址管理系统(AMS)4.2 Address Management System (AMS)

地址管理系统(AMS)110可以被特别设计为用于控制地址数据超集130中的数据的组织、存储和检索，并且用于控制地址超集130及其组件数据库的安全性和完整性。接口600可以被配置为用于接收和处理从外部应用(未示出)接收到的对数据的请求。在一个实施例中，接口600可以是具有创建、读取、更新和删除记录的能力的基于COM的接口。接口600还可以包括查询功能，用于对存储在地址超集130中的数据执行操作。Address Management System (AMS) 110 may be specifically designed to control the organization, storage and retrieval of data in Address Data Superset 130, and to control the security and integrity of Address Superset 130 and its component databases. The interface 600 may be configured to receive and process a request for data received from an external application (not shown). In one embodiment, interface 600 may be a COM-based interface with the ability to create, read, update and delete records. Interface 600 may also include query functionality for performing operations on data stored in address superset 130 .

5.找出优选表示(preferred representation)5. Find the preferred representation

在一个实施例中，本发明的系统10可以包括用于数据超集30的数据库管理系统(DBMS)。DBMS也可以用作用于包括地址数据在内的任何类型的数据的数据库管理系统。在地址数据的场境中，DBMS可以被称为地址管理系统(AMS)110。在任何情况下，管理系统110都可以包括接口600和程序套组500。In one embodiment, the system 10 of the present invention may include a database management system (DBMS) for the data superset 30 . A DBMS can also be used as a database management system for any type of data including address data. In the context of address data, the DBMS may be referred to as an address management system (AMS) 110 . In any case, management system 110 may include interface 600 and program suite 500 .

在一个实施例中，程序套组500都可以包括一个或多个计算机软件程序，用于接收“主观表示(subjective representation)”的原始数据，通过用接口600执行一个或多个查询来分析存储在数据库中的值，并且产生“优选表示”的输出数据。In one embodiment, program suite 500 may each include one or more computer software programs for receiving raw data in a "subjective representation" and analyzing it by performing one or more queries with interface 600 stored in values in the database, and produce a "preferred representation" of the output data.

这里所使用的术语“主观表示”是指可能对数据拥有个人理解的人输入或提交的原始数据。主观表示往往是含糊的或不完整的，这在执行计算步骤需要原始数据时就可能成问题了。例如，某个人可以利用输入主观表示“12-4-63”输入生日。在美国，这个日期可能是指“12月4日”，而在欧洲它可能表示“4月12日”。计算机组件可能把年份解释为1963年或63年。这些含糊性对于原始数据的准确度具有严重影响。为了去除含糊性和不完整性，程序套组500可以被设计成将主观表示转换成“优选表示”。这种程序套组500例如可以包括用于确定用户是在以美国格式还是在以欧洲格式输入日期的系统或查询。程序套组500还可以包括除非用户输入四位年份否则总将0作为所有输入的年份的默认世纪的规则或逻辑例程。设计和构建程序套组500要求关于特定系统中预期的原始数据的类型和格式的先见和计划。The term "subjective representation" as used herein refers to raw data entered or submitted by a person who may have a personal understanding of the data. Subjective representations are often ambiguous or incomplete, which can be problematic when raw data are required to perform computational steps. For example, a person may enter a birthday using the input subjective representation "12-4-63". In the United States, this date might mean "December 4th," while in Europe it might mean "April 12th." Computer components may interpret the year as 1963 or 63. These ambiguities have serious implications for the accuracy of the raw data. To remove ambiguity and incompleteness, the program suite 500 may be designed to convert subjective representations into "preferred representations." Such program suite 500 may include, for example, a system or query for determining whether a user is entering a date in US or European format. The program suite 500 may also include rules or logic routines that always use 0 as the default century for all entered years unless the user enters a four-digit year. Designing and building program suite 500 requires foresight and planning regarding the types and formats of raw data expected in a particular system.

主观表示可以被程序套组500处理成一般与原始数据无关的优选表示。例如，顾客可以利用主观表示“Acme LX-709 Color”来定购打印机墨盒，其中Acme是打印机制造商，LX-709是打印机的型号，并且想要彩色墨。在用于处理打印机墨盒定单的系统中，例如，可以用十位墨盒序列号来对墨盒进行编目和存储。序列号不与原始数据中的文本和数字直接相关；但是，序列号是要打印在定购单上的“优选表示”，从而销售商可以定位和装运所需的墨盒。为了将主观原始数据匹配到正确的序列号，程序套组500可以被写成解释由顾客提交的任何种类的可能的指示符。假定每个墨盒序列号的前四位对应于构建能够使用该类墨盒的机器的打印机制造商的列表。程序套组500可以包括存储的规程，用于将输入的打印机制造商名称与列表中的名称相比较，并找出墨盒序列号的相应的前四位数字。这代表了找出打印在定购单上的十位序列号的第一步骤。The subjective representation can be processed by the suite of programs 500 into a preferred representation that is generally independent of the original data. For example, a customer may order a printer cartridge using the subjective representation "Acme LX-709 Color," where Acme is the printer manufacturer, LX-709 is the printer model, and wants color ink. In a system for processing printer cartridge orders, for example, cartridges may be cataloged and stored with a ten-digit cartridge serial number. The serial number is not directly related to the text and numbers in the original data; however, the serial number is the "preferred representation" to be printed on the purchase order so that the seller can locate and ship the required ink cartridges. In order to match the subjective raw data to the correct serial number, the program suite 500 can be written to account for any kind of possible indicator submitted by the customer. It is assumed that the first four digits of each cartridge's serial number correspond to a list of printer manufacturers who built machines capable of using that type of cartridge. The program suite 500 may include stored procedures for comparing the entered printer manufacturer name with the names in the list and finding the corresponding first four digits of the ink cartridge serial number. This represents the first step in locating the ten-digit serial number printed on the order form.

主观表示的另一个示例是常见的街道地址。在邮件单上，某个人可以写下主观表示“Doe，123 East Main Street N.W.Suite A-4，Atl30030”。地址的几个部分是含糊的或不完全的，其中包括收件人“Doe”，缩写“Atl”和缺少的州名称。如果该数据将要被计算机或分类设备所处理，则这些含糊性会导致邮件单的丢失、延迟或不正确的投递。为了去除含糊性和不完整性，程序套组500可以被设计成将主观表示转换成优选表示。这种程序套组500例如可以包括程序或存储的规程，用于将所写下的地址与商业上可获得的街道地址和ZIP代码的计算机数据库相比较。Another example of a subjective representation is a common street address. On a mailing list, someone can write the subjective representation "Doe, 123 East Main Street N.W. Suite A-4, Atl 30030". Several parts of the address were ambiguous or incomplete, including the addressee "Doe," the abbreviation "Atl," and the missing state name. These ambiguities can lead to lost, delayed or incorrect delivery of mailing lists if the data is to be processed by computers or sorting equipment. To remove ambiguity and incompleteness, the suite of procedures 500 may be designed to convert subjective representations into preferred representations. Such a program suite 500 may, for example, include a program or stored procedure for comparing a written address to a commercially available computer database of street addresses and ZIP codes.

上述示例谈及了属性或参数-日期、零件号码、地址。参数可以以多种格式来表征，其中包括以上所示的主观表示和取决于使用场境的其他表示。在一个实施例中，本发明的系统使用表列数据来操纵和修改表征参数的方式，如下文更详细描述。The examples above talked about attributes or parameters - date, part number, address. Parameters can be represented in a variety of formats, including the subjective representations shown above and others depending on the context of use. In one embodiment, the system of the present invention uses tabular data to manipulate and modify the manner in which characterizing parameters are described in more detail below.

在一个实施例中，本发明的数据库管理系统(DBMS)可以包括程序套组500，其可以包括以下通用规程中的一个或多个：(1)增强模块；(2)公布和预订模块；以及(3)匹配模块。程序套组500当然也可以包括附加的组件和规程，用于执行本申请中描述的其他功能。In one embodiment, the database management system (DBMS) of the present invention may include a program suite 500, which may include one or more of the following general procedures: (1) an enhancement module; (2) a publish and subscribe module; and (3) Matching module. Program suite 500 may of course also include additional components and procedures for performing other functions described in this application.

5.1增强模块5.1 Enhancement Module

在一个实施例中，本发明的程序套组500可以包括优化存储在数据超集30的数据库31-35中的数据的结构和顺序的增强模块。数据超集30中的数据库31-35可以包括数百万记录。在一个实施例中，可以通过优化数据结构来改进和加快对每个数据库31-35中的所有或大部分记录进行读取、更新和搜索的任务。In one embodiment, the program suite 500 of the present invention may include enhancement modules that optimize the structure and order of the data stored in the databases 31 - 35 of the data superset 30 . Databases 31-35 in data superset 30 may include millions of records. In one embodiment, the tasks of reading, updating and searching all or most of the records in each database 31-35 can be improved and accelerated by optimizing the data structure.

数据库表格包括大量的记录，占用了大量的存储器并需要很长的计算时间以进行分类、搜索和其他分析操作。增强或优化数据的一个简单例子是基于一个或多个属性(列)来分类记录，以按递增或递减的顺序放置记录。但是，对于具有多个属性的大表格而言，简单的记录分类不能显著产生时间节省或搜索效率。Database tables contain a large number of records, occupy a large amount of memory and require a long calculation time for sorting, searching and other analysis operations. A simple example of enhancing or optimizing data is sorting records based on one or more attributes (columns) to place records in increasing or decreasing order. However, for large tables with many attributes, simple record classification does not yield significant time savings or search efficiency.

在一个实施例中，程序套组500中的一种增强模块包括用于将数据库变换成稀疏矩阵链接列表的规程。链接列表包括被设计成将查询从一个字段引导到下一个字段的链接，其中有时使用链接来绕开或跳过不相关的字段。稀疏矩阵不包括后续记录中的重复字段值。不重复第一值，后续字段被留空，并且后续的值被假定为等于第一值，除非并且直到不同的值出现。In one embodiment, an enhancement module in program suite 500 includes a procedure for transforming a database into a sparse matrix linked list. Linked lists include links designed to direct queries from one field to the next, where links are sometimes used to bypass or skip irrelevant fields. A sparse matrix does not include repeated field values in subsequent records. The first value is not repeated, subsequent fields are left blank, and subsequent values are assumed to be equal to the first value unless and until a different value occurs.

例如，在图9中，ZIP代码字段包括十三个记录中的每一个中的重复性条目(ZIP代码20001)。在一个方面中，本发明的系统10使用稀疏矩阵的概念来消除重复性条目，从而节省了存储器并缩短了计算时间。例如，在图9中，节点1的ZIP代码可以被五位ZIP代码20001填充。在本发明的其中表格可以被变换成稀疏矩阵的系统10中，可以后续的ZIP代码字段为空或为零。在图9中，节点2至节点13的ZIP代码字段可以为空或为零；这些字段中的值可以被假定为是20001。For example, in FIG. 9, the ZIP code field includes a recurring entry (ZIP code 20001) in each of the thirteen records. In one aspect, the system 10 of the present invention uses the concept of a sparse matrix to eliminate repetitive entries, thereby saving memory and reducing computation time. For example, in Figure 9, the ZIP code for node 1 may be filled with the five-digit ZIP code 20001. In the system 10 of the present invention where a table can be transformed into a sparse matrix, the following ZIP code field can be empty or zero. In FIG. 9, the ZIP code fields of nodes 2 through 13 may be empty or zero; the value in these fields may be assumed to be 20001.

在稀疏矩阵中，记录序列中遇到的值被假定为保持相同，直到不同的值出现。由于这样一来可以消除许多重复的值，因此将表格或矩阵描述成是稀疏的。通过应用用于创建稀疏矩阵的规则，可以使表格中的任何属性成为稀疏的。In a sparse matrix, values encountered in a sequence of records are assumed to remain the same until a different value occurs. Since this eliminates many duplicate values, a table or matrix is described as sparse. Any attribute in the table can be made sparse by applying the rules used to create sparse matrices.

模型数据库表格40的一小部分在图5中所示。每一行包含单个记录42。可以通过参考行号和列号来定位每个字段44。例如，位于第2列第3行中的字段可以被描述成字段(3，2)，或者就简单地是(3，2)。这种字段命名约定在许多需要指向特定字段的数据库操作中都是有价值的。A small portion of the model database table 40 is shown in FIG. 5 . Each row contains a single record 42 . Each field 44 can be located by reference to row and column numbers. For example, a field located in column 2, row 3 could be described as field (3, 2), or simply (3, 2). This field naming convention is valuable in many database operations that need to refer to specific fields.

图6的表格40是稀疏矩阵的示例。例如，列2开始于行1中的值“Smith”，然后是随后的记录(行)中的零值。因此，认为在后续的行2、3、和4中列2的值为“Smith”。Table 40 of FIG. 6 is an example of a sparse matrix. For example, column 2 starts with the value "Smith" in row 1, followed by zero values in subsequent records (rows). Therefore, consider the value of column 2 to be "Smith" in subsequent rows 2, 3, and 4.

当表格被组织成链接列表时，字段的行列命名约束是有帮助的。在一类链接列表中，链接340可以包括字段44、值46以及一个或多个指针，如图7和图8所示。在图7所示的一类链接340中，包括了下一列(next-in-column)指针344以及下一行(next-in-row)指针342。指针344、342包括到下一个包含非零值的字段的指示。由于它们指向下一个字段(而不是上一个字段)，因此这些指针344、342被称为前向指针。某些类型的链接列表还包括后向指针，其具有指向上一个或前一个非零字段值的指示。在一个方面中，本发明的系统10可以只包括前向指针。Row and column naming constraints for fields are helpful when tables are organized as linked lists. In a type of linked list, a link 340 may include a field 44, a value 46, and one or more pointers, as shown in FIGS. 7 and 8 . A type of link 340 shown in FIG. 7 includes a next-in-column pointer 344 and a next-in-row pointer 342 . Pointers 344, 342 include an indication to the next field containing a non-zero value. These pointers 344, 342 are called forward pointers because they point to the next field (rather than the previous field). Certain types of linked lists also include back pointers, which have an indication of the previous or previous non-zero field value. In one aspect, the system 10 of the present invention may include only forward pointers.

图8是图6所示的稀疏矩阵值之间的链接340的表示。例如，第1列第4行的链接340中的指示将会迅速地把分析引导到位于第3列第4行中的下一个非零值。链接340中包含的指示允许诸如搜索查询这样的分析过程绕开或跳过稀疏矩阵中的空字段。通过跳过空字段，大大减少了搜索时间，从而更快地产生查询结果。FIG. 8 is a representation of links 340 between sparse matrix values shown in FIG. 6 . For example, an indication in link 340 at column 1, row 4 will quickly direct the analysis to the next non-zero value located in column 3, row 4. Indications contained in link 340 allow analysis procedures such as search queries to bypass or skip empty fields in the sparse matrix. By skipping empty fields, the search time is greatly reduced, resulting in faster query results.

在一个实施例中，包括增强模块的程序套组500可以用于将数据超集30中的任何表格变换成稀疏矩阵链接列表。存储为稀疏矩阵链接列表的数据超集30消耗的存储器可能少得多，因此可能更适合于作为复制超集330被分发到订户客户端255。当数据表格已经被变换成稀疏矩阵链接列表(SMLL)表格时，增强模块可以最后确定SMLL表格或以其他方式将其“包装”起来，以便将其准备好以供分发和供其他系统组件在别处使用。In one embodiment, the program suite 500 including enhancement modules can be used to transform any table in the data superset 30 into a sparse matrix linked list. A data superset 30 stored as a sparse matrix linked list may consume much less memory and thus may be more suitable for distribution to subscriber clients 255 as a replicated superset 330 . When a data table has been transformed into a Sparse Matrix Linked List (SMLL) table, the enhancement module can finalize or otherwise "wrap" the SMLL table so that it is ready for distribution and for other system components to store elsewhere use.

如图5-8所示，复制超集330可以驻留在系统10中的一个或多个客户端255上。在整个系统10内对复制超集330的传输或“公布”可以利用公布和预订模块来完成，如下所述。As shown in FIGS. 5-8 , replicated superset 330 may reside on one or more clients 255 in system 10 . The transmission or "publishing" of the replicated superset 330 throughout the system 10 can be accomplished using the publish and subscribe modules, as described below.

在一个实施例中增强模块还可以在新数据被添加时监视表格的状态，通过在必要时重复变换规程将表格维持在最优状态中，并且就表格的状态以及其被共享或被分发到订户客户端255的可用性与其他系统组件通信。在这个方面中，程序套组500的增强模块可以被配置成与其他系统组件交互和通信，以将数据表格维持在最优状态中，以便进行迅速和高效的搜索。In one embodiment the enhancement module can also monitor the state of the table as new data is added, maintain the table in an optimal state by repeating the transformation procedure as necessary, and provide feedback on the state of the table and how it is shared or distributed to subscribers Availability of client 255 communicates with other system components. In this regard, enhancement modules of program suite 500 may be configured to interact and communicate with other system components to maintain data tables in an optimal state for rapid and efficient searching.

5.2公布和预订模块5.2 Publish and subscribe modules

在一个实施例中，本发明的程序套组500可以包括公布和预订程序或规程，以控制和帮助在本发明的系统10的组件之间传送数据。如图3所示，系统10可以包括按服务器-客户端关系分布的基础设施服务器25、一个或多个计算机网络230、应用服务器200和一个或多个客户端255。In one embodiment, the program suite 500 of the present invention may include publish and subscribe programs or procedures to control and facilitate the transfer of data between components of the system 10 of the present invention. As shown in FIG. 3 , system 10 may include infrastructure server 25 , one or more computer networks 230 , application server 200 and one or more clients 255 distributed in a server-client relationship.

在服务器-客户端网络环境中，例如图5-9中所示的环境中，复制超集330可以驻留在系统10的一个或多个订户客户端255上。公布和预订模块可以被配置成监视和控制在整个系统10内向作为订户的客户端255公布复制超集330。In a server-client network environment, such as that shown in FIGS. 5-9 , replica superset 330 may reside on one or more subscriber clients 255 of system 10 . The publish and subscribe module may be configured to monitor and control the publication of the replication superset 330 to the clients 255 as subscribers throughout the system 10 .

5.3匹配模块5.3 Matching Module

在一个实施例中，本发明的程序套组500可以包括匹配模块85，其被配置成接收处于主观表示80的原始数据，利用接口600分析存储在数据超集30中的值以执行一个或多个查询，以及产生处于优选表示90的输出数据。示例性匹配模块85中的通用步骤被示为图12中的流程图。In one embodiment, the program suite 500 of the present invention may include a matching module 85 configured to receive the raw data in the subjective representation 80, analyze the values stored in the data superset 30 using the interface 600 to perform one or more queries, and produce output data in a preferred representation 90. The general steps in the exemplary matching module 85 are shown as a flowchart in FIG. 12 .

在一个实施例中，基于主观表示80查找和显示处于其优选表示90的数据的步骤可以包括以下一般功能：捕捉300、解析305、标准化310、验证320、更新380、组合390和发表395。本领域的技术人员可以理解，根据一个或多个特定算法，这些通用步骤不一定需要以这种顺序发生，并且必要时某些步骤可以重复。In one embodiment, the step of finding and displaying data in its preferred representation 90 based on the subjective representation 80 may include the following general functions: capture 300 , parse 305 , normalize 310 , validate 320 , update 380 , combine 390 and publish 395 . Those skilled in the art will appreciate that these general steps do not necessarily need to occur in this order, and that certain steps may be repeated as necessary, according to a particular algorithm or algorithms.

5.31.捕捉。在一个实施例中，被称为捕捉300的步骤可以包括捕捉或以其他方式接收主观表示80(输入数据)。5.31. Capture . In one embodiment, the step referred to as capturing 300 may include capturing or otherwise receiving subjective representation 80 (input data).

5.3.2.解析。在一个实施例中，被称为解析305的步骤可以包括将主观表示80解析成其组成部分。解析的任务一般包括将句子或字段串划分成其组成部分。例如，在街道地址的场境中，写在信封上的地址代表可以经由解析过程划分成许多不同成分或工件的主观表示80。解析算法或程序一般接收字符序列或字符串作为输入，然后应用规则集合以完成按类别的划分。5.3.2. Analysis . In one embodiment, a step referred to as parsing 305 may include parsing subjective representation 80 into its constituent parts. The task of parsing generally involves partitioning a sentence or field string into its constituent parts. For example, in the context of a street address, an address written on an envelope represents a subjective representation 80 that can be divided via a parsing process into many different components or artifacts. A parsing algorithm or program generally receives a character sequence or string as input, and then applies a set of rules to accomplish the classification by category.

主观表示80的一个示例是街道地址。例如，诸如“123 East MainStreet N.W.，Suite A-4”这样的美国街道地址可以包括多个离散的工件，其中包括数字(123)、预定向(East)、主名称(Main)、类型(Street)、后定向(NW)、次名称(Suite)、次号码(A-4)。还可以基于诸如城市、县和州这样的行政细分将街道地址解析成成分，或者例如可以基于ZIP+4代码将其解析成更细的细节或粒度级别。One example of a subjective representation 80 is a street address. For example, a U.S. street address such as "123 East MainStreet N.W., Suite A-4" can include several discrete artifacts including number (123), pre-direction (East), main name (Main), type (Street) , Back Orientation (NW), Secondary Name (Suite), Secondary Number (A-4). Street addresses can also be parsed into components based on administrative subdivisions such as city, county, and state, or they can be parsed into a finer level of detail or granularity based on ZIP+4 codes, for example.

例如，通过解析主观表示80并将其组成部分存储在表格的相分离的字段中，本发明的匹配模块85可以允许用户根据需求和应用以多种方式访问和总结(或“抽象化”)数据。例如，用户可以基于特定州的五位ZIP代码请求地址数据的总结或摘要。如果地址数据已被解析并且ZIP代码被存储在了离散的字段中，则基于ZIP代码抽象化数据的步骤包括相对简单的搜索和检索。将工件存储在相分离的字段中可以允许用户利用任何级别的抽象来搜索和检索数据。在这个方面中，本发明向具有各种需求的各种用户提供了巨大的灵活性。For example, by parsing the subjective representation 80 and storing its components in separate fields of a table, the matching module 85 of the present invention can allow users to access and summarize (or "abstract") the data in a variety of ways depending on needs and applications . For example, a user may request a summary or summary of address data based on a five-digit ZIP code for a particular state. If the address data has been parsed and the ZIP codes are stored in discrete fields, the step of abstracting the data based on the ZIP codes involves relatively simple search and retrieval. Storing artifacts in separate fields allows users to search and retrieve data with any level of abstraction. In this respect, the present invention provides great flexibility to various users with various needs.

5.3.3.标准化。在一个实施例中，被称为标准化310的步骤一般可以包括根据标准化规则集合对主观表示80进行重新格式化。标准化一般可以涉及主观表示80的许多特性，其中包括字型、间距、字样、标点、字段可能包括字母字符还是数字字符还是两者、字段长度、字段大小或容量和其他方面。5.3.3. Standardization . In one embodiment, a step referred to as normalizing 310 may generally include reformatting the subjective representation 80 according to a set of normalization rules. Standardization may generally involve many characteristics of subjective representation 80, including typeface, spacing, typeface, punctuation, whether a field may include alphabetic or numeric characters or both, field length, field size or capacity, and others.

例如，在街道地址的场境中，主观表示80可以被写作：For example, in the context of a street address, the subjective representation 80 could be written:

John DoeJohn Doe

123 East Main Street，N.W.123 East Main Street, N.W.

Oakland Center，Suite A-4Oakland Center, Suite A-4

Atlanta，Georgia 30030Atlanta, Georgia 30030

被称为标准化310的步骤可以更改以上主观表示80的字型、间距、标点和其他方面，从而使其在标准化之后看起来如下：A step called normalization 310 can alter the font, spacing, punctuation and other aspects of the above subjective representation 80 so that after normalization it looks as follows:

JOHN DOEJOHN DOE

123 E MAIN ST NW STE A4123 E MAIN ST NW STE A4

DECATUR GA 30030-1549DECATUR GA 30030-1549

在一个实施例中，标准化步骤310可以包括可变的规则集合，这取决于地址类型和地区或国家。例如，外国地址可能具有很不相同的约束各种地址工件的标准表示的规则。例如，以下主观表示80可以被标准化：In one embodiment, the normalization step 310 may include a variable set of rules, depending on the address type and region or country. For example, foreign addresses may have very different rules governing the standard representation of various address artifacts. For example, the following subjective representations 80 can be normalized:

主观表示80：标准化后：Subjective representation 80: After normalization:

Prielle Kelia U.19-15 BUDAPEST XIPrielle Kelia U.19-15 BUDAPEST XI

Budapest H-2100 PRIELLE KELIA U.19-35Budapest H-2100 PRIELLE KELIA U.19-35

11171117

Hungary HUNGARYHungary HUNGARY

V.Delle Terme LARGO DELLE TERMEV. Delle Terme LARGO DELLE TERME

Rome 0010 000153-ROMA RMRome 0010 000153-ROMA RM

Italy ITALYItaly ITALY

103New Oxford PG 103 NEW OXFORD ST103New Oxford PG 103 NEW OXFORD ST

London WC1A 1PG LONDONLondon WC1A 1PG LONDON

Great Britain WC1A 1PGGreat Britain WC1A 1PG

UNITED KINGDOMUNITED KINGDOM

标准化步骤310可以与解析步骤305结合执行，从而使得经解析的工件以其标准化后的格式被存储在表格中。在一个实施例中，可以在解析之后对每个相分离的工件执行标准化步骤310，而在另一个实施例中，解析步骤305可以先发生。正如匹配模块85中的其他通用步骤一样，标准化310和解析305步骤可以以任何顺序发生，并且可以重复。The normalization step 310 may be performed in conjunction with the parsing step 305 such that the parsed artifacts are stored in a table in their normalized format. In one embodiment, the normalization step 310 may be performed on each phase-separated artifact after parsing, while in another embodiment, the parsing step 305 may occur first. As with other general steps in the matching module 85, the normalization 310 and parsing 305 steps can occur in any order and can be repeated.

5.3.4.验证模块。在一个实施例中，被称为验证320的步骤可以包括被采取来验证主观表示80的复杂的步骤系列，如下文更详细描述。验证320一般包括检索主观表示80的准确性和新近性。验证320还可以包括将主观表示80与存储在超集30的表格中的值相比较，从而搜索优选表示90。5.3.4. Authentication module . In one embodiment, the step referred to as verification 320 may comprise a complex series of steps taken to verify subjective representation 80, as described in more detail below. Validation 320 generally includes retrieving the accuracy and recency of subjective representation 80 . Validation 320 may also include comparing subjective representation 80 with values stored in a table of superset 30 to search for preferred representation 90 .

5.3.5.更新。在一个实施例中，被称为更新380的步骤可以包括将新获取的数据添加到超集30中的关系数据库之一。在这个方面中，可以基于新的数据通过和经由程序套组500的操作不断地更新超集30。更新步骤380可以发生在由匹配模块85执行的规程期间的任何时刻。5.3.5. Updates . In one embodiment, the step referred to as updating 380 may include adding the newly acquired data to one of the relational databases in superset 30 . In this aspect, superset 30 may be continuously updated by and through operation of program suite 500 based on new data. The updating step 380 may occur at any time during the procedure performed by the matching module 85 .

在一个实施例中，更新步骤380可以将新数据添加到超集中的表格之一。数据可以被放置在位于表格末端附近的记录中。在本发明的一个方面中，在接下来执行增强模块的任务之前，可以重新编译也可以不重新编译表格。所期望的表格不要求频繁编译。In one embodiment, update step 380 may add new data to one of the tables in the superset. Data can be placed in records located near the end of the table. In one aspect of the present invention, the table may or may not be recompiled before subsequent execution of the enhancement module's tasks. The desired form does not require frequent compilation.

5.3.6.组合。在一个实施例中，被称为组合390的步骤可以包括解析步骤305的逆转，这是因为主观表示80的相分离的工件被重新组装。在一个实施例中，组合步骤390是在验证步骤320产生了优选表示90的工件之后被执行的。5.3.6. Composition . In one embodiment, the step referred to as combining 390 may comprise a reversal of the parsing step 305 as the disjoint artifacts of the subjective representation 80 are reassembled. In one embodiment, the combining step 390 is performed after the verifying step 320 has produced the artifact of the preferred representation 90 .

5.3.7.发表和显示。在一个实施例中，被称为发表395的步骤可以包括将优选表示90(或优选令牌)传输或发送到本发明的系统10的一个或多个组件。在这个方面中，发表步骤395可以被描述成返回或公布搜索查询的结果。发表步骤395还可以包括或者后跟显示步骤，在该显示步骤中，优选表示90可以被显示在监视器或其他类型的用户显示器上。发表步骤395还可以包括或后跟打印步骤，在该打印步骤中，优选表示90可以根据系统指导被打印在标签上、打印在列表中、打印成报告的一部分或者以可读文本格式以其他方式发送。5.3.7. Posting and Displaying . In one embodiment, the step referred to as posting 395 may include transmitting or sending preference representation 90 (or preference token) to one or more components of system 10 of the present invention. In this aspect, publish step 395 may be described as returning or publishing the results of the search query. The publishing step 395 may also include or be followed by a displaying step in which the preferred representation 90 may be displayed on a monitor or other type of user display. The publishing step 395 may also include or be followed by a printing step in which the preferred representation 90 may be printed on a label, printed in a list, printed as part of a report, or otherwise sent in a readable text format according to system directions .

5.4验证模块5.4 Verification module

在一个实施例中，验证步骤320一般可以包括将主观表示80与存储在超集30的表格中的值相比较，从而搜索优选表示90。在地址管理系统110的场境中，地址验证320一般包括将输入地址的主观表示80与存储在地址超集130(如图1所示)的地址数据库131、132、133中的值相比较，并且识别地址的优选表示90。In one embodiment, the verification step 320 may generally include comparing the subjective representation 80 with values stored in a table of the superset 30 to search for a preferred representation 90 . In the context of the address management system 110, address verification 320 generally involves comparing the subjective representation 80 of the incoming address with values stored in the address databases 131, 132, 133 of the address superset 130 (shown in FIG. 1 ), And a preferred representation 90 of the address is identified.

如图1所示，在一个实施例中，地址超集130可以包括邮政数据库131、运输公司数据库132、标准数据库133和计划数据库134。在一个实施例中，每个关联数据库131-134可以包括优选表格141、街道别名表格142和收存人别名表格143。优选表格141还可以包括一个或多个字段，用于存储令牌以充当特定记录的唯一标识符。As shown in FIG. 1 , in one embodiment, the address superset 130 may include a postal database 131 , a carrier database 132 , a standard database 133 and a schedule database 134 . In one embodiment, each association database 131 - 134 may include a preference table 141 , a street alias table 142 and a depositor alias table 143 . Preferably table 141 may also include one or more fields for storing a token to serve as a unique identifier for a particular record.

邮政数据库131在一个实施例中可以包括来自诸如美国邮政服务(USPS)这样的邮政服务的地址数据。美国包括多于145,000,000个可投递地址。USPS向公众提供多种被定期更新的地址数据库，其中包括投递序列文件(DSF)。DSF是由USPS开发的计算机化的数据库，其包括USPS所服务的每个投递点的完全、标准化的地址，这些地址被存储在离散的记录中。每个分离的记录包含街道地址、ZIP+4代码、派发路线代码、投递序列号码(行走序列号码)、投递类型代码和季节性投递指示符。USPS最近开发了新的投递点验证(DPV)数据库，以取代DSF。DPV数据库具有基本格式或增强型格式，增强型格式被称为DSF²(其包括附加的地址属性)。许多外国和外国地区提供类似的邮政地址记录数据库，其中包括根据国家的特定需求和规则标准化的数据。本发明的邮政数据库131可以被配置成接收和存储多种包含邮政地址的数据库中的任何一种。 Postal database 131 may, in one embodiment, include address data from a postal service such as the United States Postal Service (USPS). The United States includes more than 145,000,000 deliverable addresses. The USPS provides the public with a variety of address databases that are regularly updated, including the Delivery Sequence File (DSF). The DSF is a computerized database developed by the USPS that includes the full, standardized addresses of each delivery point served by the USPS, stored in discrete records. Each separate record contains a street address, ZIP+4 code, dispatch route code, delivery serial number (walking serial number), delivery type code, and seasonal delivery indicator. USPS recently developed a new Point of Delivery Verification (DPV) database to replace DSF. The DPV database has a basic format or an enhanced format, called DSF ² (which includes additional address attributes). Many foreign countries and foreign territories offer similar postal address record databases that include data standardized according to country-specific needs and rules. The postal database 131 of the present invention may be configured to receive and store any of a variety of databases containing postal addresses.

在邮政数据库131内，优选表格141.1可以被配置成接受和存储由邮政机构所服务的投递点的优选表示。优选表示可以被存储为整个工件或存储成相分离的工件，或者两者。邮政优选表格141.1可以是地址的优选表示90的主要来源之一。Within the postal database 131, the preference table 141.1 may be configured to accept and store preference representations of delivery points served by the postal establishment. Preferred representations can be stored as an entire artifact or as separate artifacts, or both. The postal preference form 141.1 can be one of the main sources of the preferred representation 90 of an address.

邮政机构还可以提供可以在街道别名表格142.1中接受和存储的街道别名数据。别名顾名思义是指若干个不同标识符指同一对象的情形。街道别名的常见示例发生在道路具有多个名称时：本地街道名称、州路线号码和联邦公路号码。例如，U.S.Highway 1在特定州中可以被称为State Route 16，并且当它经过特定城镇时也可能被称为MapleStreet。在这三个名称都适用的地区中，街道名称Maple Street、StateRoute 16和U.S.Highway 1是街道别名。此外，街道别名例如还可以包括S.R.16、Route 16、U.S.1、Route 1或Maple Drive，如果这些名称在使用的话。USPS数据库通常包括街道别名数据。街道别名表格142.1可以被配置成接受和存储由邮政机构提供的街道别名数据。The Postal Service may also provide street alias data which may be accepted and stored in Street Alias Form 142.1. Aliases, as the name implies, refer to situations where several different identifiers refer to the same object. A common example of street aliasing occurs when a road has more than one name: local street name, state route number, and federal highway number. For example, U.S. Highway 1 might be called State Route 16 in a particular state, and it might also be called MapleStreet when it passes through a particular town. In areas where all three designations apply, the street names Maple Street, StateRoute 16, and U.S. Highway 1 are street aliases. In addition, street aliases may also include, for example, S.R. 16, Route 16, U.S. 1, Route 1, or Maple Drive, if those names are in use. USPS databases often include street alias data. The street alias table 142.1 may be configured to accept and store street alias data provided by postal agencies.

其他特征和工件也可能经历别名使用。例如，正式公司名称可能包括公众一般不包括的术语。例如，Acme Shoe Corporation在日常说法中可以被称为Acme Shoes或者就是Acme。由将要存储在数据库中的值的不同名称或别名所产生的问题发生在数据库的用户希望特别检索该值时。例如，对Acme Shoe Corporation的搜索可能不会找到仅指示Acme Shoes的记录。Other features and artifacts may also experience alias usage. For example, the official company name may include terms that are not generally included by the public. For example, Acme Shoe Corporation could be called Acme Shoes or simply Acme in everyday parlance. A problem arising from a different name or alias for a value to be stored in a database occurs when a user of the database wishes to retrieve that value specifically. For example, a search for Acme Shoe Corporation may not find records indicating only Acme Shoes.

收存人别名表格143.1可以被配置成接受和存储由邮政机构提供的收存人别名数据(当其可用时)。邮政机构可以提供也可以不提供收存人别名数据。在某些辖区中，比如美国，邮政服务可能不会与街道地址一起发布暴露居住者(收存人)身份的数据。所示的用于收存人别名表格143.1的数据字段(字段1、字段2、字段3、...、字段n)之前是连字符而不是+号，以指示这些字段可以为空。The depository alias form 143.1 may be configured to accept and store depository alias data provided by the postal service as it becomes available. Postal agencies may or may not provide recipient alias data. In some jurisdictions, such as the United States, the Postal Service may not release data that reveals the identity of the occupant (recipient) along with the street address. The data fields shown for the Recipient Alias Form 143.1 (Field 1, Field 2, Field 3, ..., Field n) are preceded by hyphens instead of + signs to indicate that these fields may be empty.

可以以关系数据库领域中已知的方式用一个或多个关键字字段来链接或以其他方式互连邮政数据库131的表格141.1、142.1、143.1。The tables 141.1, 142.1, 143.1 of the postal database 131 may be linked or otherwise interconnected with one or more key fields in a manner known in the relational database art.

运输公司数据库132在一个实施例中可以包括来自私有来源的地址数据，所述私有来源例如是货运公司、包裹服务或私有数据库提供商。某些投递公司和其他服务提供商开发和维护地址数据，其中某些可能是可用的。本发明的运输公司数据库132可以被配置成接收和存储多种包含地址信息的私有数据库中的任何一种。 Carrier database 132 may, in one embodiment, include address data from private sources such as freight companies, package services, or private database providers. Certain delivery companies and other service providers develop and maintain address data, some of which may be available. The carrier database 132 of the present invention may be configured to receive and store any of a variety of proprietary databases containing address information.

在运输公司数据库132内，优选表格141.2可以被配置成接受和存储私有来源数据库中包含的投递点的优选表示。优选表示可以被整体地存储，或者作为相分离的工件存储，或者两者。Within the carrier database 132, the preference table 141.2 may be configured to accept and store preference representations of delivery points contained in the private source database. Preferred representations may be stored in their entirety, or as separate artifacts, or both.

私有来源还可以提供街道别名表格142.2中可以接受和存储的街道别名数据。某些投递公司和其他服务提供商开发和维护它们所服务的区域的街道别名的列表。街道别名表格142.2可以被配置成接受和存储由任何私有来源提供的街道别名数据。Private sources may also provide street alias data which may be accepted and stored in Street Alias Form 142.2. Certain delivery companies and other service providers develop and maintain lists of street aliases for the areas they serve. The street alias table 142.2 may be configured to accept and store street alias data provided by any proprietary source.

收存人别名表格143.2可以被配置成接受和存储由私有来源提供的收存人别名数据。除了街道别名之外，许多投递公司和其他服务提供商开发和维护可以包括别名的用户或顾客(收存人)的列表。收存人别名表格143.2可以被配置成接受和存储由任何私有来源提供的收存人别名数据。The depositary alias table 143.2 may be configured to accept and store depositary alias data provided by private sources. In addition to street aliases, many delivery companies and other service providers develop and maintain lists of users or customers (recipients) that may include aliases. The depositary alias form 143.2 may be configured to accept and store depositary alias data provided by any proprietary source.

可以以关系数据库领域中已知的方式用一个或多个关键字字段来链接或以其他方式互连运输公司数据库132的表格141.2、142.2、143.2。类似地，运输公司数据库132也可以与邮政数据库131相链接或以其他方式互连。The tables 141.2, 142.2, 143.2 of the carrier database 132 may be linked or otherwise interconnected with one or more key fields in a manner known in the relational database art. Similarly, carrier database 132 may also be linked or otherwise interconnected with postal database 131 .

标准数据库133在一个实施例中一般可以包括别名数据。在邮政数据库131和运输公司数据库132的上载和安装期间，本发明的系统10可以包括用于收集街道别名和收存人别名信息并将其存储在标准数据库133中的工具。标准街道别名表格142.3可以被配置成接受和存储街道别名数据。标准收存人别名表格143.3可以被配置成接受存储收存人别名数据。在这个方面中，在一个实施例中标准数据库133可以充当别名数据的仓库。 Standards database 133 may generally include alias data in one embodiment. During upload and installation of the postal database 131 and carrier database 132 , the system 10 of the present invention may include tools for collecting and storing street alias and consignee alias information in the standard database 133 . The standard street alias table 142.3 may be configured to accept and store street alias data. Standard Recipient Alias Table 143.3 may be configured to accept storage of Recipient Alias data. In this regard, the canonical database 133 may serve as a repository for alias data in one embodiment.

由于标准数据库133一般是用于别名数据的，因此它可以包括也可以不包括表格141.3中的任何优选数据。标准优选表格141.3的数据字段(字段1、字段2、字段3、...、字段n)之前可以是连字符而不是+号，以指示这些字段可以为空。Since the standard database 133 is generally used for alias data, it may or may not include any preferred data in table 141.3. The data fields of Standard Preferred Form 141.3 (Field 1, Field 2, Field 3, ..., Field n) may be preceded by a hyphen instead of a + sign to indicate that these fields may be empty.

可以以关系数据库领域中已知的方式用一个或多个关键字字段来链接或以其他方式互连标准数据库133的表格141.3、142.3、143.3。类似地，标准数据库133也可以与运输公司数据库132和邮政数据库131相链接或以其他方式互连。The tables 141.3, 142.3, 143.3 of the standard database 133 may be linked or otherwise interconnected with one or more key fields in a manner known in the relational database art. Similarly, standards database 133 may also be linked or otherwise interconnected with carrier database 132 and postal database 131 .

存储在标准数据库133中的数据可以用于一个被称为朦胧或模糊匹配的过程中。字面匹配要求确切的匹配，例如Acme和Acme。模糊匹配展现了部分匹配，例如Acme、ACM、Acmed和Ch2Acme。别名数据一般可以用于允许或需要模糊匹配的系统中，这是因为别名按其性质正好包含细微的差异却代表着相同对象。例如，上述收存人别名(Acme Shoe Corporation，Acme Shoes，Acme)也代表了彼此的模糊匹配。Data stored in the criteria database 133 may be used in a process known as fuzzy or fuzzy matching. A literal match requires an exact match, such as Acme and Acme. Fuzzy matches exhibit partial matches, such as Acme, ACM, Acmed, and Ch2Acme. Alias data can generally be used in systems that allow or require fuzzy matching, because aliases by their nature happen to contain subtle differences but represent the same object. For example, the aforementioned depositor aliases (Acme Shoe Corporation, Acme Shoes, Acme) also represent fuzzy matches to each other.

模糊匹配可以用于地址标准化场境中，这是因为地址的主观表示80可以包括一个或多个含糊的或不正确的地址工件。例如，主观表示80“Doe，123 East Main Street N.W.，Suite A-4，Atl 30030”就是不完整的，并且包括若干含糊之处。可以利用存储在邮政数据库131的收存人别名表格143.3中的数据，经由模糊匹配过程，将地址“Doe”与优选收存人“John W.Doe”相匹配。这个示例示出了地址超集130的数据库131-134如何一起工作，这是因为邮政数据库131可以不包括表格141.3中的任何优选数据。因此，为了完成地址验证320，地址管理系统110可以被配置成访问存储在其他数据库131、132、234中的表格中的相关数据，以找出地址的优选表示90。由于表格141、142、143被链接，因此对匹配的搜索可以单独地或连同街道主名称“Main”使用ZIP代码“30030”，以找出与主观表示80类似的记录。在这个方面中，在一个实施例中本发明的地址管理系统110可以被配置成包括用于存储在地址超集130中的任何数据之间的匹配的程序或结构化查询语言。Fuzzy matching can be used in the context of address normalization because the subjective representation 80 of an address may include one or more ambiguous or incorrect address artifacts. For example, the subjective representation 80 "Doe, 123 East Main Street N.W., Suite A-4, Atl 30030" is incomplete and includes several ambiguities. The address "Doe" can be matched to the preferred depository "John W. Doe" via a fuzzy matching process using data stored in the depositary alias table 143.3 of the postal database 131. This example shows how the databases 131-134 of the address superset 130 work together, since the postal database 131 may not include any preferred data in table 141.3. Thus, to accomplish address verification 320, the address management system 110 may be configured to access relevant data in tables stored in other databases 131, 132, 234 to find the preferred representation 90 of the address. Since the tables 141, 142, 143 are linked, a search for a match can use the ZIP code "30030" alone or in conjunction with the street main name "Main" to find records similar to the subjective representation 80 . In this regard, the address management system 110 of the present invention may be configured in one embodiment to include a procedural or structured query language for matching between any data stored in the address superset 130 .

地址标准化和验证场境中可能有用的另一个工具被称为Soundex。Soundex提供了查找听起来相似的单词的方法。Soundex开始时是文件编排系统，它使用简单的语音算法来将正确的名称和其他单词简化为四字符字母数字代码。在一类Soundex算法中，代码的第一字母可以对应于单词或正确名称的第一字母，代码的其余部分可以由从其余音节的声音得出的三个数字构成。这样一来，单词或名称的语音被量化。Soundex功能是有用的，这是因为与比较字母相比，计算机一般更擅长比较数字。在一个实施例中，本发明的验证步骤320可以包括Soundex算法。Another tool that may be useful in the context of address normalization and verification is called Soundex. Soundex provides a way to find words that sound similar. Soundex started out as a document arrangement system that used simple phonetic algorithms to reduce the correct names and other words to four-character alphanumeric codes. In a class of Soundex algorithms, the first letter of the code may correspond to the first letter of a word or proper name, and the remainder of the code may consist of three numbers derived from the sounds of the remaining syllables. In this way, the phonetics of words or names are quantified. The Soundex function is useful because computers are generally better at comparing numbers than letters. In one embodiment, the verification step 320 of the present invention may include the Soundex algorithm.

计划数据库134在一个实施例中可以包括输入数据，其中包括一个或多个主观表示80。在这个方面中，将主观表示数据添加到计划表格141.4、142.4、143.4中的过程可以包括这里所描述的捕捉、解析和标准化步骤，从而输入数据可以被适当地划分和标准化，以为验证作好准备。 Planning database 134 may, in one embodiment, include input data including one or more subjective representations 80 . In this aspect, the process of adding subjective representation data to planning forms 141.4, 142.4, 143.4 may include the capture, parsing and normalization steps described herein so that input data can be properly partitioned and normalized in preparation for validation .

在一个实施例中，输入数据可以主要被存储在计划优选表格141.4中。由于计划数据库134一般是用于输入数据的，因此它可以包括也可以不包括街道别名和收存人别名表格142.4、143.4中的任何数据。这些表格的数据字段之前是连字号而不是+号，以指示这些字段可以为空。In one embodiment, the input data may primarily be stored in the plan preference table 141.4. Since the planning database 134 is generally used for data entry, it may or may not include any data in the street alias and depositor alias tables 142.4, 143.4. The data fields of these tables are preceded by a hyphen instead of a + sign to indicate that these fields can be empty.

5.4.1按分级体系排列数据。在一个方面中，本发明的地址管理系统110可以利用地址数据的分级性质，以迅速和高效地定位与主观表示80类似的记录。在这个方面中，地址管理系统110可以包括根据其内在分级体系来准备或排列存储的数据的方法。数据可以被排列成从笼统到具体的一系列级别(如下所述)，或者排列成尤其适合于该应用的任何顺序。在使用中，地址管理系统110可以被配置成包括能够查找存储在地址超集130中的数据之间的匹配的程序或存储的查询规程。5.4.1 Arrange the data according to the classification system . In one aspect, the address management system 110 of the present invention can take advantage of the hierarchical nature of address data to quickly and efficiently locate records similar to the subjective representation 80 . In this regard, address management system 110 may include methods for preparing or arranging stored data according to its inherent hierarchy. Data may be arranged at a range of levels from general to specific (as described below), or in any order particularly suitable for the application. In use, the address management system 110 may be configured to include a program or stored lookup procedure capable of finding matches between data stored in the address superset 130 .

一般而言，查询可以被用来从数据库提取所需的数据，而不改变或更改数据本身。因为查询一般找出所需数据并将其显示给用户，因此查询的结果有时被称为视图。此外，查询还可以被用来创建结果(视图)，而不将其显示给用户。在这个方面中，查询可以被用来将数据(通常是临时的)排列成不同于表格结构的新结构。查询可以被用来创建具有特定优点的新数据结构，所述优点例如是改进的排列逻辑、更迅速的分类和搜索，或将特定数据字段移动到更主要的位置。在一个实施例中，本发明的验证步骤320可以包括一个或多个查询，以排列超集中的数据。一个这种排列包括被称为令牌化(tokenization)的过程。Generally speaking, queries can be used to extract the required data from the database without altering or altering the data itself. Because a query typically finds the required data and displays it to the user, the results of a query are sometimes called a view. Additionally, queries can be used to create results (views) without displaying them to the user. In this respect, queries can be used to arrange data (usually temporary) into new structures other than tabular structures. Queries can be used to create new data structures with specific advantages, such as improved sorting logic, faster sorting and searching, or moving certain data fields to a more central location. In one embodiment, the verification step 320 of the present invention may include one or more queries to rank the data in the superset. One such arrangement involves a process known as tokenization.

5.4.2.令牌化。邮政优选表格141.1的示例在图9中示出。每行代表单个记录并且包括多个字段。每个相分离的字段被存储在包含类似属性的相分离的列中。表格的属性在顶部示为列名称。图9所示的优选表格141.1可以被描述成模式(ZIP、令牌、街道、类型、低、高、奇/偶、收存人、部分、低、高、+4)。5.4.2. Tokenization . An example of a postal preference form 141.1 is shown in FIG. 9 . Each row represents a single record and includes multiple fields. Each separate field is stored in a separate column containing similar attributes. The properties of the table are shown as column names at the top. The preferred table 141.1 shown in Figure 9 can be described as a pattern (ZIP, Token, Street, Type, Low, High, Odd/Even, Depositor, Part, Low, High, +4).

如图所示的令牌列包括邮政令牌71，作为每个唯一地址的唯一标识符。注意，包含地址“440 First Street，Suite 600”的两个记录被分配以邮政令牌T6。表格的其他行中的其他街道地址记录代表不同地址，因此具有不同的令牌。The token column as shown includes a postal token 71 as a unique identifier for each unique address. Note that the two records containing the address "440 First Street, Suite 600" are assigned postal token T6. Other street address records in other rows of the table represent different addresses and therefore have different tokens.

地址数据按照其性质就正好是分级的。地址的各种工件从笼统变化到具体。例如，五位ZIP代码本身提供了关于地址位置笼统概念，而完整地址通常被理解为包括居住者或收存人和所有街道数据，及ZIP代码或ZIP+4，提供了非常具体的地址位置。Address data is just hierarchical by its nature. The various artifacts of the address vary from the general to the specific. For example, a five-digit ZIP code by itself provides a general idea of the address location, while a full address is generally understood to include the occupant or depositor and all street data, and a ZIP code, or ZIP+4, provides a very specific address location.

在一个实施例中，本发明的验证步骤320可以包括用于将City-State-ZIP组合放置在地址数据的分级体系的顶部的查询或算法。City-State组合当然可以包括多个ZIP代码。在具体程度的下一个级别上是街道工件，其中包括预定向、街道名称、街道类型和后定向。这种街道地址可以类似100 East Main Street，SW。还可以利用一个或多个街道地址范围进一步细分街道工件，所述范围可以是纯数字，例如范围240-298，或者可以是字母，这取决于范围字段。超出正规街道别名的是次要工件，其中包括次要号码，例如Suite 100或Apartment1C。ZIP+4代码中的附加的四个数字可以提供另一个级别的具体程度。某些数据库还可以包括附加的两位投递序列号码。In one embodiment, the verification step 320 of the present invention may include a query or algorithm for placing the City-State-ZIP combination at the top of the hierarchy of address data. A City-State combination can of course include multiple ZIP codes. At the next level of specificity are street artifacts, which include pre-orientation, street name, street type, and post-orientation. Such a street address could be something like 100 East Main Street, SW. Street artifacts can also be further subdivided by one or more street address ranges, which can be purely numeric, such as the range 240-298, or alphabetic, depending on the range field. Beyond the regular street aliases are secondary artifacts, which include secondary numbers, such as Suite 100 or Apartment1C. The additional four digits in the ZIP+4 code can provide another level of specificity. Certain databases may also include an additional two-digit delivery sequence number.

在一个实施例中，本发明的验证步骤320可以包括将超集的表格中的记录排序成从笼统到具体的分级结构的方法。在验证步骤320内，可以就被称为包含和包括的概念来定义所产生的关系和记录分组。节点号码已经被分配给表格141.1的每个记录，如图9所示。节点号码可以帮助证明地址记录之间的包含和包括的概念。In one embodiment, the validation step 320 of the present invention may include a method of sorting the records in the superset's table into a general to specific hierarchy. Within the validation step 320, the resulting relationships and groupings of records can be defined in terms of concepts known as containment and inclusion. A node number has been assigned to each record of Form 141.1, as shown in Figure 9. Node numbers can help demonstrate the concepts of inclusion and inclusion between address records.

5.4.3包含级别。在验证步骤320对表格141.1的记录进行重新排序之后，记录的新的分级排列可以如图10所示。图10中的节点号码是根据数据中显示的具体程度级别来分发的。例如，图10中的级别1包括节点1，其代表包括地址范围“440-498 First Street“的记录。在图9所示的所有记录中，位于节点1处的记录是最笼统的，从而被置于级别1中。具体程度的下一个级别，级别2，包括节点2。节点2处的记录包括单个街道地址(440 First Street)，但不包括次要工件(没有套房号码)。5.4.3 Levels of inclusion . After the verification step 320 reorders the records of table 141.1, the new hierarchical arrangement of records may be as shown in FIG. 10 . The node numbers in Figure 10 are distributed according to the level of specificity shown in the data. For example, Level 1 in Figure 10 includes Node 1, which represents records that include the address range "440-498 First Street." Of all the records shown in FIG. 9 , the record located at node 1 is the most general and thus placed in level 1 . The next level of specificity, level 2, includes node 2. The record at node 2 includes a single street address (440 First Street), but not secondary artifacts (no suite number).

图10中的级别3包括具有套房号码或范围但没有收存人名称的地址。这些记录包括节点3、11、4、12、5和13。级别3中的节点是按套房号码增大的顺序从左到右排列的。在这个方面中，系统10可以被配置成除了将数据置于不同的具体程度级别中之外还对地址数据进行从左到右的排序。Level 3 in Figure 10 includes addresses with suite numbers or ranges but no depositary names. These records include nodes 3, 11, 4, 12, 5 and 13. Nodes in level 3 are arranged from left to right in order of increasing suite number. In this aspect, system 10 may be configured to sort address data from left to right in addition to placing the data in different levels of specificity.

级别4包括在收存人字段具有名称的记录。Level 4 includes records with names in the To field.

包含和包括的概念由图10中的各种节点之间的连接所证明。节点10连接到节点3，因为“Suite 310“是范围“Suite 100-400”的子集。类似地，节点6、7和8连接到节点5，因为它们的套房号码“500是600”是节点5中的范围(Suite 500-600)的子集。最后，节点9是节点13的子集，因为地址是相同的而节点9包括收存人名称。The concepts of containment and inclusion are demonstrated by the connections between the various nodes in Figure 10. Node 10 is connected to Node 3 because "Suite 310" is a subset of the range "Suite 100-400". Similarly, nodes 6, 7, and 8 are connected to node 5 because their suite number "500 is 600" is a subset of the range (Suite 500-600) in node 5. Finally, node 9 is a subset of node 13 because the address is the same and node 9 includes the depositary name.

图10所示的节点显示了在本发明的验证步骤320的一个实施例中可以实施的包含和包括概念。级别1上的节点1“包含”其下的所有节点，这是因为所有其他地址让都落在为节点1声明的范围内。相反，级别1之下的所有节点都被“包括”在节点1内(或由节点1包含)。类似地，级别2上的节点2包含其下的所有节点，节点3包含节点10。点5包含节点8、6和7，因为它们是节点4中声明的范围的子集。节点13包含节点9。The nodes shown in Figure 10 illustrate the containment and inclusion concepts that may be implemented in one embodiment of the verification step 320 of the present invention. Node 1 on level 1 "contains" all nodes below it because all other addresses fall within the range declared for node 1. Instead, all nodes below level 1 are "included" within (or contained by) node 1. Similarly, node 2 on level 2 contains all nodes below it, and node 3 contains node 10. Point 5 contains nodes 8, 6 and 7 because they are a subset of the range declared in node 4. Node 13 contains node 9.

在一个实施例中，本发明的验证步骤320可以向每个唯一记录分配令牌。令牌也证明了包含和包括的概念。图11是图10所示的分级表格的表状表示。图11中的表格示出了从级别11开始每个级别上的节点和令牌。令牌T1可以被描述成包含分级表格中的所有其他令牌。但是注意，令牌号码可能不同于节点号码。令牌T3包含令牌T9。令牌T5包含令牌T6和T7。注意，令牌T6被用于节点6和7两者，因为地址是等同的。In one embodiment, the verification step 320 of the present invention may assign a token to each unique record. Tokens also demonstrate the concepts of inclusion and inclusion. FIG. 11 is a tabular representation of the ranking table shown in FIG. 10 . The table in Figure 11 shows the nodes and tokens at each level starting with level 11. Token T1 can be described as containing all other tokens in the ranking table. Note, however, that token numbers may differ from node numbers. Token T3 contains token T9. Token T5 contains tokens T6 and T7. Note that token T6 is used for both nodes 6 and 7, since the addresses are equivalent.

在图11中容易看出包括和包含的概念。例如，比较节点3和节点10处的数据，读者将会注意到节点10中的“Suite 310”处于存储在节点3中的套房号码(100-400)的范围之间。这个关系证明了在图10中也示出的包括和包含概念。The concepts of include and contain are easily seen in FIG. 11 . For example, comparing the data at Node 3 and Node 10, the reader will notice that "Suite 310" in Node 10 is in the range of suite numbers (100-400) stored in Node 3. This relationship demonstrates the include and contain concepts also shown in FIG. 10 .

在一个实施例中，对于在本发明的验证步骤320期间应用的包含级别的数目没有限制。地址记录可以包含大量工件。表格可以包括大量记录。考虑表格中可以包括的大量记录，记录的分级组织可以用于大大增大访问和分析数据的速度。针对图14、15和16中示出的十三个节点描述的包含级别和令牌数目可以被应用到地址超集130中的任何一个表格中的数百万的地址记录和范围。按照与根据分级体系对图9中的优选表格141.1排序的方式相同的方式，也可以使用节点和包含级别来组织地址超集130中的其他表格141、142、143。In one embodiment, there is no limit to the number of inclusion levels applied during the verification step 320 of the present invention. Address records can contain a large number of artifacts. Tables can contain large numbers of records. Given the large number of records that can be included in a table, a hierarchical organization of records can be used to greatly increase the speed of accessing and analyzing data. The inclusion levels and token numbers described for the thirteen nodes shown in FIGS. 14 , 15 and 16 can be applied to the millions of address records and ranges in any one of the tables in address superset 130 . The other tables 141 , 142 , 143 in the address superset 130 can also be organized using node and containment levels in the same way as the preferred table 141.1 in Fig. 9 is sorted according to the hierarchy.

除了利用包含级别来重新排列数据之外，每个表格可以被变换成一个这里所描述的稀疏矩阵链接列表，以进一步增大处理速度。In addition to rearranging data using containment levels, each table can be transformed into a sparse matrix linked list as described here to further increase processing speed.

5.4.3.优选令牌。再次参考图9中的表格141.1，节点6和7都被赋予了相同的令牌T6，因为它们代表相同的物理位置。注意，节点6和7中的收存人名称分别是“APC”和“AM POLLING CMTE”。收件人的这些备选名称是收存人别名。换言之，APC是AM POLLINGCMTE的别名。正如这里所讨论的，这种收存人别名可以被存储在地址超集130中的一个或多个收存人别名表格143中。5.4.3. Preferred Tokens . Referring again to table 141.1 in Figure 9, nodes 6 and 7 are both assigned the same token T6 because they represent the same physical location. Note that the recipient names in nodes 6 and 7 are "APC" and "AM POLLING CMTE" respectively. These alternate names for recipients are recipient aliases. In other words, APC is an alias for AM POLLINGCMTE. Such payee aliases may be stored in one or more payee alias tables 143 in address superset 130, as discussed herein.

类似，街道别名数据可以被存储在地址超集130的一个或多个街道别名表格142中。例如，街道别名表格142中的字段可以按图13所示的方式排列。图13中的示例性街道别名表格142包括New YorkCity的Sixth Avenue的若干个街道别名，该街道也被称为Avenue ofthe Americas。街道别名表格142可以采取在比较街道地址记录时易于访问的格式的这种列表。Similarly, street alias data may be stored in one or more street alias tables 142 of address superset 130 . For example, the fields in the street alias table 142 may be arranged in the manner shown in FIG. 13 . The exemplary street alias table 142 in FIG. 13 includes several street aliases for Sixth Avenue in New York City, also known as Avenue of the Americas. Street alias table 142 may be such a list in an easily accessible format when comparing street address records.

在本发明的一个方面中，可以指示地址数据库管理系统10将别名表示之一标记为“优选表示”。在将各种街道别名和收存人别名应用到存储在地址数据超集130中的数据的情况下，令牌T4081之一(例如)可以被标记为优选表示。这样，优选令牌70可以包括一个标志，例如“p”以表示优选，从而使得优选令牌70看起来是T4081p。本发明的系统10可以意识到所有具有令牌T4081的地址记录都是等同的。在一个实施例中，识别优选令牌70并且标记它(例如T4081p)可以有助于确保特定街道地址的优选工件(被标记为T4081p)始终会响应于查询被返回。In one aspect of the invention, the address database management system 10 may be instructed to mark one of the alias representations as a "preferred representation." Where various street aliases and recipient aliases are applied to the data stored in the address data superset 130, one of the tokens T4081 (for example) may be marked as the preferred representation. Thus, preference token 70 may include a designation, such as "p" to indicate preference, so that preference token 70 appears to be T4081p. The system 10 of the present invention can realize that all address records with token T4081 are equal. In one embodiment, identifying the preferred token 70 and labeling it (eg, T4081p) may help ensure that the preferred artifact for a particular street address (labeled T4081p) will always be returned in response to a query.

在本发明的这个方面中，在一个实施例中验证步骤320可以被配置成利用查询将存储的数据排列成新的分级数据结构。在一个实施例中，一个或多个令牌可以被标记为或以其他方式被标识为优选令牌70，以标识地址或特定工件的优选表示。In this aspect of the invention, the validation step 320 may in one embodiment be configured to utilize queries to arrange the stored data into a new hierarchical data structure. In one embodiment, one or more tokens may be tagged or otherwise identified as a preferred token 70 to identify an address or a preferred representation of a particular artifact.

在相关的方面中，本发明的管理系统可以被配置成在本发明的系统10的各种组件之间传递令牌(而不是文本)。交换令牌比起交换长串地址文本来更高效且更不易出错。在这个方面中，将令牌用作唯一标识符进一步加快了查询的处理、报告以及对存储在超集中的数据的其他类型的分析。In a related aspect, the management system of the present invention may be configured to pass tokens (rather than text) between the various components of the system 10 of the present invention. Exchanging tokens is more efficient and less error-prone than exchanging long strings of address text. In this aspect, using the token as a unique identifier further speeds up the processing of queries, reporting, and other types of analysis of the data stored in the superset.

在一个实施例中，验证步骤320可以作为地址管理系统110的程序套组500的一部分被执行(例如见图7)。可以对复制超集330和发表给AMS客户端655的结果执行验证步骤320。在地址管理系统110中，在应用这里描述的一个或多个技术的情况下，从捕捉步骤300到发表步骤395经过的时间可以在一百到两百毫秒范围内。In one embodiment, verification step 320 may be performed as part of program suite 500 of address management system 110 (see, eg, FIG. 7 ). The validation step 320 may be performed on the replicated superset 330 and the results published to the AMS client 655 . In address management system 110, the time elapsed from capture step 300 to publish step 395 may be in the range of one hundred to two hundred milliseconds, applying one or more techniques described herein.

5.4.5.比较。在一个实施例中，验证步骤320一般包括将主观表示80与存储在超集30的表格中的值相比较，从而搜索优选表示90。在地址管理系统110的场境中，地址验证320一般涉及将输入地址的主观表示80与存储在地址超集130中的地址数据库131、132、133中值相比较(如图1所示)，并且识别地址的优选表示90。5.4.5. Comparison . In one embodiment, the verification step 320 generally includes comparing the subjective representation 80 with values stored in a table of the superset 30 to search for a preferred representation 90 . In the context of the address management system 110, address verification 320 generally involves comparing the subjective representation 80 of the incoming address with values stored in address databases 131, 132, 133 in the address superset 130 (as shown in FIG. 1 ), And a preferred representation 90 of the address is identified.

在图2所示的框图中，验证步骤320占据了单个块。但是，正如这里所描述的，验证步骤320可以涉及用于验证地址的大量步骤和规程。之前的部分略述了多个数据操纵例程和搜索方法，而概括性地描述了比较输入数据和存储的数据的过程。更具体而言，在一个实施例中，验证步骤320的比较过程可以包括以下列出的编号的步骤。In the block diagram shown in Figure 2, the verification step 320 occupies a single block. However, as described herein, verification step 320 may involve a number of steps and procedures for verifying the address. The previous sections outlined several data manipulation routines and search methods, while generally describing the process of comparing input data with stored data. More specifically, in one embodiment, the comparison process of verification step 320 may include the numbered steps listed below.

(1)将输入数据(80)存储在计划数据库134中的优选表格141.4中(参见图1)。(1) Store the input data (80) in a preferred table 141.4 in the planning database 134 (see Figure 1).

(2)将存储在优选表格141.4中的输入数据与存储在其他优选表格141.1、141.2和141.3(如果有的话)中的数据值相比较。回忆起，在一个实施例中，超集中的每个表格如上所述可能都已经被变换成了稀疏矩阵链接列表，利用节点和分级包含级别被重新排列，以及/或者被令牌化，以帮助在每个表格中进行快速且高效的搜索。比较过程可以包括从存储在其他优选表格141.1、141.2、141.3中的数据值中定位一个或多个候选表示。找到匹配一般可以包括选择与被搜索的主观表示80最相似的候选表示。(2) Compare the input data stored in the preferred table 141.4 with the data values stored in the other preferred tables 141.1, 141.2 and 141.3 (if any). Recall that, in one embodiment, each table in the superset may have been transformed into a sparse matrix linked list as described above, rearranged with nodes and hierarchical containment levels, and/or tokenized to facilitate Fast and efficient searches in every table. The comparison process may include locating one or more candidate representations from data values stored in other preference tables 141.1, 141.2, 141.3. Finding a match may generally include selecting the candidate representation that is most similar to the subjective representation 80 being searched for.

(a)如果在输入数据和优选表格数据之间找到匹配，则定位相应的优选令牌70，并进行到执行图12中所示的更新380、组合390和发表395步骤。(a) If a match is found between the input data and the preferred form data, locate the corresponding preferred token 70 and proceed to perform the update 380, combine 390 and publish 395 steps shown in FIG.

(b)如果未找到匹配，则进行到以下的步骤(3)。(b) If no match is found, proceed to step (3) below.

(3)将存储在优选表格141.4中的街道名称输入数据与存储在街道别名表格142.1、142.2和142.3中的街道别名数据值相比较。比较过程可以包括从存储在街道别名表格141.2、142.2、142.3中的数据值中定位一个或多个候选街道别名。找到匹配一般可以包括选择与优选令牌最紧密关联的候选街道别名。(3) Compare the street name input data stored in the preferred table 141.4 with the street alias data values stored in the street alias tables 142.1, 142.2 and 142.3. The comparison process may include locating one or more candidate street aliases from data values stored in street alias tables 141.2, 142.2, 142.3. Finding a match may generally include selecting the candidate street alias that is most closely associated with the preferred token.

(a)如果在街道名称输入数据和街道别名表格数据之间找到匹配，则定位标识优选街道别名的优选令牌70，用街道名称的相应街道别名替换优选表格141.4中的街道名称，并利用该街道别名重复以上步骤(1)。(a) If a match is found between the street name input data and the street alias table data, locate the preferred token 70 identifying the preferred street alias, replace the street name in the preferred table 141.4 with the corresponding street alias for the street name, and use the Repeat step (1) above for the street alias.

(b)如果未找到匹配，则进行到以下的步骤(4)。(b) If no match is found, proceed to step (4) below.

(4)将存储在优选表格141.4中的收存人名称输入数据与存储在收存人别名表格143.1(如果有的话)、143.2和143.3中的收存人别名数据值相比较。比较过程可以包括从存储在收存人别名表格143.1、143.2、143.3中的数据值中定位一个或多个候选收存人别名。找到匹配一般可以包括选择与优选令牌最紧密关联的候选收存人别名。(4) Compare the depositor name input data stored in preferred table 141.4 with the depositor alias data values stored in depositor alias tables 143.1 (if any), 143.2 and 143.3. The comparison process may include locating one or more candidate payee aliases from data values stored in the payee alias tables 143.1, 143.2, 143.3. Finding a match may generally include selecting the candidate recipient alias most closely associated with the preferred token.

(a)如果在收存人名称输入数据和收存人别名表格数据之间找到匹配，则定位标识优选收存人别名的优选令牌70，用收存人名称的相应收存人别名替换优选表格141.4中的收存人名称，并利用该收存人别名重复以上步骤(1)。(a) If a match is found between the depositor name input data and the depositor alias form data, locate the preference token 70 identifying the preferred recipient alias and replace the preferred Depository name from Form 141.4 and repeat step (1) above with that depositary alias.

(b)如果未找到匹配，则进行到以下的步骤(5)。(b) If no match is found, proceed to step (5) below.

(5)向用户28或应用返回异常代码400。(5) Return exception code 400 to the user 28 or the application.

(6)在一个实施例中，验证步骤320可以包括显示可能的匹配(地址、街道别名、收存人别名)的列表并允许用户28执行视觉比较并且手动选择(如果适当的话)可能的匹配之一作为优选表示的步骤。(6) In one embodiment, the verification step 320 may include displaying a list of possible matches (address, street alias, depositor alias) and allowing the user 28 to perform a visual comparison and manually select (if appropriate) one of the possible matches. A step as a preferred representation.

(a)如果进行手动选择，则比较过程将会进行到执行图12中所示的更新380、组合390和发表395步骤。(a) If manual selection is made, the comparison process will proceed to perform the Update 380, Combine 390 and Publish 395 steps shown in Figure 12 .

(b)如果未进行手动选择，则输入数据和异常代码400将会被传送出验证系统，以便进一步处理。(b) If no manual selection is made, the input data and exception code 400 will be sent out of the verification system for further processing.

以上步骤(2)中描述的用于找到优选地址表示的方法可以包括以下额外的步骤：The method for finding a preferred address representation described in step (2) above may include the following additional steps:

(a)将主观表示解析成一个或多个离散工件；(a) parse the subjective representation into one or more discrete artifacts;

(b)选择一个或多个离散工件之一：(b) Select one of one or more discrete artifacts:

(1)通过将所述一个离散工件与源数据相比较来从源数据中定位一个或多个候选工件；(1) locating one or more candidate artifacts from the source data by comparing the one discrete artifact to the source data;

(2)从一个或多个候选工件中选择优选工件，该优选工件与所述一个离散工件最相似；(2) selecting a preferred artifact from one or more candidate artifacts, the preferred artifact being most similar to the one discrete artifact;

(3)存储该优选工件；(3) storing the preferred artifact;

(c)为一个或多个离散工件中的每一个重复步骤(b)；(c) repeating step (b) for each of the one or more discrete workpieces;

(d)组合优选工件以形成优选表示。(d) Combining preferred artifacts to form preferred representations.

类似地，以上步骤(3)和(4)中描述的用于找到优选别名表示的方法可以包括以下额外的步骤：Similarly, the method described in steps (3) and (4) above for finding preferred alias representations may include the following additional steps:

(1)通过将所述一个离散工件与别名数据相比较来从源数据中定位一个或多个候选工件；(1) locating one or more candidate artifacts from the source data by comparing the one discrete artifact to the alias data;

(2)从一个或多个候选别名工件中选择优选别名工件，该优选别名工件与优选别名令牌最紧密地关联；(2) selecting a preferred alias artifact from the one or more candidate alias artifacts that is most closely associated with the preferred alias token;

(3)存储该优选别名工件；(3) storing the preferred alias artifact;

(d)将优选别名工件添加到优选别名。(d) Add a preferred alias artifact to preferred aliases.

在一个实施例中，上述比较步骤中使用的术语“匹配”可以涉及分析地地址的一个或多个工件以确定数据之间的相似性是否有效到足够构成“匹配”。例如，以下准则可以适用：In one embodiment, the term "match" as used in the comparison step above may involve analyzing one or more artifacts of the address to determine whether the similarity between the data is significant enough to constitute a "match". For example, the following guidelines may apply:

1.对于包括街道号码和街道名称在内的主要地址要求确切匹配。1. An exact match is required for the primary address including street number and street name.

2.仅当次要地址存在于运输公司数据库132中并且它与主要地址相关联时，对于次要地址(例如套房号码)才要求确切匹配。2. An exact match is required for a secondary address (eg suite number) only if the secondary address exists in the carrier database 132 and it is associated with the primary address.

3.当收存人存在于计划数据库134(输入数据)时，对于收存人名称才要求确切匹配。3. Exact matches are only required for depositor names when the depositor exists in the planning database 134 (input data).

应当理解，根据应用和处理目标，可以确立其他匹配准则。It should be understood that other matching criteria may be established depending on the application and processing objectives.

5.5.接口5.5. Interface

在一个实施例中，本发明的数据库管理系统110可以包括接口600和程序套组500，如图3和图5-9所示。在一个实施例中，接口600可以包括被设计成在应用(例如程序套组500)和用户(或另一应用)之间提供操作性连接或接口的计算机程序。接口600可以提供一系列命令，这些命令允许了用户创建、读取、更新和删除存储在数据库表格中的数据值。这些功能(创建、读取、更新、删除)有时是用缩写CRUD来提及的，因此提供这些命令的接口可以被称为CRUD接口。包括查询功能的数据库接口可以被称为CRUDQ接口。In one embodiment, the database management system 110 of the present invention may include an interface 600 and a program suite 500, as shown in FIG. 3 and FIGS. 5-9. In one embodiment, interface 600 may comprise a computer program designed to provide an operative connection or interface between an application (eg, program suite 500 ) and a user (or another application). Interface 600 may provide a series of commands that allow users to create, read, update and delete data values stored in database tables. These functions (Create, Read, Update, Delete) are sometimes referred to by the acronym CRUD, so an interface providing these commands may be called a CRUD interface. A database interface that includes query functionality may be referred to as a CRUDQ interface.

在一个实施例中，接口600可以被配置成基于COM的接口；意思是它是基于组件对象模型的。组件对象模型是可以辅助接口600和本发明的系统10的各种其他组件之间的互用性的开放软件体系结构。虽然可以提供基于COM的接口600，但是也可使用其他软件模型来完成所需的功能。In one embodiment, interface 600 may be configured as a COM-based interface; meaning it is Component Object Model based. The component object model is an open software architecture that can facilitate interoperability between the interface 600 and various other components of the system 10 of the present invention. While a COM-based interface 600 may be provided, other software models may be used to accomplish the desired functionality.

根据本发明的一个实施例，在接口600中可以包括查询功能。查询是用来从数据库中提取所需的数据集合的命令或指令。已知的最好的查询语言是结构化查询语言(SQL，发音是“sequel”)，虽然也可使用其他查询语言。查询可以包括单个命令，或复杂的命令系列。SQL包括很多种查询命令。可以被再次使用的查询命令集合可以被保存在SQL中，作为存储的规程。与运行程序类似，调整sequel中的存储的规程比起每次发送一个查询命令更高效。此外，存储的规程一般被预先编译，并且还可以被数据库管理系统缓存。在这个方面，查询命令可以被用作强大的编程工具。According to an embodiment of the present invention, a query function may be included in the interface 600 . A query is a command or instruction used to extract a desired set of data from a database. The best known query language is Structured Query Language (SQL, pronounced "sequel"), although other query languages can be used. Queries can consist of a single command, or a complex series of commands. SQL includes many kinds of query commands. A set of query commands that can be reused can be stored in SQL as a stored procedure. Similar to running a program, the procedure for adjusting the storage in the sequel is more efficient than sending a query command one at a time. In addition, stored procedures are typically pre-compiled and can also be cached by the database management system. In this respect, query commands can be used as powerful programming tools.

5.5.1应用标识符。在一个实施例中，接口600可以被配置成与使用中的数据库管理系统110内部和外部的多种不同程序和应用一起操作和交互。接口600可以被配置成与内部程序套组500的每个组件一起操作。接口600还可以被配置成与数据库管理系统外部的一个或多个外部程序或应用一起操作，所述外部程序或应用例如是相关的数据库应用、辅助报告应用、独立业务应用或者希望或从业务上来说需要与存储在超集30、130中的数据进行交互的多种其他程序。5.5.1 Application Identifier . In one embodiment, interface 600 may be configured to operate and interact with a variety of different programs and applications both internal and external to database management system 110 in use. Interface 600 may be configured to operate with each component of internal program suite 500 . Interface 600 may also be configured to operate with one or more external programs or applications external to the database management system, such as related database applications, auxiliary reporting applications, stand-alone business applications, or desired or derived business applications. Say various other programs that need to interact with the data stored in the superset 30,130.

在一个实施例中，本发明的接口600可以包括一个或多个应用标识符，其中每一个具有相应的规则集合。应用标识符可以被用来标识请求访问本发明的数据库管理系统的应用。应用标识符可以是单个命令或复杂算法。一般来说，应用标识符进行操作以标识请求与数据库交互的应用。In one embodiment, the interface 600 of the present invention may include one or more application identifiers, each of which has a corresponding set of rules. The application identifier can be used to identify an application requesting access to the database management system of the present invention. Application identifiers can be single commands or complex algorithms. In general, application identifiers operate to identify the application requesting interaction with the database.

每个应用标识符可以包括可以用来约束特定应用270和数据库管理系统之间的交互的相应的规则集合。这种交互可以包括查询请求、预订更新、数据传送或其他通信、输出格式指示或任何其他行为。应用标识会和规则集合可以被存储在数据库中，或者以其他方式被保存成可访问的格式。Each application identifier may include a corresponding set of rules that may be used to constrain the interaction between a particular application 270 and the database management system. Such interactions may include query requests, subscription updates, data transfers or other communications, output format instructions, or any other action. Application identifiers and rule sets may be stored in a database or otherwise saved in an accessible format.

例如，在地址管理系统110的场境中，特定应用270可以通过发送查询来请求访问地址超集130。作为响应，接口600可以被配置成识别应用270，检索适当的应用标识符，并且又检索相应的规则集合。然后接口600可以将规则集合传递到地址管理系统110，以用于处理查询或与应用270的其他交互。地址管理系统110可以处理查询或采取与产生输出数据的应用270相关的其他动作。输出数据可以被返回到接口600，在这里规则集合被用于确认输出数据采取了可以被应用270访问的格式。在这个方面中，地址管理系统110及其接口600可以协同工作以通过使用规则集合处理来自应用270的请求。For example, in the context of address management system 110 , a particular application 270 may request access to address superset 130 by sending a query. In response, interface 600 may be configured to identify application 270, retrieve an appropriate application identifier, and in turn retrieve a corresponding set of rules. Interface 600 may then pass the rule set to address management system 110 for processing queries or other interactions with application 270 . The address management system 110 may process the query or take other actions related to the application 270 that generated the output data. The output data may be returned to the interface 600 where a rule set is used to confirm that the output data is in a format that can be accessed by the application 270 . In this aspect, the address management system 110 and its interface 600 can work together to process requests from the application 270 through the use of a set of rules.

在这个方面中，本发明的接口600是通用的；其意思是接口600可以被配置成与任何应用270一起操作和交互。通过维护与接口本身相分离的规则集合，接口600中的编程不需要包括用于所有各种应用270的规则。相反，通过使用应用标识符，接口600可以只包括用于查找和检索相应规则集合的相对简单的命令。In this respect, the interface 600 of the present invention is generic; it means that the interface 600 can be configured to operate and interact with any application 270 . By maintaining a rule set separate from the interface itself, programming in interface 600 need not include rules for all of the various applications 270 . Instead, by using application identifiers, the interface 600 may include only relatively simple commands for finding and retrieving the corresponding rule sets.

当管理系统110要求与新的应用270交互时，可能不需要修改接口600。唯一需要的动作是为新应用270添加应用标识符和相应的规则集合。接口600可以提供用于输入这种新信息的系统。When management system 110 requires interaction with new applications 270, interface 600 may not need to be modified. The only action required is to add an application identifier and corresponding rule set for the new application 270 . Interface 600 may provide a system for entering such new information.

5.5.2数据捕捉深度。在一个实施例中，特定应用270的规则集合可以被配置成控制从数据超集30中捕捉哪些特定工件。在使用中，例如，第一应用可能只要求ZIP代码数据，而第二应用可能要求ZIP+4、城市和州。本发明的规则集合可以包括存储的关于使用中的特定应用270的数据要求的信息。通过控制数据捕捉的程度或深度，规则集合可以增大接口600访问系统10内的数据的效率和速度。5.5.2 Data Capture Depth . In one embodiment, a rule set for a particular application 270 may be configured to control which particular artifacts are captured from the data superset 30 . In use, for example, a first application might only ask for ZIP code data, while a second application might ask for ZIP+4, city and state. The rule set of the present invention may include stored information about the data requirements of a particular application 270 in use. By controlling the degree or depth of data capture, the rule set can increase the efficiency and speed with which interface 600 can access data within system 10 .

6.结论6 Conclusion

所描述的本发明实施例只想要作为示例。对于本领域的技术人员，许多变化和修改都是显而易见的。所有这种变化和修改都希望落在所附权利要求书所限定的本发明的范围之内。The described embodiments of the invention are intended to be examples only. Many changes and modifications will be apparent to those skilled in the art. All such changes and modifications are intended to fall within the scope of the invention as defined in the appended claims.

以上所描述的包括若干个示例。当然，为了描述数据库管理系统中采用的系统、方法、计算机可读介质等等，不可能描述每个可以构想出来的组件或方法组合。但是，本领域的普通技术人员可以意识到，其他组合和排列是可能的。因此，本发明想要包含落在所附权利要求书的范围之内的更改、修改和变化。此外，以上描述并不想要限制本发明的范围。相反，本发明的范围只由所附权利要求书及其等同物来确定。What has been described above includes several examples. It is, of course, not possible to describe every conceivable combination of components or methods for purposes of describing the systems, methods, computer readable media, etc. employed in a database management system. However, one of ordinary skill in the art will appreciate that other combinations and permutations are possible. Accordingly, the present invention is intended to embrace alterations, modifications and variations that come within the scope of the appended claims. Furthermore, the above description is not intended to limit the scope of the invention. Instead, the scope of the present invention is to be determined only by the appended claims and their equivalents.

虽然已经通过描述示例来说明了这里的系统、方法和装置，并且虽然已经相当详细地描述了这些示例，并是申请人并不想要将所附权利要求书的范围局限或以任何方式限制到这种细节上。本领域的技术人员，额外的优点和修改都是很显而易见的。因此，本发明就其较宽的方面来说，并不限于所示出和描述的具体细节、代表性系统和方法或说明性示例。因此，可能在不脱离发明人的一般创造性概念的精神或范围的情况下脱离这种细节。While the systems, methods, and apparatus herein have been illustrated by describing examples, and while these examples have been described in considerable detail, applicants do not intend to limit or in any way limit the scope of the appended claims to this kind of details. Additional advantages and modifications will readily appear to those skilled in the art. Therefore, the invention in its broader aspects is not limited to the specific details, representative systems and methods, or illustrative examples shown and described. Accordingly, departures may be made from such details without departing from the spirit or scope of the inventor's general inventive concept.

Claims

1. An address management system (110), comprising:

a plurality of relational databases (31-35) operatively connected to form a superset of data (30), wherein each of said relational databases (31-35) includes one or more tables, and wherein each of the one or more tables shares a common data structure; and

one or more computer program modules (500) configured to: capture and store a subjective representation (80) in a first one of said relational databases (31-35);

storing source data in a second one of said relational databases (31-35), said source data comprising a generic representation of a type similar to said subjective representation (80);

locating one or more candidate representations from said source data by comparing said subjective representation (80) to said source data; and

A preferred representation (90) is selected from the one or more candidate representations, the preferred representation (90) being most similar to the subjective representation (80).

2. The address management system (110) of claim 1, wherein the one or more computer program modules (500) are further configured to store the one or more tables as a sparse matrix linked list.

3. The address management system (110) of claim 1, wherein the one or more computer program modules (500) are further configured to transform the one or more tables into a sparse matrix linked list.

4. The address management system (110) of claim 1, wherein the subjective representation (80) describes an address, and the source data includes address records.

5. The address management system (110) of claim 1, wherein as part of said selecting step, said one or more computer program modules (500) are further configured to:

assigning a preference token (70) to said preference representation (90); and

Identifying the preferred token (70) if the preferred token is present in any of the one or more candidate representations.

6. The address management system (110) of claim 1, wherein each of said one or more tables includes a plurality of records, and wherein said one or more computer program modules (500) are further configured become:

arranging said records in a hierarchical order on a range of levels from general to specific based on the values of said data stored in said records; and

Transform one or more of the tables into a sparse matrix linked list.

7. The address management system (110) of claim 6, in a system comprising a local server computer (510, 200) and one or more remote client computers (655, 255), wherein said one or more A computer program module (500) is further configured to:

A copy of the sparse matrix linked list is distributed from the local server computer (510, 200) to the one or more client computers (655, 255).

8. The address management system (110) of claim 1, wherein the one or more computer program modules (500) are further configured to:

(a) parsing said subjective representation (80) into one or more discrete artifacts;

(b) selecting one of said one or more discrete artifacts:

(1) locating one or more candidate artifacts from the source data by comparing the one discrete artifact to the source data;

(2) selecting a preferred artifact from the one or more candidate artifacts, the preferred artifact being most similar to the one discrete artifact;

(3) storing the preferred workpiece;

(c) repeating step (b) for each of the one or more discrete workpieces;

(d) combining said preferred artifacts to form a preferred representation.

9. The address management system (110) of claim 8, wherein as part of said selecting a preferred representation, said one or more computer program modules (500) are further configured to:

Normalized data is stored in a third of said relational databases (31-35), said normalized data comprising one or more normalized representations of said one or more discrete artifacts.

10. The address management system (110) as claimed in claim 1, wherein said one or more computer program modules (500) are further configured to:

storing alias data in a fourth relational database among said relational databases (31-35);

reviewing said alias data to identify one or more selected alias records containing a preferred alias representation;

adding a preferred alias token to said one or more select alias records;

locating one or more candidate aliases from said alias data by comparing said subjective representation (80) to said alias data;

A preferred alias is selected from the one or more candidate aliases, the preferred alias being most closely associated with the preferred alias token.

11. The address management system according to claim 10, wherein said one or more computer program modules (500) are further configured to:

(b) selecting one of said one or more discrete artifacts:

(1) locating one or more candidate alias artifacts from said source data by comparing said one discrete artifact to said alias data;

(2) selecting a preferred alias artifact from the one or more candidate alias artifacts, the preferred alias artifact being most closely associated with the preferred alias token;

(3) storing the preferred alias artifact;

(c) repeating step (b) for each of the one or more discrete workpieces;

(d) Add a preferred alias artifact to preferred aliases.

12. The address management system (10) as claimed in claim 1, wherein said one or more computer program modules (500) are further configured to:

parsing the subjective representation (80) into one or more discrete artifacts;

storing normalized data in a third one of said relational databases (31-35), said normalized data comprising one or more normalized representations of said one or more discrete artifacts; and

Alias data is stored in a fourth one of said relational databases (31-35), said alias data comprising a plurality of equivalent representations of said one or more discrete artifacts.

13. The address management system (110) as claimed in claim 1, in a system that also includes one or more external applications, wherein said address management system (110) also includes:

An application interface (600) configured to constrain interaction between said one or more computer program modules (500) and said one or more external applications.

14. The address management system (110) according to claim 1, in a system further comprising one or more external applications, wherein said one or more computer program modules (500) are also configured to:

storing a plurality of rule sets, each associated with one of the one or more external applications;

receiving a request from a first external application;

retrieve a first set of rules related to the first external application; and

The first set of rules is applied to constrain interactions between the first external application and the one or more computer program modules (500).

15. The address management system (110) of claim 14, wherein said first set of rules includes rules captureable from a superset (30) of said relational database (31-35) to be applied by said first external List of data used.

16. In an address management system, a method of processing subjective representations (80) using one or more databases, the method characterized by the steps of:

providing a plurality of relational databases (31-35) operatively connected to form a superset of data (30), wherein each of said relational databases (31-35) includes one or more tables, and wherein each of the one or more tables shares a common data structure;

capturing a subjective representation (80) and storing it in a first one of said relational databases (31-35);

17. The method of claim 16, further comprising storing the one or more tables as a sparse matrix linked list.

18. The method of claim 16, further comprising transforming the one or more tables into a sparse matrix linked list.

19. The method of claim 16, wherein the subjective representation (80) describes an address, and the source data includes address records.

20. The method of claim 16, as part of said selecting step, further comprising the step of:

assigning a preference token (70) to said preference representation (90); and

21. The method of claim 6, wherein each of the one or more tables includes a plurality of records, the method further comprising:

Transform one or more of the tables into a sparse matrix linked list.

22. The method of claim 16, in a system comprising a local server computer (510, 200) and one or more remote client computers (655, 255), the method further comprising:

23. The method of claim 16, further comprising:

(a) parsing (305) said subjective representation (80) into one or more discrete artifacts;

(b) selecting one of said one or more discrete artifacts:

(3) storing the preferred workpiece;

(c) repeating step (b) for each of the one or more discrete workpieces;

(d) combining said preferred artifacts to form a preferred representation.

24. The method of claim 23, as part of said step of selecting a preferred representation, further comprising the step of:

25. The method of claim 16, further comprising:

adding a preferred alias token to said one or more select alias records;

26. The method of claim 25, further comprising:

(b) selecting one of said one or more discrete artifacts:

(3) storing the preferred alias artifact;

(c) repeating step (b) for each of the one or more discrete workpieces;

(d) Add a preferred alias artifact to preferred aliases.

27. The method of claim 16, further comprising:

parsing (305) the subjective representation (80) into one or more discrete artifacts;

28. The method of claim 16, in a system further comprising one or more external applications, the method further comprising

An application interface (600) configured to constrain interaction between said one or more computer program modules (500) and said one or more external applications is provided.

29. The method of claim 16, in a system further comprising one or more external applications, the method further comprising

receiving a request from a first external application;

retrieve a first set of rules related to the first external application; and

30. The method of claim 29, wherein said first set of rules comprises a list of data captureable from a superset (30) of said relational database (31-35) for use by said first external application .