[go: up one dir, main page]

WO2000020992A1 - Database retrieval system - Google Patents

Database retrieval system Download PDF

Info

Publication number
WO2000020992A1
WO2000020992A1 PCT/US1998/021075 US9821075W WO0020992A1 WO 2000020992 A1 WO2000020992 A1 WO 2000020992A1 US 9821075 W US9821075 W US 9821075W WO 0020992 A1 WO0020992 A1 WO 0020992A1
Authority
WO
WIPO (PCT)
Prior art keywords
alphanumeric characters
data
sets
search query
database
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/US1998/021075
Other languages
French (fr)
Inventor
Arthur S. Gilbrech
Elizabeth M. Gilbrech
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to AU10689/99A priority Critical patent/AU1068999A/en
Priority to PCT/US1998/021075 priority patent/WO2000020992A1/en
Publication of WO2000020992A1 publication Critical patent/WO2000020992A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation

Definitions

  • the present invention relates to data retrieval systems and more particularly to an optimized remote data retrieval system.
  • a first type is a string search system which searches data or groups of data for a particular string.
  • a second type preindexes the data, and matches a search string with an index entry, which points to the file or location of occurrence.
  • Known systems provide a number of search capabilities, including Boolean operators, truncation symbols, synonyms and fuzzy match searching, natural language input, and the like. These capabilities and systems are primarily intended to increase the power and precision of the search engine.
  • These text string search algorithms include: the Brute force algorithm; Karp-Rabin algorithm; Shift Or algorithm; Morris-Pratt algorithm; Knuth-Morris-Pratt algorithm; Simon algorithm; Colussi algorithm; Galil-Giancarlo algorithm;ENSico-Crochemore algorithm; Not So Naive algorithm; Boyer-Moore algorithm; Turbo-BM algorithm; tendico-Giancarlo algorithm; Reverse Colussi algorithm; Horspool algorithm; Quick Search algorithm; Zhu- Takaoka algorithm; Smith algorithm; Raita algorithm; Reverse Factor algorithm; Turbo Reverse Factor algorithm; Galil-Seiferas algorithm; Two Way algorithm; String Matching on Ordered Alphabets; Optimal Mismatch algorithm; and the Maximal Shift algorithm.
  • String matching consists in finding one, or more generally, all the occurrences of a string (more generally called a pattern) in a data field.
  • data may be memorized in various ways, text remains the main form to exchange information. This is particularly evident in literature or linguistics where data are composed of huge corpus and dictionaries. This applies as well to computer science where a large amount of data are stored in linear files. This is also the case, for instance, in molecular biology because biological molecules are often represented as sequences of nucleotides or amino acids.
  • String-matching algorithms for text typically operate as follows: they first align the left ends of the pattern and the text, then compare the characters of the text aligned with the characters of the pattern ⁇ this specific work is called an attempt — and after a whole match of the pattern or after a mismatch they shift the pattern to the right. They repeat the same procedure again until the right end of the pattern goes beyond the right end of the text.
  • a commonly used string searching tool is called Generalized Regular Expression Parser ("Grep”), and variants, which is a widely used tool on Unix systems.
  • the present invention provides a new and unique method of operation of a data search and retrieval system which, in one preferred embodiment, preferably comprises entering one or more sets of alphanumeric characters into a user device; preferably transmitting said alphanumeric characters to a receiving device; preferably inputting said alphanumeric characters from said receiving device into a computer database search engine; preferably searching said distinct databases for data records containing said alphanumeric characters using said computer database search engine; preferably identifying data records containing said alphanumeric characters; and preferably transmitting identified data records to said transmission device.
  • This improved and novel method provides significant enhancements and unexpected results in efficiency as compared to the default behavior of known systems, and corresponds to an efficient "memory model" employed by the user.
  • the present invention therefore allows a user to preferably input, for example, the first few letters of consecutive words known to be present in a desired data file.
  • the user being experienced and knowledgeable about the contents and structure of the data files, preferably applies this knowledge to minimize the input tasks necessary to retrieve selected data files for analysis.
  • the present invention is particularly useful where there is a significant benefit in minimizing user input tasks.
  • a small, difficult to use keyboard is employed, or the user input may be accomplished by use of a touchscreen with a virtual keyboard or through handwriting recognition.
  • Newer models may also employ voice recognition of letters and numbers. In any case, the entry of even single characters may be difficult or cumbersome.
  • this input is for defining a user query, typically the user is familiar with the database or databases which are being searched. The user may, for example, seek an appointment file with a specified individual. The user may recall the name of the individual, but not the time and date of the appointment. Accordingly, in order to retrieve this information, the user seeks to input the name of the person with whom the appointment is made.
  • the query may be defined using only a few characters. For example, the user may enter the first two letters of the person's first and last names, separated by a space. Thus, with five keystrokes or character entries, the search query may be defined.
  • a further advantage of the present invention is apparent when the user input terminal is connected over a wireless network to the database containing the files.
  • the communication bandwidth may be limited, or communication costs may be related to amount of data transmitted.
  • efficiency of query definition in terms of data transmission requirements may be advantageous.
  • the user When a user defines a search, typically the user has a data "domain" in mind. This domain consists of data having common characteristics, which may be in one or more data files. Therefore, the user advantageously selects the data domain prior to defining the particular query. In fact, in one preferred embodiment of the invention, the rules interpreting the query are preferably dependent on the data domain selected.
  • the system may also be adaptive to the user, for example through explicit modification of the interface characteristics and behavior, or through an automated or "learning” process.
  • the data retrieval application of the present invention need not be a simple text search engine, and may also be a so-called “intelligent agent” which autonomously interacts, in a complex way, with other automated systems, to perform an action or obtain information, with relatively little immediate instruction from the user.
  • One preferred embodiment of the invention preferably provides a user interface for a remote device through a wireless network, such as a cellular digital packet data (CDPD), personal communication service (PCS), or digital cellular service, on a PDA platform.
  • a wireless network such as a cellular digital packet data (CDPD), personal communication service (PCS), or digital cellular service, on a PDA platform.
  • the wireless communication may, for example, include GSM-900, GSM- 1800, GSM-1900, CDMA IS-95, TDMA IS-136, 3G systems - IMT-2000, UMTS, W-CDMA, wideband IS-95, or other wireless communications standards.
  • the system is compliant with the so-called Wireless Application Protocol (WAP, see http://www.xwap.com, WAP Architecture Draft version 0.9 (1997-09), which encompasses the Wireless Markup Language (WML, from the Wireless Application Development Committee), and subsumes the Handheld Device Markup Language (HDML, from Unwired Planet Inc.) and Tagged Text Markup Language (TTML, from Nokia Mobile Phones), and is extensible for future languages, extensions and protocols which may not be fully defined at present.
  • WAP systems may also preferably interface with translation systems for converting HTML documents to HDML or its current implementation in WAP.
  • a user preferably inputs a query according to the present invention on a PDA, which communicates through a wireless service provider to a network, such as the Internet or an intranet.
  • the service provider may also preferably host a closed network.
  • the query is forwarded to a server, which translates the query into a defined search.
  • the two simple search strings are preferably interpreted as follows: each set of alphanumeric characters is treated as the first characters of a truncated string, present in the order of the sets in the query.
  • the sets may be separated by an optional arbitrary string, which may encompass one or more words (a word being defined as a sequence of characters separated by white space).
  • the interface text string input as processed and interpreted may be preferably further refined, as follows.
  • the sets of characters may be constrained as the beginning of words (character strings after a space).
  • the intervening string is constrained to be a terminal portion of a word, i.e., the intervening string includes a single, terminal space.
  • the system provides an WAP interface on a wireless communication device.
  • the device preferably communicates through the service provider, using http protocol, with at least one server.
  • the system may preferably have access to the Internet, a global network of computers employing TCP/IP communication protocol which supports the World Wide Web, a communication network on the Internet which supports hypertext communication using Hypertext Markup Language (HTML).
  • HTML Hypertext Markup Language
  • WWW World Wide Web
  • devices or “alternate platforms” include voice- and fax -based user agents, low-cost Network Computers, and handheld devices such as mobile phones and PDAs.
  • HTML While the WWW's infrastructure and protocols fully support these alternate platforms, HTML itself does not. In particular, the navigation and display models inherent to HTML are unsatisfactory when applied to a typical handheld device. However, combining the use of standard web protocols and infrastructure (for example, URLs, HTTP, SSL plus CGI, Perl, commercial web servers) with an alternate but complementary hypertext language, such as WML or other languages which are adapted for use with typical mobile wireless devices, such as HDML and TTML, provides display and navigation models designed to provide intuitive data access using a modified transaction paradigm rather than the document- based paradigm of HTML.
  • standard web protocols and infrastructure for example, URLs, HTTP, SSL plus CGI, Perl, commercial web servers
  • Handheld devices While there are many types, styles, and classes of handheld devices, they generally have many characteristics in common: • Small Display Size. Handheld devices are characterized primarily by a limited display size. A typical display is capable of displaying 4-10 lines of text 12-20 characters wide and may be graphical (bitmapped) or text-only. PDA-style displays may comprise 64x128 pixels to 640x480 pixels, in monochrome or color • Limited Input Capabilities. Handheld devices may or may not have a full keyboard and may or may not have a pointing/selection device.
  • the input keys on a data-ready mobile phone include only: The keys normally found on a telephone, usually numeric keys 0-9, *, #, with alphabet letters marked on numeric keys 2-9; Cursor/arrow keys (often just up and down or left and right); A number of dedicated function keys (SEND, END, etc.); and One or more "soft keys" with programmable labels.
  • Item selection is accomplished through numbered lists or using cursor keys to highlight a choice from a range of options and then directing a function to be performed on that item.
  • Full 2-D cursor control through pointing devices such as touch pads, touch screens, roller balls, are uncommon.
  • a full "QWERTY" keyboard is also uncommon.
  • HTTP Hypertext Transfer Protocol
  • a feature of HTTP is the typing and negotiation of data representation, allowing systems to be built independently of the data being transferred.
  • HDML in contrast to HTML, uses a "card” as the basic element, with a number of cards — a “deck” ⁇ required to make up an application.
  • Cards are individual units of data that are often, on some level, "atomic,” that is they cannot be subdivided into smaller pieces without losing context.
  • a deck of cards is linked internally and can be referenced by other decks and cards. Because the card is designed to capture discrete units of data, there are multiple types of cards: cards for displaying data, cards for entering data, and cards for listing indexes or lists of choices. Putting multiple types of data on a single card would imply the user has enough visual context in order to understand the differences. With the small display, the context would be lost.
  • WML as does HTML, communicates using HTTP.
  • a deck of cards may be designed to walk the user through a number of choices and data entry to result in a request which will return the specific data desired.
  • the end result could be a stock quote, a weather report, or a phone number, with varying levels of input and selection required to result in a call for the desired information from the WWW.
  • a deck of cards may be designed to walk the user through a number of choices and data entry to result in a request which will return the specific information requested. See, e.g., Berners-Lee, T., et.
  • the method for interpreting a user query is optimized to minimize the time the user must expend to input the query, while allowing focused requests for data.
  • the method also preferably follows an intuitive model of data structure, retained by the user, so that it is not difficult to learn the command syntax.
  • a user who is somewhat skilled in using the system and knowledgeable about the data to be retrieved will be able to efficiently use the system.
  • Naive users, i.e., those who are not skilled in using the system will preferably be presented with an intuitive, but shorthand style query input interface, which is easily learned.
  • the system will preferably offer a number of options.
  • One option is to preferably make available the text string contained in the intermediate (preprocessed) data format, formatted for the wireless device, such as an HDML card or deck, for downloading to the user device.
  • a second option is for the text search server to preferably communicate with the parent application host, translating user commands and formatting outputs in a form suitable for display on the wireless device, thus providing a single point of access to multiple servers. This, for example, preferably allows navigation of a single deck for the selected items, instead of one deck for each application.
  • the text search server may also preferably select a set of "hits" or records corresponding to the user query, and with each card or element on a card, provide sufficient information for the mobile user device to communicate directly with each of the host servers.
  • a hybrid approach may also be adopted, incorporating elements of each of the three options, or allowing the user to select from a host of other options which will be appreciated by those skilled in the art.
  • That server must also support WAP.
  • One advantage of employing the present invention is integration of multiple data sources as well as an optimized interface for the efficient retrieval of information.
  • the selection information may preferably also return an "offset" or location in the file where the user query criteria are met.
  • large files will be preprocessed according to a predetermined protocol into smaller, manageable records, which are the subject of the search selection process.
  • the preprocessing may be defined as a template, or customized by the user in advance.
  • the preprocessing is user-defined, the user's familiarity with the format and content of the data source is particularly advantageous for greater efficiency in subsequent use of the system.
  • the preprocessing may also preferably encompass an index -based scheme.
  • the source data is may be preferably processed to determine the word content of each file. While the order of terms may also be encoded, typically after each of the search terms is identified in a file or record, the content is searched to determine if it meets all criteria.
  • a preferred preprocessing of the raw data files removes extraneous spaces, normalizes the text to this upper case, and substitutes non-alphanumeric characters and punctuation with internal codes. Also, intra-record formatting is removed.
  • the user may preferably include within the search domain a local database, which may be searched by a local processor. Further, while mass transmission of large data files may be impractical, limited amounts of data may be transmitted by the present invention to the user device for processing.
  • the text search application need not reside on a central server or be physically separated from the user, the text search application is preferably separate from the user interface application.
  • the present invention preferably is capable of servicing the lowest common denominator, for example, a WAP mobile device with limited processing power and display capability.
  • the system may also preferably provide enhanced features to more capable user devices. These features may be defined in a user profile or setup preferences, or may be determined adaptively by identifying the type and capabilities of the user device. For example, a user may have various systems, such as a desktop, laptop and PDA, through which the same data retrieval system is accessed.
  • the computer system preferably communicates with the user device to determine its type and inherent capabilities.
  • the database may be remote from the user and accessed, for example through a network, through a packet data switched network (public or private), such as the Internet or an intranet, a telecommunication link, wireless link, cellular wireless link, or the like.
  • the preferred content communication protocol is the Wireless Application Protocol (WAP).
  • WAP Wireless Application Protocol
  • the user query may preferably be entered through by operation of a keyboard, a limited number of buttons, touchscreen, voice recognition or other means.
  • the universe of search databases is preferably divided into data search domains, which are selected by the user for application of the search query.
  • the data is preferably preprocessed to a normalized format, which may occur in advance or during processing of the user query.
  • Source data may, for example, have differing data formats, which are converted to a common format during preprocessing.
  • each of the ordered sets of alphanumeric characters is preferably translated as a beginning of a word in an alphanumeric string sequence.
  • the intervening alphanumeric string sequence may have a single space located at a terminal end of the alphanumeric string sequence.
  • the selected records preferably may be retrieved, and/or one or more selected records may be communicated to the user.
  • the system does not transmit a large volume of information for display to the user at one time, but rather segments the selected records according to a ranking or priority and transmits information relating to a limited number of records, allowing the user to prompt for further records.
  • Fig. 1 is a block diagram depicting the operation of a preferred embodiment of the present invention
  • Fig. 2 is a flow chart detailing a preferred embodiment of the present invention.
  • a user preferably inputs a search query consisting of the data matching parameters for the database search and an identification of the database(s) comprising the desired search domain into the user device 10 which transmits the search query 12 through a network 14 to the computer system 16.
  • a protocol interface 18 may preferably be employed to translate the data received from the network to a format which is compatible with the computer system.
  • the search query is preferably initially processed by the search engine controller 20, which separates the search query into its components of search criteria 22 and the source database identifier 24.
  • the engine controller preferably translates the search criteria, which provides the data matching parameters for the database search, to a formal search query which can be recognized by the database search engine 26, by the addition of truncator characters and order operators.
  • the source database identifier 24, which preferably defines the database(s) comprising the search domain, is preferably communicated to the database search engine to direct selection of the appropriate databases to be accesses in the search operation.
  • the databases 28 which can be accessed as required to accomplish the search include one or more Web servers, local databases, intranets, local area networks (LANs), wide area networks (WANs) or other searchable databases.
  • One preferred embodiment of this invention may also include a data preprocessor 30 which reformats large data files using a predefined protocol into smaller, more manageable records which are retained in a preprocessed database 32, for expedient access by the database search engine.
  • a data preprocessor 30 which reformats large data files using a predefined protocol into smaller, more manageable records which are retained in a preprocessed database 32, for expedient access by the database search engine.
  • the selected records or the addresses of their storage locations are preferably stored in the search records repository 34.
  • the identifiers and/or predetermined portions of the contents of one or more selected records are preferably tabulated by the records index function 36, and communicated to the user in a selected display format on the user device 10.
  • a protocol interface 18 may preferably be employed to translate the data received from the computer system to a format which is compatible with the network.
  • the user may request further data to be displayed from the selected records by responding to prompts or by other commands to display specified records which are executed by the records access function 38 and displayed on the user device.
  • the user may continue to request additional data from the selected records to be displayed on the user device, or may elect to perform additional database searches.
  • Fig. 2 shows a flow diagram of a method according to the present invention.
  • the method commences with the user optionally defining preferences and characteristics of system operation 50, such as user interface input and display formats, parameters used in search logic, database selection code definition and preprocessing format and methodology.
  • the user will preferably initially define a database search domain 52, which may be, for example, the user's personal information manager database.
  • the user then preferably enters a search query 54, consisting of one or more alphanumeric strings.
  • This user query is then preferably communicated to the computer system 56, and translated to a formal search query 58, by the addition of truncator characters and order operators.
  • the system preferably executes the formal search and selects records 60.
  • Information identifying one or more selected records is preferably communicated to the user 62, from which the user may determine whether to request further information 64. If the user requests such information, the content or an identifier of the content is preferably communicated to the user 66. The user may preferably elect to terminate the data retrieval process 68 or may interact further with the system 66, or alternatively 52 or 54.
  • the user would first initiate communication with the search engine using a wireless handheld portable device providing support for WAP, e.g., WML.
  • WAP wireless handheld portable device providing support for WAP, e.g., WML.
  • the user After identifying himself (a step which may be automatic and not require user intervention), the user would first define the data domain of interest, in this case schedules and/or meetings, which may be included in a full contact and personal information manager interface.
  • This data domain is predefined, and the data is either processed on demand or in advance to convert from a native format of the associated data management program to a text-only format of one record per string, omitting control characters, extended formatting, and the like, and presented using standard alphanumeric characters, for example in all capital letters.
  • the preprocessing software may provide translation from a wide variety of text source formats to a common intermediate format, preferably an intermediate format which may be converted to WAP (WML) for transmission to the user, avoiding the need to invoke the parent application.
  • WML WAP
  • the formatted text may also include a reference, such as a uniform resource locator (URL), to the primary data or parent application and primary data, allowing the user to then communicate directly with the parent application, for example to update data or obtain further related data.
  • URL uniform resource locator
  • CGI common gateway interface
  • a retrieved record or record identification may be accompanied by a complete search definition for a parent application, allowing the user to bypass screens of the parent application and avoiding inefficiency of passing communications through the text search application after selection.
  • the system according to the present invention may fully service the mobile client by formatting and transmitting all data requested.
  • the user knows that, in the data structure, name precedes telephone number, and types "BOB", space, "201", followed by an enter or confirm.
  • the query "BOB 201" is transmitted through a wireless service provider to a private switched data network, which interfaces with the Internet.
  • the WAP application and service provider pass through a Uniform Resource Locator (URL), with the user query appended as a parameter.
  • URL Uniform Resource Locator
  • the user query is directed to the user's data server.
  • the user query is then parsed as a text string search command by a text search application.
  • the text search application interprets the user query as a request for two sequential portions of the string, separated by an intervening string of arbitrary or zero size.
  • the limited search string is likely to retrieve only a small number of records. If the number retrieved exceeds the user's expectations, then, before examining the content of the records, the user may refine the query by adding a further restriction, such as the remainder of the telephone number, date of meeting, or other pertinent known information, or a narrower data domain selection, until a small number of records is retrieved.
  • WAP WML
  • a handheld device preferably communicates through a wireless network.
  • voice services are preferably supported through a telephone value added services server.
  • the handheld device may also preferably communicate using WML with a WAP proxy which receives input from a filter or a combined WAP proxy/filter.
  • the communicated data is derived from a Web server, which communicates in HTML.
  • This preferred embodiment provides a native WML WAP search server, which preferably receives and translates a user query from the handheld device (or any other device from the wireless network or Internet), through the existing infrastructure, including the Web Server, and preferably provides a response to the handheld device.
  • the WAP search server preferably searches the string database for matching records.
  • the string database contains records preferably derived from a preprocessor, which may be physically the same system as the WAP search server.
  • the preprocessor preferably obtains the raw data from one or more Web servers, local databases, intranets, and the like. Data domains may be defined by the source of the data, by content, or by other criteria. These criteria are defined by the user.
  • *norm++ islower(*s) ? toupper(*s) : *s;

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

A method for retrieving information from one or more predefined databases (28). The user enters one or more sets of alphanumeric characters into a user device (10). The alphanumeric characters are transmitted to a receiving device. The alphanumeric characters from the receiving device are input into a computer database search engine (26). The search engine (26) is used to search the distinct databases (28) for data records that contain the alphanumeric characters. Data records containing said alphanumeric characters are identified and input to the transmission device (10) for display at the user device (10).

Description

DATABASE RETRIEVAL SYSTEM
INVENTORS: Arthur S. Gilbrech, Elizabeth M. Gilbrech
FIELD OF THE INVENTION
The present invention relates to data retrieval systems and more particularly to an optimized remote data retrieval system.
BACKGROUND OF THE INVENTION Numerous data retrieval applications are well known to those skilled in the art.
There are at least two fundamentally different types of systems known and available. A first type is a string search system which searches data or groups of data for a particular string. A second type preindexes the data, and matches a search string with an index entry, which points to the file or location of occurrence. Known systems provide a number of search capabilities, including Boolean operators, truncation symbols, synonyms and fuzzy match searching, natural language input, and the like. These capabilities and systems are primarily intended to increase the power and precision of the search engine.
When a user is familiar with a data set, he or she may use abbreviations or truncation symbols to limit the number of characters which must be input in order to obtain the desired data retrieval. However, presently known and available search engines require particular control codes or commands in order to interpret these input string sequences. Often, the default input format is AND or OR, resulting in additional required input to define the desired query. On the other hand, an experienced user may know the data set quite well, and have a number of different mental keys to the data record. These include word or string sequences or partial sequences. Thus, known data search tools do not provide an optimized search model corresponding to an experienced user's capabilities.
A number of data string search algorithms are known in the art. See, e.g. "Exact String Matching Algorithms" , Christian Charras & Thierry Lecroq, Laboratoire d'lnformatique de Rouen, Universite de Rouen, Faculte des Sciences et des Techniques, 76821 Mont-Saint-Aignan Cedex, France, http://www.dir.univ-rouen.fx/~charras/srring/. These text string search algorithms include: the Brute force algorithm; Karp-Rabin algorithm; Shift Or algorithm; Morris-Pratt algorithm; Knuth-Morris-Pratt algorithm; Simon algorithm; Colussi algorithm; Galil-Giancarlo algorithm; Apostolico-Crochemore algorithm; Not So Naive algorithm; Boyer-Moore algorithm; Turbo-BM algorithm; Apostolico-Giancarlo algorithm; Reverse Colussi algorithm; Horspool algorithm; Quick Search algorithm; Zhu- Takaoka algorithm; Smith algorithm; Raita algorithm; Reverse Factor algorithm; Turbo Reverse Factor algorithm; Galil-Seiferas algorithm; Two Way algorithm; String Matching on Ordered Alphabets; Optimal Mismatch algorithm; and the Maximal Shift algorithm.
String matching consists in finding one, or more generally, all the occurrences of a string (more generally called a pattern) in a data field. Although data may be memorized in various ways, text remains the main form to exchange information. This is particularly evident in literature or linguistics where data are composed of huge corpus and dictionaries. This applies as well to computer science where a large amount of data are stored in linear files. This is also the case, for instance, in molecular biology because biological molecules are often represented as sequences of nucleotides or amino acids.
String-matching algorithms for text typically operate as follows: they first align the left ends of the pattern and the text, then compare the characters of the text aligned with the characters of the pattern ~ this specific work is called an attempt — and after a whole match of the pattern or after a mismatch they shift the pattern to the right. They repeat the same procedure again until the right end of the pattern goes beyond the right end of the text. A commonly used string searching tool is called Generalized Regular Expression Parser ("Grep"), and variants, which is a widely used tool on Unix systems.
SUMMARY OF THE INVENTION The present invention provides a new and unique method of operation of a data search and retrieval system which, in one preferred embodiment, preferably comprises entering one or more sets of alphanumeric characters into a user device; preferably transmitting said alphanumeric characters to a receiving device; preferably inputting said alphanumeric characters from said receiving device into a computer database search engine; preferably searching said distinct databases for data records containing said alphanumeric characters using said computer database search engine; preferably identifying data records containing said alphanumeric characters; and preferably transmitting identified data records to said transmission device.
This improved and novel method provides significant enhancements and unexpected results in efficiency as compared to the default behavior of known systems, and corresponds to an efficient "memory model" employed by the user. The present invention therefore allows a user to preferably input, for example, the first few letters of consecutive words known to be present in a desired data file. The user, being experienced and knowledgeable about the contents and structure of the data files, preferably applies this knowledge to minimize the input tasks necessary to retrieve selected data files for analysis.
The present invention is particularly useful where there is a significant benefit in minimizing user input tasks. For example, in known personal digital assistants, a small, difficult to use keyboard is employed, or the user input may be accomplished by use of a touchscreen with a virtual keyboard or through handwriting recognition. Newer models may also employ voice recognition of letters and numbers. In any case, the entry of even single characters may be difficult or cumbersome. Since this input is for defining a user query, typically the user is familiar with the database or databases which are being searched. The user may, for example, seek an appointment file with a specified individual. The user may recall the name of the individual, but not the time and date of the appointment. Accordingly, in order to retrieve this information, the user seeks to input the name of the person with whom the appointment is made. However, entry of the entire name may be unnecessary. If the user believes that the name is unique within the database, based on the user's knowledge of the data, the query may be defined using only a few characters. For example, the user may enter the first two letters of the person's first and last names, separated by a space. Thus, with five keystrokes or character entries, the search query may be defined.
A further advantage of the present invention is apparent when the user input terminal is connected over a wireless network to the database containing the files. In this case, the communication bandwidth may be limited, or communication costs may be related to amount of data transmitted. In either case, efficiency of query definition in terms of data transmission requirements may be advantageous.
When a user defines a search, typically the user has a data "domain" in mind. This domain consists of data having common characteristics, which may be in one or more data files. Therefore, the user advantageously selects the data domain prior to defining the particular query. In fact, in one preferred embodiment of the invention, the rules interpreting the query are preferably dependent on the data domain selected. The system may also be adaptive to the user, for example through explicit modification of the interface characteristics and behavior, or through an automated or "learning" process. The data retrieval application of the present invention need not be a simple text search engine, and may also be a so-called "intelligent agent" which autonomously interacts, in a complex way, with other automated systems, to perform an action or obtain information, with relatively little immediate instruction from the user.
One preferred embodiment of the invention preferably provides a user interface for a remote device through a wireless network, such as a cellular digital packet data (CDPD), personal communication service (PCS), or digital cellular service, on a PDA platform. The wireless communication may, for example, include GSM-900, GSM- 1800, GSM-1900, CDMA IS-95, TDMA IS-136, 3G systems - IMT-2000, UMTS, W-CDMA, wideband IS-95, or other wireless communications standards. In one preferred embodiment of the system, the system is compliant with the so-called Wireless Application Protocol (WAP, see http://www.xwap.com, WAP Architecture Draft version 0.9 (1997-09), which encompasses the Wireless Markup Language (WML, from the Wireless Application Development Committee), and subsumes the Handheld Device Markup Language (HDML, from Unwired Planet Inc.) and Tagged Text Markup Language (TTML, from Nokia Mobile Phones), and is extensible for future languages, extensions and protocols which may not be fully defined at present. WAP systems may also preferably interface with translation systems for converting HTML documents to HDML or its current implementation in WAP.
In one preferred embodiment, a user preferably inputs a query according to the present invention on a PDA, which communicates through a wireless service provider to a network, such as the Internet or an intranet. The service provider may also preferably host a closed network. The query is forwarded to a server, which translates the query into a defined search. In the example provided above, the two simple search strings are preferably interpreted as follows: each set of alphanumeric characters is treated as the first characters of a truncated string, present in the order of the sets in the query. The sets may be separated by an optional arbitrary string, which may encompass one or more words (a word being defined as a sequence of characters separated by white space).
Additionally, the interface text string input as processed and interpreted may be preferably further refined, as follows. Instead of allowing the query to include embedded strings, the sets of characters may be constrained as the beginning of words (character strings after a space). Second, instead of allowing an arbitrary intervening string, the intervening string is constrained to be a terminal portion of a word, i.e., the intervening string includes a single, terminal space. As will be appreciated by those skilled in the art, other constraints may also be applied to refine the behavior of the system. As stated above, in a preferred embodiment of the subject invention, the system provides an WAP interface on a wireless communication device. The device preferably communicates through the service provider, using http protocol, with at least one server. Typically, the system may preferably have access to the Internet, a global network of computers employing TCP/IP communication protocol which supports the World Wide Web, a communication network on the Internet which supports hypertext communication using Hypertext Markup Language (HTML).
The World Wide Web ("WWW") provides a robust, flexible, and ubiquitous model for information access. The adoption of the WWW as the preferred means of disseminating and accessing information from desktop personal computers and workstations has created a demand for access to the same information for other devices. These devices or "alternate platforms" include voice- and fax -based user agents, low-cost Network Computers, and handheld devices such as mobile phones and PDAs.
While the WWW's infrastructure and protocols fully support these alternate platforms, HTML itself does not. In particular, the navigation and display models inherent to HTML are unsatisfactory when applied to a typical handheld device. However, combining the use of standard web protocols and infrastructure (for example, URLs, HTTP, SSL plus CGI, Perl, commercial web servers) with an alternate but complementary hypertext language, such as WML or other languages which are adapted for use with typical mobile wireless devices, such as HDML and TTML, provides display and navigation models designed to provide intuitive data access using a modified transaction paradigm rather than the document- based paradigm of HTML.
While there are many types, styles, and classes of handheld devices, they generally have many characteristics in common: • Small Display Size. Handheld devices are characterized primarily by a limited display size. A typical display is capable of displaying 4-10 lines of text 12-20 characters wide and may be graphical (bitmapped) or text-only. PDA-style displays may comprise 64x128 pixels to 640x480 pixels, in monochrome or color • Limited Input Capabilities. Handheld devices may or may not have a full keyboard and may or may not have a pointing/selection device. As an example, the input keys on a data-ready mobile phone include only: The keys normally found on a telephone, usually numeric keys 0-9, *, #, with alphabet letters marked on numeric keys 2-9; Cursor/arrow keys (often just up and down or left and right); A number of dedicated function keys (SEND, END, etc.); and One or more "soft keys" with programmable labels.
• Alphanumeric pagers generally have even fewer keys.
• Input is severely constrained when compared to a typical PC or PDA. Item selection is accomplished through numbered lists or using cursor keys to highlight a choice from a range of options and then directing a function to be performed on that item. Full 2-D cursor control through pointing devices such as touch pads, touch screens, roller balls, are uncommon. A full "QWERTY" keyboard is also uncommon.
• Limited Bandwidth Path. Network bandwidth is usually low due to limitations of the network technology or economic.
• Limited resources such as memory, processing power, permanent storage.
The same is true of other devices: memory, processing power, even battery life. While some devices possess substantial memory or processing power, these devices are the exception, and this processing power may not be applied effectively for user communication with a remote database server.
Because of the smaller displays and limited input, bandwidth, and resources, handheld devices must be viewed as distinctly different from typical PCs, workstations, and other common platforms found on the Internet using the WWW over Hypertext Transfer Protocol (HTTP). For a handheld device to access data on the web, that data must be presented in a way which: makes sense on a small display, is easily navigable using the available context and input capabilities, does not strain the limited bandwidth, memory, or processing power available The Hypertext Transfer Protocol (HTTP) is an application-level protocol for distributed, collaborative, hypermedia information systems. It is a generic, stateless, object- oriented protocol which can be used for many tasks, such as name servers and distributed object management systems, through extension of its request methods. A feature of HTTP is the typing and negotiation of data representation, allowing systems to be built independently of the data being transferred.
HDML, in contrast to HTML, uses a "card" as the basic element, with a number of cards — a "deck" ~ required to make up an application. Cards are individual units of data that are often, on some level, "atomic," that is they cannot be subdivided into smaller pieces without losing context. A deck of cards is linked internally and can be referenced by other decks and cards. Because the card is designed to capture discrete units of data, there are multiple types of cards: cards for displaying data, cards for entering data, and cards for listing indexes or lists of choices. Putting multiple types of data on a single card would imply the user has enough visual context in order to understand the differences. With the small display, the context would be lost. WML, as does HTML, communicates using HTTP. This segregation of data motivated the inventor to develop a navigation model that is transaction based, rather than the navigation model found on the web today, browsing or surfing. A deck of cards may be designed to walk the user through a number of choices and data entry to result in a request which will return the specific data desired. The end result could be a stock quote, a weather report, or a phone number, with varying levels of input and selection required to result in a call for the desired information from the WWW. A deck of cards may be designed to walk the user through a number of choices and data entry to result in a request which will return the specific information requested. See, e.g., Berners-Lee, T., et. al, "Uniform Resource Locators (URL) ", RFC 1738, CERN, December 1994, http://www.internic.net/rfc/rfcl738.txt; Fielding, R., "Relative Uniform Resource Locators" , RFC 1808, UC Irvine, June 1995, http://www.internic.net/rfc/rfcl 808.txt; Fielding, R, et. al., "Hypertext Transfer Protocol - HTTP/LI", RFC 2068, UC Irvine, January 1997, http://www.internic.net/rfc/rfc2068.txt; HDML Specification, http://www.unwired- planet.com (1997), expressly incorporated herein by reference in their entirety.
In the present invention, the method for interpreting a user query is optimized to minimize the time the user must expend to input the query, while allowing focused requests for data. The method also preferably follows an intuitive model of data structure, retained by the user, so that it is not difficult to learn the command syntax. Thus, a user who is somewhat skilled in using the system and knowledgeable about the data to be retrieved will be able to efficiently use the system. Naive users, i.e., those who are not skilled in using the system, will preferably be presented with an intuitive, but shorthand style query input interface, which is easily learned. Users who are searching an unfamiliar data source will face similar obstacles that they would face using other type of searching systems, and indeed, the present invention may be altered from its default behavior to provide other styles of interface, i.e., Boolean, extended text searching commands, such as those provided by "GREP" style software or Lexis®/Nexis® systems.
In one preferred embodiment of the present invention, after one or more data files corresponding to the user query have been selected, the system will preferably offer a number of options. One option is to preferably make available the text string contained in the intermediate (preprocessed) data format, formatted for the wireless device, such as an HDML card or deck, for downloading to the user device. A second option is for the text search server to preferably communicate with the parent application host, translating user commands and formatting outputs in a form suitable for display on the wireless device, thus providing a single point of access to multiple servers. This, for example, preferably allows navigation of a single deck for the selected items, instead of one deck for each application. The text search server may also preferably select a set of "hits" or records corresponding to the user query, and with each card or element on a card, provide sufficient information for the mobile user device to communicate directly with each of the host servers. A hybrid approach may also be adopted, incorporating elements of each of the three options, or allowing the user to select from a host of other options which will be appreciated by those skilled in the art. Where the user communicates directly with a host application server, that server must also support WAP. One advantage of employing the present invention, however, is integration of multiple data sources as well as an optimized interface for the efficient retrieval of information.
Where the selected data file is large, the selection information may preferably also return an "offset" or location in the file where the user query criteria are met. Typically, in one embodiment of the present invention, large files will be preprocessed according to a predetermined protocol into smaller, manageable records, which are the subject of the search selection process.
In the present invention, the preprocessing may be defined as a template, or customized by the user in advance. Thus, where the preprocessing is user-defined, the user's familiarity with the format and content of the data source is particularly advantageous for greater efficiency in subsequent use of the system.
It is also understood that the preprocessing may also preferably encompass an index -based scheme. The source data is may be preferably processed to determine the word content of each file. While the order of terms may also be encoded, typically after each of the search terms is identified in a file or record, the content is searched to determine if it meets all criteria.
A preferred preprocessing of the raw data files removes extraneous spaces, normalizes the text to this upper case, and substitutes non-alphanumeric characters and punctuation with internal codes. Also, intra-record formatting is removed.
It should also be understood that the user may preferably include within the search domain a local database, which may be searched by a local processor. Further, while mass transmission of large data files may be impractical, limited amounts of data may be transmitted by the present invention to the user device for processing. Although the text search application need not reside on a central server or be physically separated from the user, the text search application is preferably separate from the user interface application.
The present invention preferably is capable of servicing the lowest common denominator, for example, a WAP mobile device with limited processing power and display capability. However, the system may also preferably provide enhanced features to more capable user devices. These features may be defined in a user profile or setup preferences, or may be determined adaptively by identifying the type and capabilities of the user device. For example, a user may have various systems, such as a desktop, laptop and PDA, through which the same data retrieval system is accessed. The computer system preferably communicates with the user device to determine its type and inherent capabilities.
It is therefore an object according to the present invention to preferably provide a method for communicating a user query to a computer system, comprising the steps of receiving, in at least one mode of operation, the user query as sets of alphanumeric characters; communicating the sets of characters to the computer system; applying the sets of alphanumeric characters as search parameters against a database; selecting database records in which all of the ordered sets of alphanumeric characters appear in order; and transmitting all or portions of the database records to the user device.
It is also an object of the invention to preferably provide a method wherein the database is generated by processing text documents as records, formatting information of the text documents being separated from semantic content of the text documents. The database may be remote from the user and accessed, for example through a network, through a packet data switched network (public or private), such as the Internet or an intranet, a telecommunication link, wireless link, cellular wireless link, or the like. The preferred content communication protocol is the Wireless Application Protocol (WAP). The user query may preferably be entered through by operation of a keyboard, a limited number of buttons, touchscreen, voice recognition or other means.
It is a further object of the invention to provide a method wherein the database system preferably has a plurality of modes of operation, comprising a mode of operation in which each of the ordered sets of alphanumeric characters is translated as an alphanumeric string sequence, an order of the sets of alphanumeric characters representing an order of strings represented by the ordered sets, separated by an intervening alphanumeric string sequence of arbitrary length, and other different modes.
According to a further object of the invention, the universe of search databases is preferably divided into data search domains, which are selected by the user for application of the search query. The data is preferably preprocessed to a normalized format, which may occur in advance or during processing of the user query. Source data may, for example, have differing data formats, which are converted to a common format during preprocessing.
According to another object of one embodiment of the invention, each of the ordered sets of alphanumeric characters is preferably translated as a beginning of a word in an alphanumeric string sequence. The intervening alphanumeric string sequence, for example, may have a single space located at a terminal end of the alphanumeric string sequence.
According to another object of the invention, the selected records preferably may be retrieved, and/or one or more selected records may be communicated to the user.
Preferably, the system does not transmit a large volume of information for display to the user at one time, but rather segments the selected records according to a ranking or priority and transmits information relating to a limited number of records, allowing the user to prompt for further records.
It is a further object according to the present invention to preferably communicate to the user information identifying selected records, which may be further examined by the user by retrieving the string data which is used by for selection, the source data, translated and formatted by the database system for communication to the user, or an identifier of the selected record, for example a URL or a URL and commands to recall the selected record from a separate server.
These and other objects will become clear through a review of the drawings and detailed description of the preferred embodiments.
BRIEF DESCRIPTION OF THE DRAWINGS
The invention will now be described with respect to the drawings, in which Fig. 1 is a block diagram depicting the operation of a preferred embodiment of the present invention, and Fig. 2 is a flow chart detailing a preferred embodiment of the present invention.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS As shown in Fig. 1 , a user preferably inputs a search query consisting of the data matching parameters for the database search and an identification of the database(s) comprising the desired search domain into the user device 10 which transmits the search query 12 through a network 14 to the computer system 16. Depending upon the network protocol characteristics, a protocol interface 18 may preferably be employed to translate the data received from the network to a format which is compatible with the computer system. Upon receipt of the search query by the computer system, the search query is preferably initially processed by the search engine controller 20, which separates the search query into its components of search criteria 22 and the source database identifier 24. The engine controller preferably translates the search criteria, which provides the data matching parameters for the database search, to a formal search query which can be recognized by the database search engine 26, by the addition of truncator characters and order operators. The source database identifier 24, which preferably defines the database(s) comprising the search domain, is preferably communicated to the database search engine to direct selection of the appropriate databases to be accesses in the search operation. The databases 28 which can be accessed as required to accomplish the search include one or more Web servers, local databases, intranets, local area networks (LANs), wide area networks (WANs) or other searchable databases.
One preferred embodiment of this invention may also include a data preprocessor 30 which reformats large data files using a predefined protocol into smaller, more manageable records which are retained in a preprocessed database 32, for expedient access by the database search engine. Upon completion of the database search, the selected records or the addresses of their storage locations are preferably stored in the search records repository 34. The identifiers and/or predetermined portions of the contents of one or more selected records are preferably tabulated by the records index function 36, and communicated to the user in a selected display format on the user device 10. Again, depending upon the network protocol requirements, a protocol interface 18 may preferably be employed to translate the data received from the computer system to a format which is compatible with the network.
At this point, the user may request further data to be displayed from the selected records by responding to prompts or by other commands to display specified records which are executed by the records access function 38 and displayed on the user device. The user may continue to request additional data from the selected records to be displayed on the user device, or may elect to perform additional database searches.
Fig. 2 shows a flow diagram of a method according to the present invention. The method commences with the user optionally defining preferences and characteristics of system operation 50, such as user interface input and display formats, parameters used in search logic, database selection code definition and preprocessing format and methodology. In use, the user will preferably initially define a database search domain 52, which may be, for example, the user's personal information manager database. The user then preferably enters a search query 54, consisting of one or more alphanumeric strings. This user query is then preferably communicated to the computer system 56, and translated to a formal search query 58, by the addition of truncator characters and order operators. The system preferably executes the formal search and selects records 60. Information identifying one or more selected records is preferably communicated to the user 62, from which the user may determine whether to request further information 64. If the user requests such information, the content or an identifier of the content is preferably communicated to the user 66. The user may preferably elect to terminate the data retrieval process 68 or may interact further with the system 66, or alternatively 52 or 54.
For example, if the user has a scheduled meeting with a person named "Bob", as well as Bob's telephone number (or even just area code), e.g., (201) 555-1234, and needed additional data regarding the meeting from the user's personal calendar database, such as time, agenda, or location, the user would first initiate communication with the search engine using a wireless handheld portable device providing support for WAP, e.g., WML. After identifying himself (a step which may be automatic and not require user intervention), the user would first define the data domain of interest, in this case schedules and/or meetings, which may be included in a full contact and personal information manager interface. This data domain is predefined, and the data is either processed on demand or in advance to convert from a native format of the associated data management program to a text-only format of one record per string, omitting control characters, extended formatting, and the like, and presented using standard alphanumeric characters, for example in all capital letters.
While the data may be preprocessed in advance, it may also be preprocessed as needed, for example to provide current information. Therefore, the preprocessing software may provide translation from a wide variety of text source formats to a common intermediate format, preferably an intermediate format which may be converted to WAP (WML) for transmission to the user, avoiding the need to invoke the parent application. On the other hand, the formatted text may also include a reference, such as a uniform resource locator (URL), to the primary data or parent application and primary data, allowing the user to then communicate directly with the parent application, for example to update data or obtain further related data. There are a number of known methods for communicating parameters along with an identification of a server, for example common gateway interface (CGI) scripts. According to the present invention, a retrieved record or record identification may be accompanied by a complete search definition for a parent application, allowing the user to bypass screens of the parent application and avoiding inefficiency of passing communications through the text search application after selection. On the other hand, where the parent application is not WAP compatible, or where the data source is difficult to communicate with, the system according to the present invention may fully service the mobile client by formatting and transmitting all data requested.
The user knows that, in the data structure, name precedes telephone number, and types "BOB", space, "201", followed by an enter or confirm. The query "BOB 201" is transmitted through a wireless service provider to a private switched data network, which interfaces with the Internet. The WAP application and service provider pass through a Uniform Resource Locator (URL), with the user query appended as a parameter. Thus, the user query is directed to the user's data server. The user query is then parsed as a text string search command by a text search application. The text search application interprets the user query as a request for two sequential portions of the string, separated by an intervening string of arbitrary or zero size.
Based on the limited data domain, user's knowledge of the data set (for example, that there are only one or two records in the database relating to individuals named Bob with telephone area codes in northern New Jersey), the limited search string is likely to retrieve only a small number of records. If the number retrieved exceeds the user's expectations, then, before examining the content of the records, the user may refine the query by adding a further restriction, such as the remainder of the telephone number, date of meeting, or other pertinent known information, or a narrower data domain selection, until a small number of records is retrieved. The user, using WAP (WML) interface techniques, then is able to examine the small number of retrieved records, formatted for the display limitations of the handheld device.
In another preferred embodiment of the present invention, a handheld device preferably communicates through a wireless network. As one function of the device, voice services are preferably supported through a telephone value added services server. The handheld device may also preferably communicate using WML with a WAP proxy which receives input from a filter or a combined WAP proxy/filter. In either case, the communicated data is derived from a Web server, which communicates in HTML.
This preferred embodiment provides a native WML WAP search server, which preferably receives and translates a user query from the handheld device (or any other device from the wireless network or Internet), through the existing infrastructure, including the Web Server, and preferably provides a response to the handheld device. The WAP search server preferably searches the string database for matching records. The string database contains records preferably derived from a preprocessor, which may be physically the same system as the WAP search server. The preprocessor preferably obtains the raw data from one or more Web servers, local databases, intranets, and the like. Data domains may be defined by the source of the data, by content, or by other criteria. These criteria are defined by the user.
The following listing details the C++ language source code for the one text search method of normalized text of a preferred embodiment of the present invention: SOURCE CODE
II Copyright Crescent Technologies, Inc., 1997 // Product Name: SmartFinder // Description: Test program for a possible implementation of SmartFinder.
// Usage: sfind <pattern>
// Strings to search are read from standard input
// Date modified: 10/05/97
// Author: Skip Gilbrech iiiiiiiiiiiiuniiiiiiiiiiiiimiiiiiiiiiiiiiiiiiimiiiiiiiimuiii
#include <stdio.h> #include <ctype.h> #include <string.h> #include <stdlib.h>
static int pmatch(const char * pat, const char * str)
{ while (1)
{ int pat_char = *pat++;
switch (pat_char) { default: if (*str++ != pat_char) // fail if chars don't match return 0; break; case '*': while (*pat = '*') // Gobble runs of '*'
{
++pat;
} if (*pat = '\0') // succeed if end of pattern
{ return 1;
} while (*str) { if (pmatch(pat, str++)) // succeed if remainder of string matches
{ return 1 ;
} } return 0; break; case '?': if (*str++ = '\0') // fail if end of string { return 0;
} break; case '\0': return *str = *\0'; // succeed if both strings end at same time break;
} II helper for normalize static inline int isalnum_or_wild(int c) { return isalnum(c) || c = '*' || c = '?'; }
static void normalize(char * norm, const char * s, int ispattern)
{ if (ispattern) { int lastchar_star;
*norm++ = '*'; // begin with wildcard
for (lastchar_star = 0; *s; ++s)
{ if (isspace(*s) || ! isalnum_or_wild(*s))
{ if (! lastchar_star) {
*norm++ = '*'; lastchar_star = 1 ;
} } else // alpha-numeric or wild card
{ *norm++ = islower(*s) ? toupper(*s) : *s; lastchar_star = (*s = '*' ? 1 : 0);
} } *norm++ = '*'; // end with wildcard
} else // string to search, zap all but alpha-numeric & wild cards
{ for ( ; *s; ++s)
{ if (isalnum_or_wild(*s))
{
*norm++ = islower(*s) ? toupper(*s) : *s;
}
}
}
*norm = '\0';
}
static int match (const char * pnorm, const char * srchstring)
{ int matched; char * snorm;
// normalize string to be searched (pattern already normalized) snorm = (char *)malloc(strlen(srchstring) + 1); normalizefsnorm, srchstring, 0); matched = pmatch(pnorm, snorm); free(snorm); return matched;
int main (int argc, char** argv)
{ char search_string[256];
if (argc < 2) return 1;
// normalize pattern char * pnorm = (char *)malloc(strlen(argv[l]) + 3); normalize(pnorm, argv[l], 1);
while (1)
{ char * s;
if (! gets(search_string)) break;
// make sure there's something to look for for (s = search_string; *s; ++s) if (isalnum(*s)) break; if(!*s) continue; if (match(pnorm, search_string))
{ printf("%s\n", search_string); fflush(stdout); }
}
return 0;
}
Many changes, modifications, variations, combinations, subcombinations and other uses and applications of the subject invention will be and become apparent to those skilled in the art after considering this specification and the accompanying drawings which disclose the preferred embodiments thereof. All such changes, modifications, variations and other uses and applications which do not depart from the spirit and scope of the invention are deemed to be covered by the invention, which is to be limited only by the claims which follow.

Claims

What is claimed is:
1. A method for retrieving information from one or more predefined databases, said method comprising the steps of:
(a) entering one or more sets of alphanumeric characters corresponding to a search query into a transmission device;
(b) transmitting said alphanumeric characters to a computer system;
(c) entering said alphanumeric characters into a computer database search engine;
(d) searching said predefined databases for data records containing said alphanumeric characters using said computer database search engine;
(e) identifying data records containing said alphanumeric characters; and
(f) transmitting said identified data records to said transmission device.
2. The method of claim 1 further comprising the step of preprocessing the data records contained in one or more of said predefined databases to make said data records more readily searchable for said alphanumeric characters.
3. The method of claim 1, wherein at least one of said sets of alphanumeric characters corresponds to one or more distinct databases and one or more remaining sets of alphanumeric characters correspond to the information sought to be retrieved.
4. The method of claim 1 further comprising the step of converting said sets of alphanumeric characters from the protocol transmitted from said transmission device into a different protocol which is in a form usable by said computer system.
5. The method of claim 1, wherein said sets of alphanumeric characters are separated by one or more predefined symbols or alphanumeric characters.
6. The method of claim 1, wherein said alphanumeric characters are transmitted wirelessly.
7. The method of claim 1, wherein said transmission device is a wireless communication device.
8. The method of claim 7, wherein said wireless communication device uses a Wireless Application Protocol interface.
9. The method of claim 1, wherein said databases comprise the Internet.
10. The method of claim 1, wherein said databases comprise the World Wide Web.
11. A method for retrieving information from one or more databases, said method comprising the steps of:
(a) entering two or more ordered sets of alphanumeric characters into a transmission device, wherein at least one of said sets of alphanumeric characters corresponds to one or more distinct databases and one or more remaining sets of alphanumeric characters correspond to the information sought to be retrieved;
(b) transmitting said alphanumeric characters to a computer system;
(c) separating said ordered sets of alphanumeric characters between those which correspond to said distinct databases and those which correspond to the information sought to be retrieved; (d) inputting said ordered sets of alphanumeric characters which correspond to the information sought to be retrieved into a database search engine;
(e) searching said distinct databases for data records containing the alphanumeric characters using said computer database search engine;
(f) identifying data records containing said alphanumeric characters; and (g) transmitting said identified data records to said transmission device.
12. The method of Claim 11 further comprising the step of preprocessing the data records contained in one or more of said predefined databases to make said data records more readily searchable for said alphanumeric characters.
13. The method of Claim 11 further comprising the step of converting said sets of alphanumeric characters from the protocol transmitted from said transmission device into a different protocol which is in a form usable by said computer system.
14. The method of claim 1 1, wherein said sets of alphanumeric characters are separated by one or more predefined symbols or alphanumeric characters.
15. The method of claim 11, wherein said alphanumeric characters are transmitted wirelessly.
16. The method of claim 11, wherein said transmission device is a wireless communication device.
17. The method of claim 11, wherein said wireless communication device uses a Wireless Application Protocol interface.
18. A system for remotely searching for records of information stored in a computer database, the system comprising:
(a) a means for transmitting a search query comprised of one or more sets of alphanumeric characters from a remote device to a computer system, one or more of said sets of alphanumeric characters corresponding to one or more designated categories of data stored in said computer database;
(b) a means for searching all of the records of corresponding categories of data in said computer database for said sets of alphanumeric characters; and (c) a means for transmitting said search results and corresponding data to said remote device.
19. The system of claim 18, wherein one or more of the designated categories of data stored in said computer database have been preprocessed in order to reduce the amount of searchable data in said designated categories of data.
20. The system of claim 18, further comprising a means to convert said search query from the protocol transmitted from said remote device into a different protocol which is in a form searchable in said computer database.
21. The system of claim 18, wherein said sets of alphanumeric characters in said search query are separated by one or more predefined symbols or alphanumeric characters.
22. The system of claim 18, wherein said a means for transmitting a search query is wireless.
23. The system of claim 18, wherein said computer database comprises the Internet.
24. The system of claim 18, wherein said computer databases comprises the World Wide Web.
25. A method for communicating a search query to a computer system, comprising the steps of:
(a) receiving, in at least one mode of operation, the search query as ordered sets of alphanumeric characters, each of the ordered sets of alphanumeric characters being separated by a character or symbol;
SUBSTITUTE SHEET (RULE 25) (b) communicating the ordered sets of alphanumeric characters to the computer system;
(c) translating, in the computer system, each of the ordered sets of alphanumeric characters as a partial alphanumeric string sequence, an order of the sets of alphanumeric characters representing an order of strings represented by the ordered sets, separated by an intervening alphanumeric string sequence or arbitrary length; and
(d) applying the translated ordered sets of alphanumeric characters as search parameters against a database, selecting at least one database record in which all of the ordered sets of alphanumeric characters appear in order.
26. The method of claim 25, wherein the database is generated by processing text documents as records, formatting information of the text documents being separated from semantic content of the text documents.
27. The method of claim 25, wherein the search query is generated by a human user.
28. The method of claim 25, wherein the computer system is remote from location in which the search query is entered.
29. The method of claim 25, wherein the computer system has a plurality of modes of operation, comprising a first mode of operation in which each of the ordered sets of alphanumeric characters is translated as an alphanumeric string sequence, an order of the sets of alphanumeric characters representing an order of strings represented by the ordered sets, separated by an intervening alphanumeric string sequence of arbitrary length, and a second mode of operation in which each of the ordered sets of alphanumeric characters is not translated as an alphanumeric string sequence, an order of the sets of alphanumeric characters representing an order of strings represented by the ordered sets, separated by an intervening alphanumeric string sequence or arbitrary length.
30. The method of claim 25, wherein a subset of available data is designated upon which the search query is to be applied.
31. The method of claim 25, wherein each of the ordered sets of alphanumeric characters are translated as a beginning of a word in an alphanumeric string sequence.
32. The method of claim 25, wherein the intervening alphanumeric string sequence has a designated alphanumeric character or symbol located at a terminal end of the alphanumeric string sequence.
33. The method of claim 25, wherein the search query is communicated wirelessly.
34. The method of claim 25, wherein the search query is entered through a wireless handheld communications terminal.
35. The method of claim 25, wherein the search query is defined using Wireless Application Protocol interface.
36. The method of according to claim 25, wherein the search query is entered through a keyboard.
37. The method of claim 25, wherein the search query is entered through a touchscreen.
38. The method of claim 1, further comprising the step of retrieving the selected database records.
39. The method of claim 38, further comprising the step of communicating at least one of the selected database records to the source of the search query.
40. The method of claim 39, wherein the at least one selected database records are presented through a Wireless Application Protocol interface.
41. The method of claim 25, wherein the database comprises a plurality of preprocessed files having normalized data.
42. The method of claim 41 , wherein at least two of the plurality of preprocessed files have differing data formats.
43. The method of claim 25, wherein the ordered sets of alphanumeric characters are communicated over the Internet.
44. The method of claim 25, wherein the ordered sets of alphanumeric characters are communicated initially over a cellular wireless link and subsequently over a public packet switched data network.
45. The method of claim 25, wherein the computer system communicates an identifier of a source of a selected database record to the source of the search query.
46. The method of claim 25, wherein the computer system translates and communicates data from a database server in a first format to the user as data in a second format.
47. The method of claim 25, wherein the computer system translates the search query and communicates the translated search query to the source of the search query with an identification of the a selected data record.
PCT/US1998/021075 1998-10-07 1998-10-07 Database retrieval system Ceased WO2000020992A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
AU10689/99A AU1068999A (en) 1998-10-07 1998-10-07 Database retrieval system
PCT/US1998/021075 WO2000020992A1 (en) 1998-10-07 1998-10-07 Database retrieval system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/US1998/021075 WO2000020992A1 (en) 1998-10-07 1998-10-07 Database retrieval system

Publications (1)

Publication Number Publication Date
WO2000020992A1 true WO2000020992A1 (en) 2000-04-13

Family

ID=22268040

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1998/021075 Ceased WO2000020992A1 (en) 1998-10-07 1998-10-07 Database retrieval system

Country Status (2)

Country Link
AU (1) AU1068999A (en)
WO (1) WO2000020992A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001086896A1 (en) * 2000-05-05 2001-11-15 James Ewing A method and a system relating to protocol communications within a pre-existing www server framework
DE10048743A1 (en) * 2000-09-29 2002-05-29 Siemens Ag automation system
GB2374757A (en) * 2001-01-05 2002-10-23 Michael John Stubbs Remote audio-video instructional database
WO2003102815A1 (en) * 2002-05-30 2003-12-11 Nokia Corporation Symbol-based query mechanism
EP1826943A1 (en) * 2006-07-31 2007-08-29 Siemens Aktiengesellschaft Method for searching information in a network

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4255796A (en) * 1978-02-14 1981-03-10 Bell Telephone Laboratories, Incorporated Associative information retrieval continuously guided by search status feedback
US5253341A (en) * 1991-03-04 1993-10-12 Rozmanith Anthony I Remote query communication system
US5781773A (en) * 1995-05-10 1998-07-14 Minnesota Mining And Manufacturing Company Method for transforming and storing data for search and display and a searching system utilized therewith
US5804803A (en) * 1996-04-02 1998-09-08 International Business Machines Corporation Mechanism for retrieving information using data encoded on an object

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4255796A (en) * 1978-02-14 1981-03-10 Bell Telephone Laboratories, Incorporated Associative information retrieval continuously guided by search status feedback
US5253341A (en) * 1991-03-04 1993-10-12 Rozmanith Anthony I Remote query communication system
US5781773A (en) * 1995-05-10 1998-07-14 Minnesota Mining And Manufacturing Company Method for transforming and storing data for search and display and a searching system utilized therewith
US5804803A (en) * 1996-04-02 1998-09-08 International Business Machines Corporation Mechanism for retrieving information using data encoded on an object

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
"WIRELESS APPLICATION PROTOCOL ARCHITECTURE SPECIFICATION", WAP ARCHITECTURE VERSION 30 APR 1998, XX, XX, 30 April 1998 (1998-04-30), XX, pages COMPLETE, XP002919030 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001086896A1 (en) * 2000-05-05 2001-11-15 James Ewing A method and a system relating to protocol communications within a pre-existing www server framework
DE10048743A1 (en) * 2000-09-29 2002-05-29 Siemens Ag automation system
DE10048743C2 (en) * 2000-09-29 2002-11-28 Siemens Ag automation system
US6915174B2 (en) 2000-09-29 2005-07-05 Siemens Aktiengesellschaft Automation installation
GB2374757A (en) * 2001-01-05 2002-10-23 Michael John Stubbs Remote audio-video instructional database
GB2374757B (en) * 2001-01-05 2003-06-11 Michael John Stubbs Remote audio visual instruction system
WO2003102815A1 (en) * 2002-05-30 2003-12-11 Nokia Corporation Symbol-based query mechanism
EP1826943A1 (en) * 2006-07-31 2007-08-29 Siemens Aktiengesellschaft Method for searching information in a network
WO2008014907A1 (en) * 2006-07-31 2008-02-07 Nokia Siemens Networks Gmbh & Co. Kg Method for searching information in a network

Also Published As

Publication number Publication date
AU1068999A (en) 2000-04-26

Similar Documents

Publication Publication Date Title
US7809716B2 (en) Method and apparatus for establishing relationship between documents
US7844594B1 (en) Information search, retrieval and distillation into knowledge objects
US7243095B2 (en) Prose feedback in information access system
US6745181B1 (en) Information access method
CA2578791C (en) Disambiguating ambiguous characters
US7376641B2 (en) Information retrieval from a collection of data
CA2643754C (en) Searching for commands to execute in applications
US7136846B2 (en) Wireless information retrieval
US20020103933A1 (en) Internet-access enabled device personalization
US20030093409A1 (en) Search engine interface and method of controlling client searches
WO2007070410A2 (en) Mobile device retrieval and navigation
EP2076855A1 (en) Personalized search using macros
US20040210560A1 (en) Method and system for searching a wide area network
US10255362B2 (en) Method for performing a search, and computer program product and user interface for same
Aridor et al. Knowledge encapsulation for focused search from pervasive devices
WO2000020992A1 (en) Database retrieval system
JP2002116983A (en) Web content conversion method and system
US8640017B1 (en) Bootstrapping in information access systems
EP1176518A2 (en) Method and system in a computer network for searching and linking web sites
JP4773761B2 (en) Information search server, information search method, information search program
KR100491254B1 (en) Method and System for Making a Text Introducing a Web Site Directory or Web Page into a Hypertext
KR20020062464A (en) A method of searching the Web using meta search engines and the system thereof
KR20020004041A (en) File search service system and method through the internet
Eskicioğlu A Search Engine for Turkish with Stemming

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AL AM AT AU AZ BA BB BG BR BY CA CH CN CU CZ DE DK EE ES FI GB GE GH GM HR HU ID IL IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT UA UG US UZ VN YU ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW SD SZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

NENP Non-entry into the national phase

Ref country code: CA

122 Ep: pct application non-entry in european phase