WO2013096625A1

WO2013096625A1 - Proximity-based electronic record query system

Info

Publication number: WO2013096625A1
Application number: PCT/US2012/070955
Authority: WO
Inventors: Marek Tomaszewski
Original assignee: Marek Tomaszewski
Priority date: 2011-12-22
Filing date: 2012-12-20
Publication date: 2013-06-27

Abstract

A system compares a target case against a database of records, responsive to proximity between that target case and the database's records, returning records with highest proximity. Proximity is responsive to how many attributes match and how many attributes differ. Each attribute has an attribute value representing an index into that attribute's information. The searchable records are ordered by number of attributes: Records with few attributes precede those with many attributes. They are further ordered lexicographically by attribute value: Records with smaller attributes precede those having only larger attributes, as described herein. This allows rapid elimination of records whose proximity cannot match the least-best records thus found. Records are examined for attributes which most contribute to increased proximity, allowing search specificity improvement by updating those attributes.

Description

Proximity-based electronic record query system

BACKGROUND

The amount of information available electronically has become so large that it is not easy for one person to comprehend all of it. This problem is particularly acute in the field of medicine. Medical personnel often rely on known literature to decide on treatments for patients. Literature often relics on analysis of many cases from medical studies. It takes time to gather information, process that information, and publish the results in a form that medical personnel can use, or in a form that can be prepared as guidelines for medical and other personnel. One problem in the known art is that the number of possible new treatments, and the amount of information about those possible new treatments, can outstrip the ability of the literature to follow and disseminate that information to medical personnel.

In the medical field, another aspect relating to information is the increased use of electronic records to maintain information about patients. EHR (electronic health records), also known as EMHR (electronic medical health records) are becoming both more common and more standardized, and typically include medical information about the patient which can be of value to medical personnel. While there has been work in the area of using electronic health records for appointments, billing, and record-keeping, there is still value in comparing information from electronic health records with known case histories to assist medical personnel in diagnosis and treatment.

One problem in the known art is that clinical support systems for health-related information can involve construction of a set of rules from those electronic health records with known case histories. In such cases, the rules embody expert knowledge, which can be applied to electronic health records for new patients by an automated rule-based system. While these techniques can generally achieve the purpose of extracting information from earlier case histories, they are subject to several drawbacks. First, these techniques involve extensive expert assistance in interpreting those earlier case histories. That expert assistance is often only as good as theory known by the experts generating the rules to be applied in response to those earlier case histories. Thus, new discoveries, or emergent phenomena in patient care, are not addressed. Second, these techniques involve development of a new set of rules for each type of patient care. Expert knowledge applicable to, for example, heart ailments, is not often applicable to, for example, bone fractures. Thus, patient issues which might be common to multiple fields of medicine are not addressed. Third, these techniques substitute the judgment of a computing system for medical personnel. Expert knowledge which has been compiled from unknown sources is often untrusted by medical personnel. Thus, medical personnels' own impressions of particular patients are not addressed.

Another problem in the known art is that clinical support systems for health-related information can involve application of mathematical techniques, such as Bayesian statistics and artificial neural networks. In such cases, the techniques involve probability estimates of diagnoses and of treatments. While these techniques can generally achieve the purpose of extracting information from earlier case histories, they are subject to the drawback that medical personnel generally distrust probability estimates and related mathematical techniques. Medical personnel are generally more comfortable with exchanging infonnation with other medical personnel, collaborating on identifying important features of patient ailments, diagnoses, and treatments. In cases in which medical personnel distrust clinical support systems, the effectiveness of those support systems can be greatly reduced.

For example, the effect of combining experience from experts in more than one area can be seen in the creation of social networks in which physicians exchange information about cases and collaborate on ways to address those cases. Medical personnel are generally aware of the value of considering a relatively complete body of information about patients when performing diagnosis and treatment. This body of infonnation can include a wide variety of factors, including symptoms, test results, ailment and treatment history, hereditary information, current and fonner medications, and otherwise. Moreover, medical infonnation can be complex in nature and is often not easy to compare. Medical personnel enhance their experience by review of actual cases.

SUMMARY OF THE DESCRIPTION

A query system for electronic records compares a target case against a database of records which includes a relatively large number of known cases, at least some of which arc expected to be similar to the target case. Having obtained a set of known cases which are similar to the target case, the query system selects those cases with desired features. Some examples might include information about possible treatments, information about a likely length of treatment, information about likely resolution of the target case, and information about possible complications involved in treatment. Having identified those cases which are most similar to the target case and which preferably have a desired outcome, a user of the query system can determine a course of action which is deemed best to proceed with. The query for electronic records is responsive to a proximity measure between the target case and each one of the records found in the database. The proximity measure between a pair of records is determined in response to how many attributes are shared in common by those records and how many attributes differ. When the target case is compared with a relatively large database of records, the target case being compared rapidly against the database, and those records in the database with highest proximity are returned.

In one embodiment, those records with highest proximity are returned and examined for attributes which can contribute to increased proximity. The user is requested to add attribute values for the target case, which can then be updated with information about those attributes, with the effect of improving search specificity. Selecting some number of records with highest proximity, updating the target case, and repeating the operation of the query system with the updated target case, can be performed as an iterative process. As this iterative process proceeds, the target case is successively updated, with the effect of providing a successive set of alternative matches to the user.

In one embodiment, each attribute has an attribute value that represents an index into information associated with that attribute, with the attributes maintained in a data structure in increasing order of those attribute values. A set of records to be searched is ordered by how many attributes each record has present, so that records with a relatively smaller number of attributes precede those with a relatively larger number of attributes. That is, records with four attributes would always precede records with five attributes, regardless of what those attributes are.

In one embodiment, the set is ordered lexicographically by attribute value. For those records with the same number of attributes, for example, for all records having exactly five attributes, the ones whose lowest-valued attributes were relatively smaller would always precede those whose lowest-valued attributes were relatively larger. That is, records whose lowest-valued attributes had attribute value " 14" would always precede records whose lowest- valued attributes had attribute value " 15". Moreover, for records whose lowest-valued attributes were equal, the ones whose 2nd-lowest-vaIued attributes were relatively smaller would always precede those whose 2nd-lowest-values attributes were relatively larger, similar to a dictionary ordering of English words.

For example, the record including attribute values {345, 500, 999, 1001 } would precede the record including attribute values { 1 , 2, 3, 4, 5, 6} , as the former record includes fewer attribute values (4 attribute values) than the latter record (6 attribute values). Similarly, the record including attribute values {345, 500, 999, 1001 } would precede the record including attribute values {348, 350, 375, 400}, as the former record begins with a smaller value (attribute value 345) than the latter record (attribute value 348). Similarly, the record including attribute values {345, 500, 999, 1001 } would precede the record including attribute values {345, 502, 575, 1000}, as the former record begins with smaller earlier values (attribute values 345 and 500) than the latter record (attribute value 345 and 502).

As described herein, this ordering allows a search to rapidly eliminate those records which cannot have a proximity to the target case as good as the least-best record found so far.

In one aspect, the invention provides an efficient search technique combined with a novel mining data structure search technique. This has the effect that close matches of medical records to a query record can be resolved relatively quickly, even with a large set of earlier records to review. A user can request a selected number of those medical records which best match the query record, where the query record can represent an actual case which medical personnel are seeking to resolve. The search technique can readily obtain relatively close matches, while discarding those medical records which differ substantially from the query in tenns of symptoms, test results, medical history, hereditary factors, medications, and otherwise. The search technique can also indicate where the user can improve the search, by providing additional specific information to improve its determination of proximity. Having obtained those relatively close matches, the user can obtain specific information about each of them, such as for example information regarding treatments, medications, complications, counterindications, and otherwise. Moreover, the search technique provides the user with the ability to require matching cases to satisfy a set of selected constraints, such as for example to only return successfully resolved cases, or only cases for a specific gender. This has the effect that medical personnel can more intuitively access a large mining database of medical records and information.

BRIEF DESCRIPTION OF FIGURES

Figure 1 shows a conceptual diagram of a system.

Figure 2 shows a conceptual drawing of a data structure.

Figure 3 shows a conceptual drawing of a method.

Figure 4 shows a conceptual drawing of a portion of a method.

DESCRIPTION GENERALITY OF THE DESCRIPTION

Technologies shown or suggested by this description should also be thought of in their most general possible form. This includes, without limitation, the following:

The phrases and terms "constantly," "continually," "from time to time,"

"occasionally," "periodically", and similar phrases and terms, generally indicate any circumstance in which a method or technique, or an apparatus or system, operates over a duration of time, including without limitation any circumstance in which that operation occurs only part of that duration of time. For example and without limitation, these terms would include, without l imitation, methods which perform an operation as frequently as feasible, on a periodic schedule such as once per second or once per day, in response to an alarm or trigger such as a value reaching a threshold, in response to a request or an implication of a request, in response to operator intervention, otherwise, and to combinations and conjunctions thereof.

The phrases and terms "in one embodiment," "one embodiment," and similar phrases and terms, generally indicate that a particular characteristic, feature, or structure, described herein is included in at least one embodiment of the invention. Uses of these phrases herein do not necessarily all refer to the same embodiment. Rather, the specific particular characteristic, feature, or structure, described herein might be combined in any suitable manner into one or more distinct possible embodiments.

The phrases and terms "methods, physical articles, and systems," "techniques", and similar phrases and terms, generally indicate any material suitable for description, including without limitation all such material within the scope of patentable subject matter, or having ever been considered within the scope of patentable subject matter, or which might colorably be within the scope of patentable subject matter, notwithstanding most recent precedent. The term "relatively", and similar phrases and terms, generally indicates any relationship in which a comparison is possible, including without limitation "relatively less," "relatively more," and the like. In the context of the invention, where a measure or value is indicated to have a relationship "relatively," that relationship need not be precise, need not be well- defined, need not be by comparison with any particular or specific other measure or value. For example and without limitation, in circumstances in which a measure or value is "relatively increased" or "relatively more," that comparison need not be with respect to any known measure or value, but might be with respect to a measure or value held by that measurement or value at another place or time.

The term "substantially", and similar phrases and terms, generally indicates any circumstance in which a determination, measure, value, or otherwise, is equal, equivalent, nearly equal, nearly equivalent, or approximately, what the measure or value is recited. The terms "substantially all" and "substantially none" and similar phrases and terms, generally indicate any circumstance in which all but a relatively minor amount or number (for

"substantially all") or none but a relatively minor amount or number (for "substantially none") have the stated property. The terms "substantial effect" and similar phrases and terms, generally indicate any circumstance in which an effect might be detected or determined.

The phrases "this application," "this description", and similar phrases and terms, generally indicate any material shown or suggested by any portions of this application, individually or collectively, including all documents incorporated by reference or to which a claim of priority can be made or is made, and include all reasonable conclusions that might be drawn by those skilled in the art when this application is reviewed, even if those conclusions would not have been apparent at the time this application is originally filed.

The invention is not in any way limited to the specifics of any particular examples disclosed herein. After reading this application, many other variations are possible which remain within the content, scope and spirit of the invention; these variations would be clear to those skilled in the art, without undue experiment or new invention.

TERMS AND PHRASES Terms and phrases used herein are intended to be exemplary, not limiting in any way.

Some examples of terms and phrases used herein include the following:

The text "electronic record", sometimes abbreviated "ER" herein, and variants thereof, generally refers to any data structure disposed to be reviewed or searched as described herein, such as for ex-ample an "electronic health record" or an "electronic medical health record", the two latter sometimes abbreviated "EMHR" herein. Although this application primarily describes electronic records which relate to medical information, in the context of the invention, there is no particular requirement for any such limitation. For example, electronic records might relate to the automobile industry. The text "attribute value", and variants thereof, generally refers to any data element indicating or referring to an attribute that might be applicable to an electronic record. For example, for electronic records that relate to patients, an indicator of whether the patient is within the age range of 0-1 year would be an attribute value, an indicator of whether the patient is pregnant would be an attribute value, an indicator of whether the patient is currently hospitalized would be an attribute value, and so on. Although this application primarily describes attribute values which are integers, in the context of the invention, there is no particular requirement for any such limitation. For example, attribute values could be treated as pointers into an external data structure.

The text "current case", "query", "target case", and variants thereof, generally refers to a set of attributes which collectively describe what the user is searching for. i one embodiment, the target case represents a set of information relating to a patient whose case is being investigated, and results which match the target case represent particular sets of information relating to past cases for which information and results are available.

After reading this application, those skilled in the art would recognize and understand all terms and phrases used in combination or conjunction with these terms and phrases, without undue experimentation or new invention.

In one embodiment, a mining database of electronic records is constructed, in which each electronic record is identified by a set of attribute identifiers, each attribute identifier having an attribute value. Each electronic record has a proximity value defined with respect to a query. The query might itself be described as an electronic record itself, possibly an incomplete one, in which case the proximity value is defined with respect to pairs of electronic records and defines a proximity between each such pair. In one embodiment, the proximity value has properties which are convenient for rapidly ruling out electronic records which cannot be any closer to the query than a set of best matches which have already been found, with the effect of allowing rapid elimination of unfruitful searches and rapid termination of the search itself.

FIGURES AND TEXT

In one embodiment, the electronic records in the mining database are ordered, first b> number of attribute values, so that those electronic records with fewer attribute values are positioned earlier than those electronic records with more attribute values. Within those electronic records having the same number of at tribute values, the electronic records are ordered lexicographically by attribute values, as described herein with respect to the

"SUMMARY OF TH E INVENTION", and otherwise. A matching technique selects a group of currently-best matches, determines which electronic records have too distant a proximity value to be considered, and rules out those overly-distant electronic records from further search. When all electronic records have been either ruled out as too distant, or examined for their actual proximity value, the search will have found the set of best matches, which are returned in response to the query.

In one embodiment, the target case can be updated by the user to add or modify attributes, with the effect that those cases in the mining database which match the target case might have their matching order changed. When cases have their matching order changed, the new best matches are presented to the user. The user can further update the target case, iterating until the user determines that an adequate matching case was found, or until no further improvement in the set of best matches is achieved. Query System

FIGURE 1

Figure 1 shows a conceptual diagram of a system.

A query system 1 00 includes elements as shown in the figure, including at least one or more mining databases 1 10, one or more record sources 120a and 120b, one or more query servers 130, one or more user interface elements 140, and possibly other elements. The query system 100 might be accessed by one or more users 150.

Each mining database 1 10 includes memory or mass storage, maintaining a set of

ER's (electronic records) 1 1 1 . Although this application primarily describes electronic records which relate to medical information, in the context of the invention, there is no particular requirement for any such limitation. For example, electronic records might relate to the automobile industry. Those skilled in the art would recognize, after reading this application, that electronic records could relate to any field in which describing individual cases is relevant, as well as other fields.

Although this application primarily describes computing devices as having specific hardware and software components for each such device, in die context of the invention, there is no particular requirement for any such limitation. For example, one or more computing devices described herein could be implemented using a cloud computing or virtualization technique, in which specific hardware might be shared among more than one computing device, or in which more than one computing device might collectively share specific hardware. After reading this application, those skilled in art would realize that virtualization would actually be preferred for searching databases of particular size.

Parsing Records

The electronic records 1 1 1 can be placed directly into one or more mining databases 1 10, either directly by one or more data entry programs 121 or data entry clerks 122, as shown in the figure, or by parsing data records maintained in another format. In one embodiment, EMHR's (electronic medical health records) are maintained in XML format, or in another standardized format, in one or more medical record systems 112, from which they are retrieved and parsed by an EMHR parser 1 13. There might be several such EMHR parsers 1 13, which might operate on several such medical record systems 1 12. The EMHR parsers 1 13 might each operate on separate medical record systems 1 12, or the EMHR parsers 1 13 might collectively operate concurrently with respect to one or more medical record systems 1 12.

As described herein, this has the effect that one or more medical record systems 112 perform as EMHR repositories, which are accessed by one or more EMHR parsers 1 13, which provide electronic records 1 1 1 for a mining database. As described herein, the mining database includes records 1 1 1 for search by one or more query servers, in one embodiment, there could be multiple EMHR parsers 1 13, which operate on selected portions of those one or more medical record systems 1 12, with the effect of providing, in parallel, a unified mining database. In one embodiment, there could be multiple query servers, which operate on separate portions of the unified mining database, with the effect of providing, in parallel, a sel of query results.

For example, in one embodiment, each EMHR parser 1 13 is assigned a relatively small number of assigned 1CD (international classification of disease) codes, with the effect that each EMHR parser 1 13 can operate without becoming overwhelmed by the amount of data it processes from the medical record systems 1 12. Each EMHR parser 1 13 can maintain the resultant electronic records 1 1 1 in one or more mining databases 1 10. For example the mining databases 1 10 might distribute the records 1 1 1 so that each mining database 1 10 has assigned to it only those records 1 1 1 associated with a relatively small number of assigned ICD codes.

In one embodiment, the EMHR parsers 1 13 presume that EMHR's are maintained in a standard format.

The following standards, promulgated by ASTM International, are of interest:

— E1384-07, Standard Practice for Content and Structure of the Electronic Health Record (EHR)

—E l 633-08a, Standard Specification for Coded Values Used in Electronic Health Records

— HL-7, Health Level 7 Interface Standard

The HL-7 standard prescribes how EMHR's are retrieved from a medical database system, such as might be maintained by a hospital or a medical office.

Each electronic record 1 1 1 maintains a set of attribute values associated with that one record 1 1 1. As described herein, each attribute value indicates a fact about the case described by that record 1 1 1. For example, where a record 1 1 1 refers to a patient, an indicator of whether the patient is within the age range of 0-1 year would be an attribute value, an indicator of whether the patient is pregnant would be an attribute value, an indicator of whether the patient is currently hospitalized would be an attribute value, and so on.

Attribute values might be mutually exclusive, in that only one of a set of related attribute values could be time for a particular electronic record 1 1 1. For example, the attribute values {patient is 0-1 year old, patient is 1-2 years old, patient is 2-4 years old, ...}would be mutually exclusive. Similarly, the attribute values {patient is male, patient is female}would be mutually exclusive.

Attribute values might also be non-exclusive, in that none, one, or more than one of those attribute values might be true for a particular electronic record 1 1 1. For example, the attribute values {patient is male, patient is hospitalized} are not mutually exclusive, as any one of the four possible circumstances are possible.

Each user interface element 140 is disposed to receive queries 141 from users 150, and to respond to those queries 141 with search results 142. A "query" might also be referred to herein as a "current case", a "search query", a "target case", or an "unresolved case", while an electronic record 1 ] 1 maintained in the mining database 1 10 might be referred to herein as a "comparison case" or a "resolved case". A user 150 might also be referred to herein as a "query issuer" or a "searcher".

The user interface elements 140 send those queries 141 to one or more selected query servers 130. The query servers 130 are disposed to receive queries 141 using a

communication channel, such as for example a web interface using an HTTP protocol or a variant thereof. The query servers 130 obtain access to the mining database 1 10, retrieving electronic records 1 1 1 in response to the query 141 and in response to performance of methods described herein. The query servers 130 also maintain information relevant to the query 141, also in response to records 1 1 1 and in response to performance of methods described herein. The query servers 130 provide search result information which the user interface elements 140 can format as search results 142.

Attribute Values

Example Attributes

As described herein, electronic records 1 1 1 include medical health information.

Those electronic records 1 1 1 maintain attribute values, which represent information about the patient (or patient's ailment, or patient's course of treatment, or otherwise) described with respect to a particular record 1 1 1.

For a 1 st example, an attribute value could represent a particular fact, such as whether the patient is currently hospitalized, as described above. There are a relatively large number of such facts, such as whether the patient has been seen by medical personnel within the past six months, such as whether the patient is a burn victim, such as whether the patient is pregnant, and many other cases. In one embodiment, selection of these facts could be detemiined by what information medical personnel choose to record about the patient. As described herein, each such attribute value represents substantially a single fact, or possibly a cluster of related facts.

For a 2nd example, an attribute value could represent a range of data values associated with the patient (or a test associated with the patient, or otherwise), such as for example a blood cell count, which might be re solved to one of a relatively smaller number of distinct and mutually exclusive possibilities (low, normal, or high), or might alternatively be resolved to one of a relatively larger number of distinct and mutually exclusive possibilities (20-30, 31 -80, 81 -90, 91-100, and so on). Determining whether to use a smaller set of possibilities, which might result in unwarranted matches, or a larger set of possibilities, which might result in unwarranted misses, would be up to the discretion of medical personnel, or up to the discretion of expert advisors when selecting possible attribute values. In alternative embodiments, attribute values may be selected both for a relatively smaller number and a relatively larger number of possibilities.

These example attribute values do not rule out possible attribute values which have elements of both types. For example, whether the patient is male or female is a particular fact, which is mutually exclusive, and which does not lend itself to grouping into larger or smaller numbers of attribute values.

Attribute Classes

In one embodiment, attribute values are distinguished by class. Some example classes might include:

a 1 st class of attribute values could represent a set of demographic information, such as for example the patient's age, sex, socioeconomic class, ethnic background, and otherwise; a 2nd class of attribute values could represent a set of clinical data relating to earlier diagnoses, such as past illnesses and possible precursors to the patient's present ailment, a set of stressors or other exposure history which might relate to the patient's ailment, and possibly other medical history such as immunization records;

a 3rd class of attribute values could represent a set of medical history, such as for example earlier assessments of the patient, a set of information reported by the patient, clinical orders and laboratory tests, and otherwise;

a 4th class of attribute values could represent a set of diagnostic tests and medications prescribed or taken by the patient;

a 5th class of attribute values could represent a set of encounter information, including for example appointments the patient has had with medical personnel, telephone contacts between the patient and medical personnel, and emergency room visits by the patient;

a 6th class of attribute values could represent a set of symptoms presented by the patient, observations by medical personnel, and tentative analysis of those symptoms and observations by medical personnel; and

a 7th class of attribute values could represent a set of other indications of any kind.

In one embodiment, attribute values are represented using 32-bit unsigned integers, with the effect that there are a relatively large number of attribute values to be assigned to possible information about the patient and any evaluation thereof. In one embodiment, each class of attribute values could be drawn from a predetermined subset of possible 32-bit unsigned integers, such as for example assigning the range ( 1 , 1000) of values to the 1 st class described above, and with a reserved range of values (1001 , 4999) for possible expansion, assigning the range 5000, 20000) to the 2nd class described above, and with a reserved range of values (20001 , 29999) for possible expansion, and otherwise. While this application primarily describes an embodiment in which attribute values are represented in this manner, in the context of the invention, there is no particular requirement for any such limitation. Attribute values might be represented in any manner consistent with their use described herein. For example, attribute values might be represented as pointers into a data structure. Matching By Classes

In one embodiment, matching of attribute values could be conducted with respect to classes of attribute values, with the effect, for example, that matching attribute values in one class would be distinct from matching attribute values in a different class. Thus, for each pair of records to be compared (such as for example an electronic record in the mining data structure 201 when compared with a query 141), a number of attribute values which match and a number of attribute values which differ is separately determined for each class of attribute values. This has the effect that each class of attribute values does not distort the comparison, for that pair, for each other class of attribute values. For example, a pair of records might match well with respect to demographic information but might match poorly with respect to medical encounter information.

In alternative embodiments, each distinct class might have a distinct weight associated with matching attribute values for that class, with the effect, for example, that matching attribute values in demographic information might be given lesser weight than matching attribute values in symptoms and observations. However, in preferred embodiments, each class is weighted equally with each other class, and attribute values within each class are also weighted equally with each other attribute value within the same class, with the intended effect that matching does not involve the intervention of any expert option regarding which classes or which attribute values within those classes to regard as having lesser or greater importance.

In embodiments in which matching is conducted by classes, pairs of electronic records 1 1 1 1 and B would have more than one proximity value. There could be a distinct proximity value (A, B), for each class, with the effect that each class's proximity value could be combined, as described below. For example, for the several classes described above, there could be distinct proximity values for the patient's demographic information, for the patient's clinical data, and otherwise. This would provide for isolating similarity of records 1 1 1 so that one class of information does not dominate the search results. For example, it might not be best to cause all patients with the same socioeconomic status to be grouped together.

When comparing records, to determine a distinct proximity value for each class, the binary search technique described herein determines a position range for search, as described herein for determining a single proximity value. This is performed for each class of attribute values. In one embodiment, as described herein, attribute values are disposed so that they are clustered together in a range for each class. This has the effect that the high end HP of the position range [LP, HP], when considered separately for each class, starts with a value equal to the boundary between that class and the next class. When performed separately for each class, the binary search technique finds a position range [LP, HP] for each class, with the effect of providing a rapid evaluation of the proximity value, for each comparison, providing a set of per-class proximity values.

When performing a search of electronic records 1 1 1 for a query Q, the system (and the method of searching) would perform multi-variable optimization on the per-class proximity values, where each variable represents a proximity value for one class. The search would attempt to optim ize (minimize or maximize, as appropriate) a function of those multiple proximity values, possibly with constraints set such as that the other proximity values fall within selected bounds. For example, it could be valuable to find a linearized maximum sum of proximity for symptoms and test results, while maintaining at least some selected minimum for other proximity values. In alternative embodiments, determining a proximity value in response to a sum of squares of distance values would also be possible.

Since each electronic record 1 1 1 could include attribute values from more than one such class, in one embodiment, comparison of the query 141 with records 1 1 1 from the mining database no is performed by class, with the number of matching attribute values and the number of non-matching attribute values maintained by class. This has the effect that, for each class, a separate count of matching attribute values and of non-matching attribute values is maintained. In one embodiment, this is relatively easily performed, as each class includes attribute values which fall within non-overlapping ranges, as described herein. Constraints Class

In one embodiment, the user 1 50 can specify a particular class of attribute values which is sometimes referred to herein as a "constraints class". When presenting a query 141 , the user 150 can optionally require that the attribute values of the query 141 are to be compared against the constraint class. In such cases, only those electronic records 1 1 1 from the mining database 1 10 which include all the attribute values present in the constraint class are considered when perform ing the search, and only those records 1 1 1 would be returned with the search results 142. This has the effect that only those records 1 1 1 from the mining database 1 10 which include all the attribute values present in the constraint class can be returned as search results 142, regardless of how close those records 1 1 1 would be otherwise determined to be according to computed proximity values.

For example and without limitation, the user 1 50 might instruct the query system 1 00 to return search results 142 which only include electronic records 1 1 1 with respect to, say, pregnant females aged 21 -25 and having high blood pressure. In this example case, the query system 100 would provide results 142 which have the best proximity values to the query 141 , and which also have each of the attribute values described as constraints. This would have the effect that an records 1 1 1 in the search results 142 would satisfy the described constraints. Of records 1 1 1 which satisfy those constraints, the results 142 provided would be those which have the best proximity values.

In alternative embodiments, rather than requiring that all electronic records 1 1 1 fall within a single constraint class, one particular class could be designated as a primary class for proximity value determination. In such cases, 1 st, the primary class for determining proximity value would be used to determine whether records 1 1 1 have sufficient proximity value to be returned as search results, and 2nd, one or more other classes would be regarded as optimization constraints. This is distinct from a single constraint class, as records 1 1 1 successfully meeting the optimization constraints would still need to have sufficient proximity value according to the primary class to be considered as possible search results 142, and would then be returned as search results 142 only if they also met the designated optimization constraints.

Proximity Values

A proximity measure for two electronic records 1 1 1 , A and B, is defined in response to a comparison between those cases, as described below: — The proximity of two records 1 1 1 (considered as sets of attribute values) that are identical is infinity.

— The proximity of two records 1 1 1 that have no object in common is zero, that is, their distance is infinity.

— If two records 1 1 1 (considered as sets of attribute values) intersect and have a non-empty difference, their proximity is more than zero and less than infinity.

— The proximity of two records 1 1 1 is higher when those two records 1 1 1 have more items in common and differ by fewer items.

— If a 1 st pair of records 1 1 1 have the same fraction of items in common and the same fraction of items that are different as a 2nd pair of records 1 1 1 , the pair with more items in common has greater proximity, this is, that pair is closer.

One example of a proximity measure could be determined as an inverse of the distance measure, as shown in equation ( 191). One example of a distance measure satisfying these requirements is shown in equation ( 1 92). After reading this application, those of ordinary skill in the art would recognize that other distance measures and proximity measures would be workable, would fall substantially within the parameters described herein, and would be within the scope and spirit of the invention.

( 1 91 ) proximity (A, J?) = 1 / distance (A ,B) ( 192) distance (A, B) =

Thus, the distance between A and B = (number of elements which differ) / (number of elements in common)²,

... where (A - B) represents a set difference, that is, a set of attribute values found with respect to the record 1 1 \A but not with respect to the record 1 1 12?,

... where | C\ represents a value for a size of a set, such as a number of attribute values, and

... where z is an arbitrary constant > 1 (in one embodiment, z = 1 .2, in alternative embodiments, l < z < 1 .2).

After reading this application, those skilled in the art would real ize that the distance (A,B) is zero when the electronic records 1 1 1 1 and B share all the same attribute values, infinite when the records 1 1 1 A and B share none the same attribute values, larger when the records 1 1 1 A and B have more attribute values which differ, smaller when the records 1 1 1 A and B have more attribute values which are the same.

Other distance measure formulas are also possible. After reading this application, those of ordinary skill in the art would recognize that other distance measure formulas are within the scope and spirit of the invention.

In one embodiment, an administrator could specify their own distance metric, such as by creating a mapping function which assigns a distance value in response to a number of common attribute values and in response to a number of differing attribute values. In alternative embodiments, the administrator could specify a distance metric which is responsive to other factors as well.

For example and without limitation, the administrator could specify a 2-dimensional array, Proximity-Distance [NumberMatching] [NumberDiffering], in which each element of the array specifies a distance value associated with "NumberMatching" number of matching attribute values and "NumberDiffering" number of differing attribute values.

In one embodiment, the administrator could monitor statistics with respect to queries 141 submitted by the users 150 and could adjust the choice of proximity function in response thereto. The administrator could also provide an ad hoc database query, that is, a database query which does not use SQL or another database query language, but instead presents the user 150 with choices according to the user interface element 140, and in which the user interface element 140 provides the user 150 with choices of proximity value functionality as specified by the administrator. This would have the effect that the administrator could adjust the options available to users 150 for search, such as for example medical personnel who are searching, using the user interface element 140.

In alternative embodiments, such as for example, for use with the automobile industry or with the pharmaceutical industry, users 150 are more likely to be engineers or scientists, who might wish to have more control over parameters for their searches. This may be achieved by providing alternative modes for use of ad hoc database queries, such as for example a "normal" mode and an "expert" mode. In such embodiments, a normal mode would prov ide a set of options as described above, with possibly a further set of options for use by expert users 1 50, while an expert mode would provide a relatively expanded set of options. For example, an expert mode could provide additional options for controlling a search, or an expert mode could provide a query language, possibly similar to SQL but not necessarily, with parameters for describing at least some uses of the proximity value functionality. In one embodiment, the user 150 can specify a particular minimum proximity value that would be required for an electronic record 1 1 1 to be returned among the search results 142. Minimum proximity values need not be specified as actual values. They might be specified as differences or ratios with respect to proximity values for selected records 1 1 1. Alternatively, they might be specified, for example, as a description of a minimum number of matching attributes, or a maximum number of differing attributes, for a selected class of attribute values. For example, a user 150 might update a search query 141 by selecting one record 1 1 1 from the search results 142, and specifying an updated search query 141 that asks for results that have a better proximity value than the selected record 1 1 1.

Considering several examples of proximity values for pairs of electronic records 1 1 1 , in a 1st example, a pair of electronic records might have 14 attribute values in common and 14 attribute values which differ; their distance (1 / proximity value) would be 0.590. In a 2nd example, a pair of electronic records might have 14 attribute values in common and 15 attribute values which differ; their distance would be 0.632. In a 3rd example, a pair of electronic records might have 13 attribute values in common and 15 attribute values which differ; their distance would be 0.691. In a 4th example, a pair of electronic records might have 4 attribute values in common and 4 attribute values which differ; their distance would be 0.758.

While this application primarily describes operation of the method with respect to a particular proximity value, in the context of the invention, there is no particular requirement for any such limitation. Another proximity value, or another method for determining a proximity value, would be within the scope and spirit of 2the invention. The technique described for search with respect to the proximity value described herein would also be effective with respect to other methods of determining a proximity value. For example, a proximity value could be determined in response to a sum of conditional information / (A in response to B) + 1 (B in response to A), or using another technique.

After reading this application, those skilled in the art would realize that the ratio of attribute values in common to attribute values which differ is the same for the 1 st example described above (14 attribute values in common, 14 allribute values which differ), and for the 4th example described above (4 attribute values in common, 4 attribute values which differ), but that the choice of z > 1 causes the proximity value for the latter example pair of electronic records to be less. In one embodiment, in general, when the ratio of attribute values which match, to the number of attribute values which differ, is the same, the proximity value is closer for pairs which have a greater number of attribute values which match. FIGURE 2

Figure 2 shows a conceptual drawing of a data structure.

As described with respect to the figure 1 , the system 100 includes a mining database

1 10.

The mining database 1 10 includes a mining data structure 201 M. The mining data structure 2 1 M includes the electronic records 1 1 1 to be searched, ordered as described below. In one embodiment, these records 1 1 1 (or pointers to these records 1 1 1) might be disposed in an array, or a hash table, or another data structure suitable for performing the functions described herein. The records 1 1 1 are ordered as described herein. Those records 1 1 1 having a smaller number of specified attribute values precede those records 1 1 1 having a larger number of specified attribute values. Among those records 1 1 1 having an equal number of specified attribute values, those having attribute values with smaller indices precede those having attribute values with larger indices, as described herein.

The mining database 1 10 also includes a best match table 202 B, which includes those electronic records 1 1 1 found as the current list of best matches, ordered as described below. In one embodiment, these records 1 1 1 (or pointers to these records 1 1 1) might be disposed in a separate table, which might include an array, or a hash table, or another data structure suitable for performing the functions described herein. In the best match table 202, the records 1 1 1 are ordered as described herein, with records 1 1 1 having a superior proximity value preceding those records 1 1 1 having an inferior proximity value. A first record 1 1 1 has the best proximity value found so far, while a last record 1 1 1 has the least-best proximity value found so far.

The mining database 1 1 0 also includes an alternative best match table 203 B', which also includes those electronic records 1 1 I found as the current list of best matches,

(differently) ordered as described below. Similar to the mining data structure 201 , in one embodiment, these records 1 1 1 (or pointers lo these records 1 1 1 ) might be disposed in a separate lable, which m ight include an array, or a hash table, or another data structure suitable for performing the functions described herein. In the alternative best match table 203, the records 1 1 1 are ordered as described herein, with those records 1 1 1 having a smaller number of mismatched attribute values preceding those records 1 1 1 having a smal ler number of mismatched attribute values. As described herein, the method 300 operates with respect to a relatively large set of electronic records 1 1 1 . In one embodiment, a set of records 1 1 1 is disposed in the mining data structure 201 M. As described herein, each record 1 1 1 C in the mining data structure 201 M has a position, with notation for the location pos( = [M_CJ and notation for the actual record 1 1 1 C = [M]_c. In the mining data structure 201 M, the records 1 1 1 are disposed as follows:

— 1 ^sl, for each pair of records 1 1 1 A and B, the one with fewer attribute values is ordered earlier:

NumberAttribute Values (A)< NumberAttributeValues (B)— -*- Position (A in M) < Position (B in M)

This has the effect that, when the search is performed, it is known that records 1 1 1 which are positioned earlier in the mining data structure 201 necessarily have fewer specified attribute values. This allows the search technique to determine a maximum proximity value for those records 1 1 1 which are sufficiently earlier than the record 1 1 1 which has been most recently examined.

In one embodiment, finding those records 1 1 1 with a selected number of specified attribute values (for example, finding those records 1 1 1 with exactly 14 attribute values), a binary search is conducted over the position of those records 1 1 1 in the mining data structure 201. For example and without limitation, if the first record 1 1 1 in the mining data structure 201 has (for example) 3 attribute values, and the last record 1 1 1 in the mining data structure 201 has (for example) 56 attribute values, the binary search successively divides the mining data structure 201 in halves until it reaches a position where the record 1 1 1 at that position has the selected number of attribute values (in this example, 14). If the method 300 searches for a record 1 1 1 with n attribute values, but there are no records 1 1 1 with exactly that many attribute values, the method 300 finds the record 1 1 1 with the closest available number of attribute values.

— 2nd, for each pair of electronic records 1 1 \ A and B having the same number of attribute values, they are sorted lexicographical ly, as described herein with respect to the

"SUMMARY OF THE INVENTION", and otherwise, with the effect that the one whose least-valued attribute value is lower is ordered earlier:

MinimumAttributeValue (A) < MinimumAttributeValue (B)—►

Position (A in M) < Position (B in M) If two records 1 1 1 A and B have the same minimum attribute value, they are sorted by second-least attribute value, and so on, as described herein with respect to the

"SUMMARY OF THE INVENTION", and otherwise.

— 3rd , for each record JJlA,\ts attribute values are maintained sorted in increasing order:

AttributeValuel( ) < AttributeValue2 (A) —►

Position (AttributeValuel in A)< Position (AttributeValue2 in A)

This has the effect that, when the search is performed, it is known that attribute values which match in two records 1 1 1 necessarily are indexed in order.

As described herein, the method 300 also maintains a best match table 202 B, identifying those records 1 1 1 found so far which best match the query Q. In one embodiment, the number of electronic records 1 1 1 maintained in the best match table 202 B is set by a user 150. However, in the context of the invention, there is no particular requirement for any such limitation. For example, the number of records 1 1 1 maintained in the best match table 202 B might be set in response to the size of the query Q, or th e number of records 1 1 1 being 2searched, or otherwise.

The electronic records 1 1 1 in the best match table 202 are disposed in order of their proximity value, with the best match found so far being first, the next-best being second, and so on, until the least-best match is disposed last. In one embodiment, the records 11 lare disposed in the best match table 202 in a format which allows relatively easy insertion and movement, such as for example a linked list. This has the effects that (1) a new record 1 11 can be added to the best match table 202 at its position when found, and (2) the least-best matching record 1 1 lean be removed when a newly-found record 1 1 1 is added which is superior to it. In one embodiment, each record 1 1 1 has its proximity value maintained with it in the best match table 202, so that it is not necessary to recompute that proximity value.

The electronic records 1 1 1 in the alternative best match table 203 are disposed in order of the number of mismatched attribute values, with the record 1 1 1 with the smallest number of mismatched attribute values being first, the next-smallest being second, and so on. In one embodiment, the records 1 1 1 are disposed in the best match table 202 in a format which allows relatively easy movement, such as for example a linked list. This has the effect that records 1 1 1 can be moved within the alternative best match table 203 at to a new position when the number of mismatched attribute values for those records 1 1 l have changed. In one embodiment, each record 1 1 1 has its count of mismatched attribute values and its count of matched attribute values maintained with it in the alternative best match table 203, so that it is not necessary to recompute those values.

FIGURE3

Figure 3 shows a conceptual drawing of a method.

A method 300 includes a set of flow points and steps, and operates with respect to one or more data structures. In one embodiment, the method 300 is performed by a dedicated computing device, such as a server as described above, but may alternatively be performed by one or more virtualized computing devices, such as a cloud computing embodiment of a server as described above.

The method 300 described herein operates by comparing the target case Q with electronic records 1 1 1 in the mining data structure 201 M, maintaining a best match table 202 B. The best match table 202 B always includes a least-best matching record 1 1 1 , which indicates how many positions away from the center point, as described below, the method 300 needs to look. Beyond that set of positions, no record 1 1 1 can match the target case Q any better than already-compared records 1 1 1 , so no further search need be performed.

First, the method 300 finds the electronic record 1 1 1 in the mining data structure 201 M that has the closest length to the target case Q. This is considered the "center point" of the search. If there is more than one record 1 1 1 with the same number of attribute values, the method 300 selects a mid-point of the set of those records. If no such record 1 1 1 exists, the method 300 selects the position of the record 1 1 1 with the closest possible number of attribute values. Second, the method 300 fills the best match table 202 B by matching each record 1 1 1 in turn, stepping upward and downward from the center point. That is, the method 300 determines a proximity value between the target case Q and [B_cemer+]], [B_center-i], [B_ccnt(;r+2], [B_CCTter_₂], and so on, entering each such record 1 1 1 into the best match table 202 B until the best match table 202 B is full.

Having filled the best match table 202 B, the method 300 detennines a position range [LP, HP] in response to the least-best electronic record 1 1 1 in the best match table 202 B, where the lower position LP represents a lowest possible position in the mining data structure 201 M where cases might be found to be selected for the best match table 202 B, the higher position HP represents a highest possible position in the mining data structure 201 M where cases might be found to be so selected. Those records wh ich are outside the position range [LP, HP] either have too few or too many attribute values for their proximity values to possibly be good enough to be added to the best match table 202 B, so there is no need to determine proximity values for them.

Once the center point has been determined, and the best match table 202 B has been filled, the method 300 proceeds to compare additional electronic records 1 1 1 against the target case Q. As described herein, the method 300 selects a new record 1 1 1 to compare at the flow point 320. Each time the method 300 compares a record 1 1 1 with the target case Q, the method 300 proceeds from the flow point 3 10.

Comparing Single Case

At a flow point 310, the method 300 is ready to compare a query 141 , or target case Q, with a set of electronic records 1 1 1 C. In one embodiment, the target case Q could represent a current unresolved case, such as for example a real-world patient with a set of information in an record 1 1 l for that patient.

At a step 31 1, the method 300 compares the target case Q with a single case C from the mining data structure 201. To perform this step, the method 300 perfonns the sub-steps described with respect to the figure 4-

At a step 312, the method 300 determines the proximity value for the pair Q and C, using the values for k , length (Q), and length (C). The number of non-matching attribute values = length (Q) + length (C)- 2 k. In one embodiment, the proximity value is computed in response to the equation (191) and the equation (192) above. In alternative embodiments, the proximity value is determined in response to a 2-dimensional array, ProximityDistance [NumberMatchingJ [NumberDiffering], where values are pre-computed or otherwise selected for that 2-dimensional array in response to an administrator-specified proximity function, as described herein.

At a step 313, the method 300 determines if the proximity value for the pair Q and C is good enough that the electronic record 1 1 1 C should be maintained in the best match table 202 B. More specifically, the 300 determines if the proximity value for the pair Q and C is superior to the least-best matching record 1 1 1 in the best match table 202 B.

—As described herein, the method 300 maintains the best match table 202 Bin best-match order:

proximity (Q,A) >proximity (Q, B) ►

order (| B_A) <order ([B_B]) If the proximity (Q, C) is superior to the proximity value of the least-best matching electronic record 1 1 1 in the best match table 202 B, that is, proximity (Q, [B_max]), the method proceeds with the step 3 14. If the proximity (Q, C) is not good enough for the record 1 1 1 C to be entered into the best match table 202 B, that is, the proximity (Q, C) is not superior to the proximity value of the least-best matching record 1 1 1 in the best match table 202 B, the method 300 skips the step 3 14 and proceeds with the flow point 320.

At a step 314, the method 300 adds the electronic record 1 1 1 C to the best match table 202 B. In one embodiment, the method 300 performs a search of the best match table 202 B, and inserts the electronic record 1 1 1 C into the best match table 202 B in its correctly positioned order. As described above, the best match table 202 B is maintained using a structure which provides for relatively easy insertion of new records 1 1 1 , such as for example a linked list, although other data structures would also be suitable.

In one embodiment, the method 300 maintains a minimum and maximum possible proximity (Q, C) that might be determined in response to the comparison of Q and C. If at any time during the step 3 12, the method 300 determines, in response to the number of counted matching attribute values, the number of computed non-matching attribute values, and the size of each of Q and C, that the proximity (Q, C) cannot be better than the worst proximity value in the best match table 202 B, proximity (Q, [Bmax]), the method 300 aborts the comparison of Q and C, and moves to a new electronic record 1 1 l for comparison.

Comparing Next Case

At a flow point 320, the method 300 is ready to examine a next electron ic record 1 1 1 C in the mining database 201 .

At a step 321 , the method 300 examines the least-best electronic record 1 1 1 in the best match table 202 B, that is, C ^~ [B_max], which could be a new value if the just-matched record 1 1 1 C was in fact just added to the best match table 202 B. In response to this worst record 1 1 1 C, the method 300 determines a new position range [LP, HP] .

In one embodiment, method 300 determines the new position range [LP, HP] in the mining data structure 301 M, in response to the least-best possible match C in the best match table 202 B, that is, C = B_max] Since the distance (A,B) for two electronic records 1 1 1 4 and B is defined as described above in equation (202), the lowest possible distance (A, B) is at least I length (A) - length (B) | / minimum length (A), length (B)), where length (·) is defined as described above. Accordingly, the method 300 sets the new position range [LP, HP] so that the lowest possible values for distance (Q, [M_LP]) and distance (Q, [M_Hp]) are at least as great as distance (Q, [B_max]).

Because the mining data structure 301 M is sorted so that records outside the position range [LP, HP] cannot be any closer than the (currently known) worst possible match this has the effect that the method 2300 need not search outside that position range [LP, HP]. Any electronic records 1 1 1 outside that position range [LP, HP] are certain to be no closer to the target case Q than the (currently known) worst possible match.

At a step 322, the method 300 examines the new position range [LP, HP] and compares that new position range [LP, HP] with the position of the most recently examined electronic record 1 1 1 in the mining database 201 . If the new position range [LP, HP] includes only records 1 1 1 which have already been compared with the target case Q, this means that the method 300 has found all the best possible matches and continues with the flow point 330, where the method 300 attempts to improve the search using updated attribute values.

Otherwise, the method 300 continues with the step 323.

At a step 323, the method 300 selects a next unexamined electronic record 1 1 1 as near the center of the new position range as possible. In one embodiment, the method 300 finds the next unexamined record 1 1 1 by stepping away (that is, downward and upward) from the center point, as described above, until it finds a record 1 1 1 which has not yet been examined. If the method 300 is unable to find a record 1 1 1 which has not yet been examined, and which is also within the position range [LP, HP], this means that the method 300 has found all the best possible matches and continues with the flow point 330, where (as described above) the method 300 attempts to improve the search using updated attribute values. Otherwise, once the method 300 finds a next unexamined record 1 1 1 , it proceeds with the step 3 1 1 , where it compares the target case Q with the next unexamined record 1 1 1.

Additional Attribute Values

At a flow point 330, the method 300 is ready to request additional attribute values to improve the possible search.

When the user 150 requests a number N of best matches, the method 300 finds a number K > N of possible best matches, that is, a number K > N of entries maintained in the best match table 202 B. As noted above, the entries in the best match table 202 B are maintained in an order in which they best match the target case Q. In one embodiment, the K > N possible best matches are determined independent of classes of attr ibute values, that is, a single proximity value is determined which is responsive to all attribute values regardless of their class. In one embodiment, K > N possible best matches are initially selected to provide the possibility that cases which are originally not as good as the least-best TV entries in the best match table 202 B might be promoted to within the best N such entries. In alternative embodiments, it may be possible to perform supplemental searches, or other heuristics or techniques, which update the results of search in response to updating the query 141 provided by the user 1 0. After reviewing this application, those skilled in the art would recognize that the techniques described herein are part of a broad set of possibilities for updating the attribute values of the query and obtaining improved results.

In one embodiment, it is desired to improve the proximity value of at least some electronic records 1 1 1 in response to additional information about the query 141 . As the proximity value is responsive to the number of matching attribute values, and the number of differing attribute values, between each electronic record 1 1 1 and the query 141 Q, it may be valuable to see if additional (or altered) attribute values for the query 141 Q would provide new proximity values for one or more records 1 1 1 C. In cases in which additional attribute values are added to the query 141 Q, those attribute values which already differ from records 1 1 1 Care likely not good candidates for additional information. Accordingly, the method 300 reviews those attribute values which are present in records 11 1 C, but which are missing from the query 141 Q.

In one embodiment, the method 300 examines those electronic records 1 1 1 C with the least mismatch between C and the query 141 Q, so as to achieve the best possibility that when additional attribute values are added to the query 141 Q, the proximity value Proximity (C, Q) will be improved. Accordingly, the method 300 maintains an alternative best match table 203 B', in which it orders records 1 1 1 so that they are maintained in increasing order of mismatch, for attribute values of the target case Q, thus:

NumberMismatchedTargetAttributeValues (A) < NumberMismatchedTargetAttributeValues (B) >. Position (A in B') < Position (B in B')

At a step 331, the method 300 examines the electronic record 1 1 1 with the lowest amount of mismatch, that is, C = [B'„_lin] As described below, it could occur that the record 1 1 1 with the lowest amount of mismatch cannot be improved, in which case the method 300 examines the record 1 1 1 with the next-lowest amount of mismatch. The method 300 proceeds to examine successive records 1 1 1 with the next-lowest amount of mismatch until all such records 1 1 1 have been examined, after which the method 300 has made as many possible improvements as possible, and proceeds with the flow point 340, where the method 300 presents the search results 142 to the user 1 50.

At a step 332, the method 300 determines those attribute values in that electronic record 1 1 1 C which arc mismatched case attribute values, not target case attribute values. That is, the method 300 selects one or more attribute values from the electronic record 1 1 1 C which are not present in the query 141 , also called the "current case" or "target case" Q. The method 300 selects one or more of those mismatched case attribute values.

At a step 333, the method 300 presents those one or more selected mismatched case attribute values to the user 1 50, and requests possible updated attribute values, in response to those one or more mismatched case attribute values, for the target case Q. If the user 150 does not wish to update any attribute values, the method 300 proceeds with the flow point 340, where the method 300 presents the search results 142 to the user 150. If the user 150 is willing to update at least some attribute values, the method 300 proceeds with the step 334

At a step 334, the method 300 receives one or more updated attribute values from the user 150. In one embodiment, the method 300 repeats the step 332, the step 333, and the step 334, until a selected n umber of mismatched case attribute values have been presented to the user 1 50, and the user 150 declines to provide any further updated attribute values, after which the method 300 gives up on improvements in response to that electronic record 1 1 1 and returns to the step 33 l to proceed with another record 1 1 1.

At a step 335, the method 300 updates the target case Q, and recomputes the proximity value proximity (Q, C). If the proximity value proximity (Q, C) improves, the method 300 reorders the best match table 202 B and the alternative best match table 203 B'. The method 300 returns to the step 33 1 to select a new electronic record 1 1 1 with the lowest amount of mismatch (which might possibly be the same record 1 1 1 again ). If the proximity value proximity (Q, C) does not improve, no reordering is performed, and the method 300 proceeds to another record 1 1 1 at the next step 33 1 .

At a flow point 340, the method 300 has been unable to further improve the ordering of the best match table 202 B, or the user ] 50 has declined to provide any further attribute values. As the user 150 has requested the TV best matches, the method 300 presents those N best matches to the user 150 for review.

FIGURE 4

Figure 4 shows a conceptual drawing of a portion of a method.

The portion includes flow points and steps, as shown in the figure and operates with respect to one or more data structures. In one embodiment, the portion is performed by a dedicated computing device, such as a server as described above, but may alternatively be performed by one or more virtualized computing devices, such as a cloud computing embodiment of a server as described above.

A flow point 31 1 A indicates a beginning of comparing a pair of cases. The method 300 proceeds from this flow point 3 1 1A at the beginning of the step 31 1.

As described above, in this portion of the method 300, the target case Q is compared with a single case rom the mining data structure 201 . In this description of this portion of the method 300, for the pair of electronic records 1 1 1 _^4 and B to be compared (where at this step the method 300 starts with A = Q and B = C), the method 300 considers each record 1 1 1 A and B as including a set of attribute values each referenced by an index, with the notation Ai= index of the attribute value (A),-, and where the value of the index A,; is within the index range (A_mi„, A_max).

At a sub-step 351, the method 300 initializes values for data variables to be used.

Initially, the index range (A_min,A_mux) is set to (o, length (A)), where length (A) represents the size of the electronic record 1 1 1 A in number of attribute values. Similarly, the index range ( B_mim B_max) is set to (o, length (B) ). As described herein, later, the minimum value A_mi„ is set to a greater bound and the maximum value A_max is set to a lesser bound, where the index range (A_min., A_maJi) is dynamically adjusted. Similarly, as described herein, later, the minimum value Bmin is set to a greater bound and the maximum value B_max is set to a lesser bound, where the index range (/?„„^■„, B_max) is dynamically adjusted.

As it examines the pair of electronic records 1 1 1 A and B, the method 300 maintains a count k of how many attribute values of the pair ,4 and B are successfully matched. Before examining the pair A and B, the value k is initialized to zero. As described herein, from k and from the values length (A) and length (B), the method 300 can determine the proximity value described above with respect to the equation (191 ) and the distance value described above with respect to the equation ( 1 2).

At a sub-step 352, the method 300 examines the attribute value at the smallest possible index A,„_in that is, (A)_mil„ and compares it with the attribute value at the smallest possible index B„,_/„., that is, (B) ,„,-„. If the attribute value (A)_mi„ is less than the attribute value (B)_mi„, the method 300 proceeds with the sub-step 353. Otherwise, the method 300 proceeds with the sub-step 354.·

At a sub-step 353, the method 300 "flips" (that is, exchanges) the two electronic records 1 1 1 A and B. This has the effect that for the two records 1 1 1 being compared, A and B, the attribute value (A)„„·„ always has some smaller attribute values, and some greater attribute values, than the attribute values in B.

At a sub-step 354, the method 300 examines the target case attribute value (A)rnm, and prepares to compare that attribute value with the attribute values of the electronic record 1 1 1 B in the index range (B_min, B_miLX).

At a sub-step 355, the method 300 performs a binary search of B in the index range (B,„_/„, B_max) for the attribute value (A)_mirlTf a match is found for the attribute value (A)„„„ in B with the index range (B„„„, B„_iax),t e method 300 proceeds with the sub-step 356. If no match is found, the method 300 proceeds with the substep 357·

At the sub-step 356, the match count k is incremented by one. The method 300 proceeds with the next sub-step 357-

At the sub-step 357, the method 300 determines if there are any further attribute values to search in ,4 and B. There are no further attribute values to search if one of the ranges (A,„i„, A_max) or (Bmin, Bmax) is narrow enough that no further matches are possible. This would occur if one of the ranges has fewer than two elements, or if (A_min > A_max), or if (B_min

-^> B_max).

If there are any further attribute values to search, the method 300 proceeds with the sub-step 358. Otherwise, the method 300 proceeds with the flow point 3 1 I B, where this portion of the method is done.

At the sub-step 358, ^4 is then restricted to the index range (A_min+ A_max), and B is restricted to the index range (B_{new min}, B_max), where ^~B_HeW min is the smal 1 est index of B for which the attribute value JS„_en, m_m does not match the attribute value A„„-„. The method 300 then proceeds with the sub-step 352, where it continues to compare attribute values to be found in A and B.

A flow point 3 1 IB indicates an end of comparing a pair of cases. The method 300 proceeds with the step 3 12, which follows the step 3 1 1.

Example Embodiments

In the medical field, one embodiment would include a service in which users would access a mining data structure 201 M having a relatively large number of entries. A relatively large number of entries provides an opportunity for a relatively wide range of possible sets of information about patients and about medical events, with the effect of a relatively good likelihood that an electronic record 1 1 1 with a h igh proximity value would be of value to medical personnel. A relatively large number of entries could be achieved by parsing medical records, such as EMHR's, which are made available by medical databases, while preserving the privacy of individual patients. In one embodiment, the mining data structure 201 M would support a proximity-based query system using an "ad hoc" query, as described herein.

In the automobile and pharmaceutical industries, as described with respect to "Alternative Embodiments", a substantially robust mining data structure 201M could be achieved by parsing information available from databases, such as those maintained by automobile manufacturers and by pharmaceutical industry researchers. Having a sufficiently extensive mining data structure 201 M would provide the opportunity for automobile manufacturers and troubleshooters (in the automobile industry) and for pharmaceutical researchers and investigators (in the pharmaceutical industry) to use proximity searches in their search for problem solutions.

Alternative Embodiments

After reading this application, those skilled in the art would realize that the techniques described herein are not limited to medical personnel, patients or their ailments or conditions, or medical applications. Techniques described herein are generally applicable at least to any large database of records, to any set of semi-structured data, or otherwise. For example, techniques described herein are applicable at least to the following:

The automobile industry. Techniques described herein are applicable to performing tests on new models of automobiles and other vehicles that might be introduced. Tests include at least examination for feasibility of building models, repairs, troubleshooting designs, and otherwise. The automobile industry has a relatively large inventory of parts, from a relatively large number of distinct manufacturers, and a substantial number of possible failures, or failure modes, which might occur during testing. Moreover, the automobile industry has many of its data and results organized in relatively large databases, with the effect that a set of parsers associated with the organization of those databases would be capable of providing electronic records in formats suitable for techniques described herein.

The pharmaceutical industry. Techniques described herein are applicable to creation of new drugs, to drug-lead discovery, to research into symptoms and possible pharmaceutical solutions, to amelioration of side-effects, to revie of results of testing protocols, and otherwise. The pharmaceutical industry has a relatively large inventory of possible molecules which might serve either as effective drugs, or as precursors to effective drugs, and a substantial number of possible effects, both desired effects and undesired effects, which might be applicable to those molecules. Other research areas. Other research areas are also dependent on finding past known cases which are similar to a current, unresolved case.

These examples are not exhaustive. Techniques described herein could be used with search of substantially any large database for which individual elements have a number of possible attributes or factors of interest.

Claims

1 . A method, including steps of

generating a data structure in response to a set of information records,

wherein said data structure is responsive to a set of attribute identifiers selected in response to said information records, each said attribute identifier indicating the presence of a predetermined feature of an information record,

wherein said data structure orders said information records by increasing number of attribute identifiers associated with said information records, and orders said attribute identifiers associated with said information records by increasing number of an index associated with each said attribute identifier;

receiving a query from a user device;

generating a set of said query attribute identifiers in response to said query;

in response to a proximity metric, identifying a subset of said information records closest to said query, said subset having a selected number of said information records,

wherein said proximity metric defines a proximity between a first and second record, said proximity being smaller in response to a larger measure of a difference between said first and second record, and larger in response to a larger measure of a similarity between said first and second record,

wherein said steps of identifying include steps of (1) determining a worst- case distance between said query and information records to be identified in said subset, (2) limiting a range of records to examine in response to said worst-case distance, (3) determining a value for said proximity metric for between said query and one or more information records, repeating said steps (l)-(3) until said selected number of said information records have been identified;

reporting, to said user device, said information records closest to said query.

2. A method as in claim 1, wherein

said data structure includes a sequence of elements,

each said element indicating for an information record which attribute identifiers are associated with said information record,

said elements having a primary sort order responsive to a number of attribute identifiers associated therewith. said elements being secondary sort order responsive to a dictionary order of those attribute identifiers associated therewith.

3. A method as in claim 1 , wherein

said information records each indicate a one or more medical encounters, including information describing one or more symptoms or tests, one or more treatments, and one or more dispositions;

said query indicates one or more medical encounters, including one or more symptoms or tests;

said steps of reporting include one or more information records including treatments associated with said symptoms or tests.

4. - A method as in claim 3, wherein

said steps of reporting indicate one or more treatments associated with successful dispositions of encounters associated with said symptoms or tests.

5. A method as in claim 1, wherein

said information records are accessed from a plurality of servers each maintaining one or more medical records.

6. A method as in claim 1, wherein said query has one or more incomplete elements;

and including steps of

identifying one or more new query attribute identifiers not present in said query, one or more of said new query attribute identifiers providing altered values for said proximity between said query and said subset of information records closest to said query;

reporting, to said user device, one or more of said new query attribute identifiers.

7-A method as in claim 1 , wherein

said difference between said first and second record is proportional to a measure of a symmetric difference of sets of attribute identifiers between said first and second record; said similarity between said first and second record is proportional to a measure of an intersection of sets of attribute identifiers between said first and second record.

8. A method as in claim 7, wherein said difference between said first and second record is responsive to a ratio of: a measure of matching attribute values for said first and second record to a measure of differing attribute values; and

when said radio is substantially equal for a 1 ^st pair of records and a 2^nd pair of records, said difference indicates that said 1^st pair is closer when said measure of matching attribute values is larger for said ¹ pair.

9. A method as in claim 7, wherein

said difference between said first and second record is responsive to a ratio of a measure of attribute values which match for said first and second record, and a measure of attribute values which differ between said first and second record.

10. A method as in claim 7, wherein

said proportion to a measure of symmetric difference is responsive to an exponent of said measure of symmetric difference;

said proportion to a measure of intersection is responsive to an exponent of said measure of intersection;

wherein a proportional increase in both difference and intersection provides a smaller proximity measure.

1 1 . A method as in claim 1 , wherein

said steps of determining a worst-case distance include steps of

locating a first information record whose number of attribute identifiers is closest to said query's number of attribute identifiers,

determining a lowest bound for said proximity measure for a second information record, in response to said second information record's number of attribute identifiers;

said steps of limiting a range of records include steps of

in response to said lowest bound, determining whether to review information records after said second information record.

12. A method as in claim 1 1 , wherein

said steps of locating a first information record include steps of determining a proximity between said query and at least a first said information record;

determining a lower bound in response to said proximity between said query and said first information record.

13. A method as in claim 1, wherein

said attribute identifiers include one or more sets of mutually exclusive attributes, wherein each said information record includes only one attribute identifier from each said set of mutually exclusive attributes.

14. A method as in claim 1, wherein

said attribute identifiers include one or more sets of attribute values with a selected resolution,

wherein each said information record includes only one attribute identifier from each said set of mutually exclusive attributes.

15. A method as in claim 1 , wherein

said query includes a data structure which orders attribute identifiers associated with said information records by increasing number of an index associated with each said attribute identifier.

16. A method as in claim 1, wherein

said attribute identifiers are associated with distinct classes:

said proximity metric includes a set of proximity values, each said proximity value being responsive to attribute values in one of said distinct classes; and

said steps of identifying include determining said subset of said information records in response to said set of proximity values.

17. A method as in claim 1 , wherein

said attribute identifiers are associated with distinct classes;

said steps of identifying are responsive to one or more of:

whether said information records are excluded by a class excluded by said query; whether said information records are included in a class required by said query.

1 8. A method as in claim 1, wherein

said steps of generating a data structure including steps of:

maintaining a set of records of examples,

each said example including a set of attribute value representing information about observations of said example and a set of attribute values representing information about a resultant from said example;

wherein said subset of said information records closest to said query includes information pertinent to an action for an example represented by said query.

19. A method as in claim 1 , including steps of

said steps of identifying said subset of information records include steps of verifying a set of conditions for said subset of information records,

said conditions being responsive to information from said user device.

20. A method, including steps of

maintaining a data structure in response to a set of information records,

wherein said data structure is responsive to a set of attribute identifiers selected in response to said information records, each said attribute identifier indicating the presence of a predetermined feature of an information record;

in response to a proximity function, identifying a number of said information records closest to said query, said subset having a selected number of said information records;

wherein said data structure maintains said information records in a selected order, said selected order having the property that said proximity function of said query and a 1st said information record is greater than said proximity function of said query and a 2nd said information record,

whenever said 2nd said information record is sufficiently far in said order from said 1 st said information record;

wherein said steps of identifying a number of said information records include steps of: determining a proximity function of said query and a selected information record, limiting a range of records to examine in response to a value of said proximity function, and

repeating said steps of determining and limiting until said number of information records are found.

21 . A method as in claim 20, wherein

said data structure maintains attribute identifiers for each particular record each having a selected index associated with said particular record,

said selected index being responsive to values of said attribute identifiers;

said steps of determining a proximity function of said query and a selected information record include steps of:

determining whether a 1 st selected attribute identifier from said query with attribute identifiers has a match within a range of indices in said selected information record,

updating said range of indices, and

repeating said steps of determining and updating.

22. A method as in claim 20, wherein

said set of information records represents a set of professional experience, said professional experience including specific knowledge of real cases.

23. A method as in claim 20, wherein

said set of information records represents a plurality of real cases, each one case including application of professional experience by an expert; and

said number of found information records represent a combination of said professional experience by experts.

24. A method as in claim 20, including steps of

in response to said found information records, identifying further information applicable to said applying said proximity function to said found information records and i said query;

receiving said further information; and

l imiting said found information records in response to said proximity function applied to said found information records, and to said query with said further information.

25. A method as in claim 20, including steps of

in response to said found information records, receiving further information;

applying said proximity function to said found information records and to said query augmented by said further information; and

limiting said found information records in response to said query augmented by said further information.

26. Λ method as in claim 20, wherein

said proximity function is responsive to a measure of similarity associated with professional experience by said one or more professionals; and

said found information records are responsive to professional experience applied to said query, said professional experience indicating one or more information records representing real cases with said measure of similarity as appl ied to said query.

27. A method as in claim 20, wherein

said set of information records each represent an example problem and solution thereof; and

said set of found information records represents one or more examples of applying professional experience to a problem similar to said query;

said examples of applying professional experience providing a set of possible solutions to said query.

28. A method as in claim 20, wherein

said set of information records each represent an example real case encountered by one or more professionals;

said attribute identifiers each represent a factor associated with one or more said real cases;

said proximity function is responsive to a measure of sim ilarity associated with professional experience

by said one or more professionals; and

said found information records include examples of said one or more real cases which are responsive to

said professional experience as applied to said query.

29. A method as in claim 20, wherein

said set of information records represents a set of real cases;

said attribute identifiers represent factors associated with said real cases; and said proximity function is responsive to a set of professional experience associated with matching real cases to said query.