EP1346293A2 - Finding the most interesting patterns in a database quickly by using sequential sampling - Google Patents
Finding the most interesting patterns in a database quickly by using sequential samplingInfo
- Publication number
- EP1346293A2 EP1346293A2 EP01960677A EP01960677A EP1346293A2 EP 1346293 A2 EP1346293 A2 EP 1346293A2 EP 01960677 A EP01960677 A EP 01960677A EP 01960677 A EP01960677 A EP 01960677A EP 1346293 A2 EP1346293 A2 EP 1346293A2
- Authority
- EP
- European Patent Office
- Prior art keywords
- hypotheses
- error
- utility
- hypothesis
- probability
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/18—Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99931—Database or file accessing
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99931—Database or file accessing
- Y10S707/99933—Query processing, i.e. searching
- Y10S707/99934—Query formulation, input preparation, or translation
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99941—Database schema or data structure
- Y10S707/99944—Object-oriented database structure
- Y10S707/99945—Object-oriented database structure processing
Definitions
- the invention relates to finding the most interesting patterns in a database quickly by using sequential sampling in particular to a method for sampling a database for obtaining the probably approximately n best hypotheses having the highest empirically utility of a group of potential hypotheses.
- KDD knowledge discovery in databases
- sampling must be combined with discovery algorithms in a fashion that allows us to give the user guarantees about how far the results obtained using sampling differ from the optimal (non-sampling based) results.
- the goal of a sampling discovery algorithm then is to guarantee this quality using the minimum amount of examples [5].
- Definition 1 (n-best hypotheses problem) Let D be a database of instances, H a set of possible hypotheses, f : H x D ⁇ R>0 a quality or utility function on H, and n, 1 ⁇ n ⁇ I H
- Equation 1 says that E provides a two-sided confidence interval on f(h;Q m ) with confidence ⁇ .
- the probability of drawing a sample Q m (when drawing m transactions independently and identically distributed from D), such that the difference between true and estimated utility of any hypothesis disagree by ⁇ or more (in either direction) lies below ⁇ .
- the confidence interval vanishes. In this case, we can shrink the confidence interval (at any confidence level ⁇ ) to arbitrarily low nonzero values by using a sufficiently large sample.
- a method for sampling a database for obtaining the probably approximately n best hypotheses having the highest empirically utility of a group of potential hypotheses comprising the steps of (see also Table 1 of this specification): a) generating all possible hypotheses based on specifications of a user as remaining hypotheses, b) checking whether enough data points were sampled so far to distinguish within the group of potential hypotheses all good looking hypotheses from bad looking hypotheses with sufficient confidence, c) if according to step b) enough data points were sampled so far then continue with step j) otherwise continue with step f), d) sampling one or more data points, e) calculating the utility of all the remaining hypotheses on the basis of the sampled data points, f) in the set of remaining hypotheses determining a set of good looking hypotheses by taking the n hypotheses that currently have the highest utility based on the data points sampled so far, whereas the other hypotheses are added to a set of bad looking hypotheses,
- step k) determining a utility confidence interval given a required maximum probability of error and a sample size based on the observed variance of each hypothesis' utility value.
- step a) the following step I) is performed (see step 2. of Table 1):
- step f)i) the following steps are performed (see step 3.(e)i.
- m) determine a locally allowed error probability based on the probability of error selected by the user divided by two and the number of remaining hypotheses and the smallest data point sample size that makes the error confidence interval according to step j) smaller than or equal to the user selected maximum error margin when determined based on the user selected error probability divided by two and the size of the initial set of hypotheses, n) for each of the bad looking hypotheses, determine the sum of its observed utility value and its error confidence interval size as determined according to step k) for the current data point sample size and the locally allowed error probability error according to step m), o) selecting the maximum value of all the sums determined in step n), p) subtracting the user selected maximum error margin from the value selected in step o), q) adding to the number obtained in step p) the size of the error confidence interval as determined according to step k) for the hypothesis currently considered for outputting based on the current data point sample size and the locally allowed error probability, r) if the number determined according to steps m) to q)
- step g)ii) the following steps are performed
- step 3(e)ii. of Table 1 determines (see step 3(e)ii. of Table 1): s) determine a locally allowed error probability based on the probability of error selected by the user divided by two and the number of remaining hypotheses and the smallest data point sample size that makes the error confidence interval according to step k) smaller than or equal to the user selected maximum error margin when determined based on the user selected error probability divided by two and the size of the initial set of hypotheses, t) for each good looking hypothesis determine the difference between its observed utility value and its error confidence interval as determined in step k) based on the current data point sample size and the locally allowed error probability as determined in step s), u) selecting the minimum of all the differences determined in step t), v) subtract from the value selected in step u) the size of the utility confidence interval of the hypothesis considered for the removal and determined according to step k) and the locally allowed error probability as determined in step s), w) if the number determined in steps s) to v) is not smaller than the observed utility of
- the general approach to designing a sampling method is to use an appropriate error probability bound to determine the required number of examples for a desired level of confidence and accuracy.
- Chernoff bounds that are used in PAC theory and many other areas of statistics and computer science can be used to determine appropriate sample bounds [5].
- the Chernoff bounds can be replaced by tighter normal or t distribution tables.
- step 3b we combine sequential sampling with the popular "loop reversal" technique found in many KDD algorithms. Instead of processing hypotheses one after another, and obtaining enough examples for each hypothesis to evaluate it sufficiently precisely, we keep obtaining examples (step 3b) and apply these to all remaining hypotheses simultaneously (step 3c).
- This strategy allows the algorithm to be easily implemented on top of database systems (assuming they are capable of drawing samples), and enables us to reach tighter bounds.
- step 3(e)i) outputs those where it can be sufficiently certain that the number of better hypotheses is no larger than the number of hypotheses still to be found (so they can all become solutions), or (step 3(e)ii) discards those hypotheses where it can be sufficiently certain that the number of better other hypotheses is at least the number of hypotheses still to be found (so it can be sure the current hypothesis does not need to be in the solutions).
- step 3(e)i) outputs those where it can be sufficiently certain that the number of better hypotheses is no larger than the number of hypotheses still to be found (so they can all become solutions), or (step 3(e)ii) discards those hypotheses where it can be sufficiently certain that the number of better other hypotheses is at least the number of hypotheses still to be found (so it can be sure the current hypothesis does not need to be in the solutions).
- Table 1 Sequential sampling algorithm for the n-best hypotheses problem Algorithm Generic Sequential Sampling.
- Input n (number of desired hypotheses), ⁇ and ⁇ (approximation and confidence parameters).
- Theorem 1 The algorithm will output a group G of exactly n hypotheses such that, with confidence 1 - ⁇ , no other hypothesis in ⁇ has a utility which is more than ⁇ higher than the utility of any hypothesis that has been returned: P ⁇ [3h € H ⁇ G : /( ⁇ ) > f min x ⁇ ] ⁇ ⁇ (2)
- the number of transactions in the database for which the rule makes a correct prediction is called the support, or the generality.
- the confidence is the fraction of correct predictions among those transactions for which a prediction is made.
- the accuracy quantifies the probability of a hypothesis conjecturing a correct attribute.
- the term accuracy is typically used in the context of classification and refers to the probability of a correct classification for a future transaction whereas the confidence refers to the database (i.e., the training data). From a sampling point of view, confidence and accuracy can be treated equally. In both cases, a relative frequency is measured on a small sample; from this frequency we want to derive claims on the underlying probability. It does not make a difference whether this probability is itself a frequency on a much larger instance space (confidence) or a "real" probability (accuracy), defined with respect to an underlying distribution on instances.
- Subgroups are of a more descriptive character. They describe that the value of an attribute differs from the global mean value within a particular subgroup of transactions without actually conjecturing the value of that attribute for a new transaction.
- the generality of a subgroup is the fraction of all transactions in the database that belong to that subgroup.
- the term statistical unusualness refers to the difference between the probability p 0 of an attribute in the whole database and the probability p of that attribute within the subgroup.
- subgroups are desired to be both general (large g) and statistically unusual (large I pO - pi ).
- There are many possible utility functions for subgroup discovery which trade generality against unusualness. Unfortunately, none of these functions can be expressed as the average (over all transactions) of an instance utility function.
- the relative frequency corresponding to the target probability is governed by the binomial distribution whereas, when the sample is drawn without replacement, it is governed by the hyper-geometrical distribution and we can specify a tighter bound.
- the only feasible way of calculating both the hyper-geometrical distribution and the binomial distribution is to use a normal approximation. But the normal approximation of both distributions are equal and so we cannot realize the small advantage that drawing without replacement seems to promise. The same situation arises with other utility functions.
- This simplest form of a utility function is the average, over all example instances, of some instance utility function fm st (h,q ⁇ ) where q, e D.
- the utility is then defined as
- Equation 4 satisfies this condition.
- Equation 5 we insert Equation 4 into Equation 1.
- Equation 3 we insert Equation 4 into Equation 1.
- Equation 7 we insert Equation 7 into Equation 7.
- Equation 7 we apply the Hoeffding inequality (Equation 3) in Equation 6 and obtain the desired result in Equation 7.
- the Hoeffding inequality is less suited since it is not very tight.
- f(h,Q m ) - f(h) is a random variable with mean value 0; we further know that f(h,Q m ) is bounded between zero and ⁇ .
- the normal distribution we need to refer to the true variance of our random variable. In step 3, the variance is not known since we do not refer to any particular hypothesis.
- Equation 1 f(h,Q m ) is the average of m values, namely
- Equation 8 satisfies Equation 1.
- z is the inverse standard normal distribution that can be looked up in a table.
- steps 3(e)i and 3(e)ii we refer to specific hypotheses h and can therefore determine the empirical variance of ff Qm).
- E h (m, ⁇ ) we can define E Equation 10.
- Equation 11 we expand our definition E.
- the ⁇ and log-terms cancel out in Equation 12; we can bound the confidence interval to ⁇ /2 in Equation 13 as required for the algorithm to exit in step 3e.
- the first class of nontrivial utility functions that we study weight the generality g of a subgroup and the deviation of the probability of a certain feature p from the default probability pO equally. Hence, these functions multiply generality and distributional unusualness of subgroups. Alternatively, we can use the absolute distance I p - p 0 1 between probability p and default probability p 0 .
- the multi-class version of this function is
- Equation 17 we insert Equation 14 into Equation 1.
- Equation 18 we refer to the union bound in Equation 18.
- Equation 20 we refer to the simple observation that g ⁇ 1 and (p - p 0 ) ⁇ 1 leads to Equation 20.
- Equations 21 and 22 are based on elementary transformations.
- Equation 23 we refer to the union bound again.
- the key observation here is that ab cannot be greater than (c + ⁇ ) (d + ⁇ ) unless at least a>c + ⁇ orb>d + ⁇ .
- Equation 25 Let us now prove the normal approximation of the above confidence bound.
- Equation 25 by inserting Equation 16 into Equation 1.
- s g and s p denote the variances of g and p, respectively.
- Equation 15 we also cover Equation 15 in this proof. The variances can be bounded from above:
- Equation 27 follows from g ⁇ 1 and p - p 0 ⁇ 1 and Equation 28 is just a factorization of
- Equation 30 we insert the sample bound (Equation 30) into the definition of E for linear functions (Equation 14), after the log-terms rule out each other in Equation 31 we obtain the desired bound of ⁇ /2.
- Equation 37 we start in Equation 37 by combining the definition of E (Equation 33) with Equation 1 which specifies the property of E that we would like to prove.
- Equation 38 we add g 2 (p - p 0 ) to both sides of the inequality an start factorizing.
- Equation 39 we have identified three factors.
- Equation 34 is a special case of Equation 35 (variance bounded from above).
- the variance of both g and p is at most
- Equation 35 This takes us from Equation 35 to Equation 42. Equation 43 equals Equation 34.
- Equation 44 we want to see if the normal approximation (Equation 35) satisfies the requirement of Equation 1.
- Equation 35 we add g 2 (p - p 0 ) to both sides of the equation and start factorizing the right hand side of the inequality in Equations 45 and 46.
- the union bound takes us to Equation 47; Equation 48 proves the claim.
- Pr[ ⁇ f(KQ m )-f(h) ⁇ >E(m,, ⁇ ) ⁇ (44) P ⁇ g 2 i ⁇ - Po) - (p - Po)
- Equation 33 each of the three terms in Equation 33 has to be below 1. Note that if ⁇ ⁇ 1 then ⁇ 2 ⁇ ⁇ . We can therefore bound E as in Equation 51.
- Equation 49 we insert the sample bound into the exit criterion in Equation 52.
- the log-terms rule out each other and the result is ⁇ /2 as desired.
- the Binomial test heuristic is based on elementary considerations. Suppose that the probability p is really equal to p 0 (i.e., the corresponding subgroup is really uninteresting). How likely is it, that the subgroup with generality g displays a frequency of on the sample Q with a greater difference 1 - p 0 1? For large I Q I x
- Equation 57 we insert Equation 54 into Equation 1 (the definition of E). We refer to the union bound in Equation 58 and exploit that
- Equation 59 we factor the right hand side of the inequality in Equation 59 and use the union bound in Equation 60. Now in Equation 61 we weaken the inequality a little. Note that
- Equation 61 subtracting the lengthy term in Equation 61 decreases the probability of the inequality (which we want to bound from above).
- the reason why we subtract this term is that we want to apply the binomial equation and factor
- Equation 55 we would like Equation 55 to be a special case of Equation 55 with the variances bounded from above. Equation 65 confirms that this is the case since
- Equation 70 Equation 70 and Equation 71.
- Basic manipulations and the Chernoff inequality complete the proof in Equation 73.
- Equation 54 The middle term of Equation 55 dominates the expression since, for ⁇ ⁇ 1 it Is true that ⁇ > > ⁇ .
- Equation 75 provides us with an easier bound.
- step 3 The algorithm terminates in step 3 when
- Equation 76 proves that this is the case with guarantee. Note that, since we bounded the confidence interval quite sloppily, we expect the algorithm to terminate considerably earlier. £
- non-sequential sampling algorithm determines a sample size M like our algorithm does in step 2, but using the full available error probability ⁇ rather than only ⁇ /2. Hence, the non-sequential sampling algorithm has a lower worst-case sample size than the sequential one but never exits or returns any hypothesis before that worst-case sample bound has been reached. Sequential and non-sequential sampling algorithm use the same normal approximation and come with identical guarantees on the quality of the returned solution.
- Figure 1 shows the sample size of the non-sequential algorithm as well as the sample size required before the sequential algorithm returned the first (out of ten) hypothesis and the sample size that the sequential algorithm required to return the last (tenth) hypothesis and terminate.
- the sequential sampling algorithm terminated significantly earlier than the non-sequential one, even though the latter possesses a lower worst-case sample bound.
- the relative benefit of sequential sampling can reach orders of magnitude.
- the data contains 95,412 records that describe mailings by a veterans organization. Each record contains 481 attributes describing one recipient of a previous mailing.
- the target fields note whether the person responded and how high his donation to the organization was.
- Our task was to find large subgroups of recipients that were particularly likely (or unlikely) to respond (we used the attribute "Target B" as target and deleted “Target D").
- the histogram of error rates of a set of decision trees or rule sets can be determined in time logarithmic in the number of hypotheses. We are confident that our sampling algorithm can be applied analogously for complex and structured hypothesis spaces without explicit representation of all hypotheses.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Optimization (AREA)
- Theoretical Computer Science (AREA)
- Mathematical Physics (AREA)
- Pure & Applied Mathematics (AREA)
- Computational Mathematics (AREA)
- Mathematical Analysis (AREA)
- Probability & Statistics with Applications (AREA)
- Software Systems (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Algebra (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Databases & Information Systems (AREA)
- Operations Research (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Complex Calculations (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Ultra Sonic Daignosis Equipment (AREA)
- Processing Or Creating Images (AREA)
Abstract
Description
Claims
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| EP01960677A EP1346293B1 (en) | 2000-08-19 | 2001-08-18 | Finding the most interesting patterns in a database quickly by using sequential sampling |
Applications Claiming Priority (4)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| EP00117900 | 2000-08-19 | ||
| EP00117900 | 2000-08-19 | ||
| EP01960677A EP1346293B1 (en) | 2000-08-19 | 2001-08-18 | Finding the most interesting patterns in a database quickly by using sequential sampling |
| PCT/EP2001/009541 WO2002017133A2 (en) | 2000-08-19 | 2001-08-18 | Finding the most interesting patterns in a database quickly by using sequential sampling |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| EP1346293A2 true EP1346293A2 (en) | 2003-09-24 |
| EP1346293B1 EP1346293B1 (en) | 2006-06-28 |
Family
ID=8169592
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| EP01960677A Expired - Lifetime EP1346293B1 (en) | 2000-08-19 | 2001-08-18 | Finding the most interesting patterns in a database quickly by using sequential sampling |
Country Status (5)
| Country | Link |
|---|---|
| US (1) | US7136844B2 (en) |
| EP (1) | EP1346293B1 (en) |
| AT (1) | ATE331990T1 (en) |
| DE (1) | DE60121214T2 (en) |
| WO (1) | WO2002017133A2 (en) |
Families Citing this family (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US7873724B2 (en) * | 2003-12-05 | 2011-01-18 | Microsoft Corporation | Systems and methods for guiding allocation of computational resources in automated perceptual systems |
| US7440586B2 (en) * | 2004-07-23 | 2008-10-21 | Mitsubishi Electric Research Laboratories, Inc. | Object classification using image segmentation |
| US8504606B2 (en) * | 2005-11-09 | 2013-08-06 | Tegic Communications | Learner for resource constrained devices |
| US7716144B2 (en) * | 2007-03-22 | 2010-05-11 | Microsoft Corporation | Consistent weighted sampling of multisets and distributions |
| US8412564B1 (en) * | 2007-04-25 | 2013-04-02 | Thomson Reuters | System and method for identifying excellence within a profession |
| US7912946B2 (en) * | 2008-01-25 | 2011-03-22 | International Business Machines Corporation | Method using footprints in system log files for monitoring transaction instances in real-time network |
| US7908365B2 (en) * | 2008-01-25 | 2011-03-15 | International Business Machines Corporation | System using footprints in system log files for monitoring transaction instances in real-time network |
| CN111352840B (en) * | 2020-02-28 | 2023-08-15 | 抖音视界有限公司 | Online behavior risk assessment method, device, equipment and readable storage medium |
Family Cites Families (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US5519608A (en) * | 1993-06-24 | 1996-05-21 | Xerox Corporation | Method for extracting from a text corpus answers to questions stated in natural language by using linguistic analysis and hypothesis generation |
| GB9401816D0 (en) * | 1994-01-31 | 1994-03-23 | Mckee Neil H | Accessing data held in large databases |
| US6278989B1 (en) * | 1998-08-25 | 2001-08-21 | Microsoft Corporation | Histogram construction using adaptive random sampling with cross-validation for database systems |
| US6282570B1 (en) * | 1998-12-07 | 2001-08-28 | International Business Machines Corporation | Monitoring a large parallel database through dynamic grouping and sequential sampling |
| US6532458B1 (en) * | 1999-03-15 | 2003-03-11 | Microsoft Corporation | Sampling for database systems |
| US6542886B1 (en) * | 1999-03-15 | 2003-04-01 | Microsoft Corporation | Sampling over joins for database systems |
-
2001
- 2001-08-18 DE DE60121214T patent/DE60121214T2/en not_active Expired - Lifetime
- 2001-08-18 EP EP01960677A patent/EP1346293B1/en not_active Expired - Lifetime
- 2001-08-18 WO PCT/EP2001/009541 patent/WO2002017133A2/en not_active Ceased
- 2001-08-18 US US10/344,852 patent/US7136844B2/en not_active Expired - Lifetime
- 2001-08-18 AT AT01960677T patent/ATE331990T1/en active
Non-Patent Citations (1)
| Title |
|---|
| See references of WO0217133A3 * |
Also Published As
| Publication number | Publication date |
|---|---|
| WO2002017133A3 (en) | 2003-06-26 |
| US20050114284A1 (en) | 2005-05-26 |
| US7136844B2 (en) | 2006-11-14 |
| DE60121214T2 (en) | 2007-05-24 |
| DE60121214D1 (en) | 2006-08-10 |
| ATE331990T1 (en) | 2006-07-15 |
| WO2002017133A2 (en) | 2002-02-28 |
| EP1346293B1 (en) | 2006-06-28 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Scheffer et al. | Finding the most interesting patterns in a database quickly by using sequential sampling | |
| US8005769B2 (en) | Computer-implemented method of generating association rules from data stream and data mining system | |
| US6931418B1 (en) | Method and system for partial-order analysis of multi-dimensional data | |
| Jin et al. | Data discretization unification | |
| US7177854B2 (en) | Dynamic update cube and hybrid query search method for range-sum queries | |
| US7647293B2 (en) | Detecting correlation from data | |
| Ghoting et al. | Loaded: Link-based outlier and anomaly detection in evolving data sets | |
| US5713016A (en) | Process and system for determining relevance | |
| US7577638B2 (en) | Sampling for queries | |
| US8762393B2 (en) | Method and system of clustering for multi-dimensional data streams | |
| US5787424A (en) | Process and system for recursive document retrieval | |
| US20040003004A1 (en) | Time-bound database tuning | |
| US7149735B2 (en) | String predicate selectivity estimation | |
| EP1346293A2 (en) | Finding the most interesting patterns in a database quickly by using sequential sampling | |
| Pagallo et al. | Two algorithms that learn DNF by discovering relevant features | |
| Bhat et al. | Finding aliases on the web using latent semantic analysis | |
| Lin et al. | Interactive mining of probabilistic frequent patterns in uncertain databases | |
| Minartz et al. | Multivariate correlations discovery in static and streaming data | |
| US10482128B2 (en) | Scalable approach to information-theoretic string similarity using a guaranteed rank threshold | |
| Meretakis et al. | A study on the performance of large bayes classifier | |
| Gotoh et al. | Topic-based mixture language modelling | |
| Wang et al. | Near-linear time local polynomial nonparametric estimation with box kernels | |
| Müller | Selected problems in cardinality estimation | |
| Pan | Differentially-private Continual Releases against Dynamic Databases | |
| Jamil et al. | Recognizing credible experts in inaccurate databases |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
| 17P | Request for examination filed |
Effective date: 20030215 |
|
| AK | Designated contracting states |
Kind code of ref document: A2 Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE TR |
|
| RAP1 | Party data changed (applicant data changed or rights of an application transferred) |
Owner name: FRAUNHOFER-GESELLSCHAFT ZUR FOERDERUNG DERANGEWAND |
|
| 17Q | First examination report despatched |
Effective date: 20031217 |
|
| GRAP | Despatch of communication of intention to grant a patent |
Free format text: ORIGINAL CODE: EPIDOSNIGR1 |
|
| GRAS | Grant fee paid |
Free format text: ORIGINAL CODE: EPIDOSNIGR3 |
|
| GRAA | (expected) grant |
Free format text: ORIGINAL CODE: 0009210 |
|
| AK | Designated contracting states |
Kind code of ref document: B1 Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE TR |
|
| PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: FI Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20060628 Ref country code: IT Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT;WARNING: LAPSES OF ITALIAN PATENTS WITH EFFECTIVE DATE BEFORE 2007 MAY HAVE OCCURRED AT ANY TIME BEFORE 2007. THE CORRECT EFFECTIVE DATE MAY BE DIFFERENT FROM THE ONE RECORDED. Effective date: 20060628 Ref country code: NL Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20060628 |
|
| REG | Reference to a national code |
Ref country code: GB Ref legal event code: FG4D |
|
| REG | Reference to a national code |
Ref country code: CH Ref legal event code: EP |
|
| REG | Reference to a national code |
Ref country code: IE Ref legal event code: FG4D |
|
| REF | Corresponds to: |
Ref document number: 60121214 Country of ref document: DE Date of ref document: 20060810 Kind code of ref document: P |
|
| PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: IE Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20060818 |
|
| PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: MC Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20060831 |
|
| PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: DK Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20060928 Ref country code: SE Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20060928 |
|
| PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: ES Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20061009 |
|
| REG | Reference to a national code |
Ref country code: CH Ref legal event code: NV Representative=s name: HEPP, WENGER & RYFFEL AG |
|
| PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: PT Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20061128 |
|
| ET | Fr: translation filed | ||
| NLV1 | Nl: lapsed or annulled due to failure to fulfill the requirements of art. 29p and 29m of the patents act | ||
| RAP2 | Party data changed (patent owner data changed or rights of a patent transferred) |
Owner name: FRAUNHOFER-GESELLSCHAFT ZUR FOERDERUNG DER ANGEWAN |
|
| PLBE | No opposition filed within time limit |
Free format text: ORIGINAL CODE: 0009261 |
|
| STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT |
|
| 26N | No opposition filed |
Effective date: 20070329 |
|
| PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: GR Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20060929 |
|
| PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: TR Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20060628 Ref country code: LU Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20060818 |
|
| PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: CY Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20060628 |
|
| REG | Reference to a national code |
Ref country code: FR Ref legal event code: PLFP Year of fee payment: 16 |
|
| REG | Reference to a national code |
Ref country code: FR Ref legal event code: PLFP Year of fee payment: 17 |
|
| REG | Reference to a national code |
Ref country code: FR Ref legal event code: PLFP Year of fee payment: 18 |
|
| REG | Reference to a national code |
Ref country code: DE Ref legal event code: R079 Ref document number: 60121214 Country of ref document: DE Free format text: PREVIOUS MAIN CLASS: G06F0017300000 Ipc: G06F0016000000 |
|
| PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: GB Payment date: 20200825 Year of fee payment: 20 Ref country code: DE Payment date: 20200824 Year of fee payment: 20 Ref country code: FR Payment date: 20200820 Year of fee payment: 20 |
|
| PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: AT Payment date: 20200819 Year of fee payment: 20 Ref country code: CH Payment date: 20200825 Year of fee payment: 20 Ref country code: BE Payment date: 20200820 Year of fee payment: 20 |
|
| REG | Reference to a national code |
Ref country code: DE Ref legal event code: R071 Ref document number: 60121214 Country of ref document: DE |
|
| REG | Reference to a national code |
Ref country code: CH Ref legal event code: PL |
|
| REG | Reference to a national code |
Ref country code: GB Ref legal event code: PE20 Expiry date: 20210817 |
|
| REG | Reference to a national code |
Ref country code: BE Ref legal event code: MK Effective date: 20210818 |
|
| REG | Reference to a national code |
Ref country code: AT Ref legal event code: MK07 Ref document number: 331990 Country of ref document: AT Kind code of ref document: T Effective date: 20210818 |
|
| PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: GB Free format text: LAPSE BECAUSE OF EXPIRATION OF PROTECTION Effective date: 20210817 |