WO2019147968A1 - Real time multi variate time series search - Google Patents
Real time multi variate time series search Download PDFInfo
- Publication number
- WO2019147968A1 WO2019147968A1 PCT/US2019/015197 US2019015197W WO2019147968A1 WO 2019147968 A1 WO2019147968 A1 WO 2019147968A1 US 2019015197 W US2019015197 W US 2019015197W WO 2019147968 A1 WO2019147968 A1 WO 2019147968A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- time series
- data
- query signal
- data set
- distance
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
- G06F16/24568—Data stream processing; Continuous queries
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2477—Temporal data queries
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/901—Indexing; Data structures therefor; Storage structures
Definitions
- any industrial asset there may be large quantities of data being acquired during operation, for example, from sensors and/or operational parameters. In some cases, up to about 97% of the acquired data can go unused due to lack of tools that can utilize the data for troubleshooting. As an example, troubleshooting can focus on specific installations where a problem has occurred around a failure time period.
- FIG. 1 is a process flow diagram illustrating an example process flow diagram a process of searching an indexed data set with a query signal (e.g., pattern) represented as a string, which is a compressed representation of the time series data.
- a query signal e.g., pattern
- FIG. 2 is a plot showing example time series data and the corresponding example string representation of that time series data.
- FIG. 3 illustrates another example of time series data represented as a string.
- FIG. 4 is a plot illustrating the distance between two signals quantized to levels corresponding to their string representation.
- FIG. 5 illustrates the time series and query signal respective string representations.
- FIG. 6 is a functional block diagram illustrating an example framework for a time series pattern search system.
- FIG. 7 is two plots illustrating normalized reference data.
- FIG. 8 is a plot illustrating an example L2 norm distance for an example query.
- FIG. 9 illustrates example time series data with highlighted portions indicating that a query signal was found within the time series data.
- the query signal can be of a data set that is storing a string representation of time series data generated by machine asset such as, for example, an industrial machine asset.
- machine asset such as, for example, an industrial machine asset.
- time series data is of rotations per minute of a turbine that varies over a range of values that are represented as, e.g., float or double data types, those values can be transformed (e.g., encoded) into string values taken from a fixed set of string values.
- FIG. 2 is a plot 200 showing example time series data and the corresponding example string representation of that time series data.
- the amplitude values of the time series data are effectively quantized to discrete levels corresponding to one of a number of string values.
- the amplitude of the time series data is quantized to one of three values (“a”,“b”, or“c”).
- FIG. 3 is a plot 300 illustrating another example of time series data represented as a string.
- the time series data (amplitude value on left axis) can be assigned one value from the set of“a”,“b”,“c”,“d”,“e”, and“f’ (the assignment scheme is illustrated on the right axis).
- This approach can be advantageous as each symbol can require fewer bits than real-numbers (e.g., float, double, and the like).
- nearly a 100 times compression can be possible without substantial loss in fidelity and the approach can be noise immune. This can be achieved because in some implementations, the time series data is not pre-filtered to suppress certain features or characteristics of the data.
- the indexed data set is searched for an occurrence of the query signal.
- the indexed data set can be searched by at least determining a distance between the query signal and portions of the indexed data set.
- Distance may be computed as a distance between the string characters in the query signal and the indexed data set using a sliding window approach.
- the distance can include the Euclidean distance or the piecewise aggregate approximation (PAA) distance, which can be used when the time series is discretized and aggregated. Other measures of distance can be used.
- FIG. 4 is a plot 400 illustrating the distance between two signals quantized to levels corresponding to their string representation
- FIG. 5 is a plot 500 illustrating the time series and query signal respective string representations.
- the Euclidean distance between“a” and“b” is 1, whereas the Euclidean distance between“a” and“d” is 3.
- this can be expressed formally as
- determining the distance between two strings can be implemented with a table look up where the distance between each pair of strings or characters is included in the table.
- a table lookup can be O(constant) evaluation, which can be performed quickly.
- the occurrence of the query signal within the indexed data set can be provided.
- the occurrence can be represented as, e.g., a time index within the data set at which the pattern occurs.
- the occurrence can include the time indices associated with the minimum computed distance.
- multiple occurrences are provided, for example the indices associated with all computed distances that are below a predetermined value or the smallest N distances, where N is predetermined.
- the provided occurrence characterizes the presence or matching of a pattern within the data set, which can represent the occurrence of an event. This can be used, for example, where there are many assets in the field producing data over multiple years. Once a match of a pattern in a given asset is determined, which and whether other assets had this same event previously and when can be determined.
- FIG. 6 is a functional block diagram 600 illustrating an example framework for a time series pattern search system.
- a query signal can be received in multi-variate form and transformed to a string representation, which is an approximate of the multi- variate query signal.
- incoming data is received (e.g., from the industrial machine), is transformed to a string representation, and both the incoming data and the string representation is stored in a database as an indexed data set.
- the query signal approximate can be compared using a sliding window to the indexed data set and a measure of distance is computed at each position of the sliding window.
- This approach enables indexing of time series data as it is received (e.g., indexing is ongoing and the dataset is preprocessed before a search is performed) and at scale (e.g., pattern matching can be performed across many large data sets simultaneously).
- the window size can be determined dynamically or predetermined.
- the window size can be determined based on the particular application (e.g., the underlying time series data). For example, the window size can be determined based on a length of time of an event that is being searched for.
- the sliding window can have a varying stride length (e.g., the number of samples that the window moves between distance computations), which can impact both detection rates and query speed performance.
- indexing can also serve as a compression (e.g., encoding) scheme.
- the indexed data set can be notably small in size than the original time series data.
- the level of compression and fidelity of the compression can be varied by changing the number of levels (e.g., quantization levels such as number of string characters used) used to represent the time series data in string format.
- data can be scaled for unit variance. This can be performed because given time series data may have dynamic amplitude ranges and events may occur with different amplitudes.
- scaling e.g., normalizing
- FIG. 7 is two plots 700 and 750 illustrating normalized reference data.
- FIG. 8 is a plot 800 illustrating an example L2 norm distance for an example query.
- the string representation transform results in flat level
- the search techniques can be parallelized to operate at scale.
- the current subject matter can be multivariable enabled and can include a framework to process data (including indexing and searching) quickly (e.g., in near real-time). Indexing can be performed independently for segments of the time series data and in parallel.
- Searching for a query in a time series index can be performed in parallel by splitting time series and parallelizing the processing.
- some implementations of the current subject matter can normalize and index datasets at the time of ingestion; assemble the query string and time series data based on physical reality of data (window size, stride length, etc..); perform univariate distance search in string domain; compute the multi variate distance using L2 norm; search for the min distance along the time series; highlight locations that satisfy the threshold of minimum; and the like.
- some implementations of the current subject matter can enable compression of time series during ingestion by indexing and searching through indexed data rapidly; an ability to determine a quantitative metric for distance between query and time series in a multivariate domain; accelerated computing at scale using parallel computing; and the like.
- the current subject matter can enable searching through massive time series data sets with a low computational burden and at scale (e.g., across a large range of data sets). Because the current subject matter does not rely on some transformations, such as the fast Fourier transform or wavelet representation, the current subject matter can be less prone to noise and it can scale better to larger data sets.
- implementations of the current subject matter can include normalizing the data set space and determining the string approximation for a preset number of levels; creating an index and store the index as a function of time into the same dataset; assembling the query and roll it (e.g., slide) through the indexed dataset.
- Some implementations can also include a multi-variate approach where a distance metric can be determined as the query sweeps through the dataset, in contrast to univariate based approaches.
- One or more aspects or features of the subject matter described herein can be realized in digital electronic circuitry, integrated circuitry, specially designed application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs) computer hardware, firmware, software, and/or combinations thereof.
- ASICs application specific integrated circuits
- FPGAs field programmable gate arrays
- These various aspects or features can include implementation in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which can be special or general purpose, coupled to receive data and instructions from, and to transmit data and instructions to, a storage system, at least one input device, and at least one output device.
- the programmable system or computing system may include clients and servers.
- a client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Fuzzy Systems (AREA)
- Mathematical Physics (AREA)
- Probability & Statistics with Applications (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Testing And Monitoring For Control Systems (AREA)
Abstract
A string representation of a query signal of a data set storing a string representation of a stream of time series data is received. The time series data is generated by a machine asset. An indexed data set is searched for an occurrence of the query signal by at least determining a distance between the query signal and portions of the indexed data set. The occurrence of the query signal within the indexed data set is provided. Related apparatus, systems, techniques and articles are also described.
Description
Real Time Multi Variate Time Series Search
CROSS-REFERENCE TO RELATED APPLICATION
[0001] The present application claims priority to U.S. Provisional
Application No. 62/622,739, filed on January 26, 2018 in the U.S. Patent and Trademark Office, the entire disclosure of which is hereby incorporated by reference.
BACKGROUND
[0002] In any industrial asset, there may be large quantities of data being acquired during operation, for example, from sensors and/or operational parameters. In some cases, up to about 97% of the acquired data can go unused due to lack of tools that can utilize the data for troubleshooting. As an example, troubleshooting can focus on specific installations where a problem has occurred around a failure time period.
[0003] In order to search time series data produced by industrial assets for patterns, a search can be performed by extracting the data from a database and then operating on it to search for patterns. Such searches are time consuming and require significant processing power.
SUMMARY
[0004] The subject matter described herein relates to real time multi variate time series search. In an aspect, a string representation of a query signal of a data set storing a string representation of a stream of time series data is received. The time series data is generated by a machine asset. An indexed data set is searched for an occurrence of the query signal by at least determining a distance between the query signal and portions of the indexed data set. The occurrence of the query signal within
the indexed data set is provided. Related apparatus, systems, techniques and articles are also described.
[0005] Non-transitory computer program products (e.g., physically embodied computer program products) are also described that store instructions, which when executed by one or more data processors of one or more computing systems, cause at least one data processor to perform operations herein. Similarly, computer systems are also described that may include one or more data processors and memory coupled to the one or more data processors. The memory may temporarily or permanently store instructions that cause at least one processor to perform one or more of the operations described herein. In addition, methods can be implemented by one or more data processors either within a single computing system or distributed among two or more computing systems. Such computing systems can be connected and can exchange data and/or commands or other instructions or the like via one or more connections, including a connection over a network (e.g. the Internet, a wireless wide area network, a local area network, a wide area network, a wired network, or the like), via a direct connection between one or more of the multiple computing systems, etc.
[0006] The details of one or more variations of the subject matter described herein are set forth in the accompanying drawings and the description below. Other features and advantages of the subject matter described herein will be apparent from the description and drawings, and from the claims.
DESCRIPTION OF DRAWINGS
[0007] FIG. 1 is a process flow diagram illustrating an example process flow diagram a process of searching an indexed data set with a query signal (e.g.,
pattern) represented as a string, which is a compressed representation of the time series data.
[0008] FIG. 2 is a plot showing example time series data and the corresponding example string representation of that time series data.
[0009] FIG. 3 illustrates another example of time series data represented as a string.
[0010] FIG. 4 is a plot illustrating the distance between two signals quantized to levels corresponding to their string representation.
[0011] FIG. 5 illustrates the time series and query signal respective string representations.
[0012] FIG. 6 is a functional block diagram illustrating an example framework for a time series pattern search system.
[0013] FIG. 7 is two plots illustrating normalized reference data.
[0014] FIG. 8 is a plot illustrating an example L2 norm distance for an example query.
[0015] FIG. 9 illustrates example time series data with highlighted portions indicating that a query signal was found within the time series data.
[0016] Like reference symbols in the various drawings indicate like elements.
DETAILED DESCRIPTION
[0017] The current subject matter can enable generic time series pattern searches on datasets, in some instances, massive datasets. In some implementations, time series data relating to industrial machines (such as oil and gas wells, turbines, refineries, and the like) can be indexed as it is stored. As an example, the time series
signal can be compressed to or represented as a string representation. Patterns can be detected by searching through the dataset using the string representation as an index. In some implementations, the system interface can allow a user to select any portion of a multi-variate time series signal and to quickly search through very large datasets to identify similar patterns. Some implementations can be orders of magnitude faster than currently available industry standard practices for searching through time series datasets. By having a rapid way of querying similar patterns in the time series, a huge wealth of information about the onset of industrial machine failures or troubling operation prior to it becoming severe can be provided.
[0018] FIG. 1 is an example process flow diagram 100 illustrating a process for searching an indexed data set with a query signal (e.g., pattern) represented as a string, which is a compressed representation of the time series data. Because the process illustrated in FIG. 1 can be implemented as a rapid way of querying similar patterns in a time series, information about the onset of industrial machine failures or troubling operation prior to it becoming severe can be provided.
[0019] At 110, data characterizing a string representation of a query signal is received. The query signal can be of a data set that is storing a string representation of time series data generated by machine asset such as, for example, an industrial machine asset. For example, if the time series data is of rotations per minute of a turbine that varies over a range of values that are represented as, e.g., float or double data types, those values can be transformed (e.g., encoded) into string values taken from a fixed set of string values.
[0020] FIG. 2 is a plot 200 showing example time series data and the corresponding example string representation of that time series data. By transforming the time series data to a string representation, the amplitude values of the time series
data are effectively quantized to discrete levels corresponding to one of a number of string values. In the example of FIG. 2, the amplitude of the time series data is quantized to one of three values (“a”,“b”, or“c”).
[0021] FIG. 3 is a plot 300 illustrating another example of time series data represented as a string. The time series data (amplitude value on left axis) can be assigned one value from the set of“a”,“b”,“c”,“d”,“e”, and“f’ (the assignment scheme is illustrated on the right axis). This approach can be advantageous as each symbol can require fewer bits than real-numbers (e.g., float, double, and the like). In addition, in some implementations, nearly a 100 times compression can be possible without substantial loss in fidelity and the approach can be noise immune. This can be achieved because in some implementations, the time series data is not pre-filtered to suppress certain features or characteristics of the data.
[0022] Referring again to FIG. 1, at 120, the indexed data set is searched for an occurrence of the query signal. As an example, the indexed data set can be searched by at least determining a distance between the query signal and portions of the indexed data set. Distance may be computed as a distance between the string characters in the query signal and the indexed data set using a sliding window approach. The distance can include the Euclidean distance or the piecewise aggregate approximation (PAA) distance, which can be used when the time series is discretized and aggregated. Other measures of distance can be used. For example, FIG. 4 is a plot 400 illustrating the distance between two signals quantized to levels corresponding to their string representation and FIG. 5 is a plot 500 illustrating the time series and query signal respective string representations. The Euclidean distance between“a” and“b” is 1, whereas the Euclidean distance between“a” and“d” is 3. For the representation of FIG. 4, this can be expressed formally as
where Q is the query signal, and C is the indexed data set. Where the signals are represented as strings, the Euclidean distance can be expressed formally as
where dist() returns the integer separation between two strings. In some implementations, determining the distance between two strings can be implemented with a table look up where the distance between each pair of strings or characters is included in the table. A table lookup can be O(constant) evaluation, which can be performed quickly.
[0023] Referring again to FIG. 1, at 130, the occurrence of the query signal within the indexed data set can be provided. The occurrence can be represented as, e.g., a time index within the data set at which the pattern occurs. The occurrence can include the time indices associated with the minimum computed distance. In some implementations, multiple occurrences are provided, for example the indices associated with all computed distances that are below a predetermined value or the smallest N distances, where N is predetermined.
[0024] Providing the occurrence can include displaying the time series data in a manner that highlights the portion of the time series that resulted in a match. For example, FIG. 9 illustrates example time series data with highlighted portions indicating that a query signal was found within the time series data.
[0025] The provided occurrence characterizes the presence or matching of a pattern within the data set, which can represent the occurrence of an event. This can be used, for example, where there are many assets in the field producing data over multiple years. Once a match of a pattern in a given asset is determined, which and
whether other assets had this same event previously and when can be determined.
How those identified assets deteriorated over time can be determined and as well as the consequence of the occurrence of the event. This analysis and learning can be used to take corrective action to prevent the current asset from having similar problems. In other words, the current subject matter can enable industrial machine operators to identify potential operational problems early and take appropriate action before further damage or performance loss occurs.
[0026] FIG. 6 is a functional block diagram 600 illustrating an example framework for a time series pattern search system. A query signal can be received in multi-variate form and transformed to a string representation, which is an approximate of the multi- variate query signal. In addition, incoming data is received (e.g., from the industrial machine), is transformed to a string representation, and both the incoming data and the string representation is stored in a database as an indexed data set. To perform the search, the query signal approximate can be compared using a sliding window to the indexed data set and a measure of distance is computed at each position of the sliding window. This approach enables indexing of time series data as it is received (e.g., indexing is ongoing and the dataset is preprocessed before a search is performed) and at scale (e.g., pattern matching can be performed across many large data sets simultaneously).
[0027] The current subject matter can perform pattern searches over very large data sets quickly. This can be achieved by indexing the dataset at ingestion (e.g., receipt and storage) of the data set. By pre-indexing, significant gains in query speed can be achieved.
[0028] In some implementations, the window size can be determined dynamically or predetermined. The window size can be determined based on the
particular application (e.g., the underlying time series data). For example, the window size can be determined based on a length of time of an event that is being searched for. In addition, the sliding window can have a varying stride length (e.g., the number of samples that the window moves between distance computations), which can impact both detection rates and query speed performance.
[0029] The current subject matter can be advantageous in that the indexing can also serve as a compression (e.g., encoding) scheme. By changing the
representation of the time series data to a string format, the indexed data set can be notably small in size than the original time series data. The level of compression and fidelity of the compression can be varied by changing the number of levels (e.g., quantization levels such as number of string characters used) used to represent the time series data in string format.
[0030] In some implementations, data can be scaled for unit variance. This can be performed because given time series data may have dynamic amplitude ranges and events may occur with different amplitudes. By scaling (e.g., normalizing) the time series data and query signal when creating the string representation, pattern detection can be improved. For example, FIG. 7 is two plots 700 and 750 illustrating normalized reference data.
[0031] FIG. 8 is a plot 800 illustrating an example L2 norm distance for an example query. The string representation transform results in flat level
approximations of the data. As a result, the distance computation may not specifically identify the start of a pattern match (e.g., event). As a result, additional processing can be performed to detect the index corresponding to the start of the pattern. This can be achieved by detecting sequence of distances that have the same value and consider them as a single match. For example, if the distance is computed as“3, 2, 3, 1, 1, 1, 2,
3”; then the series of“1, 1, 1” can be treated as a single match (e.g., because it is flat) instead of three matches. Combining can be performed based on the window size to report the match location as a single match instead of multiplicity of values.
[0032] In some implementation, flat level approximations of the data can be processed by identifying the minimum distance values; of all the indexes returned, a first order difference of index locations can be determined; and in response to index locations delta being equal to 1, those locations can be combined.
[0033] Although a few variations have been described in detail above, other modifications or additions are possible. For example, in some implementations, the search techniques can be parallelized to operate at scale. The current subject matter can be multivariable enabled and can include a framework to process data (including indexing and searching) quickly (e.g., in near real-time). Indexing can be performed independently for segments of the time series data and in parallel.
Searching for a query in a time series index can be performed in parallel by splitting time series and parallelizing the processing.
[0034] The subject matter described herein can provide many technical advantages. For example, some implementations of the current subject matter can normalize and index datasets at the time of ingestion; assemble the query string and time series data based on physical reality of data (window size, stride length, etc..); perform univariate distance search in string domain; compute the multi variate distance using L2 norm; search for the min distance along the time series; highlight locations that satisfy the threshold of minimum; and the like.
[0035] Further, some implementations of the current subject matter can enable compression of time series during ingestion by indexing and searching through indexed data rapidly; an ability to determine a quantitative metric for distance
between query and time series in a multivariate domain; accelerated computing at scale using parallel computing; and the like. The current subject matter can enable searching through massive time series data sets with a low computational burden and at scale (e.g., across a large range of data sets). Because the current subject matter does not rely on some transformations, such as the fast Fourier transform or wavelet representation, the current subject matter can be less prone to noise and it can scale better to larger data sets. Further, some implementations of the current subject matter can include normalizing the data set space and determining the string approximation for a preset number of levels; creating an index and store the index as a function of time into the same dataset; assembling the query and roll it (e.g., slide) through the indexed dataset. Some implementations can also include a multi-variate approach where a distance metric can be determined as the query sweeps through the dataset, in contrast to univariate based approaches.
[0036] One or more aspects or features of the subject matter described herein can be realized in digital electronic circuitry, integrated circuitry, specially designed application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs) computer hardware, firmware, software, and/or combinations thereof. These various aspects or features can include implementation in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which can be special or general purpose, coupled to receive data and instructions from, and to transmit data and instructions to, a storage system, at least one input device, and at least one output device. The programmable system or computing system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server
arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
[0037] These computer programs, which can also be referred to as programs, software, software applications, applications, components, or code, include machine instructions for a programmable processor, and can be implemented in a high-level procedural language, an object-oriented programming language, a functional programming language, a logical programming language, and/or in assembly/machine language. As used herein, the term“machine-readable medium” refers to any computer program product, apparatus and/or device, such as for example magnetic discs, optical disks, memory, and Programmable Logic Devices (PLDs), used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term“machine-readable signal” refers to any signal used to provide machine instructions and/or data to a programmable processor. The machine-readable medium can store such machine instructions non-transitorily, such as for example as would a non-transient solid-state memory or a magnetic hard drive or any equivalent storage medium. The machine -readable medium can alternatively or additionally store such machine instructions in a transient manner, such as for example as would a processor cache or other random access memory associated with one or more physical processor cores.
[0038] To provide for interaction with a user, one or more aspects or features of the subject matter described herein can be implemented on a computer having a display device, such as for example a cathode ray tube (CRT) or a liquid crystal display (LCD) or a light emitting diode (LED) monitor for displaying information to the user and a keyboard and a pointing device, such as for example a
mouse or a trackball, by which the user may provide input to the computer. Other kinds of devices can be used to provide for interaction with a user as well. For example, feedback provided to the user can be any form of sensory feedback, such as for example visual feedback, auditory feedback, or tactile feedback; and input from the user may be received in any form, including acoustic, speech, or tactile input. Other possible input devices include touch screens or other touch-sensitive devices such as single or multi-point resistive or capacitive trackpads, voice recognition hardware and software, optical scanners, optical pointers, digital image capture devices and associated interpretation software, and the like.
[0039] In the descriptions above and in the claims, phrases such as“at least one of’ or“one or more of’ may occur followed by a conjunctive list of elements or features. The term“and/or” may also occur in a list of two or more elements or features. Unless otherwise implicitly or explicitly contradicted by the context in which it is used, such a phrase is intended to mean any of the listed elements or features individually or any of the recited elements or features in combination with any of the other recited elements or features. For example, the phrases“at least one of A and B;”“one or more of A and B;” and“A and/or B” are each intended to mean“A alone, B alone, or A and B together.” A similar interpretation is also intended for lists including three or more items. For example, the phrases“at least one of A, B, and C;”“one or more of A, B, and C;” and“A, B, and/or C” are each intended to mean“A alone, B alone, C alone, A and B together, A and C together, B and C together, or A and B and C together.” In addition, use of the term“based on,” above and in the claims is intended to mean,“based at least in part on,” such that an unrecited feature or element is also permissible.
[0040] The subject matter described herein can be embodied in systems, apparatus, methods, and/or articles depending on the desired configuration. The implementations set forth in the foregoing description do not represent all implementations consistent with the subject matter described herein. Instead, they are merely some examples consistent with aspects related to the described subject matter. Although a few variations have been described in detail above, other modifications or additions are possible. In particular, further features and/or variations can be provided in addition to those set forth herein. For example, the implementations described above can be directed to various combinations and sub-combinations of the disclosed features and/or combinations and sub-combinations of several further features disclosed above. In addition, the logic flows depicted in the accompanying figures and/or described herein do not necessarily require the particular order shown, or sequential order, to achieve desirable results. Other implementations may be within the scope of the following claims.
Claims
1. A method comprising:
receiving a string representation of a query signal of a data set storing a string representation of a stream of time series data, the time series data generated by a machine asset;
searching an indexed data set for an occurrence of the query signal by at least determining a distance between the query signal and portions of the indexed data set; and providing the occurrence of the query signal within the indexed data set.
Priority Applications (4)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201980015856.7A CN111989661A (en) | 2018-01-26 | 2019-01-25 | Real-time multivariate time series search |
| SG11202007063PA SG11202007063PA (en) | 2018-01-26 | 2019-01-25 | Real time multi variate time series search |
| RU2020127289A RU2020127289A (en) | 2018-01-26 | 2019-01-25 | SEARCH IN A MULTIDIMENSIONAL TIME SERIES IN REAL TIME |
| EP19743633.0A EP3743825A4 (en) | 2018-01-26 | 2019-01-25 | Real time multi variate time series search |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US201862622739P | 2018-01-26 | 2018-01-26 | |
| US62/622,739 | 2018-01-26 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2019147968A1 true WO2019147968A1 (en) | 2019-08-01 |
Family
ID=67395083
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US2019/015197 Ceased WO2019147968A1 (en) | 2018-01-26 | 2019-01-25 | Real time multi variate time series search |
Country Status (5)
| Country | Link |
|---|---|
| EP (1) | EP3743825A4 (en) |
| CN (1) | CN111989661A (en) |
| RU (1) | RU2020127289A (en) |
| SG (1) | SG11202007063PA (en) |
| WO (1) | WO2019147968A1 (en) |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| KR100472948B1 (en) * | 2001-10-11 | 2005-03-08 | 한국전자통신연구원 | A method for optimizing the post-processing of sub-sequence matching in time-series databases |
| WO2008043082A2 (en) * | 2006-10-05 | 2008-04-10 | Splunk Inc. | Time series search engine |
| WO2013122338A1 (en) * | 2012-02-14 | 2013-08-22 | 주식회사 케이티 | Method for distributed indexing and searching for efficiently analyzing time series data in search systems |
| US20150120749A1 (en) * | 2013-10-30 | 2015-04-30 | Microsoft Corporation | Data management for connected devices |
Family Cites Families (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US9069824B2 (en) * | 2012-11-15 | 2015-06-30 | International Business Machines Corporation | Accelerating time series data base queries using dictionary based representations |
| CN104182460B (en) * | 2014-07-18 | 2017-06-13 | 浙江大学 | Time Series Similarity querying method based on inverted index |
| US9953065B2 (en) * | 2015-02-13 | 2018-04-24 | International Business Machines Corporation | Method for processing a database query |
-
2019
- 2019-01-25 WO PCT/US2019/015197 patent/WO2019147968A1/en not_active Ceased
- 2019-01-25 RU RU2020127289A patent/RU2020127289A/en unknown
- 2019-01-25 EP EP19743633.0A patent/EP3743825A4/en not_active Withdrawn
- 2019-01-25 CN CN201980015856.7A patent/CN111989661A/en active Pending
- 2019-01-25 SG SG11202007063PA patent/SG11202007063PA/en unknown
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| KR100472948B1 (en) * | 2001-10-11 | 2005-03-08 | 한국전자통신연구원 | A method for optimizing the post-processing of sub-sequence matching in time-series databases |
| WO2008043082A2 (en) * | 2006-10-05 | 2008-04-10 | Splunk Inc. | Time series search engine |
| WO2013122338A1 (en) * | 2012-02-14 | 2013-08-22 | 주식회사 케이티 | Method for distributed indexing and searching for efficiently analyzing time series data in search systems |
| US20150120749A1 (en) * | 2013-10-30 | 2015-04-30 | Microsoft Corporation | Data management for connected devices |
Non-Patent Citations (2)
| Title |
|---|
| EAMONN J. KEOGH ET AL.: "An Indexing Scheme for Fast Similarity Search in Large Time Series Databases", PROCEEDINGS. ELEVENTH INTERNATIONAL CONFERENCE ON SCIENTIFIC AND STATISTICAL DATABASE MANAGEMENT, 1999, pages 56 - 67, XP010348732, Retrieved from the Internet <URL:https://www.semanticscholar.org/paper/An-Indexing-Scheme-for-Fast-Similarity-Search-in-Keogh-Pazzani/5f5873a30755c6ca3dfdedcb4bdd6081f1cc792c#extracted> * |
| See also references of EP3743825A4 * |
Also Published As
| Publication number | Publication date |
|---|---|
| CN111989661A (en) | 2020-11-24 |
| RU2020127289A (en) | 2022-02-28 |
| RU2020127289A3 (en) | 2022-02-28 |
| SG11202007063PA (en) | 2020-08-28 |
| EP3743825A4 (en) | 2021-10-20 |
| EP3743825A1 (en) | 2020-12-02 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| KR102532396B1 (en) | Data set processing method, device, electronic equipment and storage medium | |
| US9946876B2 (en) | Wavelet decomposition of software entropy to identify malware | |
| CN107622072B (en) | Identification method for webpage operation behavior, server and terminal | |
| US12373557B2 (en) | Methods and systems for identifying anomalous computer events to detect security incidents | |
| CN110750615B (en) | Text repeatability judgment method and device, electronic equipment and storage medium | |
| CN111967262A (en) | Method and device for determining entity tag | |
| WO2009132263A2 (en) | Database systems and methods | |
| Xu et al. | An adaptive algorithm for online time series segmentation with error bound guarantee | |
| CN111291070A (en) | Abnormal SQL detection method, equipment and medium | |
| EP3084673A1 (en) | System alert correlation via deltas | |
| CN114528311B (en) | A method and device for detecting similarity of SQL statements | |
| CN111737966B (en) | Document repetition detection method, device, equipment and readable storage medium | |
| CN111399848A (en) | A hard-coded data detection method, device, electronic device and medium | |
| US11436241B2 (en) | Entity resolution based on character string frequency analysis | |
| US11880391B2 (en) | Clustering software codes in scalable manner | |
| CN108229358A (en) | Index establishing method and device, electronic equipment, computer storage media, program | |
| CN110472034B (en) | Detection method, device and equipment of question-answering system and computer readable storage medium | |
| US11593245B2 (en) | System, device and method for frozen period detection in sensor datasets | |
| CN115146083A (en) | Method and apparatus for determining target text, electronic device, computer-readable medium | |
| CN115114532A (en) | A kind of data search method, device and device based on user behavior | |
| CN114724146A (en) | Abnormal text recognition method and device, electronic equipment and storage medium | |
| WO2019147968A1 (en) | Real time multi variate time series search | |
| Megherbi et al. | Detection of advanced persistent threats using hashing and graph-based learning on streaming data: M. Walid et al. | |
| CN113792087A (en) | Data analysis method, apparatus, electronic device, and computer-readable storage medium | |
| US12500923B2 (en) | Identifying coordinated malicious activities using sequences of requests |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19743633 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| ENP | Entry into the national phase |
Ref document number: 2019743633 Country of ref document: EP Effective date: 20200826 |