US20160203409A1 - Framework for calculating grouped optimization algorithms within a distributed data store - Google Patents
Framework for calculating grouped optimization algorithms within a distributed data store Download PDFInfo
- Publication number
- US20160203409A1 US20160203409A1 US15/074,063 US201615074063A US2016203409A1 US 20160203409 A1 US20160203409 A1 US 20160203409A1 US 201615074063 A US201615074063 A US 201615074063A US 2016203409 A1 US2016203409 A1 US 2016203409A1
- Authority
- US
- United States
- Prior art keywords
- data
- groups
- database
- iteration
- predictive model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- G06N7/005—
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N7/00—Computing arrangements based on specific mathematical models
- G06N7/01—Probabilistic graphical models, e.g. probabilistic networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/242—Query formulation
- G06F16/2433—Query languages
- G06F16/244—Grouping and aggregation
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2471—Distributed queries
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
- G06F16/285—Clustering or classification
-
- G06F17/30412—
-
- G06F17/30598—
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
Definitions
- This application relates generally to analyzing data using machine learning algorithms to develop prediction models for generalization, and more particularly for applying iterative machine learning and other analytic algorithms directly on grouped data instances in databases.
- the data are not simply facts such as sales and transactional data. Rather, the data may comprise all relevant information within the purview of a company which the company may acquire, explore, analyze and manipulate while searching for facts and insights that can lead to new business opportunities and leverage for its strategies. For instance, an airline company may have a great deal of data about ticket purchases and sometimes even about traveling customers, but this information in and of itself does not permit an understanding of customer behavior or answer questions such as their motivations behind ticket purchases, and does not afford the company the insight to make predictions that take advantage of this motivation. To accomplish this, the company may need to run various analytics and machine learning algorithms (processes) on its data to derive models which can provide insight into the data and afford generalization.
- processes machine learning algorithms
- Database systems typically store data in data structures such as tables, and use query languages such as Structured Query Language (SQL) and the like for storing, manipulating, and accessing the data.
- SQL Structured Query Language
- SQL and other query languages cannot perform more complex analytics on data or run machine learning algorithms such as regression, classification, etc., which attempt to make predictions based upon generalizations from representations of data instances.
- machine learning algorithms require iteration on data, which SQL cannot do. This means that such analytics must be run by other programs and processes that may not operate within the database or interface well with SQL.
- data redistributed since data is typically stored in a database by mixing together and storing a variety of data elements having different parameters and values, it may be necessary to redistribute the data to group common elements together for analysis. While data may be redistributed using a SQL GROUPBY operation, data redistribution is expensive and undesirable. It is time-consuming and it requires physically moving data around which has high overhead and the risk of data loss or corruption.
- FIG. 1 is a diagrammatic view of the architecture of a distributed database system with which the invention may be used;
- FIG. 2 illustrates a preferred embodiment of a node of the database system of FIG. 1 ;
- FIG. 3 is a diagrammatic view that illustrates generally a process in accordance with the invention.
- FIG. 4 is a diagrammatic view illustrating the architecture of a preferred embodiment of a system in accordance with the invention.
- FIG. 5 is diagrammatic view of a process in accordance with a preferred embodiment of the invention.
- This invention is particularly well adapted for use with a large distributed relational database system such as a massively parallel processor (MPP) shared-nothing database system used for data warehousing or transaction processing, and will be described in that context. It will be appreciated, however, that this is illustrative of only one utility of the invention and that the invention has applicability to other types of data storage systems and methods.
- MPP massively parallel processor
- FIG. 1 is an overview of the architecture of one type of distributed database system 100 with which the invention may be employed, the database system shown being a massively parallel processor (MPP) shared nothing distributed database system.
- the database system may include a master node 102 which connects to a plurality of segment nodes 104 _A through 104 _N. Each segment node may comprise one or more database (DB) instances.
- the master and segment nodes may comprise generally similar server systems and have similar architectures and process models.
- the nodes operate together to process user data requests (queries) and return results, and to perform other user directed processing operations, such as running analytics and machine learning algorithms, as will be described.
- the segments 104 _ 1 - 104 _N work together under the direction of the master 102 to process workloads.
- FIG. 2 illustrates an example of the architecture of a master node 202 of the database system that is configured to perform processes and operations in accordance with the invention.
- the master node and the segment nodes may have substantially similar architectures, as stated above.
- the master node 202 may comprise a host computer server system (which may comprise a single CPU 210 or which may be a multi-processor system comprising a plurality of CPUs) connected to input/output (I/O) devices 212 by a communications bus 214 .
- the I/O devices may be standard computer system input and output devices.
- a network interface module 216 may also be connected to bus 214 to allow the master node to operate in a networked environment.
- the master node may further have storage 220 comprising non-transitory physical storage media connected to the bus that embodies executable instructions, such as an operating system and application programs, to control the operations of the computer system.
- Storage 220 may include a main memory 222 embodying control programs that control the CPU to operate in accordance with the invention, as will be described, and may contain other storage 224 including a data store that stores system configuration information, logs, applications and utilities, data and auxiliary storage 226 for storing the results of analytic processing algorithms, for instance, as will be described.
- the master 202 may be responsible for accepting queries from a client (user), planning queries, dispatching query plans to the segments for execution on the stored data in the distributed storage, and collecting the query results from the segments.
- the master may also accept directions from a user or other application programs to perform other processing operations, as will be described.
- the network interconnect module 216 may also communicate data, instructions and results between execution processes on the master and segments.
- the invention affords systems and methods having a unique architecture that includes a library of machine learning and statistical tools (analytic algorithms) on top of a general data store with an additional layer of abstraction above this structure.
- the abstraction layer may control the algorithms to operate on data in the data store and provide an iterative and grouping framework for the algorithms for merging groups of data and data sets from the data stores of different segments.
- This architecture enables the analytic algorithms to run iteratively and directly on selected grouped data instances in the data stores that have one or more common elements of interest, and to produce models corresponding to aggregated data values from which inferences and predictions (generalizations) can be made.
- the invention avoids the necessity of data redistribution, and the iterative framework enables iterative analytic algorithms such as logistic regression and classification to be used for analyzing the data. Moreover, the architecture allows new analytic algorithms to be easily added to the library and existing algorithms to be readily changed to upgrade the functionality of the system.
- the GROUPBY operation enables a user to apply various aggregation operations to heterogeneous data to collect data across multiple records to group the results by one or more columns, on which machine learning algorithms may be run.
- this approach has the undesirable disadvantage of physically redistributing the data and the associated problems previously mentioned.
- the problem is providing an approach for analyzing data in a database to obtain models for generalization by executing a machine learning or other analytic algorithm on table data with grouping on sets of columns to obtain as output multiple models where each model corresponds to the aggregated data belonging to a specific value of the combination of grouped columns.
- FIG. 3 illustrates an approach of the invention to address the problem of developing multiple machine learning models corresponding to multiple groups in data without the necessity of redistributing the based upon grouping columns and then applying the algorithms to the redistributed grouped data as is required by known approaches.
- FIG. 3 shows a data store 300 that contains data instances having four different data groups G 1 -G 4 dispersed throughout the data store.
- the invention may run a machine learning or other analytic algorithm 310 , e.g., regression, for each of the four data groups, directly on the unsorted data in the data store without first redistributing the data.
- Four separate regression algorithms may be run in for the four groups on the unsorted data.
- the algorithm selects the first instance of data group G 1 in the data store at 312 . Then the algorithm may jump ahead to select the next instance of G 1 at 314 , skipping the intermediate data instances G 4 and G 2 at 316 and 318 , respectively, and continue onto instances of G 1 at 320 and 322 .
- the first regression on data group G 1 produces a first model M 1 , as indicated, which may be stored in auxiliary storage 324 .
- regressions are also run on each of the other data groups G 2 , G 3 and G 4 in the data store to produce and store corresponding models M 2 , M 3 and M 4 , respectively.
- the invention avoids the necessity for explicit grouping in which a user must first distribute the data based upon grouping columns before applying the algorithm. Instead, the invention employs a novel approach of implicit grouping by using the SQL GROUPBY operation to select from the unsorted data instances each data instance containing a particular data group, and to run analytics on that particular group. This process is performed for each of the different groups and on each database segment and the resulting models are stored in auxiliary storage.
- the invention enables iterative algorithms to be run for each group.
- Each iteration uses the stored models in the auxiliary storage from the previous iteration.
- Each iteration updates and refines the stored models and the updated models are used in a subsequent iteration to improve the solution.
- the models the database segments may be merged into a final model.
- FIG. 4 is a diagrammatic view of the architecture of a database system that is enhanced in accordance with the invention to perform analytic and machine learning processing.
- the database system may comprise a generally known general SQL data store 410 comprising relational database management system (RDBMS) query processing 412 , for example, conventional PostgresSQL, and RDBMS built-in functions 414 .
- RDBMS relational database management system
- the system may include on top of the conventional SQL layer 412 two or more processing layers written in a programming language such as C++.
- These may include a low level abstraction layer 416 that may provide matrix operations and a C++ to RDBMS bridge, and may include a function layer 418 that incorporates various machine learning and other analytic algorithms and statistical tools that may be used for analyzing the data in the database.
- the algorithms are preferably written in C++, which allows them to be easily updated and supplemented as needed.
- SQL may call the C++ functions within layers 416 and 418 to execute operations on the stored data in the database.
- the database system may further include a high-level abstraction layer 420 and a driver function layer 422 on top of the C++ layers.
- the high-level abstraction layer 420 may include a grouping controller and an iteration controller as well as other functions such as convex optimizers.
- the driver function layer 422 may include SQL template calls to the grouping controller in layer 420 for executing the loops of iterative algorithms (as will be described) and may include optimizer invocations.
- Layers 420 and 422 are preferably written in a programming language such as Python.
- a SQL user interface layer 424 may be included over the Python layers for controlling the operation of the database system.
- the invention saves the result models of each iteration and passes these on for use by the algorithm in a subsequent iteration.
- the invention provides the abstraction Python layers which are user accessible to implement iterative controllers.
- the Python layers perform the iterative calls and use SQL in each iteration to call the analytic functionality in the C++ layer. Additionally, they implement a grouping (aggregation) framework for temporary storage.
- the Python abstraction layer 420 builds an auxiliary storage in the data store, such as auxiliary storage 226 ( FIG.
- the Python layers call the SQL and C++ layers to execute an algorithm using an initial model selected for efficiency.
- the algorithm updates the model during the first iteration, stores the results back into the database in auxiliary storage, and keeps track of which iteration is currently proceeding.
- the Python abstraction layer determines how many iterations to perform and controls the C++ and SQL layers to perform those iterations.
- the results of the previous iteration which are stored in auxiliary storage are used as input to the algorithm, and the results of the subsequent iterations continually update the stored models.
- FIG. 5 is a flowchart that illustrates an embodiment of a process in accordance with the invention. The process is run concurrently on each segment of the database.
- the SQL GROUPBY operation may be used to select data instances, e.g., rows within the data store of a segment to form the groups on which a machine learning or other analytic algorithm is to be run.
- a group may comprise data instances selected from among the various data instances in the data store that have data elements with a common characteristic or attribute value.
- the common characteristic may be the value of the data element of a particular column, e.g., gender, in the data instances, or it may be a combination of several columns in a data set.
- a starting model of the algorithm is initialized.
- the model may be the parameters or coefficients of the algorithm, and the initial model is preferably selected so that an initial iteration algorithm runs efficiently.
- an iteration of the algorithm may be run sequentially on all of the different groups of each segment, and the results of the iteration comprising a model for each group are collected at 516 and saved into auxiliary storage in a database 520 .
- the first iteration of the algorithm is run with the initial model input at 512 . Subsequent iterations of the algorithm will use an updated model corresponding to the model produced by the preceding iteration.
- the current model from the database that corresponds to the model for a group saved at 516 from the preceding iteration is looped back to the algorithm at 514 and another iteration of the algorithm is run on the data groups using that current model.
- Each iteration of the algorithm updates and refines the models to afford increasingly more accurate results.
- a counter may be implemented in a decision step 524 that counts the number of iterations, and the algorithm iteration loop may be repeated for a predetermined number of times (N) for each group, or until the iteration results of the groups converges, i.e., the iterations cease to improve the solution. Since the groups do not necessarily have the same number of instances, the number N of iterations will likely be different for the different groups.
- one group may have 200 instances, another may have 2000 instances, and yet another may have 20 instances.
- the process groups the models from the database 520 at 530 , and returns all of the models for the groups to a user at 528 .
- new data instances may be added to the data store, they may be analyzed using the same algorithm and the results merged with the stored models in the database to update and refine the stored models.
- the stored models may be used to predict future results.
- the invention affords a powerful and efficient approach to analyzing data sets directly in a distributed database using iterative machine learning and analytic algorithms in ways not possible with current databases.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- Evolutionary Computation (AREA)
- Probability & Statistics with Applications (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Fuzzy Systems (AREA)
- Medical Informatics (AREA)
- Computational Mathematics (AREA)
- Pure & Applied Mathematics (AREA)
- Mathematical Optimization (AREA)
- Mathematical Analysis (AREA)
- Algebra (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
- This application is a continuation of co-pending U.S. patent application Ser. No. 13/931,876, entitled A FRAMEWORK FOR CALCULATING GROUPED OPTIMIZATION ALGORITHMS WITHIN A DISTRIBUTED DATA STORE filed Jun. 29, 2013 which is incorporated herein by reference for all purposes.
- This application relates generally to analyzing data using machine learning algorithms to develop prediction models for generalization, and more particularly for applying iterative machine learning and other analytic algorithms directly on grouped data instances in databases.
- Companies and other enterprises store large amounts of data, generally in large distributed data stores (databases), and the successful ones use the data to their advantage. The data are not simply facts such as sales and transactional data. Rather, the data may comprise all relevant information within the purview of a company which the company may acquire, explore, analyze and manipulate while searching for facts and insights that can lead to new business opportunities and leverage for its strategies. For instance, an airline company may have a great deal of data about ticket purchases and sometimes even about traveling customers, but this information in and of itself does not permit an understanding of customer behavior or answer questions such as their motivations behind ticket purchases, and does not afford the company the insight to make predictions that take advantage of this motivation. To accomplish this, the company may need to run various analytics and machine learning algorithms (processes) on its data to derive models which can provide insight into the data and afford generalization.
- Database systems typically store data in data structures such as tables, and use query languages such as Structured Query Language (SQL) and the like for storing, manipulating, and accessing the data. Unfortunately, except for rather simplistic analytics such as max, min, average, sum, etc., SQL and other query languages cannot perform more complex analytics on data or run machine learning algorithms such as regression, classification, etc., which attempt to make predictions based upon generalizations from representations of data instances. Moreover, most machine learning algorithms require iteration on data, which SQL cannot do. This means that such analytics must be run by other programs and processes that may not operate within the database or interface well with SQL.
- Moreover, since data is typically stored in a database by mixing together and storing a variety of data elements having different parameters and values, it may be necessary to redistribute the data to group common elements together for analysis. While data may be redistributed using a SQL GROUPBY operation, data redistribution is expensive and undesirable. It is time-consuming and it requires physically moving data around which has high overhead and the risk of data loss or corruption.
- As a result, there are not available convenient, easy to use approaches for safely and efficiently running data analytics and machine learning algorithms on stored data within a database to derive models that characterize the data and afford insight into the factors underlying the data to permit generalization and predictions.
- It is desirable to provide systems and methods that enable various analytic and machine learning processes to be applied directly to groups of data within a distributed database, without the necessity of redistribution of the data, in order to analyze the data and derive models that created the data and which can be used for generalizations and predictions. It is to these ends that the present invention is directed.
- Various embodiments of the invention are disclosed in the following detailed description and the accompanying drawings.
-
FIG. 1 is a diagrammatic view of the architecture of a distributed database system with which the invention may be used; -
FIG. 2 illustrates a preferred embodiment of a node of the database system ofFIG. 1 ; -
FIG. 3 is a diagrammatic view that illustrates generally a process in accordance with the invention; -
FIG. 4 is a diagrammatic view illustrating the architecture of a preferred embodiment of a system in accordance with the invention; and -
FIG. 5 is diagrammatic view of a process in accordance with a preferred embodiment of the invention. - This invention is particularly well adapted for use with a large distributed relational database system such as a massively parallel processor (MPP) shared-nothing database system used for data warehousing or transaction processing, and will be described in that context. It will be appreciated, however, that this is illustrative of only one utility of the invention and that the invention has applicability to other types of data storage systems and methods.
-
FIG. 1 is an overview of the architecture of one type ofdistributed database system 100 with which the invention may be employed, the database system shown being a massively parallel processor (MPP) shared nothing distributed database system. The database system may include amaster node 102 which connects to a plurality of segment nodes 104_A through 104_N. Each segment node may comprise one or more database (DB) instances. The master and segment nodes may comprise generally similar server systems and have similar architectures and process models. The nodes operate together to process user data requests (queries) and return results, and to perform other user directed processing operations, such as running analytics and machine learning algorithms, as will be described. The segments 104_1-104_N work together under the direction of themaster 102 to process workloads. -
FIG. 2 illustrates an example of the architecture of amaster node 202 of the database system that is configured to perform processes and operations in accordance with the invention. The master node and the segment nodes may have substantially similar architectures, as stated above. Themaster node 202 may comprise a host computer server system (which may comprise asingle CPU 210 or which may be a multi-processor system comprising a plurality of CPUs) connected to input/output (I/O)devices 212 by acommunications bus 214. The I/O devices may be standard computer system input and output devices. Anetwork interface module 216 may also be connected tobus 214 to allow the master node to operate in a networked environment. The master node may further havestorage 220 comprising non-transitory physical storage media connected to the bus that embodies executable instructions, such as an operating system and application programs, to control the operations of the computer system.Storage 220 may include amain memory 222 embodying control programs that control the CPU to operate in accordance with the invention, as will be described, and may containother storage 224 including a data store that stores system configuration information, logs, applications and utilities, data andauxiliary storage 226 for storing the results of analytic processing algorithms, for instance, as will be described. - The
master 202, as will be described in more detail below, may be responsible for accepting queries from a client (user), planning queries, dispatching query plans to the segments for execution on the stored data in the distributed storage, and collecting the query results from the segments. The master may also accept directions from a user or other application programs to perform other processing operations, as will be described. In addition to interfacing the segment hosts to the master host, thenetwork interconnect module 216 may also communicate data, instructions and results between execution processes on the master and segments. - As will be described in more detail below, the invention affords systems and methods having a unique architecture that includes a library of machine learning and statistical tools (analytic algorithms) on top of a general data store with an additional layer of abstraction above this structure. The abstraction layer may control the algorithms to operate on data in the data store and provide an iterative and grouping framework for the algorithms for merging groups of data and data sets from the data stores of different segments. This architecture enables the analytic algorithms to run iteratively and directly on selected grouped data instances in the data stores that have one or more common elements of interest, and to produce models corresponding to aggregated data values from which inferences and predictions (generalizations) can be made. By operating directly on selected data instances in the data stores, the invention avoids the necessity of data redistribution, and the iterative framework enables iterative analytic algorithms such as logistic regression and classification to be used for analyzing the data. Moreover, the architecture allows new analytic algorithms to be easily added to the library and existing algorithms to be readily changed to upgrade the functionality of the system.
- In order to facilitate an understanding of the invention, consider heterogeneous table data in a database comprising thousands of data instances (rows) having a plurality of attributes (columns), and the problem of running a machine learning algorithm, e.g., regression, on similar sets of data in the data store, i.e., on selected rows based upon the similarities between specific values of attributes, in order to generate a model which can be used to make predictions using the data. This requires running the algorithm on selected rows and a combination of grouped sets of columns having specific values of interest. One approach is to redistribute the data into sets by grouping rows of data using the SQL GROUPBY operation. The GROUPBY operation enables a user to apply various aggregation operations to heterogeneous data to collect data across multiple records to group the results by one or more columns, on which machine learning algorithms may be run. However, this approach has the undesirable disadvantage of physically redistributing the data and the associated problems previously mentioned.
- The following illustrates an example of an application for the invention. Consider the following portion of a publicly available data set shown in Table 1 that relates the age of abalone shellfish to different measurable physical attributes.
-
TABLE 1 (Abalone Shellfish Dataset) Id Gender Length Diameter Height Whole Shucked Viscera Shell Age 1 M 0.35 0.265 0.09 0.2255 0.0995 0.0485 0.07 7 2 M 0.44 0.365 0.125 0.516 0.2155 0.114 0.155 10 3 M 0.425 0.3 0.095 0.3515 0.141 0.0775 0.12 8 4 F 0.545 0.425 0.125 0.768 0.294 0.1495 0.26 16 5 F 0.55 0.44 0.15 0.8945 0.3145 0.151 0.32 19 - Users could determine the age of a particular fish by opening it up and counting rings inside the shell. However, they may wish to use the data in the Table to develop a model to predict age based upon gender and one or more of the physical attributes without having to destroy the fish. To generate models, a user may run a linear regression (or some other machine learning algorithm) on the data in the Table for all instances where “gender=M”, and another where “gender=F”.
- Stated more generally, the problem is providing an approach for analyzing data in a database to obtain models for generalization by executing a machine learning or other analytic algorithm on table data with grouping on sets of columns to obtain as output multiple models where each model corresponds to the aggregated data belonging to a specific value of the combination of grouped columns.
-
FIG. 3 illustrates an approach of the invention to address the problem of developing multiple machine learning models corresponding to multiple groups in data without the necessity of redistributing the based upon grouping columns and then applying the algorithms to the redistributed grouped data as is required by known approaches.FIG. 3 shows adata store 300 that contains data instances having four different data groups G1-G4 dispersed throughout the data store. As will be described more fully, in an embodiment the invention may run a machine learning or otheranalytic algorithm 310, e.g., regression, for each of the four data groups, directly on the unsorted data in the data store without first redistributing the data. Four separate regression algorithms may be run in for the four groups on the unsorted data. For the first regression on the first data group G1, the algorithm selects the first instance of data group G1 in the data store at 312. Then the algorithm may jump ahead to select the next instance of G1 at 314, skipping the intermediate data instances G4 and G2 at 316 and 318, respectively, and continue onto instances of G1 at 320 and 322. The first regression on data group G1 produces a first model M1, as indicated, which may be stored inauxiliary storage 324. Similarly, regressions are also run on each of the other data groups G2, G3 and G4 in the data store to produce and store corresponding models M2, M3 and M4, respectively. This generates four machine learning models M1-M4 for data groups G1-G4, respectively. The invention avoids the necessity for explicit grouping in which a user must first distribute the data based upon grouping columns before applying the algorithm. Instead, the invention employs a novel approach of implicit grouping by using the SQL GROUPBY operation to select from the unsorted data instances each data instance containing a particular data group, and to run analytics on that particular group. This process is performed for each of the different groups and on each database segment and the resulting models are stored in auxiliary storage. - As will be described in more detail below, the invention enables iterative algorithms to be run for each group. Each iteration uses the stored models in the auxiliary storage from the previous iteration. Each iteration updates and refines the stored models and the updated models are used in a subsequent iteration to improve the solution. After a predetermined number of iterations, or when the model for a particular group no longer improves results, the models the database segments may be merged into a final model.
-
FIG. 4 is a diagrammatic view of the architecture of a database system that is enhanced in accordance with the invention to perform analytic and machine learning processing. As shown in figure, the database system may comprise a generally known generalSQL data store 410 comprising relational database management system (RDBMS)query processing 412, for example, conventional PostgresSQL, and RDBMS built-infunctions 414. The system may include on top of theconventional SQL layer 412 two or more processing layers written in a programming language such as C++. These may include a lowlevel abstraction layer 416 that may provide matrix operations and a C++ to RDBMS bridge, and may include afunction layer 418 that incorporates various machine learning and other analytic algorithms and statistical tools that may be used for analyzing the data in the database. The algorithms are preferably written in C++, which allows them to be easily updated and supplemented as needed. SQL may call the C++ functions withinlayers level abstraction layer 420 and adriver function layer 422 on top of the C++ layers. The high-level abstraction layer 420 may include a grouping controller and an iteration controller as well as other functions such as convex optimizers. Thedriver function layer 422 may include SQL template calls to the grouping controller inlayer 420 for executing the loops of iterative algorithms (as will be described) and may include optimizer invocations.Layers user interface layer 424 may be included over the Python layers for controlling the operation of the database system. - Most machine learning algorithms are iterative. As described above in connection with
FIG. 3 and as will be described in more detail in connection withFIG. 5 , the invention saves the result models of each iteration and passes these on for use by the algorithm in a subsequent iteration. However, there is no functionality in normal SQL for performing iterations or for handling iterative processes. Accordingly, the invention provides the abstraction Python layers which are user accessible to implement iterative controllers. The Python layers perform the iterative calls and use SQL in each iteration to call the analytic functionality in the C++ layer. Additionally, they implement a grouping (aggregation) framework for temporary storage. ThePython abstraction layer 420 builds an auxiliary storage in the data store, such as auxiliary storage 226 (FIG. 2 ), for containing multiple models, one for each value of the sets of selected grouping columns. The algorithm runs directly on the selected groups in each segment, and updates each model concurrently upon each iteration. This does not require that the data groups be distributed to parallel nodes. This allows the creation of groups on-the-fly without the need to redistribute data which saves significant time and effort. - In effect, the Python layers call the SQL and C++ layers to execute an algorithm using an initial model selected for efficiency. The algorithm updates the model during the first iteration, stores the results back into the database in auxiliary storage, and keeps track of which iteration is currently proceeding. The Python abstraction layer determines how many iterations to perform and controls the C++ and SQL layers to perform those iterations. On each subsequent iteration of the algorithm, the results of the previous iteration which are stored in auxiliary storage are used as input to the algorithm, and the results of the subsequent iterations continually update the stored models.
-
FIG. 5 is a flowchart that illustrates an embodiment of a process in accordance with the invention. The process is run concurrently on each segment of the database. Beginning at 510, the SQL GROUPBY operation may be used to select data instances, e.g., rows within the data store of a segment to form the groups on which a machine learning or other analytic algorithm is to be run. A group may comprise data instances selected from among the various data instances in the data store that have data elements with a common characteristic or attribute value. The common characteristic may be the value of the data element of a particular column, e.g., gender, in the data instances, or it may be a combination of several columns in a data set. At 512 a starting model of the algorithm is initialized. The model may be the parameters or coefficients of the algorithm, and the initial model is preferably selected so that an initial iteration algorithm runs efficiently. At 514, an iteration of the algorithm may be run sequentially on all of the different groups of each segment, and the results of the iteration comprising a model for each group are collected at 516 and saved into auxiliary storage in adatabase 520. The first iteration of the algorithm is run with the initial model input at 512. Subsequent iterations of the algorithm will use an updated model corresponding to the model produced by the preceding iteration. Thus, at 522, the current model from the database that corresponds to the model for a group saved at 516 from the preceding iteration is looped back to the algorithm at 514 and another iteration of the algorithm is run on the data groups using that current model. Each iteration of the algorithm updates and refines the models to afford increasingly more accurate results. A counter may be implemented in adecision step 524 that counts the number of iterations, and the algorithm iteration loop may be repeated for a predetermined number of times (N) for each group, or until the iteration results of the groups converges, i.e., the iterations cease to improve the solution. Since the groups do not necessarily have the same number of instances, the number N of iterations will likely be different for the different groups. For example, one group may have 200 instances, another may have 2000 instances, and yet another may have 20 instances. Once the number N for a group is reached or the groups converge, the process groups the models from thedatabase 520 at 530, and returns all of the models for the groups to a user at 528. As new data instances are added to the data store, they may be analyzed using the same algorithm and the results merged with the stored models in the database to update and refine the stored models. The stored models may be used to predict future results. - As may be appreciated from the foregoing, the invention affords a powerful and efficient approach to analyzing data sets directly in a distributed database using iterative machine learning and analytic algorithms in ways not possible with current databases.
- While the foregoing has been with respect to preferred embodiments of the invention, it will be appreciated that changes to these embodiments may be made without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims.
Claims (27)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US15/074,063 US20160203409A1 (en) | 2013-06-29 | 2016-03-18 | Framework for calculating grouped optimization algorithms within a distributed data store |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US13/931,876 US9324036B1 (en) | 2013-06-29 | 2013-06-29 | Framework for calculating grouped optimization algorithms within a distributed data store |
US15/074,063 US20160203409A1 (en) | 2013-06-29 | 2016-03-18 | Framework for calculating grouped optimization algorithms within a distributed data store |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/931,876 Continuation US9324036B1 (en) | 2013-06-29 | 2013-06-29 | Framework for calculating grouped optimization algorithms within a distributed data store |
Publications (1)
Publication Number | Publication Date |
---|---|
US20160203409A1 true US20160203409A1 (en) | 2016-07-14 |
Family
ID=55754698
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/931,876 Active 2034-02-18 US9324036B1 (en) | 2013-06-29 | 2013-06-29 | Framework for calculating grouped optimization algorithms within a distributed data store |
US15/074,063 Abandoned US20160203409A1 (en) | 2013-06-29 | 2016-03-18 | Framework for calculating grouped optimization algorithms within a distributed data store |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/931,876 Active 2034-02-18 US9324036B1 (en) | 2013-06-29 | 2013-06-29 | Framework for calculating grouped optimization algorithms within a distributed data store |
Country Status (1)
Country | Link |
---|---|
US (2) | US9324036B1 (en) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP3107040A1 (en) * | 2015-06-19 | 2016-12-21 | Tata Consultancy Services Limited | Assurance-enabled linde buzo gray (albg) data clustering based segmentation |
US10642831B2 (en) | 2015-10-23 | 2020-05-05 | Oracle International Corporation | Static data caching for queries with a clause that requires multiple iterations to execute |
US10678792B2 (en) | 2015-10-23 | 2020-06-09 | Oracle International Corporation | Parallel execution of queries with a recursive clause |
US10783142B2 (en) * | 2015-10-23 | 2020-09-22 | Oracle International Corporation | Efficient data retrieval in staged use of in-memory cursor duration temporary tables |
CN113268494B (en) * | 2021-05-24 | 2023-06-02 | 中国联合网络通信集团有限公司 | Method and device for processing database statements to be optimized |
Family Cites Families (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030041042A1 (en) * | 2001-08-22 | 2003-02-27 | Insyst Ltd | Method and apparatus for knowledge-driven data mining used for predictions |
US7451065B2 (en) * | 2002-03-11 | 2008-11-11 | International Business Machines Corporation | Method for constructing segmentation-based predictive models |
US7283982B2 (en) * | 2003-12-05 | 2007-10-16 | International Business Machines Corporation | Method and structure for transform regression |
US8438122B1 (en) * | 2010-05-14 | 2013-05-07 | Google Inc. | Predictive analytic modeling platform |
US8521664B1 (en) * | 2010-05-14 | 2013-08-27 | Google Inc. | Predictive analytical model matching |
US8595154B2 (en) * | 2011-01-26 | 2013-11-26 | Google Inc. | Dynamic predictive modeling platform |
US8626791B1 (en) * | 2011-06-14 | 2014-01-07 | Google Inc. | Predictive model caching |
US20140279784A1 (en) * | 2013-03-14 | 2014-09-18 | Kxen, Inc. | Partial predictive modeling |
-
2013
- 2013-06-29 US US13/931,876 patent/US9324036B1/en active Active
-
2016
- 2016-03-18 US US15/074,063 patent/US20160203409A1/en not_active Abandoned
Also Published As
Publication number | Publication date |
---|---|
US9324036B1 (en) | 2016-04-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11727001B2 (en) | Optimized data structures of a relational cache with a learning capability for accelerating query execution by a data system | |
US11036735B2 (en) | Dimension context propagation techniques for optimizing SQL query plans | |
US20220138199A1 (en) | Automated provisioning for database performance | |
Jankov et al. | Declarative recursive computation on an rdbms, or, why you should use a database for distributed machine learning | |
US11645294B2 (en) | Interactive identification of similar SQL queries | |
US11120361B1 (en) | Training data routing and prediction ensembling at time series prediction system | |
US9411853B1 (en) | In-memory aggregation system and method of multidimensional data processing for enhancing speed and scalability | |
US20190034485A1 (en) | System and method for optimizing large database management systems with multiple optimizers | |
US20190384759A1 (en) | Search Integration | |
US10706052B2 (en) | Method for performing in-memory hash join processing in relational database systems | |
US11243958B2 (en) | Implementing contract-based polymorphic and parallelizable SQL user-defined scalar and aggregate functions | |
Akdere et al. | The Case for Predictive Database Systems: Opportunities and Challenges. | |
US20160162521A1 (en) | Systems and Methods for Data Ingest in Interest-Driven Business Intelligence Systems | |
US9558240B2 (en) | Extending relational algebra for data management | |
US20160203409A1 (en) | Framework for calculating grouped optimization algorithms within a distributed data store | |
US20180336062A1 (en) | Dynamic parallelization of a calculation process | |
US20240221039A1 (en) | Auto-mated price performance offers for cloud database systems | |
US20230418824A1 (en) | Workload-aware column inprints | |
Damasio et al. | Guided automated learning for query workload re-optimization | |
US20240256426A1 (en) | Runtime error attribution for database queries specified using a declarative database query language | |
US20250165452A1 (en) | Relationship analysis using vector representations of database tables | |
Vrbić | DATA MINING AND CLOUD COMPUTING. | |
Chao-Qiang et al. | RDDShare: reusing results of spark RDD | |
Kougka et al. | Declarative expression and optimization of data-intensive flows | |
US20250200043A1 (en) | Index join query optimization |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: EMC CORPORATION, MASSACHUSETTS Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:IYER, RAHUL;QIAN, HAI;YANG, SHENGWEN;AND OTHERS;REEL/FRAME:038029/0130 Effective date: 20130625 |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
AS | Assignment |
Owner name: CREDIT SUISSE AG, CAYMAN ISLANDS BRANCH, AS COLLATERAL AGENT, NORTH CAROLINA Free format text: SECURITY AGREEMENT;ASSIGNORS:ASAP SOFTWARE EXPRESS, INC.;AVENTAIL LLC;CREDANT TECHNOLOGIES, INC.;AND OTHERS;REEL/FRAME:040134/0001 Effective date: 20160907 Owner name: THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS NOTES COLLATERAL AGENT, TEXAS Free format text: SECURITY AGREEMENT;ASSIGNORS:ASAP SOFTWARE EXPRESS, INC.;AVENTAIL LLC;CREDANT TECHNOLOGIES, INC.;AND OTHERS;REEL/FRAME:040136/0001 Effective date: 20160907 Owner name: CREDIT SUISSE AG, CAYMAN ISLANDS BRANCH, AS COLLAT Free format text: SECURITY AGREEMENT;ASSIGNORS:ASAP SOFTWARE EXPRESS, INC.;AVENTAIL LLC;CREDANT TECHNOLOGIES, INC.;AND OTHERS;REEL/FRAME:040134/0001 Effective date: 20160907 Owner name: THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., A Free format text: SECURITY AGREEMENT;ASSIGNORS:ASAP SOFTWARE EXPRESS, INC.;AVENTAIL LLC;CREDANT TECHNOLOGIES, INC.;AND OTHERS;REEL/FRAME:040136/0001 Effective date: 20160907 |
|
AS | Assignment |
Owner name: EMC IP HOLDING COMPANY LLC, MASSACHUSETTS Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:EMC CORPORATION;REEL/FRAME:040203/0001 Effective date: 20160906 |
|
AS | Assignment |
Owner name: THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., T Free format text: SECURITY AGREEMENT;ASSIGNORS:CREDANT TECHNOLOGIES, INC.;DELL INTERNATIONAL L.L.C.;DELL MARKETING L.P.;AND OTHERS;REEL/FRAME:049452/0223 Effective date: 20190320 Owner name: THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., TEXAS Free format text: SECURITY AGREEMENT;ASSIGNORS:CREDANT TECHNOLOGIES, INC.;DELL INTERNATIONAL L.L.C.;DELL MARKETING L.P.;AND OTHERS;REEL/FRAME:049452/0223 Effective date: 20190320 |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
AS | Assignment |
Owner name: THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., TEXAS Free format text: SECURITY AGREEMENT;ASSIGNORS:CREDANT TECHNOLOGIES INC.;DELL INTERNATIONAL L.L.C.;DELL MARKETING L.P.;AND OTHERS;REEL/FRAME:053546/0001 Effective date: 20200409 |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |
|
AS | Assignment |
Owner name: WYSE TECHNOLOGY L.L.C., CALIFORNIA Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:CREDIT SUISSE AG, CAYMAN ISLANDS BRANCH;REEL/FRAME:058216/0001 Effective date: 20211101 Owner name: SCALEIO LLC, MASSACHUSETTS Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:CREDIT SUISSE AG, CAYMAN ISLANDS BRANCH;REEL/FRAME:058216/0001 Effective date: 20211101 Owner name: MOZY, INC., WASHINGTON Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:CREDIT SUISSE AG, CAYMAN ISLANDS BRANCH;REEL/FRAME:058216/0001 Effective date: 20211101 Owner name: MAGINATICS LLC, CALIFORNIA Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:CREDIT SUISSE AG, CAYMAN ISLANDS BRANCH;REEL/FRAME:058216/0001 Effective date: 20211101 Owner name: FORCE10 NETWORKS, INC., CALIFORNIA Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:CREDIT SUISSE AG, CAYMAN ISLANDS BRANCH;REEL/FRAME:058216/0001 Effective date: 20211101 Owner name: EMC IP HOLDING COMPANY LLC, TEXAS Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:CREDIT SUISSE AG, CAYMAN ISLANDS BRANCH;REEL/FRAME:058216/0001 Effective date: 20211101 Owner name: EMC CORPORATION, MASSACHUSETTS Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:CREDIT SUISSE AG, CAYMAN ISLANDS BRANCH;REEL/FRAME:058216/0001 Effective date: 20211101 Owner name: DELL SYSTEMS CORPORATION, TEXAS Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:CREDIT SUISSE AG, CAYMAN ISLANDS BRANCH;REEL/FRAME:058216/0001 Effective date: 20211101 Owner name: DELL SOFTWARE INC., CALIFORNIA Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:CREDIT SUISSE AG, CAYMAN ISLANDS BRANCH;REEL/FRAME:058216/0001 Effective date: 20211101 Owner name: DELL PRODUCTS L.P., TEXAS Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:CREDIT SUISSE AG, CAYMAN ISLANDS BRANCH;REEL/FRAME:058216/0001 Effective date: 20211101 Owner name: DELL MARKETING L.P., TEXAS Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:CREDIT SUISSE AG, CAYMAN ISLANDS BRANCH;REEL/FRAME:058216/0001 Effective date: 20211101 Owner name: DELL INTERNATIONAL, L.L.C., TEXAS Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:CREDIT SUISSE AG, CAYMAN ISLANDS BRANCH;REEL/FRAME:058216/0001 Effective date: 20211101 Owner name: DELL USA L.P., TEXAS Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:CREDIT SUISSE AG, CAYMAN ISLANDS BRANCH;REEL/FRAME:058216/0001 Effective date: 20211101 Owner name: CREDANT TECHNOLOGIES, INC., TEXAS Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:CREDIT SUISSE AG, CAYMAN ISLANDS BRANCH;REEL/FRAME:058216/0001 Effective date: 20211101 Owner name: AVENTAIL LLC, CALIFORNIA Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:CREDIT SUISSE AG, CAYMAN ISLANDS BRANCH;REEL/FRAME:058216/0001 Effective date: 20211101 Owner name: ASAP SOFTWARE EXPRESS, INC., ILLINOIS Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:CREDIT SUISSE AG, CAYMAN ISLANDS BRANCH;REEL/FRAME:058216/0001 Effective date: 20211101 |
|
AS | Assignment |
Owner name: SCALEIO LLC, MASSACHUSETTS Free format text: RELEASE OF SECURITY INTEREST IN PATENTS PREVIOUSLY RECORDED AT REEL/FRAME (040136/0001);ASSIGNOR:THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS NOTES COLLATERAL AGENT;REEL/FRAME:061324/0001 Effective date: 20220329 Owner name: EMC IP HOLDING COMPANY LLC (ON BEHALF OF ITSELF AND AS SUCCESSOR-IN-INTEREST TO MOZY, INC.), TEXAS Free format text: RELEASE OF SECURITY INTEREST IN PATENTS PREVIOUSLY RECORDED AT REEL/FRAME (040136/0001);ASSIGNOR:THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS NOTES COLLATERAL AGENT;REEL/FRAME:061324/0001 Effective date: 20220329 Owner name: EMC CORPORATION (ON BEHALF OF ITSELF AND AS SUCCESSOR-IN-INTEREST TO MAGINATICS LLC), MASSACHUSETTS Free format text: RELEASE OF SECURITY INTEREST IN PATENTS PREVIOUSLY RECORDED AT REEL/FRAME (040136/0001);ASSIGNOR:THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS NOTES COLLATERAL AGENT;REEL/FRAME:061324/0001 Effective date: 20220329 Owner name: DELL MARKETING CORPORATION (SUCCESSOR-IN-INTEREST TO FORCE10 NETWORKS, INC. AND WYSE TECHNOLOGY L.L.C.), TEXAS Free format text: RELEASE OF SECURITY INTEREST IN PATENTS PREVIOUSLY RECORDED AT REEL/FRAME (040136/0001);ASSIGNOR:THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS NOTES COLLATERAL AGENT;REEL/FRAME:061324/0001 Effective date: 20220329 Owner name: DELL PRODUCTS L.P., TEXAS Free format text: RELEASE OF SECURITY INTEREST IN PATENTS PREVIOUSLY RECORDED AT REEL/FRAME (040136/0001);ASSIGNOR:THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS NOTES COLLATERAL AGENT;REEL/FRAME:061324/0001 Effective date: 20220329 Owner name: DELL INTERNATIONAL L.L.C., TEXAS Free format text: RELEASE OF SECURITY INTEREST IN PATENTS PREVIOUSLY RECORDED AT REEL/FRAME (040136/0001);ASSIGNOR:THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS NOTES COLLATERAL AGENT;REEL/FRAME:061324/0001 Effective date: 20220329 Owner name: DELL USA L.P., TEXAS Free format text: RELEASE OF SECURITY INTEREST IN PATENTS PREVIOUSLY RECORDED AT REEL/FRAME (040136/0001);ASSIGNOR:THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS NOTES COLLATERAL AGENT;REEL/FRAME:061324/0001 Effective date: 20220329 Owner name: DELL MARKETING L.P. (ON BEHALF OF ITSELF AND AS SUCCESSOR-IN-INTEREST TO CREDANT TECHNOLOGIES, INC.), TEXAS Free format text: RELEASE OF SECURITY INTEREST IN PATENTS PREVIOUSLY RECORDED AT REEL/FRAME (040136/0001);ASSIGNOR:THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS NOTES COLLATERAL AGENT;REEL/FRAME:061324/0001 Effective date: 20220329 Owner name: DELL MARKETING CORPORATION (SUCCESSOR-IN-INTEREST TO ASAP SOFTWARE EXPRESS, INC.), TEXAS Free format text: RELEASE OF SECURITY INTEREST IN PATENTS PREVIOUSLY RECORDED AT REEL/FRAME (040136/0001);ASSIGNOR:THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS NOTES COLLATERAL AGENT;REEL/FRAME:061324/0001 Effective date: 20220329 |
|
AS | Assignment |
Owner name: SCALEIO LLC, MASSACHUSETTS Free format text: RELEASE OF SECURITY INTEREST IN PATENTS PREVIOUSLY RECORDED AT REEL/FRAME (045455/0001);ASSIGNOR:THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS NOTES COLLATERAL AGENT;REEL/FRAME:061753/0001 Effective date: 20220329 Owner name: EMC IP HOLDING COMPANY LLC (ON BEHALF OF ITSELF AND AS SUCCESSOR-IN-INTEREST TO MOZY, INC.), TEXAS Free format text: RELEASE OF SECURITY INTEREST IN PATENTS PREVIOUSLY RECORDED AT REEL/FRAME (045455/0001);ASSIGNOR:THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS NOTES COLLATERAL AGENT;REEL/FRAME:061753/0001 Effective date: 20220329 Owner name: EMC CORPORATION (ON BEHALF OF ITSELF AND AS SUCCESSOR-IN-INTEREST TO MAGINATICS LLC), MASSACHUSETTS Free format text: RELEASE OF SECURITY INTEREST IN PATENTS PREVIOUSLY RECORDED AT REEL/FRAME (045455/0001);ASSIGNOR:THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS NOTES COLLATERAL AGENT;REEL/FRAME:061753/0001 Effective date: 20220329 Owner name: DELL MARKETING CORPORATION (SUCCESSOR-IN-INTEREST TO FORCE10 NETWORKS, INC. AND WYSE TECHNOLOGY L.L.C.), TEXAS Free format text: RELEASE OF SECURITY INTEREST IN PATENTS PREVIOUSLY RECORDED AT REEL/FRAME (045455/0001);ASSIGNOR:THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS NOTES COLLATERAL AGENT;REEL/FRAME:061753/0001 Effective date: 20220329 Owner name: DELL PRODUCTS L.P., TEXAS Free format text: RELEASE OF SECURITY INTEREST IN PATENTS PREVIOUSLY RECORDED AT REEL/FRAME (045455/0001);ASSIGNOR:THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS NOTES COLLATERAL AGENT;REEL/FRAME:061753/0001 Effective date: 20220329 Owner name: DELL INTERNATIONAL L.L.C., TEXAS Free format text: RELEASE OF SECURITY INTEREST IN PATENTS PREVIOUSLY RECORDED AT REEL/FRAME (045455/0001);ASSIGNOR:THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS NOTES COLLATERAL AGENT;REEL/FRAME:061753/0001 Effective date: 20220329 Owner name: DELL USA L.P., TEXAS Free format text: RELEASE OF SECURITY INTEREST IN PATENTS PREVIOUSLY RECORDED AT REEL/FRAME (045455/0001);ASSIGNOR:THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS NOTES COLLATERAL AGENT;REEL/FRAME:061753/0001 Effective date: 20220329 Owner name: DELL MARKETING L.P. (ON BEHALF OF ITSELF AND AS SUCCESSOR-IN-INTEREST TO CREDANT TECHNOLOGIES, INC.), TEXAS Free format text: RELEASE OF SECURITY INTEREST IN PATENTS PREVIOUSLY RECORDED AT REEL/FRAME (045455/0001);ASSIGNOR:THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS NOTES COLLATERAL AGENT;REEL/FRAME:061753/0001 Effective date: 20220329 Owner name: DELL MARKETING CORPORATION (SUCCESSOR-IN-INTEREST TO ASAP SOFTWARE EXPRESS, INC.), TEXAS Free format text: RELEASE OF SECURITY INTEREST IN PATENTS PREVIOUSLY RECORDED AT REEL/FRAME (045455/0001);ASSIGNOR:THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS NOTES COLLATERAL AGENT;REEL/FRAME:061753/0001 Effective date: 20220329 |
|
AS | Assignment |
Owner name: DELL MARKETING L.P. (ON BEHALF OF ITSELF AND AS SUCCESSOR-IN-INTEREST TO CREDANT TECHNOLOGIES, INC.), TEXAS Free format text: RELEASE OF SECURITY INTEREST IN PATENTS PREVIOUSLY RECORDED AT REEL/FRAME (053546/0001);ASSIGNOR:THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS NOTES COLLATERAL AGENT;REEL/FRAME:071642/0001 Effective date: 20220329 Owner name: DELL INTERNATIONAL L.L.C., TEXAS Free format text: RELEASE OF SECURITY INTEREST IN PATENTS PREVIOUSLY RECORDED AT REEL/FRAME (053546/0001);ASSIGNOR:THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS NOTES COLLATERAL AGENT;REEL/FRAME:071642/0001 Effective date: 20220329 Owner name: DELL PRODUCTS L.P., TEXAS Free format text: RELEASE OF SECURITY INTEREST IN PATENTS PREVIOUSLY RECORDED AT REEL/FRAME (053546/0001);ASSIGNOR:THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS NOTES COLLATERAL AGENT;REEL/FRAME:071642/0001 Effective date: 20220329 Owner name: DELL USA L.P., TEXAS Free format text: RELEASE OF SECURITY INTEREST IN PATENTS PREVIOUSLY RECORDED AT REEL/FRAME (053546/0001);ASSIGNOR:THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS NOTES COLLATERAL AGENT;REEL/FRAME:071642/0001 Effective date: 20220329 Owner name: EMC CORPORATION, MASSACHUSETTS Free format text: RELEASE OF SECURITY INTEREST IN PATENTS PREVIOUSLY RECORDED AT REEL/FRAME (053546/0001);ASSIGNOR:THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS NOTES COLLATERAL AGENT;REEL/FRAME:071642/0001 Effective date: 20220329 Owner name: DELL MARKETING CORPORATION (SUCCESSOR-IN-INTEREST TO FORCE10 NETWORKS, INC. AND WYSE TECHNOLOGY L.L.C.), TEXAS Free format text: RELEASE OF SECURITY INTEREST IN PATENTS PREVIOUSLY RECORDED AT REEL/FRAME (053546/0001);ASSIGNOR:THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS NOTES COLLATERAL AGENT;REEL/FRAME:071642/0001 Effective date: 20220329 Owner name: EMC IP HOLDING COMPANY LLC, TEXAS Free format text: RELEASE OF SECURITY INTEREST IN PATENTS PREVIOUSLY RECORDED AT REEL/FRAME (053546/0001);ASSIGNOR:THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS NOTES COLLATERAL AGENT;REEL/FRAME:071642/0001 Effective date: 20220329 |