HK1186544A

HK1186544A - Network server system and related method for the same

Info

Publication number: HK1186544A
Application number: HK13113814.2A
Authority: HK
Inventors: H．维卡萨洛
Original assignee: 奥比融移动有限公司
Filing date: 2010-06-24
Publication date: 2014-03-14

Description

Network server assembly for processing non-parametric, multi-dimensional, spatial and temporal human behavior or pervasive measured technical observations and related methods

Technical Field

The present invention relates generally to wireless devices and communication networks. In particular, but not exclusively, the invention relates to processing and distributing data relating to observations performed in one or more mobile devices in a server-side system by means of hierarchical data processing activities and conversion of non-parametric data into parametric form, including the utilization of applicable techniques such as statistical filtering and semantic data structures.

Background

More and more data may be collected from mobile devices, such as mobile terminals (e.g., smartphones), and transaction feeds (transactionalfeeds) may be created based on associated observations. However, these feeds are not self-contained in an exhaustive or even sufficient characterization of the mobile device user in question, although they could admittedly tell some details about the relevant, e.g. transaction-oriented, time-dependent (point in time) and context (an event may be linked with attributes such as location or weather) event (such as the user's movements in the course of daily life).

Second, current databases and data processing solutions are not optimized when behavioral data or technical observations need to be processed, taking into account a number of factors (such as processing speed, memory requirements, or general availability of historical data and making it available for sophisticated further processing or statistical analysis).

Third, despite the fact that there is in principle a huge amount of available information about people's lives, contemporary systems unfortunately do not mostly consider the link between historical data/models and real-time data (i.e. real-world applications) and fail to confirm that their technical implementation is feasible given the widely available databases, storage and data processing hardware.

However, several prior art publication(s) still describe how to collect data points, locate a user, or make context data points available locally to other applications of the mobile device. For example, prior art publication WO2008118119 discloses a mobile device and method for: transmitting positioning data of the mobile device to a server at periodic intervals; automatically generating, in a mobile device, a current location profile (profile) associated with a current geographic location of the device in response to a server; simultaneously generating in the mobile device a set of neighbor profiles provided by the server as directions away from a current geographic location of the mobile device; and refreshing the current location profile and the set of neighbor profiles at the periodic intervals in the mobile device.

Despite the various prior art solutions for storing mobile device related events in view of the above, there is still room for improvement and a need for: how multidimensional data describing, in particular, human behavior can be stored and processed through a hierarchical mechanism such that: not only is performance optimized, or a complex analysis process enabled, but also more meaningful semantic indicators and profiles are generated from the data, and different abstraction layers are physically separated for both technical and legal reasons.

Disclosure of Invention

It is an object of the present invention to alleviate one or more of the aforementioned disadvantages of prior art solutions and preferably to meet the associated aforementioned needs.

This object is achieved by providing a more intelligent, more flexible and more adaptive alternative to data feeds for physically storing and technically analyzing (possibly continuously with a hierarchical approach) human behavior.

A server assembly according to embodiments of the present invention can be configured to receive and process observation data in a variety of coordinated ways, and the data can be further developed into an output that is understandable from the perspective of an observer and advantageously contains relationships that can even be used for predictive purposes. In various additional or alternative embodiments, it may be preferable that the metrics relating to the life of one or more users be generated with a feedback loop relating to data processing activity so as to enable the technical process to be calibrated continuously or according to the particular needs or requirements of the triggering condition. Various embodiments of the present invention enable determining how non-parametric data collected by a wireless device can be efficiently used in conjunction with, for example, the utilization of mobile services to build derived more abstract (higher-level) data entities (such as vectors describing the user's usage and lifestyle or technical elements surrounding the user). This information can be generated using multiple abstraction layers, facilitating nearly any type of further aggregation process, and physically saving the amount of storage capacity and the number of actions required when processing data. Some embodiments of the proposed solution may in fact be equipped to convert the original level of data into a higher level of data that may be used in various applications, including for example mobile advertising or network performance analysis/optimization. Further, the physical presence and (past) actions of the mobile user may be linked or compared in real time to patterns stored in a database based on previously received data. Further behavior of the user may be predicted. The solution may be optimized for different, possibly continuous, data streams containing non-parametric, multi-dimensional data (e.g., sensor data) received from wireless mobile devices and/or other applicable devices acting as data sources or data intermediaries.

Thus, in one aspect of the invention, a network server assembly comprises:

a data input entity configured to receive multidimensional non-parametric data (e.g., sensor data) obtained from a number of mobile devices (e.g., smartphones);

a processing entity configured to parameterize the multi-dimensional unparameterized data;

a memory entity configured to store parameterized data as multi-layer data, preferably on a plurality of different abstraction layers;

an aggregation engine configured to assign a number of aggregation and/or data modeling activities (such as time series operations, averaging operations and/or summing operations) to batches of said parameterized data (optionally with respect to a certain time period, a certain location, a certain mobile application (application) or application class, a certain mobile user and/or user group) in order to determine from the data batches a number of descriptive higher level behavioral indicators and/or technical indicators, the running of said indicators preferably being activated substantially at any particular moment when at least a predetermined sufficient amount of data or information becomes available or when a trigger is released; and

a data deriving entity, such as an API (application programming interface), configured to provide the number of behavior indicators and/or technical indicators, or information derived therefrom, to an external entity, such as a mobile marketing entity for selecting personalized advertisements for one or more mobile users, or to a network analysis or management entity for evaluating network performance and/or user experience and optionally enabling it to further optimize the performance and/or the user experience based on the evaluation, respectively.

The process of determining behavioral indicators may include various innovative items for ensuring smooth operation.

That is, in one embodiment, a generic ontology may be defined for stored processed data, which may be achieved by the data structuring features of the present invention that structure the received data into at least one specific data entity (e.g., a table) based on its content and/or dynamic attributes (e.g., location, user identification, or time), preferably adding process classification information to the data entity to facilitate later processing.

In another supplementary or alternative embodiment, the non-parametric input data that can be collected from one or several software modules running in the wireless device can be changed into more abundant, more structured and advantageously parametric data, and preferably at the same time several processes can be performed on the data that can be executed on the fly, which thereby relieves the load of possible other modules. This object is achieved by an entity configured to process incoming data streams before handing them over to a memory module.

In another supplemental or alternative embodiment, a dynamic, time-stamped vector can be determined that reflects the true behavior of a mobile user in a given number of dimensions, which can be achieved by utilizing an entity that generates a rich variety of predefined statistics, for example, by several scripts that process blocks of data in batches and apply advanced statistical techniques, processing activities, and/or other scripted actions to periodically generate user-level, time-stamped statistics. The vector advantageously facilitates a form of future simple conversion that includes transforming a behavior vector of a given set of day-levels (day-levels) into a weekly vector, for example, by utilizing a given statistical method (e.g., arithmetic mean).

However, in another embodiment, the assembly may be configured to generate a more complete set of statistics using behavior indicators and vectors that have been calculated in an intelligent manner. To this end, a feature called vector aggregation may be applied that can process, average and extrapolate data from earlier calculated finer grained data and produce meaningful statistics of slightly different ranges as output, outputting the statistics into different time periods or to a user population, rather than a single user.

Still, in another supplementary or alternative embodiment, several measures may be calculated with respect to the dynamic behavior of a given user (trend analysis) or the difference between any two users of the set, which may be achieved by associated features called behavior vectors, which may essentially output measures conveying the type and reach of key differences between the entities under study (e.g., users or time periods).

In some embodiments, the present invention also seeks to understand significant differences and generate alerts or actions based on these differences. This object is achieved by a feature called vector trigger, which is a set of predefined configurations that tell under which conditions, after associating any two specific vectors or calculating a new behavior indicator, that a certain alarm should be generated and signaled to the internal or external module.

Still, according to some embodiments of the invention, the proposed solution may advantageously distinguish between various data sources related to user behavior. To generate semantic structures from, for example, a separate table, an ontology of the incoming data feed can be formed and stored in a separate database. In the background, there is logic to archive the data as larger batches with semantics in place, and preferably a multi-level aggregation process and/or averaging is applied to the incoming data, along with, for example, cluster analysis and/or pattern recognition. A multidimensional behavior vector may be computed for each user, which also relates to the time dimension for enabling dynamic applications. The vector may be calculated for a certain period of time, such as for one week, and be multidimensional in the sense that e.g. a so-called measure of activity (action per period) and/or frequency of use (smaller periods as to how much of all periods involved in the calculation of the vector some activity occurs) are incorporated into the same vector. The vectors reflect a semantic understanding of user behavior, and exemplary vectors are described including travel activity, movement activity, music consumption activity, stress levels, and sleep activity.

Behavioral indicators (vectors) may be computed based on the technical routing and scheduling innovations described herein, taking into account the nature of data obtained from data sources, such as smartphones, which may involve, for example, a significant number of black time periods (i.e., time periods when no data is available), sporadic synchronization of data and, in many cases, incomplete and/or non-standardized data streams (i.e., sensor data typically collected by a separate client application) that may be in a non-parametric form without a predefined structure. The vectors can be calculated with respect to overlapping time periods, the present invention proposes an applicable technique for storing dynamic vectors without consuming too much memory space. The behavior vector may furthermore be used to define a behavior class for each user based on the relative proportion of reference users (in other words, percentiles of current users within a larger group), which for example obtain a lower score than the considered user in a particular behavior dimension. The vectors of different users (Pearson) correlations) may also be correlated with each other to derive a metric, called a similarity index, for any pair of users, which in turn serves as a basis for a user segmentation model. Advantageously, the behavior vector can be automatically and dynamically computed as new information becomes available as it comes in, confirming that the assembled output reflects the most recently available information content in an optimized form at any particular time. Significant changes in behavior may be identified by triggers related to analysis of the normalized vector. The present disclosure also describes how the proposed solution can be used to improve the intelligent and dynamic performance of mobile advertising.

Preferably, the proposed solution can always be performed seamlessly, and sometimes through a dense, non-standardized data stream. To this end, some embodiments of the invention include a feature called "caching" that enables the incoming data stream to be directed through one or more systematic pipelines that ensure that the data is processed in the correct order through the structured processing chain and that the parameterization processing can be supported in an optimal way. Caching also facilitates advantageous actions such as converting non-parametric data to parametric data, and a coordinated well-managed process in which certain actions need to be completed and input data may need to be organized in a particular manner (e.g., chronologically) before moving to a next action.

In some embodiments, for mission critical purposes (such as mobile advertising or optimization actions based on real-time analysis), it may be desirable to compute meaningful behavioral metrics in substantially real-time, which may be achieved by a feature called real-time processing that is relevant to the running of the cache, and based on predefined rules, compute simple indicators (such as Boolean variables) on certain behavioral events or count to reflect the frequency of certain actions.

In order to separate different types of data from each other, and to structurally divide these data points based on requirements related to the utilization of the data points or based on possible interactions with various aggregation layers such that computation load and required time can be optimized, an advantageous feature of various embodiments of the present invention, called "hierarchical data mining using behavioral data", can be implemented that manages data flow through a hierarchical model in which raw data is distinguished from more refined (poliched) data, wherein refining (policing) can refer to modifying, filtering and/or enriching transaction data in certain dimensions such that it is more understandable, more concise and easier to process in later steps, and refined transaction data can be distinguished from aggregation and statistical data that compresses relevant information into more specific numbers and indicators, individual behavioral and/or technical patterns are better reflected and easier utilization of information by internal or external systems is facilitated.

In another supplemental or alternative embodiment, a scalable component may be provided that accesses the behavior data and builds on it customized views or statistics. A feature referred to as a "middle-level table" may be configured to efficiently store at least partially aggregated data in a form that is readily directed to other systems for further aggregation or visualization.

In another supplementary or alternative embodiment, one of the associated goals may be to avoid taking a fixed standpoint with respect to what type of statistics are needed in the final output and/or report, depending on the data processing or aggregation, and therefore, a "further aggregation" feature may be provided to effectively rely on the behavioral indicators fitted into the intermediate layer tables described below and produce statistics of the type desired for internal or external purposes.

In another supplementary or alternative embodiment, the goal of ensuring minimum required storage capacity, protecting consumers' rights, and/or facilitating fast data processing may be to provide a feature called "periodic scrubbing," which means that the solution may automatically, periodically traverse the stored raw and derived data tables and process the unneeded data points from the memory all together according to predetermined criteria.

In another additional or alternative embodiment, data processing and storage may be flexibly distributed. The proposed solution may include a feature to "manage distributed data mining" that effectively keeps track of where the user came from, where his or her data points are stored, and where data processing and storage should occur if, for example, a timestamp has an effect in some way.

Incoming data from a wireless device or other data source may first be stored in a database that is responsible for caching data sets and preparing them for batch processing. Because XML (extensible markup language) processed data, for example, does not always have a predetermined target form when cached, the data may also be processed (e.g., sorted) at this step. After caching, the data can be firstly archived into a database of the original hierarchy that stores all the original data (the so-called "sensor database"), and secondly it can be directed to different analysis processes that typically store the data in an optimized form into so-called "middle tier" tables after processing, aggregation and/or averaging.

Aggregation and other processing actions required prior to storing data into the middle-level table are certain actions that may be triggered based on, for example, the amount and nature of data already in the cache storage. The middle layer table may contain a more compact, simplified form of data that can be more quickly analyzed and further aggregated in a potentially complex manner. These middle layer tables can be used periodically or in real time to generate so-called "derived tables" containing easily understandable information and well-defined statistics.

The derived tables can be used directly by external applications and preferably, they are periodically cleaned from old data entities. In this type of data structure, the data in the sensor database is also periodically cleaned up to hold only data that is meaningful enough and may be needed in further aggregations at some point in the future. Because a single instance of a larger database system may be implemented locally (e.g., in different countries), the overall architecture is designed to be scalable. At different levels, at physically separate levels of the data model, different levels of privacy (e.g., storage of personal ID information) may be guaranteed.

There may be a centralized system that knows which users' data are stored in which regional or functional database, and thus the load on incoming data may be distributed, as well as the load on data analysis. Similarly, the programming interface that extracts the data may use a centralized pointer to know where to search for the data. In this proposed system, the database server advantageously distributes not only the storage of data among them, but also the processing in terms of data functionality among them. For example, the derivative database may reside in a different server than the required middle tier data, and the server may coordinate the data extraction and processing activities itself. The whole system can be seen as a data pipe following logic such as FIFO (first in first out) queuing, but at the same time applying new solutions for data processing and memory resolution stepwise partial reduction.

In another supplemental or alternative embodiment, the data points and statistics that may be convenient for query computation for many (e.g., hundreds) users may be facilitated by a feature called "virtual access," which generates an abstraction of the user's behavioral indicators and virtualizes the middle-level tables to make them easier to access. The "virtual access" feature may connect multiple web servers together to provide a similar user experience for a customer that is actively using the API.

According to another supplementary or alternative embodiment, a semantic data model may be built, whereupon the proposed solution may tell different concepts separately (e.g. sleep or movement), preferably periodically append important data points (e.g. positions and time periods) to them, and ignore the collected raw observation data. The "conversion feature" may add semantic information to the data point and enable a more natural language-oriented semantic query.

According to one embodiment of the invention, filtering tasks and/or excluding tasks may be performed on the processed data. Because external users can request a large amount of information from the supplied assembly, it is preferable that there be a set of filtering and excluding tasks that can examine specific aspects in the data and discard or manipulate data points to make the output more structured and meaningful.

The proposed solution may generally define a platform that provides a virtual database interface with an external wireless device or a web server to access real-time behavioral and contextual information located in another web server. The platform can not only provide a single data point, but can also perform more intelligent complex actions on the data to reduce the processing time required on the querying device or reduce the functional processing requirements (complexity) on the querying device, and can provide semantic meaning to the output data through batch data processing.

According to an embodiment, a query language model for the interface is proposed, based on which the interface can extract information actively (request device initiation) or passively (when e.g. a change occurs) and in practice deliver a prepared reply to the query device in a timely manner. Instead of or in addition to providing, for example, the latest location, the interface may provide the distance traveled during a predetermined period of time (such as the past 60 minutes), or a location point starting, for example, from 60 minutes ago and a current location point (which may then be processed to calculate the required information at the querying device).

So-called statistical filters can be embedded into the solution so that potentially complex data feeds can be directed through filters that pre-process most of the data, sometimes converting it from one form to another and performing processes written earlier on it. This makes it easier to provide a profile-based solution for selected analytics, such that different types of filters and predefined analysis processes can be performed based on the data points being queried and the identification of the data source (e.g., wireless device ID number), and provide a normalized vector when returned. The platform is adapted to support a variety of different physical data sources, and a variety of applications that need to be provided with analytical data can be supported.

In another supplemental or alternative embodiment, features called "abstractions" may be provided that effectively combine multidimensional vectors (e.g., positional dynamics at the hour level) of available behavior vectors, taking into account not only understanding user behavior through metrics and time stamped transactions, but also generating higher-level descriptors about behavior patterns. By this feature, vectors can be generated which can be characterized as behavior traces each time using slightly different parameters, but still describing a certain behavior pattern. Following this type of aggregation-oriented data abstraction, it should be noted that behavior vectors, while already a type of abstraction, are easier for users to analyze for life through machine learning and pattern recognition.

In another embodiment, a goal is set to predict what people are likely to make the next given historical behavior and current context. To achieve this goal, a user behavior model is dynamically built that includes as its elements a behavior abstraction and dynamics such as Markov (Markov) chain types depicted between the elements. As one use case, the predictive model may be used to: the weights and likelihoods of different transitions in the system are dynamically computed and a vector with likelihoods for the next possible state of the system is provided almost anytime.

In some embodiments, learning from the arrival data may be implemented. A feature referred to as a "feedback loop" may be configured to: the predictive model is optionally updated continuously and a measure of how successful the prediction depicting the model is at any given time is computed, possibly continuously. With certain selected thresholds, the performance of the prediction engine can be addressed in real time. The feedback loop enables the prediction engine to truly learn autonomously.

In some embodiments, predictions may be given dynamically, for example, for the purpose of mobile advertising (context-dependent, predictive, targeted advertising). For such purposes, a state machine (e.g., a Markov model) may continuously give predictions of the next state (e.g., next location, name of the next person the user calls, music artist he will then listen to) based on dynamic queries, and through computed performance indicators (how likely the model is to be correct) and external or internal modules that provide a library of specified advertisements, the system may trigger a specific action (such as the pop-up of a certain advertisement) if the conditions are sufficiently predictive.

In another aspect, a method to be performed by an electronics assembly for processing observation data includes:

-receiving non-parametric multi-dimensional spatial and temporal human behavior and/or technical observations (e.g. sensor data) obtained from several mobile devices (e.g. smartphones);

-parameterizing, optionally classifying and/or structuring, the received data;

-performing several aggregation and/or data modeling activities on parameterized data in batches in order to determine several descriptive higher-level behavior indicators and/or technical indicators from the data batches; and

-providing the number of behavior indicators and/or technical indicators or information derived therefrom to an external entity, such as, respectively, to a mobile marketing entity for selecting personalized advertisements for one or more mobile users, or to a network analysis or management entity for evaluating network performance and/or user experience and optionally enabling it to further optimize the performance and/or the user experience based on the evaluation.

As the skilled person realizes, various considerations given herein in relation to embodiments of the assembly may be flexibly compared to embodiments applied to the method and vice versa.

Furthermore, with regard to the utility of the present invention, the present invention can be applied to a variety of usage scenarios, for example, in conjunction with systems where it is desirable to build accurate digital user profiles, for example, on a continuous basis, and to dynamically associate these profiles with one or more actions triggered by characteristics appearing in the data. Several semantic indicators and profiles may be determined based on observed data feeds of potentially logically and physically separate levels of abstraction. Metrics about the user's life or surrounding technical context may be constructed in a real-time manner. Behavioral processes, for example based on smartphone observations and related technical processes, are so equipped. Thus, feeds related to movement observations can be provided as inputs, and related behavior vectors generated by, for example, a combination of state machine methods and data clustering methods can be provided as outputs.

The proposed solution facilitates e.g. batch processing of data blocks and final removal of historical data (which is preferred for saving storage capacity). On the other hand, new incoming data is already quickly ready for analysis and even historical data can be used for analysis if needed. A new technology database solution is thus provided to support analysis processes and time series analysis that are able to partition data into different layers based on their processing requirements. Furthermore, for technical and legal reasons, the data stores may be physically distributed between different servers or other entities.

The sensor data can be physically distinguished from the more refined data, and can build a sustainable automation that is used to continuously generate refreshed analytics about the life of the mobile device user. A large number of applications may require the use of behavioral and contextual data about human behavior. In order to perform meaningful operations on data, the proposed solution is configured to facilitate multiple types of data requests to reduce bandwidth requirements, meet real-time requirements, support more intelligent queries of the server side of the system that require dynamic data processing, and support partially automated operations that trigger actions and data distribution. Physically separate systems may exchange behavioral information and divide responsibility in the processing of data, especially where sensor data is collected from wireless devices and is being further processed by one or several network servers, the data containing multiple types of different data points and aggregate vectors.

Finally, returning to the availability and availability of historical data, the accumulation of databases of behavioral and contextual data enables the building of an understanding of a person's possible actions, in other words, the building of predictive features into a commercial solution (e.g., a social network).

As a practical example of the applicability of the present invention, an external web application may be considered that automatically reflects a significant event that occurs in the life of a selected user (e.g., by sending an email report to his/her friends when someone has gone at least 3 countries in any given 7 days).

One other application may be configured to send an automatic targeted advertisement to the mobile device user based on learning from the user's recent behavior (e.g., when a person is near a record store with a valid metalwork discount, a metalwork record discount coupon is given to him/her, and the likelihood of the person listening to the metalwork in the next 10 days is determined to be above 2%).

As yet another example, the present invention can be used to specify how different types of data should first be stored in a database so that they can be subtly accessed by application programming interfaces located in different abstraction layers. As a practical example, the following explanation may take the form of storage of location information in a variety of forms (including cell tower ID, WiFi hotspot ID, and GPS fix) and also discloses ways to abstract the actual way data points are stored. Based on these descriptions, the key processing is further explained with respect to: the recognition of context-sensitive repetitive patterns in user behavior, and the computation of statistical data reflecting the uniqueness and importance of the recognized patterns.

However, as practical examples, the following are described: how the obtained data can be processed in a plurality of batches; and how physically separate information sources (e.g., the geographic coordinates of the cell tower and the precise transaction log of the cell tower) can be used in parallel for the processing and modeling processes. The output log of the user's lifestyle pattern (including behavioral indicators and associated aggregate data flows and behavioral or predictive models) may be dynamically tied to new incoming data, and certain filters and/or triggers may be programmed to perform selected actions when one or more predetermined conditions are met, and the predictive engine may calculate the likelihood of something happening.

The associated signaling procedures are reviewed further in this document. The proposed solution enables matching of separately defined estimation models and e.g. derived markov scenes with real-time data feeds, so that the next movement of the user is effectively inferred in real-time. A physical mechanism may be provided that indicates to the prediction engine whether the prediction was successful.

The expression "behavior indicator" refers herein to, for example, a numerical or category value in the case of a one-dimensional specific indicator, or a plurality of values in the case of a multi-dimensional behavior indicator, such as an average moving distance in a certain day and an average direction of such movement, or as another example, a behavior vector describing, for example, the user's voice call frequency and average time spent per unit time voice call, which behavior indicators convey the user's behavioral activity, possibly including possible metrics and semantic classifications and/or labels reflecting frequency, activity, type and/or other kinds of metrics with respect to the action.

"on-the-fly" refers to substantially real-time processing.

"technical" is used herein with reference to data, aggregations, indicators and statistics related to the observed technical context or event (rather than the behavioral context or event), meaning, for example, parameters measured from the cellular network (including the signal strength and type of the network being accessed).

"unparameterized" refers to data points that are not directly related to other data points, in other words, data in a silo with each data entity from a particular group, with no explicitly defined relationship to any other data point.

"parameterized" refers to data points that are related to each other, e.g., network base station observations while also including measurements of current throughput and signal strength.

"internal module" refers to a logical module within a physical system or device assembly, or other entity depicted in the present invention.

An "external module" is accordingly a module that is external to the physical reflection of the implementation of the invention disclosed herein.

An "API" refers to an application programming interface, essentially a preferably programmable framework that pulls data from or pushes data to the assembly in a coordinated manner.

"analyzing" herein refers to making decisions based on factual information and/or quantitative information.

An "observer" herein refers to a process that is capable of generating data items based on, for example, queries and the use of the operating system capabilities of the wireless device. Observers, which are functionally, and sometimes physically, sensors that may not always reside in a wireless device and operate continuously, may automatically sense changes identified in, for example, cellular base station usage (e.g., when the device jumps from the coverage of one tower to the coverage of the next tower). An observer can also refer to a channel where a user generates content (e.g., a blog entry or composed text message).

"trigger" refers to the rules and processes that trigger (induce) an action. In particular, they may define how observations can be made more efficiently and automatically in a wireless device. The trigger may be based on a time interval, context changes and observations, external requests, or internal requests (e.g., where some other data point requires more data).

The concept of "intelligence" is used in this document to refer to a set of rules, algorithms, databases, and/or processes that coordinate the overall process or individual micro-processes (e.g., trigger logic) of associated entities. Intelligence is a matter of making the relevant systems work more skillfully in a more optimal way (e.g., saving energy and improving accuracy). It may be based on a fixed algorithm and/or an adaptive algorithm that learns autonomously, and also on external inputs.

"server" herein generally refers to a node or at least a logical aggregation of several nodes that reside in and are accessible through one or more networks (e.g., the internet). The server may provide services for clients, such as mobile agents running in the wireless devices and other entities, such as various network services. The client can thus communicate with one or more centralized servers. The client-server architecture is a common topology for building systems in the internet.

The concept of "processing" is used in this document to refer to various types of actions that may be performed on data in a static manner or a more dynamic, immediate manner. These actions include data transformation, formulation, composition, mashup enrichment, correlation, clustering, factoring, normalization, and/or filtering, among other actions. Some form of processing may be actively used in various embodiments of the present invention, including combining and mashup (e.g., linking data points together and building relational data structures), transformation (producing meaningful streams of, for example, information entities from unordered data items of the original hierarchy (such as observed location points)), enrichment (e.g., adding metadata and making the data more substantial than originally), and/or filtering (e.g., removing data that is not relevant or is no longer needed).

A "smart phone" is defined in this document as a wireless device that is capable of running an operating system that facilitates the installation of additional applications and enables connection of packet data to a target network, such as the internet.

By "equipped" is herein meant an entity such as a device (e.g. a server device) or a system of several at least functionally interconnected devices.

The expression "plurality" refers herein to any integer number starting from two (2) (e.g., two, three, or four).

The expression "a number of" refers herein to any integer number (e.g., one, two, or three) starting from one (1).

The expressions "entity" and "module" are used interchangeably herein.

Drawings

The invention is described in more detail below with reference to the attached drawing figures, wherein:

fig. 1 shows the general concept and the main modules (i.e. the general architecture and design principles) of an embodiment of a server assembly according to the invention from a functional point of view.

Fig. 2 shows different features of the assembled embodiment, mainly focusing on the evolution of behavior indicators (e.g., vectors).

FIG. 3 is a combined block diagram and flow diagram of one embodiment of the described assembly, which primarily illustrates different aspects of the hierarchical data mining logic.

FIG. 4 is a combined block and flow diagram of one embodiment of a data output interface (e.g., a context/behavior application programming interface) that may be applied in the proposed assembly.

FIG. 5 is a combined block diagram and flow diagram of one embodiment of a data prediction module or prediction engine that may be applied in connection with a provided assembly.

Fig. 6 is a block diagram of an embodiment of a server assembly entity according to the present invention.

Fig. 7 is a flow chart disclosing an embodiment of the method according to the invention.

Detailed Description

In view of the foregoing, and with particular reference to fig. 1, the general inventive concept is described by way of an embodiment of a (network) server assembly (deployment) 102, wherein the assembly 102 comprises a data input entity 100 (such as a log reader) for inputting and caching data provided by a number of preferably wireless mobile devices 106 (optionally through at least one communication network 104 such as, for example, a mobile network or other access network and/or the internet), a processing entity 200 for processing data, a multi-tier memory entity 300 for storing data, a centralized logic module 400, and one or more output entities/modules 480 and 500 for organizing the analysis results, the centralized logic module 400 coordinating the data analysis, aggregation of the various tiers, advantageously also the hosting of units for querying and analyzing data based on triggers.

The input entity 100 may thus be configured to execute predetermined reconfigurable logic, for example, to physically structure data into different data tables and to process the entities in the correct order.

The processing entity 200 may be configured to ensure scalable reception and caching of incoming data in batches and may comprise or at least be functionally connected with, for example, a filtering module capable of modifying and processing incoming data to normalize the data flow to an internal analysis module or to a connected analysis module.

The centralized logical entity 400, also referred to as aggregation entity/module, may also be able to process, for example, data batches and preferably be able to determine a predefined number of indicators describing the batches. It may comprise or at least be functionally connected to a prediction entity/module 480 and/or comprise or at least be functionally connected to a feedback entity/module 480, said prediction entity/module 480 being able to find, preferably continuously, a mode vector and a so-called vector identifier, match this vector/these vectors with incoming real-time information and dynamically trigger the prediction, said feedback entity/module providing information back to the prediction module reflecting whether the prediction is correct, which prediction and feedback modules are described in more detail below.

Further, the assembly may include a database (management) entity 300 that is capable of storing data using various abstraction layers and, if desired, physically distributing the storage of data based on an aggregated hierarchy or based on other criteria (e.g., user's subdivisions) as will be described in more detail below.

Thus, for example, from a related assembly point of view, various embodiments of the present invention are generally applicable to defining a common ontology for substantially all stored processed data, which may be achieved by embodiments of the data structuring features of the present invention configured to structure all incoming data into at least one particular table based on their content and dynamic attributes (such as location, user identification, or time), preferably adding classification information during the process to facilitate later processing. Typical category classifications may include at least one category selected from the group consisting of:

1. application usage data (click stream)

2. Mobile web browsing usage data (click stream)

3. Network performance data

4. Device feature usage data

5. Device system data (e.g., battery status)

WiFi network Performance data

7. Memory system data

8. Alarm clock data

9. Calendar data

10. Telephone directory content

11. A message log, and

12. voice call log

One or more of the entities of the invention, such as the processing entity 200 and/or an entity comprised in the processing entity 200 or an entity connected to the processing entity 200, may advantageously turn non-parameterized input data, which may be collected by using one or several software modules (e.g. agents) running in the wireless mobile device, into more robust, more structured and/or parameterized data on the network side and at the same time perform a process on this data that can be carried out on the fly, thereby reducing the load on other modules of the installation or on other modules outside the installation. The assembled entity (e.g., processing entity 200) may be assigned responsibility for processing incoming data streams prior to handing them over to the memory module.

For example, any one or more of the following actions may be taken in conjunction with parameterization:

1. the application class (application class and application class) is added to the application name by: first mapping any particular application name to a uniform application ID (e.g., all different localizations of the default web browser would be translated to unique application IDs), then mapping the category name, application type and class name to the same row,

2. add information about the web domain name (site/page category, etc.), an

3. Adding location tags to observation data

In the parameterization process, systematic relationships between different tables through location or temporal proximity or heuristic procedures including identification of other common demographics (including, for example, technical data such as network base station cell ID or WiFi hotspot index) can advantageously be used to combine from the independent non-parameterized observation data more enriched parameterized data also including parameters that can be obtained from outside the system including, for example, weather data, geographical location names, network status information, among other parameters.

Meaningful vectors may be continuously computed such that they reflect the true behavior of the mobile user, and modules (e.g., centralized logic/aggregation entity 400 and/or entities included in centralized logic/aggregation entity 400 or entities connected to centralized logic/aggregation entity 400) may be configured to: rich varieties of predefined statistics are generated, for example, by scripts that process blocks of data in batches, and advanced statistical techniques, processing activities, or other scripted actions are periodically applied to produce user-level timestamped statistics.

For example, any one or more of the following types of behavior indicators may be calculated based on data collected from the mobile device:

1. the average browsing face time in terms of a predetermined unit (e.g., minute) of usage per a predetermined period of time (e.g., one day),

2. average sleep time of hours for a user during a predetermined period of time (e.g., 12 months 2009)

3. Average daily movement span of predetermined units (e.g., km or miles) per user per day

4. Average entropy of location dynamics of a user over a date

The metric that is feasible depends on the application and the requirements, but typically the metric is of the form: the method comprises the following steps of (1) taking minutes; conversation; a transaction; or other events per unit time; a frequency metric, on the other hand conveying a relative incidence of events during a defined time period; and likelihood measures that convey the relative tendency for a certain occurrence to occur conditionally or unconditionally relative to some other occurrence, in which case likelihood may be a more static number given a set of conditions and contexts (e.g., a time period). The key metrics themselves are often meaningful, and they contribute to all types of derived metrics (including, for example, boolean variables that are used if some usage activity threshold is exceeded).

To generate a more complete set of statistics using already computed behavior indicators and/or vectors, embodiments of the assembly may include the aforementioned feature referred to as vector aggregation, which may process, average, and/or extrapolate previously computed finer grained data and may produce as output meaningful statistics of slightly different ranges, such as determining statistics related to different time periods or groups of users (rather than individual users), for example.

With regard to the above points, an embodiment of the assembly according to the present invention may be configured to: calculating e.g. daily statistics and deriving based on the daily statistics at least any of the following similar statistics, for example:

weekly statistics (usage activity, frequency, user penetration)

Monthly statistics

Annual statistics

In order to compute a measure of the dynamic behavior (trend analysis) for a given user or the difference between any two users of the system, the relevance of the behavior vectors may be determined, which, as previously mentioned, may result in output measures expressing, for example, the type and/or reach of key differences between the entities under study. The differences can be ascertained by deduction of the normalized vectors from each other. For example, the correlation can be found by a multidimensional pearson correlation coefficient.

To understand the differences in user behavior and/or generate alerts or actions based on these differences, vector triggers may be utilized. A vector trigger is a set of predefined configurations describing the conditions under which a certain alarm should be generated and optionally signaled to an internal or external module after any two specific vectors are correlated or a new behavior indicator is calculated. In practice, this type of trigger may be a trigger that reflects, for example, that the user has awakened, is moving, or is going to sleep for a while.

Indeed, with reference to fig. 2 (an embodiment specifically disclosing features specifically relevant to the evolution of behavioral indicators), the data processing entity 200 may be made responsible for first-hand data pre-processing activities and immediate transformations, while the next entity included in the processing entity 200, or at least functionally connected to the processing entity 200 (entity 210 for structuring, parameterizing and/or adding semantics), may be responsible for partitioning the data into several structured entities (e.g. tables) based on the content and attributes of the data, preferably being able to add e.g. remotely received and/or locally generated parameters to the data using an internal or external support engine 220 (which may comprise modules such as location provisioning or weather API), said step of adding may also comprise an optional process in which one or more data points from different data tables may be mixed with other data points, to as a result, either enrich the original data points or form completely new types of data points.

The memory module 300 may be responsible for managing multiple layers of data storage and other related functions, while the (centralized) logic for data aggregation 400 achieves advantageous features by being able to undergo pre-programmed or scripted activities, for example, while analyzing data in batches, for example, at discrete intervals. In data aggregation, one or more data points from one or several data entities (e.g., tables) may be processed in batches, where, for example, time series operations, averaging operations, and/or summing operations may be used to coincidentally derive meaningful statistics from transactional (time-stamped) data.

As previously mentioned, the data aggregation module 400 may include or at least be functionally connected to a number of disparate modules including: vector calculation 410 — calculate statistics and behavior indicators and output a predefined vector that includes all such outputs; vector aggregation 420 — averaging and aggregating the computed vectors for, e.g., a group of users or for a certain time period; and vector correlation 430 — comparing any two vectors to each other, either automatically or on request.

Finally, the vector trigger 440 explained above may define a number of actions that need to be taken if a predetermined correlation outputs some particular result.

Returning to support engine block 220, an example of a module that can enrich (raw) data as part of a pre-processing action targeted at receiving the data is provided below.

The location processing module may input raw data (including various forms of location-related information) and return the location data to any requesting module in a more standardized manner and/or format. In the location processing module, the location may be recorded in a particular location variable, for example, in latitude and longitude geographic coordinates (e.g., 4 decimal degrees). A so-called master location entity (e.g. a table) may be provided in which updates of each individual location are to be stored. Additionally, there may be entities (e.g., tables) in which locations are to be aggregated for a given time period (e.g., for a 5 minute time period) for each user to facilitate aggregation and mapping with other tables and to exclude outliers, preferably by basic statistical methods.

With respect to location, the location processing module may enter, for example, each change to an active base station of the cellular network (and, in addition, input data covering a scan of visible base stations at a given frequency), periodic or aperiodic data for WiFi hotspots at a scan of a given frequency, periodic or aperiodic data for GPS fixes at a given frequency, and/or data from a location application programming interface of the mobile device.

The location processor advantageously systematically processes each individual piece of location information it receives. For incoming new, currently unknown base station or WiFi hotspot indices, the coordinates may be retrieved from other location processors, internal or external, that can map the base station index or hotspot index to geographic coordinates. In addition, the location processor may maintain its own database that maps base station indices and WiFi hotspot indices to geographic coordinates. The location processor may process substantially all of the incoming data to add tangible location coordinates to each incoming location-related observation (e.g., a parameter of a radio network hierarchy).

If GPS or precise location coordinates are received through the API of the mobile device, the location information for the currently operating base station and the WiFi hotspot operating at that time will be updated in the internal database of the location processor.

In addition to the raw data, these possible location stamps can be collected into a dedicated location table indicating the considered users, time, and location point and accuracy. In moderate cases, the location may be updated in the table each time a base station scans or changes, for example. With respect to the table, location names may be added simultaneously when creating new entries (including, for example, building/place names, addresses, areas, cities, postal addresses, and/or cities). For example, the location name may be retrieved from an external module or an internal module that may return a place name in response to geographic coordinates.

With respect to base station and WiFi based location lookup, there may also be other tables that store location names for respective coordinates so that they do not require additional location name lookup. For example, there may be a separate index table in which each base station index is mapped to an associated place name. With respect to GPS-based location lookup and wireless device API-based location lookup, location names may be retrieved from the internal/external modules in real-time.

By using a median function or similar function for each time period during all location observations, the location table may be further aggregated into a form in which locations are stored for a given time period (e.g., a 5 minute time period).

As part of the overall data processing, various embodiments of the present invention may apply so-called queuing in selected cases, where data points are processed through two additional steps to facilitate intelligent mapping or matching of information between any two tables.

As previously disclosed, various embodiments of the present invention may also include the transformation and/or processing of non-parametric data, which is generally easier to collect in a standardized manner from various sources, into parametric observation data and more substantial information that is stored into a final table, from which more complex aggregations can be made.

As an example particularly relevant to location aggregation and parameterization processing, the process of matching location data into observations may be performed as follows.

1. Several different observation types are received in a large data block of human (user) behavior covering a predetermined period of time, e.g. several days, say 3 days.

2. After the first level of refinement, the data stream is directed to a 3-step process.

a. In the first step, substantially all of the data in a given data block is preferably sorted in time order, as it cannot always be assumed that the input data is sequential.

b. In a second step, the data in the data blocks is processed line by line and only the data points related to position (e.g. GPS fix), base station change, base station scan and WiFi scan are processed and a separate position processing module is used to map this information all to geographical coordinates. As a result, the output of the location processing module (including the normalized location stamp, rather than the individual technical observations) is stored into a new table in which the updates for all locations are stored. In addition, a more standardized location table is created in which the average location information is updated for defined time periods (e.g., for each 5 minute time period). Statistical methods (e.g., median) can be used to derive a sufficiently good approximation of the location within the time period. In addition, even if there is no location update for a given time period, the processing may generate a location stamp for that lost time period, subject to the fact that, for example, the location is most likely not to change during the past 5 minutes or other predetermined time period, which may be heuristically determined from the data.

c. In a third step, all other data is guided through in a time sequence and the previously processed position data can be easily mapped to the respective observation data, so that parameterized data can be generated as output.

As some preferred entity of the present invention, hierarchical data mining (described in more detail below) can initiate a process in which data is aggregated and statistical processes are applied to convert it into an output form that is more understandable to external systems than observed data of the original transaction level.

Thus, as a related example, how to compute 410, aggregate 420, and associate 430 behavior vectors for human behavior is explained herein in terms of the use of a smartphone.

As input, this exemplary embodiment of the present invention receives a batch of data (e.g., log lines) about smartphone application usage. In the raw observation data, each row may describe, for example, the activation of a smartphone application in the user interface of the wireless device. Each line may have been processed earlier, which means that a so-called mapping ID may have been appended to the original technical name of the original hierarchy of the application, the idea being to give each application entity a unique identifier regardless of the name of the original hierarchy logged, which may for example depend on the language of the user interface of the wireless device. The mapping ID may additionally be populated with further data/tables that map each unique application identifier to a set of other variables (e.g., application type, application category, application subcategory, etc.).

Based on the mapping process, all application lines that do not represent actual applications (e.g., different types of menus, screen savers, and/or home screen applications) may be removed from the data. As part of the processing, outliers (including, for example, unusually long application sessions) are also preferably excluded. On the other hand, the refined data stream should be cleared of duplicates, in which after refinement there may be two rows with very similar names but different timestamps due to, for example, accidentally jumping to, for example, the home screen application during an application session, from which one immediately returns to the original (actual) application. After excluding applications that do not represent actual usage, there may be two rows in turn that present the same application, and therefore, they should be grouped together because the rows represent the same usage session. The pre-processed data stream on application usage may, for example, thus include a set of rows with unique user IDs, timestamps, and/or some type of application identifier, but may include additional information (e.g., application category, etc.).

In computing the behavior vectors from this type of ready private data, the vector calculus engines (410, 420, 430) can get the data blocks of these rows from the data aggregation entity 400.

The entity responsible for the process is operable so that it obtains as parameters a start time and an end time and a set of user IDs that should be processed. After receiving the original level of data, the entity may exclude data that does not fit in its batch run parameters. Second, the behavior indicator may have two key dimensions, a first key dimension being a reflection and/or abstraction that should be described, and a second key dimension being a time scale that should reflect, for example, an activity of the time scale. The time scale may be, for example, a day or a week, which means that the considered indicator will be calculated such that it describes the average activity during the observation period in the day or week, respectively.

The aggregate related tasks that the entity may then perform may include calculating, for example, how many different days or weeks, respectively, observed some usage of a particular application or device feature. This sets a benchmark for calculating frequency-dependent statistics, given that possible usage time units can be deduced (in other words, how many days there is some data available and the device is actually turned on), to make it easier to calculate statistics reflecting the average daily behavior that other activities are likely to use or take place. As an example, a block of data corresponding to a time period of a year may be received, meaning that the first observation date is the first date of the year and the last observation date is the last date of the year.

However, in the middle 4 months, no data was received, possibly because the data collection feature was disabled. First, a simple aggregation process can be performed to determine how many different months are likely to be used, which in this case would lead to an 8 month result, with 8 months then serving as a benchmark.

After aggregating the likely usage or activity times of the benchmarks, the process may proceed deeper into the calculations. In this particular example, the goal is for the acquisition of a tangible reflection of the application usage level. The original level of application data flow is not as much informed of this. Thus, there may be multiple types of different vectors that better describe the use of the application, and a key design goal may be to compute these vectors using a minimum number of rounds or batch runs. In this particular example, two such vectors are explained in more detail, which may be calculated during the same batch run.

The first vector may indicate the application face time informing people of the time they spend with their mobile phone in front of a certain application. The second vector may reflect application usage frequency, which informs of the relative incidence of usage. For the purposes of this particular example, it is assumed that the only concern is statistics on the daily level of application usage activity and on the monthly level of application usage frequency, but the data itself may cover some other period of time (such as, for example, the entire year). With respect to these variables, the process first aggregates an output file in which, for each user, for each calendar day, the sum of the cumulative face times spent for each application during that day is calculated. As a result, an aggregate data table will be constructed that contains information about: applications used by each user on a daily basis; the fact whether it is used (basically meaning whether a row exists because if no use is observed, no row exists for the application); and the activity used (meaning the degree of use in terms of, for example, the face time or number of sessions spent), which is stored as a variable per line. This type of aggregated table thus reflects the presence or absence of use and the activity of use on all applications. This type of table is also susceptible to further polymerization.

This information can then be further aggregated so that an aggregated file is finally constructed in which, for each user, for the whole calendar year, for each application, there is information about: the total time spent for the application during the time period; and the total number of different days during which the application is used. For this table, a merge operation is performed, which means that initially calculated information about possible usage days or activity days in that year is introduced. After this operation, it is possible to calculate by a simple division operation how many minutes on average any particular user spends on a particular application every day that it is likely to be used. By another division operation of different days of observation of using a particular application divided by the possible total number of days of use, we end up with a frequency vector that can have a value of 100% at the maximum and 0% at the minimum, informing the application of the relative occurrence likelihood reflecting how the user's use is repeated.

As an output, these types of behavior vectors can be combined together by different averaging processes or by simply accumulating these vectors such that for a certain period of time (e.g., one day, one week, one month, or one year) the combined vector informs the usage activity by one or more metrics, which means that the number of metrics or behavior indicators for each application under study or other activity collectively form a multi-dimension, meaning the number of different applications or activities. In this type of combining, averaging or summing process, in more detail, vectors such as the daily level (day level) are typically processed to give an average of the weekly level (week level) of the observed behavior. It is important to recognize that in some cases there is information loss in the behavior evolution. For example, when calculating a behavior indicator for the average time spent on a web browser for a particular week, from this metric it is not possible to derive a frequency measure for the number of web browser usage frequencies on a monthly level, since the input data for this type of calculation requires that the data be on a daily level and, at the same time, requires knowledge of the likely usage (meaning different number of usage days within that particular month).

The same process may be repeated for different types of aggregation levels. For example, instead of an application entity, the basic aggregation entity may be an application category, an application subcategory, or some other entity, such as from a mobile web browsing log, which may be a domain name visited by the user, or from a device feature log, which may be any particular device feature of interest.

When computing a behavior vector (e.g., a behavior vector for application usage), the resulting vector can be run through a standard regression analysis where, for example, time stamps are key independent variables, and with this type of advanced correlation method, possible temporal trends can be studied and, for example, the average slope of the trend can be determined.

As another example, a standard Pearson correlation coefficient or any term of similarity may be calculated against the behavior vectors of any two users at, for example, the year level, and thus a behavior similarity index may be determined.

As another example, how the behavior vectors can be computed 410, aggregated 420, and associated 430 are explained herein with respect to modeling of human location dynamics (in other words, movement).

First a data block of location data is available, which typically identifies all possible location updates that may have been derived during pre-processing that may combine data from several sources, including WiFi hotspot scanning and base station scanning or GPS fix points, and this location information, e.g. in the form of a table, typically forms a non-standardized data stream. The aggregation entity may first turn this location stream into a more standardized form, e.g., it may compute a table row for each, e.g., 5 minute, time period in which an approximate location is computed from the data of the transaction hierarchy, which may be performed by statistical modeling, e.g., using a median function, to arrive at the best approximation. This also typically solves the problem of outliers. Heuristics may be added to this process so that, for example, if a certain 5 minute period loses data (perhaps because no location update is done, but it is clear from other data tables that the device is turned on), a location point for this 5 minute period may be created based on the location point for the previous 5 minute period to eventually get a more standardized location stream.

Then, behavioral indicators may be derived regarding, for example, daily movement of the user. To do so, a simple clustering may be initiated during which all geographic coordinates that are closely adjacent according to the criteria used may be grouped into, for example, one salient location blob. This can be done efficiently by applying standard network analysis and clustering methods, so for example, for each 5 minute time period, an index describing the different locations can be built. Thereafter, if the final focus is on the behavior vector for the user's movement that ultimately results in the daily hierarchy, the aggregation process will follow; for each user, for each day, for example, 5% and 95% percentiles of the latitude coordinates and corresponding 5% and 95% percentiles of the longitude coordinates may be calculated, followed by indexes of different numbers of places on a particular day.

By percentile, outliers may be excluded and/or a 4-point square may be formed, for example, to approximate areas where the user is likely to move during the day. Now by calculating the geographical distance of the two furthest points (meaning the length of the diagonal) a measure can be established called a moving sphere which reflects the area in which the user moves on average during the day. Additionally, a behavior indicator, called place entropy, may be computed that simply reflects how many different places the user has gone through in a particular day, in which case the user spends at least 5 minutes at these places. As a result, a two-dimensional vector about his/her position pattern can be formed daily for each user. The dimensions of the two-dimensional vector reflect the breadth and diversity of the position dynamics.

These merely exemplary location indicators may then be further aggregated. For example, a monthly-level average may be formed from these vectors or aggregated location behavior indicators, e.g., for a group of people. Furthermore, by correlation, it can be investigated whether a day of the week, for example, affects the breadth or diversity of location dynamics. For this purpose, standard analysis of variance tools may be used.

Various embodiments of the present invention are advantageously enabled to perform the following operations: separating different types of data from each other; and structurally partitioning the data points based on requirements related to utilization of the data points or based on possible interactions with the various aggregation layers such that computational load and required time may be optimized. These goals can be achieved by the foregoing feature, generally referred to as "hierarchical data mining using behavioral data," by which we mean managing data flow through a hierarchical model in which raw data is distinguished from more refined data, and refined transactional data is distinguished from aggregated and statistical data. In total, there may be at least the following types of layers with respect to data processing and storage:

1. raw-level data (e.g., transactional observation data received from a mobile device, possibly in unparameterized form),

2. metrology data (e.g., processed, filtered, refined possibly parameterized data),

3. intermediate layer data (e.g., aggregated and/or reconstructed data), and

4. parsing data (e.g., high-level aggregations (e.g., ready-made behavior indicators or technology indicators))

Alternatively, for example, layer 3 may not be present, and in some cases, they may be included in layer 2 and layer 4 depending on the nature of the relevant data. For example, where the technical indicator relating to the average time spent in the 3G network is calculated against all the time spent in the cellular network, the technical indicator relating to a certain day may be calculated directly from the metric data, rather than any aggregation between them. Multilayer chain polymerization is used in the event that such activity satisfies either or both of the following two conditions:

1. aggregation processes simplify data or derive certain types of aggregate metrics or data structures that better reflect the details or nature of observed technical or behavioral events

2. The aggregation process leads, for example by averaging, to a situation in which the access or further processing of the output table is considerably accelerated.

A scalability component may be provided that accesses the behavior data and builds on it customized views or statistics. For this purpose, a feature called "middle-level table" is used to efficiently store at least partially aggregated data in a form that is susceptible to improvement and/or further processing and/or guidance by statistical or more descriptive methods, for example, to other systems for further aggregation or visualization. The data may be stored in, for example, SQL (simple query language) based tables (e.g., MySQL), but is preferably also readily accessible through SPSS (social science statistics software package) or other widely used statistics software tools. The data may be stored in at least one relational database, and the number of relations may increase as more data is analyzed (keeping in mind that the data is collected in a non-parameterized manner).

Preferably, the assembled embodiments are not configured to take a fixed view of what type of statistical data is needed in the final report in the task of data processing or aggregation, so there is a feature referred to as "further aggregation" as described above that can effectively rely on behavioral indicators computed into the middle-level table and produce what type of statistical data is desired for almost any internal or external purpose. Exemplary derived statistics may include:

1. application stickiness, how many people use a particular application or application class daily among people who use the application or application class weekly (i.e., shorter time period (more frequent users) versus longer time period (less frequent users) type analysis)

2. Mobile web sites relative attention numbers, comparing the absolute amount of time spent on a certain domain name during a certain period of time with all the time spent browsing the web

3. Ratio of good sleep to poor sleep (ratio of evenings less than 6 hours in length to all evenings for which measurements are made of the user)

Some embodiments of the invention are designed with the goal of minimizing required storage capacity, protecting consumer interest, and/or facilitating fast data processing, whereby a feature called "periodic scrubbing" may be applied. During the process, the assembly may advantageously automatically and periodically traverse one or more stored original level data tables and/or higher level data tables or other entities and dispose of unneeded data points/entities from storage all together.

Additionally or alternatively, data processing and storage may be flexibly distributed in the context of embodiments of the present invention. To this end, the aforementioned feature known as "managing distributed data mining" can be utilized to effectively keep track of: for example, where the user came from; where his or her data points are stored; and where data processing and storage should occur if the timestamp has an impact in any way. The storage and subsequent processing of incoming data advantageously follows a centralized configuration of the system.

FIG. 3 depicts an embodiment of the hierarchical data mining aspect of the present invention. First, the cache 350 may be needed while ensuring that the memory can facilitate/service all incoming requests and, if needed, perform, for example, important transformations and transformations on the incoming data in a coordinated manner. The memory entity 300 may be interested in core activity (managing the distribution of the operational load and/or tasks, most importantly, controlling all data) with respect to data storage. The memory may apply not only the "clean up" module explained above to remove outlier data points, but may also apply the "clean up" module explained above to improve the quality and distribution of end-customer (e.g., users of the data API) data, e.g., information that is as meaningful, well-structured, and/or rich as possible. Finally, the clean-up module may be configured to remove older, already analyzed data. Storage functionality 370 may be configured to manage a hierarchy of data that may be defined to include, but is not limited to, "observer data" 371, "metric data" 372, "intermediate tier" or "intermediate level" data 373, and "resolution (insights) data" 374, for example, as briefly reviewed above. The module 370 may proactively virtualize access to the constructed (mobile) observation information database. Again, the data aggregation 400 is configured to: predefined actions are performed on the received data and the data is processed 450, for example, by ensuring batch processing 460 of the data or by more dynamically updating, for example, critical selected statistics.

As part of the hierarchical data mining logic, one embodiment of the present invention is described next to illustrate the implementation of the physical inputs and outputs of such a model.

One reason for layering data stores and further performing aggregation processes may be due to the fact that such models can convert almost any amount of behavioral observations into various aggregation indicators in an efficient manner. In particular, because the correlation engines used to compute the behavior vectors can become very complex in these cases, the amount of possible query and statistical operations can be very large, the hierarchical data mining model makes it possible to prospectively pre-aggregate the various tables so that the final steps of behavior vector calculus are performed as efficiently as possible, and their generation can even be real-time in most cases.

In applications where actual human behavior is continuously measured, but the expected output of the assembly is required to include a communication action that initiates, for example, a mobile advertising platform sending a message to a customer, the behavior vector calculus module may not have a practical possibility to perform calculus operations that would take too much time or cover too many queries, and therefore, it should be possible to leverage tables that have been aggregated when computing high-level average numbers of past behavior and a simple measure of whether the average behavior is different from the current behavior.

As an example, it is described herein how a location may be prepared by a hierarchical data mining model. In the first level of data, each location update is time stamped and the amount of information can be very high. In the next step, after the first level of data processing, there is an output file in which approximately smooth locations are written for each 5 minute time period using heuristics and other processes (e.g., a support engine as specified in the present invention). In addition, the data is enriched so, for example, place names (buildings, streets, cities, countries) are added to the rows to generate a data description with a multi-point semantic meaning.

In the next step, in hierarchical location data processing, there is a process that can be initiated at any particular time (e.g., every night) taking as input location data of a particular range (e.g., a time period between a particular start date and end date). This is so-called batch processing that processes data periodically, not in real time.

In practical applications, the process may be designed to run for a desired optimal period of time (e.g., every 24 hours), and it may process data daily, for example, for the past 3 days. Depending on the number of days resulting, it is thus possible to (purposefully) determine the overlap polymerization. If new data covering his/her past 3 days of behavior is received from a certain user only on one day (instead of the previous day), it is important that the batch processing on this day be able to fill in missing gaps and update the key aggregation for this user not only for this day, but also for the past few days. The architecture may be designed such that if there is overlapping data, the new aggregation may override the old aggregation.

In the aggregation engine, the periodic process will complete several items in turn:

1. it will calculate the aggregation entities (e.g. tables) in which, for each user, for each date, and for each hour, a row for each aggregation entity (e.g. city) will be calculated indicating how many 5 minute time periods or any other time related unit the user spent at that location.

2. Similar entities/tables will also be computed using the output aggregation of step 1 to end up with a table in which for each user, for each date, a similar place break will be given.

3. Finally, by following steps, there may be an aggregation process that will calculate information that accurately reflects the user's higher-level location patterns over a very long period of time (e.g., a year). Higher level location patterns may be more interesting when e.g. studying where users live, because randomness and variations of daily life do not limit the analysis, which implies the fact that in low level data tables there is much noise (e.g. thousands of places that go briefly, but also possible anomalies like holidays occur, and by aggregating statistics to longer periods of time and also filtering non-significant places, it is easier to find significant places and the likelihood that time deviations in user life have any effect is much lower.

In the design of this type of multi-level data model, the output of the above steps is used to form a so-called aggregate (derivative or multi-level table), which makes further calculations easier. For example, based on the output of project 1, it is relatively simple to calculate the most typical (median) hour for each location entity for each week, which makes it possible to heuristically obtain a view on, for example, whether that is an office location or a home location.

Further, these types of aggregated output (e.g., the output of item 2 (table)) can be used at any time to derive further aggregations that describe location rankings for each weekday so that the weekly pattern can be understood in terms of activity, movement trajectory, and time spent.

From the middle layer tables, there are all types of behavioral calculations and/or processes in the data processing, including averaging, summing, variance estimation, deviation of correlation coefficients, measurement entropy, etc., that are represented as the highest layers. For example, processes in which average usage activity (such as time spent facing a web browser, maximum monthly usage frequency for sending multimedia messages, average variance of user's location dynamics in terms of kilometers round-trip in a day, and aggregate indicator of time share spent under poor signal strength conditions) are output variables that can typically be calculated for a certain period of time and can be used directly for relevant reporting or analysis practices by only doing one level of averaging or combining, but the data itself is at the highest level in terms of information content. Meaningful statistics (e.g., average time spent in the home for a particular week) can be computed by simple queries and processes based on the aggregated table. This would not actually be possible to quickly convert from the original level of data because the data would first need to be aggregated, time stamped, home location identified, etc. before the actual high level metrics or indicators are derived. The aggregation table and dynamic load balancing and responsibility division enable the different entities of the aggregation and data mining functionality of the present invention to proceed independently of each other, and the output of one process (e.g., estimated time of facing using a web browser in a certain day) can be a direct input to other processes (e.g., processes that derive a measure of variance in time of use of a web browser over multiple days). By a batch processing method, in which processed incoming data is sent, e.g., periodically, by a process during which more meaningful indicators and metrics are derived, the most recent data is actually in the shortest possible time, e.g., after each day, the key statistics available for that day in the best form to facilitate complex calculations (if needed) are calculated. In other words, the design enables separation of the aggregation work from statistics and behavior vector calculations, so that the system can process large amounts of data more efficiently, although still fast in terms of the critical requirements assumed by applications such as mobile advertising or automotive user profiling (refling) solutions.

In a similar manner, the multi-layer aggregation and computation engine may be designed to handle, for example, application usage logs, web browser click streams, music consumption, sleep data, and even audio and video signal observation data.

As described above, the storage functionality 370 may be configured to manage different data layers:

"Observation data" (371) including, for example, primitive level transactions (application usage, voice calls, messages) and scans (WiFi scans, Bluetooth scans, memory file system scans, etc.) in a basic form,

"metrology data" (372), including, for example, refined (processed/improved) data (excluded outliers, added metadata, data streams converted to parameterized form),

"intermediate layer" data (373), including more structured data such as aggregation and reorganization (of lower levels), sometimes supporting metrics are enriched and attached, and preparing key information points for evolution of the final metric,

"parse" the data (374), including, for example, key statistics and final aggregated results.

Advantageously, the present invention can serve, for example, hundreds of customers willing to retrieve data from a provided assembly at any particular time, for example, by querying computed data points and statistics to access it. The aforementioned feature referred to as "virtual access" may be configured to: abstractions of the behavioral indicators of the users are constructed and the middle-level tables are virtualized to make them easier to access. The "virtual access" feature may connect one or more web servers together to provide, for example, a similar user experience for a customer that is actively using the provided API. Virtualized access may provide: the customer need not know how many servers have collected the date, the physical location of the servers, etc., as the described assembly may provide a similar view for typing technical queries into the system.

Various embodiments of the present invention may advantageously be built to support a semantic data model, and thus may enable the provided assembly to describe concepts separately (e.g., sleep or move (user)), periodically append important data points (e.g., location and time period) to them, and ignore, for example, the collected raw observation data. For example, a related "conversion feature" implemented in connection with processing entity/module 210 may be configured to add semantic information to the data point and enable a more natural language-oriented semantic request. These semantic data points may include, among other data points, any one or more of the following semantic data points:

1. location names (NYC, Beijing) and descriptors (Zhongchang, Golf course),

2. the type of music being listened to (e.g., MP3, WAV) and/or genre (e.g., re-rock, blues, dance music, classical music),

3. information about important locations (e.g., "home" and "office")

Preferably, implementations of the present invention ensure that the required filtering and exclusion tasks can be performed on the analyzed and/or processed data. Because outsiders (i.e., customers) can request a large amount of information from the supplied assembly, it is desirable that there be a set/number of filtering and excluding tasks that are able to examine predetermined specific things in the data and discard or manipulate associated data points so that output is preferred, such as more structured and meaningful. For example, it may be desirable to derive certain statistics only for certain groups of users or only for a certain period of time.

Fig. 4 depicts an embodiment of a data output interface 500 (e.g., an Application Programming Interface (API)) and associated data distribution logic. In such processing, the ready-to-use data for output (including, for example, key statistics, indicators, and sometimes even middle tier metrics) may first be filtered, and optionally, the filter and data prediction module 480 may process communications with the prediction engine 487, described below. The data API500 may be configured to manage predetermined operations related to API usage, while the privacy engine 481 may dynamically provide criteria and/or settings regarding, for example, what types of data or statistics may be stored for any particular user or group of users. Similarly, the filtering engine 482 may specifically include rules for filtering outgoing situations and/or unifying data output (even for customer-specific purposes), for example, removing certain types of data points because of their low statistical significance, or limiting output to a certain group of people for access or privacy-related reasons, for example. The request processing module 520 may communicate with the client/user of the assembly (either a machine (via defined API commands) or a human (via ad hoc API requests)), and its primary purpose is advantageously to interpret what data points need to be passed along. The reporting module 510 may be responsible for automatically or upon request generating reports or data tables that contain a defined set of data points in a defined data structure. These reports may be stored in a client specific download site 511 or other entity or may be further delivered through a provisioning module 512, which provisioning module 512 may even send the output data (e.g., tables and reports) forward via email or some other supporting medium.

FIG. 5 depicts an embodiment of a prediction engine according to the present invention. Advantageously, the prediction engine is configured to integrate the processing of the real-time behavior vectors in the assembly by means of an integration module 480. Abstractions (e.g., clusters of behavior vectors within a time frame) may be formed in the associated modules 486 before substantially performing other actions with respect to predictions. The prediction module 487 can be comprised of a multi-dimensional complex module that includes several state machines for different types of behavioral abstractions. The feedback loop 488 may introduce real-time data for performance evaluation purposes and continuously maintain indicators reflecting, for example, the likelihood of success of any particular prediction. Finally, the data input module 100 explained above may interface the observation data stream with an associated external module (such as, for example, an advertising network).

By combining "abstraction" modules of multidimensional vectors (e.g., positional dynamics at the hour level) of the available behavior vectors, vectors can be generated that can be characterized as behavior trails, which naturally experience many variations from time unit to time unit at times, but still describe some behavior pattern as has been previously considered carefully. After abstraction, the user's life may be more easily analyzed by tools of machine learning and/or pattern recognition. An exemplary descriptor vector for a user may be: wake up at place X, move from X to Y, hit H, move from Y to Z.

To predict what people are likely to do next, a user behavior model 487 (i.e., a predictive model) may be dynamically constructed that includes abstractions of behavior as elements and dynamics, e.g., of the Markov chain type, between elements. As a further feature, the predictive model may be configured to: the model weights and/or the likelihoods of different transitions in the underlying system (assembly) are dynamically computed and a vector with the likelihood of the next possible state of the system (assembly) is provided almost anytime.

A continuous learning process may be applied to the new arrival data. The feedback loop 488 may be configured to, for example: the predictive model 487 is updated and a (continuous) metric is computed that depicts, for example, how successful the prediction of the model was at any given time. With certain thresholds, the performance of the prediction engine can be addressed in real time. The feedback loop may enable the prediction engine to learn truly autonomously.

Predictions may be provided dynamically, for example, for the purpose of mobile advertising (context-dependent, predictive, targeted advertising) or network performance analysis and optional optimization. For the former purpose, the associated state machine (e.g. markov model) may be configured to (continuously) provide a prediction of the next state (e.g. next location, name of the next person the user calls, music artist he is then listening to) and, by means of a calculated performance indicator (how likely the model is correct) and an external or internal module providing a library of specified advertisements, the system may trigger a specific action (such as the pop-up of a certain advertisement) if the conditions are sufficiently predictive according to the criteria used.

Returning to the predictive model 487, it can be used to obtain educated guesses about possible arrivals and departures of people in the short term (e.g., in the next few minutes) or in the long term future (meaning, for example, in the next week). The predictive model 487 may be configured to maintain a relatively large network of (mobile) user states. The state may be multidimensional. For example, (home, sleep) and (home, in a meeting) may represent two-dimensional states on, for example, a location state and a social state being output by the behavioral data mining engine.

The prediction engine may be configured to enable (easily) updating the associated model, weighted edges (re-weighted) (arrows) and/or input data in a standardized manner without substantial data processing activity. As an example, a prediction engine may be enabled to input behavioral data and/or technical data over multiple dimensions (e.g., location, movement, meeting status, battery status, application usage, web browsing click streams, and proximity status), where for each dimension, a category variable or a scale variable is used to distinguish between possible states. A multi-tiered relational database model is then created using the predictive model, which is optimized for network-oriented data storage and network modeling. From this storage, the prediction engine may then refresh the so-called prediction model 487. The predictive models 487 may be, for example, very specific to location patterns, or they may be more complex and more dimensional, including things like location and social activity in the same model through multidimensional states. However, this does not change the basic concept in the predictive model 487, which is generally depicted as a markov state machine, or any other relevant model that can support a multidimensional network structure with bi-directional vectors describing the relationships.

In the predictive model 487, the links between nodes describing different states are weighted into two directions; they describe the likelihood that a mobile user moves between them, assuming that movement from the current state will occur. The predictive model 487 is not static, so new data may be entered at all times, and each observation that contributes to the weight of a given link is also stamped with attributes (e.g., time, day, social context, battery status, etc.). This allows the assembly to do two things:

1. first, to give a quick high-level recommendation by trial-and-error as to whether a certain event is likely to follow over some other thing, because there is a feedback loop to the system, it can learn when a critical threshold is more likely to be correct or incorrect for a person's inferred priority. The model is generally able to inform the person of possible patterns in the next few hours, and is able to calculate a high level probability that the person will leave, for example, point a, visit point B, and finally either point C or D in the next few hours. The same may be applicable to predict whether a user is more likely to start moving or to start a meeting, for example, after talking with a wife. The method is more static and more profiles the user's context.

2. Another possibility is that the prediction is more dynamic, more so that short-term events are predicted. The implementation of the system as described above is of a type that if the system knows the user's current context (current state), and it knows various other (important) variables (such as current location, time and day of work), it can use more sophisticated statistical modeling to get a quick estimate of: what is the likelihood of starting a move within the next 5 minutes, or what is the likelihood of turning off the mobile device, given the current conditions. These more dynamic intelligent predictions are possible because the overall historical data behind an observation that renders a linked, observed reflection is multidimensional and parameterized, thus making it possible to give more accurate answers to specific questions, given that sufficient contextual data is available.

In one embodiment of the invention relating to predictive modeling, the assembly is able to calculate, for each link or group of links, a measure (e.g. predictability) of the link vector, which then not only reflects the user's behavioral profile (in other words, his/her movement pattern is highly unreasonable and unpredictable), but also serves as an input for processing the request and deciding whether a certain request can be reliably answered.

In the prediction engine, the overall assembly is closely related to the database structure and multidimensional data mining using behavioral data. The predictive model is a product of the model, but it is tied to the real world through applications on mobile websites or other content providers (e.g., mobile advertising or real-time content optimization). Other applications may include, for example, adaptive services that can proactively alert you to traffic congestion.

In weighted and probabilistic modeling of state machines, machine learning methods based on standard network models and Markov models can be used with 1, 2-or more degree Markov models. Time series data, as well as more than the current state or previous state, may be used as input for any given prediction. In predicting a more specific single event, the fitting may use any known method (even linear regression methods and non-linear regression methods) to fit existing data, estimate models, and use these to give suggestions as to what the likely outcome may be, or estimate the time for an event, for example, given current and past behavior and/or state of the art.

In the prediction engine, one aspect is the use of multiple different layers of data to best infer the likely future behavior of a person (e.g., the likelihood of changing from place a to B in the next 60 minutes), and the possibility to correlate historical data and associated models with more real-time data from the mobile device and to establish a direct real-time feedback loop with real-world events. The key is a multi-dimensional state machine in which each link or behavior jump has enough background observations to facilitate more complex predictions. At the same time, the model itself, being a more static entity, may give specific output about the person's behavior pattern, or it may be used to send very targeted activity messages based on the subdivision model. The predictive model reflects past behavior and gives a likelihood as to what the future will look like if the past behavior is known.

Fig. 6 shows various technical aspects of the invention and the related assembly according to a possible embodiment. The server assembly 660 may be provided with one or more processing devices (e.g., one or more microprocessors, microcontrollers, DSPs (digital signal processors), programmable logic chips, etc.) capable of processing instructions and other data. The processing entity 650 as a functional entity may thus physically comprise, for example, a plurality of co-operating processors and/or a number of sub-processors connected to a central processing unit. The processing entity 650 is configured to execute code stored in a memory 652. Software 658 used to implement the observation data collection, processing, and analysis systems of the present invention may utilize a dedicated or shared processor 650 to perform its tasks. The software functionality 658 can be implemented as one or several software applications and/or modules that communicate with each other. Similarly, memory entity 652 may be divided among one or more physical memory chips or other storage elements. The memory 652 may also refer to, and may include, other storage media such as, preferably, a removable memory card, a floppy disk, a CD-ROM, or a fixed storage medium such as a hard drive. By nature, the memory 652 may be non-volatile (e.g., ROM (read only memory)) and/or volatile (e.g., RAM (random access memory)).

The UI (user interface 656) may comprise a display and/or a connector to an external display or data projector, and a keyboard/keypad, or other applicable control input means (e.g., a touch screen or voice control input, or separate keys/buttons/knobs/switches) configured to provide its operator with practicable data visualization and device control methods. The UI656 may include one or more speakers and associated circuitry, such as D/a (digital-to-analog) converters for audio output, and a microphone having an a/D converter for audio input. Additionally, entity 660 includes communication interfaces, such as wireless and/or wired interfaces for conventional communication with other entities and/or network infrastructure, such as one or more radio transceivers (e.g., WLAN) or wired transceivers/interfaces (e.g., Firewire, USB (universal serial bus), LAN (local area network) adapters (e.g., ethernet adapters), etc.).

The software (product) 658 may be provided on a carrier medium such as a memory card, a memory stick, an optical disc (e.g., a CD-ROM or DVD), or some other storage carrier. The instructions needed to implement the application program may be stored in a carrier medium, either as an executable format or in some other (e.g., compressed) format, such that the software may be transferred to and installed in a target device via the carrier medium (e.g., in the target device's hard disk), or executed directly from the carrier medium in the target device, e.g., by loading the relevant instructions into the target device's memory until execution. Alternatively, the software may be transmitted over the air to the target device via a wireless transceiver or through a wired communication connection.

Fig. 7 discloses a simplified flow chart of only an exemplary embodiment of a method according to the present invention.

At 714, a server assembly according to embodiments of the present invention is obtained and configured, for example, through installation and execution of associated software for managing observation data originating from a mobile device or other data source. At 716, observed (raw) data is received and stored. Optionally, supplemental data (e.g., metadata providing location information) is also received from several external data sources. At 718, the received data may be parameterized, classified, structured, etc. (i.e., further processed) in blocks or in batches. At 720, various aggregations, abstractions, and/or predictions may be derived based on the parameterized parameters. Different behavioral indicators and/or technical indicators may be established, for example, that describe the data. The prediction task may be performed. An alarm and/or trigger as explained above may be activated. Advantageously, several (abstraction) layers are used to store data to facilitate faster future processing. At 722, the external data query is serviced by providing an indicator of the query and/or other higher level information when returned. Alternatively, higher levels of information may be pushed to one or more foreign parties based on a predetermined schedule or, for example, data service subscriptions. The dashed arrows of the loops depict the repeatability of the different method items according to the teachings set forth above. New raw data may be received and higher level entities (e.g., aggregates) may be updated.

The skilled person realizes that the illustrated flow diagrams are merely exemplary in nature and that the nature and number of method steps (the mutual order of these method steps is also included) may be adjusted dynamically and/or use case specifically.

The scope of the invention can be found in the claims. Although various embodiments have been described in detail in the foregoing, it will be appreciated by those skilled in the art that different modifications may be introduced to the specifically disclosed solution without departing from the gist of the present invention as set forth herein and defined by the independent claims.

Claims

1. A network server assembly (102,658,660), comprising:

a data input entity (100,654) configured to receive multi-dimensional non-parametric data, such as sensor data, obtained from a number of mobile devices, such as smart phones;

a processing entity (200,210,650) configured to parameterize the multi-dimensional non-parametric data;

a memory entity (300,370,371,372,373,374,652) configured to store parameterized data as multi-layer data, preferably on a plurality of different abstraction layers;

an aggregation engine (400,410,420,430,460) configured to assign a number of aggregation and/or data modeling activities, such as time series operations, averaging operations and/or summing operations, to batches of said parameterized data, optionally with respect to a certain time period, a certain location, a certain mobile application or application category, a certain mobile user and/or user group, in order to determine from the data batches a number of descriptive higher level behavioral indicators and/or technical indicators, the execution of said indicators being activated substantially at any particular moment when at least a predetermined sufficient amount of data becomes available or when a trigger is released; and

a data exporting entity (500, 520), such as an API application programming interface, configured to provide the number of behavior indicators and/or technical indicators or information derived therefrom to an external entity, such as a mobile marketing entity for selecting personalized advertisements for one or more mobile users, or to a network analysis or management entity for evaluating network performance and/or user experience and optionally enabling it to further optimize the performance and/or the user experience based on the evaluation, respectively.

2. The arrangement of claim 1, configured to perform at least one processing and/or parameterization action on the received data selected from the group consisting of: classifying, structuring based on content and/or one or more attributes of the data, adding a location tag, adding web domain name data, adding mobile application data, and determining a uniform ID, such as an application ID or a mobile content entity ID.

3. An assembly according to any preceding claim, configured to classify received data using at least one category selected from the group consisting of: application usage data, web browsing usage data, network performance data, access network scan data, cellular network scan data, WiFi (wireless fidelity) scan data, memory usage data, device feature usage data, device system data, alarm clock data, calendar data, media usage data, content usage data, phonebook content, message logs, voice call logs, and location data.

4. The arrangement of any preceding claim, comprising a prediction entity (480,486,487), the prediction entity (480,486,487) being configured to host a prediction model, such as a Markov state machine, for reflecting past behavior of a mobile user based on previously received data and optionally dynamically giving a likelihood of future states, such as future events, patterns, locations and/or times, with respect to predetermined events or patterns related to the user.

5. The arrangement of claim 4, comprising a feedback entity (488), the feedback entity (488) being configured to provide information back to the prediction entity about whether the prediction was successful, to enable adaptation of the model.

6. The arrangement of any preceding claim, further comprising a support engine (220), the support engine (220) being configured to add one or more remotely received and/or locally generated parameters to the received data, such as a standardized location stamp.

7. The arrangement of any preceding claim, being arranged to determine at least one behavioural or statistical indicator selected from the group consisting of: an average browsing face time per a predetermined time period, such as a day, within a predetermined unit, such as minutes; an average sleep time during a predetermined time period; an average span of daily movements of a predetermined unit per predetermined time period; average entropy of location dynamics for a user over a period of time; application usage activity; and application usage frequency.

8. The arrangement of any preceding claim, configured to determine a number of preferred multi-dimensional behavior vectors (410) of behavior indicator values, such as vectors indicative of travel activity, movement activity, music consumption activity, stress level and sleep activity.

9. The arrangement of any preceding claim, configured to aggregate (420) previously determined behavior indicators and/or vectors to construct further statistical data, said aggregating (420) optionally comprising averaging.

10. The arrangement of any preceding claim, configured to determine several measures (430) regarding the dynamic behavior of at least one given user (trend analysis) or the difference between at least two users at least by comparison techniques between at least two behavior vectors of behavior indicator values, such as correlation, determination of pearson correlation coefficients and/or regression analysis, optionally to obtain a behavior similarity index or time trend data such as the average slope of the behavior similarity index.

11. The arrangement of any preceding claim, configured to trigger (440) an alarm or other action based on a number of trigger conditions related to a result of a comparison between two predetermined behavior vectors of behavior indicator values, such as a correlation result, or to a calculation of a new behavior indicator.

12. The assembly of any preceding claim, configured to host a semantic data model that associates several semantic concepts, such as sleep, movement, location name, nature of location and/or consumed application or data type, with received data and/or data derived from received data to enable natural language oriented semantic data queries.

13. An assembly according to any preceding claim, configured to define a behaviour class for a user based on a percentile, the percentile being the percentile of users within a larger group that obtain a lower score than the considered user in a particular behavioural dimension according to the metric utilised.

14. An assembly according to any preceding claim, configured to periodically traverse the stored data and process away unwanted portions according to one or more predetermined criteria.

15. An assembly according to any preceding claim, configured to distribute data between a plurality of regional or functional databases to distribute associated processing and storage loads.

16. The arrangement of any preceding claim, configured to provide a virtual database interface to an external device, the virtual database interface for accessing real-time behaviour and context information divided between a number of at least functionally connected devices.

17. The arrangement of any preceding claim, configured to abstract the behavior data by determining a number of multidimensional behavior vectors based on already available behavior vectors of behavior indicator values to describe the behavior pattern (486).

18. An assembly according to any preceding claim, configured to determine a user's spatio-temporal behaviour pattern and/or relevant statistics, optionally by semantic data, to provide conceptual information, such as location names, and to apply heuristics to determine the nature of associated locations, such as workplaces and homes.

19. An arrangement according to any preceding claim, configured to determine substantially continuous behavioural entities, such as vectors, based on discontinuous received data by automatically augmenting a missing portion taking into account data preceding and/or following the missing portion.

20. An arrangement according to any preceding claim, configured to logically and/or physically separate data processing activities from storage activities, and/or to divide similar activities among a plurality of network servers based on user identifiers and/or the type of required computational or aggregation processes and/or time criteria.

21. The arrangement of any preceding claim, comprising a hierarchical averaging module (450), the hierarchical averaging module (450) being adapted to perform dynamic averaging over a selected group or period of users, optionally with rolling averaging, and preferably logically separated from the aggregation engine for the purpose of efficiently handling complex queries from outside the arrangement, and optionally separate independent data sets for other reasons, including legal.

22. The assembly of any preceding claim, configured to physically link a data layer with a number of server entities capable of enriching data with semantics, the semantics optionally being added to the data by a multidimensional matching process.

23. A method for processing observation data to be performed by an electronics assembly, comprising:

-receiving non-parametric multi-dimensional spatial and temporal human behavior and/or technology observations (716) obtained from several mobile devices, such as smartphones, the observations such as sensor data;

-parameterizing, optionally classifying and/or structuring (718), the received data;

-performing a number of aggregation and/or data modeling activities on parameterized data in batches, in order to determine a number of descriptive higher level behavioral and/or technical indicators from the data batch (720); and

-providing the number of behavior indicators and/or technical indicators or information derived therefrom to an external entity (722), such as a mobile marketing entity for selecting personalized advertisements for one or more mobile users, or to a network analysis or management entity for evaluating network performance and/or user experience and optionally enabling it to further optimize the performance and/or the user experience based on the evaluation, respectively.

24. A computer program comprising code means adapted to perform the method of claim 23 when run on a computer.

25. A carrier medium, such as an optical disc, a floppy or memory card or a memory stick, comprising a computer program according to claim 24.