HK1182811B

HK1182811B - Recommending data based on user and data attributes

Info

Publication number: HK1182811B
Application number: HK13110146.7A
Authority: HK
Inventors: J．芬尼根; H．斯瓦拉马克瑞希楠; A．N．比切
Original assignee: 微软技术许可有限责任公司
Priority date: 2011-10-11
Filing date: 2013-08-30
Publication date: 2016-09-23

Description

Recommending data based on user and data attributes

Technical Field

The present application relates to data recommendation, and more particularly to recommending data based on user and data attributes.

Background

1. Background and related Art

Computer systems and related technology affect many aspects of society. Indeed, the ability of computer systems to process information has transformed the way people live and work. Computer systems now commonly perform many tasks (e.g., word processing, scheduling, accounting, etc.) that were performed manually prior to the advent of the computer system. More recently, computer systems have been coupled to one another and to other electronic devices to form both wired and wireless computer networks over which the computer systems and other electronic devices can transfer electronic data. Thus, the performance of many computing tasks is distributed across multiple different computer systems and/or multiple different computing environments.

When a user manipulates a data set, the user is typically required to go to and look for relevant data and/or data resources that may add values to the data set. Finding data and/or data resources is generally a manual and somewhat cumbersome process for the user. Furthermore, the user must know what data to search for and also know what data they want to search for. I.e. the user has to know the correct question to ask.

For example, when manipulating a data set, users typically leave their data application (e.g., word processor, spreadsheet, database, etc.) and use search tools (e.g., Web-based search engines) to find relevant data and/or data resources that they can bring into their data set to add value. The use of search tools also typically requires the user to provide relevant input to the search tool to cause the search tool to find relevant data and/or data resources. Moreover, search tools generally lack any information about the user (e.g., user context) that can be used to improve searches for related data and/or data resources.

Disclosure of Invention

The present invention extends to methods, systems, and computer program products for recommending data based on user and data attributes. It is detected that a user has accessed a data set within a data processing application. Source attributes are derived for the accessed data set. Deriving source attributes from one or more of: user information of the user and data information of the data.

Target attributes are identified for one or more target data sets and/or one or more target data services. The derived source attributes are used with the identified target attributes for the at least one target data set and/or the at least one target data service to determine a desirability of the at least one target data set and/or the at least one target data service as a source of the related data. At least one target data set and/or at least one target data service is recommended to the user as being able to provide relevant data. The relevant data is found to be useful to other users operating on data similar to the accessed data set.

This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.

Additional features and advantages will be set forth in the description which follows, and in part will be apparent to those skilled in the art from the description, or may be learned by practice of the teachings herein. The features and advantages of embodiments of the invention may be realized and obtained by means of the instruments and combinations particularly pointed out in the appended claims. Features of embodiments of the invention will become more fully apparent from the following description and appended claims, or may be learned by the practice of the invention as set forth hereinafter.

Drawings

In order to describe the manner in which the above-recited and other advantages and features of the invention can be obtained, a more particular description of the invention briefly described above will be rendered by reference to specific embodiments thereof which are illustrated in the appended drawings. Understanding that these drawings depict only typical embodiments of the invention and are not therefore to be considered to be limiting of its scope, the invention will be described and explained with additional specificity and detail through the use of the accompanying drawings, in which:

FIG. 1 illustrates an example computer architecture that facilitates recommending data based on user and data attributes.

FIG. 2 illustrates a flow diagram of an example method for recommending data based on user and data attributes.

Detailed Description

Embodiments of the present invention may comprise or utilize a special purpose or general-purpose computer including computer hardware, such as, for example, one or more processors, system memory, and the like, as discussed in greater detail below. Embodiments within the scope of the present invention also include physical and other computer-readable media for carrying or storing computer-executable instructions and/or data structures. Such computer-readable media can be any available media that can be accessed by a general purpose or special purpose computer system. Computer-readable media storing computer-executable instructions are computer storage media (devices). Computer-readable media bearing computer-executable instructions are transmission media. Thus, by way of example, and not limitation, embodiments of the invention can include at least two significantly different computer-readable media: computer storage media (devices) and transmission media.

Computer storage media (devices) includes RAM, ROM, EEPROM, CD-ROM, Solid State Drives (SSDs) (e.g., based on RAM), flash memory, Phase Change Memory (PCM), other types of memory, other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store desired program code means in the form of computer-executable instructions or data structures and which can be accessed by a general purpose or special purpose computer.

A "network" is defined as one or more data links that allow electronic data to be transferred between computer systems and/or modules and/or other electronic devices. When information is transferred or provided over a network or another communications connection (either hardwired, wireless, or a combination of hardwired or wireless) to a computer, the computer properly views the connection as a transmission medium. Transmission media can include a network and/or data links which can be used to carry desired program code means in the form of computer-executable instructions or data structures and which can be accessed by a general purpose or special purpose computer. Combinations of the above should also be included within the scope of computer-readable media.

Further, upon reaching various computer system components, program code means in the form of computer-executable instructions or data structures can be transferred automatically from transmission media to computer storage media (devices) (or vice versa). For example, computer-executable instructions or data structures received over a network or data link may be cached in RAM within a network interface module (e.g., a "NIC") and then ultimately transferred to computer system RAM and/or to less volatile computer storage media (devices) at a computer system. Thus, it should be understood that computer storage media (devices) can be included in computer system components that also utilize, or even primarily utilize, transmission media.

Computer-executable instructions comprise, for example, instructions and data which, when executed at a processor, cause a general purpose computer, special purpose computer, or special purpose processing device to perform a certain function or group of functions. The computer-executable instructions may be, for example, binaries, intermediate format instructions such as assembly language, or even source code. Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the features and acts described above are disclosed as example forms of implementing the claims.

Those skilled in the art will appreciate that the invention may be practiced in network computing environments with many types of computer system configurations, including, personal computers, desktop computers, laptop computers, message processors, hand-held devices, multi-processor systems, microprocessor-based or programmable consumer electronics, network PCs, minicomputers, mainframe computers, mobile telephones, PDAs, tablets, pagers, routers, switches, and the like. The invention may also be practiced in distributed system environments where local and remote computer systems, which are linked (either by hardwired data links, wireless data links, or by a combination of hardwired and wireless data links) through a network, both perform tasks. In a distributed system environment, program modules may be located in both local and remote memory storage devices.

Embodiments of the present invention include using the identified attributes of the current user as well as the source data or both to propose relevant target data and data services to the user. Attributes of the target data or data service are also used. Target data and data services are recommended. The target data and data services may be similar to those found useful to users (e.g., current users operating data like the source data). Thus, the user may be provided with relevant data and/or data services without having to actively search. Further, usage recommendations for the target data and/or data services may be provided.

FIG. 1 illustrates an example computer architecture 100 that facilitates recommending data based on users and data. Referring to FIG. 1, computer architecture 100 includes application 101, analysis module 102, data store 108, data directory 112, and data service 113. Each of the components is connected to (or part of) each other over (such as, for example, network 131) a network such as, for example, a local area network ("LAN"), a wide area network ("WAN"), or even the internet. Thus, each of the depicted computer systems, as well as any other connected computer systems and components thereof, can create message related data and exchange message related data (e.g., internet protocol ("IP") datagrams and other higher layer protocols that utilize IP datagrams, such as transmission control protocol ("TCP"), hypertext transfer protocol ("HTTP"), simple mail transfer protocol ("SMTP"), etc.) over the network 131.

Application 101 includes a user interface 119. Application 101 can be substantially any data processing application such as, for example, a spreadsheet application, a database application, a word processing program, and the like. User 107 can interact with user interface 119 to submit input to application 101 and observe output from application 101. The user 107 may interact with the user interface 119 to load a data set into the application 101 and manipulate data included in the data set loaded into the application 101. The user interface 119 may be presented on a display device.

Data directory 112 and data services 113 may be internal or external to an organization (e.g., a company) with which user 107 is associated.

In general, the analysis module 102 is configured to analyze the user and the data and match source attributes to target attributes to identify recommended data. Analysis module 102 can operate as a background process (e.g., automated). As such, analysis module 102 can have little, if any, performance impact on other processes within computer architecture 100 (e.g., at application 101). The analysis module 102 includes a source attribute derivation module 142, a target attribute derivation module 143, and a matching module 147.

The source attribute derivation module 142 is configured to derive source attributes from one or more of: source data set information, user information, and environmental conditions (e.g., environmental conditions of the operating system, tasks being performed, etc.). Target attribute derivation module 143 is configured to derive target attributes from data in data directory 112 and data available through data service 113. The matching module 147 can match the source attributes to the target attributes to identify data directories and/or data services that can provide data beyond the values of the source data set.

Analysis module 102 can implement any of a variety of different mechanisms in recommending data. In some embodiments, the analysis module 102 implements a statistical algorithm to convert from a higher dimensional attribute space to a lower dimensional space (also referred to as a 'property space'). The transformation from higher dimension attributes to lower dimension attribute space can be used to generate source attributes. In these embodiments, the analysis module learns the rating expected for each combination of values in the property space. Alternatively or in combination, the analysis module 102 may utilize an algorithm based on secondary rules. The secondary rule-based algorithm may operate in both a higher-dimensional attribute space and a lower-dimensional space. The secondary rule based algorithm may patch recommendations proposed by the statistical algorithm. The patching recommendations may include adding, removing, or adjusting recommendations. Algorithms based on secondary rules allow editing recommendations other than statistical.

The secondary rule-based algorithm operates by processing the expression of the condition tree on attributes and evaluating down to a boolean value as to whether it should fire. Recommendations may have ratings that allow them to be combined across systems. The recommendations may be patched to rescale the recommendations based on how the recommendations from the statistical algorithm have performed.

FIG. 2 illustrates a flow diagram of an example method 200 for recommending data based on user and data attributes. The method 200 will be described with reference to the components and data of the computer architecture 100.

Method 200 includes an act of detecting that a user accessed a data set within a data processing application as part of performing a specified task (act 201). For example, the application 101 can detect that the user 107 accessed the data set 111 within the application 101 as part of a task (e.g., adding data to a customer or product spreadsheet). User 107 can send (perhaps through user interface 119) access command 123 to data store 108 to load data 111 in application 101.

Method 200 includes an act of deriving source attributes for the accessed data set, the source attributes derived from one or more of: user information of the user, data information of the data, environmental conditions (act 202). For example, source attribute derivation module 142 can derive source attributes 144 (of dataset 111) from one or more of: user information 148, data set 111, and environmental conditions 141 (e.g., conditions of an operating system, conditions of a specified task, etc.). Method 200 includes an act of identifying one or more target data sets and/or target attributes of one or more target data services (act 203). For example, target attribute export 143 can identify data in data directories 112A, 112B, 112C, etc., and target attributes 146 of available data at data services 113A, 113B, 113C, etc.

Method 200 includes an act of using the derived source attributes with the identified target attributes for at least one of the one or more target data sets and/or at least one of the one or more target data services to determine a desirability of at least one of the one or more target data sets and/or at least one of the one or more target data services as a source of the relevant data (act 204). For example, the matching module 147 uses the source attributes 144 and the identified attributes 146 for the at least one data catalog 112 and/or for the at least one data service 113 to determine a desirability of the at least one data catalog 112 and/or the at least one data service 113 as a source of data related to the data set 111. In some embodiments, the matching module 147 matches the source attributes 114 at least partially to the target attributes 146.

Method 200 includes an act of recommending to the user at least one target data set and/or at least one target service as being able to provide data that is found to be useful to users operating on data similar to the accessed data set under similar environmental conditions (act 205). For example, the matching module 147 may send the recommendation 116 to the user interface 119. Recommendations 116 include recommendations 117A, 118A, and 118B corresponding to data directory 112A and data services 113A and 113B, respectively. Each recommendation may indicate how the data catalog and/or data service is related to the data set 111. Recommending a data directory or data service may include using statistical and/or rule-based algorithms. The recommendations 116 may also indicate how the recommended data directory and/or data service may be used to integrate data into the data set 111. For example, the recommendation 116 may indicate that two columns of data provided by a data catalog or data service are to be combined for inclusion in the data set 111.

The user 107 may then select one or more recommendations presented at the user interface 119. For example, user 107 can submit selection 121 to user interface 119 to select recommendation 118A. User interface 119 may receive selection 121. In response to selection of the recommendation 118A, the analysis module 102 may transfer the relevant data 122 from the data service 113A into the data set 111. Thus, relevant data 122 can be used within application 101 without user 107 having to leave application 101.

In some embodiments, user attributes are collected and periodically updated based on inferences about user behavior and explicit labeling of the user. For example, analysis module 102 can collect and periodically update user attributes of user 107 based on inferences about the behavior of user 107 and/or explicit labeling of user 107.

Data set attributes may be collected through a pipeline. The conduit may be based on one or more of: raw data (or data view), a set of characteristic samples, or a data collection (collectively, sampled data), by examining attributes identified by the sampled data. In summary, it may be that the data sets are meant to be used together (e.g., a worksheet in an excel file). Thus, each portion of data that is run within the data set is analyzed. Dataset attributes are identified across portions (e.g., lists, tables, and sets of tables). The data set attributes are fed to an analysis module (e.g., analysis module 102) that processes the attributes and makes recommendations for data and data services.

In some embodiments, the user may use the accessed data set directly with respect to the recommended data. In other embodiments, the recommended data is converted for direct use. The conversion module (not shown) may consider one or more of the following: data type, semantic meaning, data format, and domain coverage (when transforming recommended data for direct use).

For example, there may be two columns of real-valued numbers (types) (e.g., in a spreadsheet) that represent the location (semantic meaning) of southeast canada (domain coverage) in latitude and longitude (format). Highly interesting and high quality data sources can be directly lined up with these and can be simply proposed to be combined with columns with the correct properties. The transformations may be linked to allow data that is not desired but useful to the user. For example, given your mobile application's IP address record, the IP address can be translated into location information. The address information may then be converted into demographic information and social media opinions related to your mobile application.

Based on the granularity of the portions of the data set (e.g., columns of a spreadsheet), the distribution of data within the portions of the data set, and the associated semantics, a conversion pipeline may be constructed to facilitate direct use of the data source with minimal degradation of the data. Granularity can be used to mitigate data loss — the demographics of a country are less valuable than the demographics of a zip code (postalcode). Domain coverage can be used to evaluate a bonded product — if the data sets hardly overlap in the zip code (zipcode) data, but completely overlap in the country data, even if the zip code is more local, it may be better to use the country data. Thus, by scoring the pipeline by granularity and/or by product incorporation, minimal information loss can be obtained through a series of transformations using data sets and data services.

Data services can be viewed as data sets themselves, where the analyzed data is the desired pattern and value sampling across the supported data. The output may be considered part of the accessed data set (e.g., when the table is arranged row by row), complementary to the accessed data set (e.g., when the data is arranged by column but represents a new row), or as a new data source (e.g., neither arranged row by row nor column).

As such, embodiments of the present invention include periodically (and possibly automatically) accessing and updating user information and accessed data sets. The source attributes are derived from the user information and the accessed data set. The target attributes are derived from the data directory and the data service. The source attributes are compared to the target attributes. When the source attributes of the accessed data set match the target attributes of the data catalog or data service, the data catalog or data service may be recommended to the user as having data related to the accessed data set. Thus, relevant data may be recommended to the user without the user having to explicitly search for relevant data or even know that relevant data exists.

The present invention may be embodied in other specific forms without departing from its spirit or essential characteristics. The described embodiments are to be considered in all respects only as illustrative and not restrictive. The scope of the invention is, therefore, indicated by the appended claims rather than by the foregoing description. All changes which come within the meaning and range of equivalency of the claims are to be embraced within their scope.

Claims

1. At a computer system including one or more processors, a system memory, and a display device, a method for recommending data related to a data set for use in a data processing application, the method comprising:

an act of detecting that a user has accessed a data set within the data processing application as part of performing a specified task;

an act of deriving source attributes for the accessed data set, the source attributes derived from one or more of: user information of the user and data information of the data;

an act of identifying target attributes for one or more target data sets and/or one or more target data services;

an act of matching the derived source attributes with the identified target attributes for at least one of the one or more target data sets and/or at least one of the one or more target data services to determine a desirability of at least one of the one or more target data sets and/or at least one of the one or more target data services as a source of relevant data, wherein the matching includes submitting the source attributes to a statistical system based on a transformation from a high-dimensional attribute space to a lower-dimensional space; and

an act of recommending the at least one target data set and/or the at least one target data service to the user as being able to provide data that is found to be useful to other users operating on data similar to the accessed data set.

2. The method as recited in claim 1, wherein the act of deriving source attributes for the accessed dataset comprises an act of deriving source attributes from raw data included in the accessed dataset.

3. The method as recited in claim 1, wherein the act of deriving source attributes for the accessed dataset comprises an act of deriving source attributes from a set of aggregations of the accessed dataset.

4. The method as recited in claim 1, wherein the act of deriving source attributes for the accessed dataset comprises an act of deriving source attributes from user-related information.

5. The method as recited in claim 1, wherein the act of matching the derived source attributes to at least one identified target attribute that is at least one of the one or more target data sets and/or the one or more target data services further comprises an act of learning a desired rating for each combination in a lower dimensional space.

6. The method of claim 5, further comprising:

an act of submitting the desired rating for each combination in the lower dimensional space to a rule-based system;

a rule-based system operates on source attributes and target attributes to fix recommended actions proposed by the statistical system, including one or more of adding recommendations, removing recommendations, and adjusting recommendations.

7. The method of claim 6, wherein the action of the rule-based system operating on source attributes and target attributes to patch recommended actions proposed by the statistical system comprises adding an action to edit recommendations from the statistical system.

8. The method as recited in claim 1, wherein the act of matching the derived source attributes to the identified at least one of the one or more target data sets and/or at least one of the one or more data services comprises identifying the at least one of the one or more target data sets and/or the at least one of the one or more data services based on one or more of: data type, semantic meaning, format, and domain coverage.

9. The method as recited in claim 1, further comprising an act of indicating to the user how the recommended at least one target data set and/or the recommended at least one data service may be used to integrate data into the accessed data set.

10. A system for recommending data related to a data set of usage within a data processing application, comprising:

means for detecting that a user accessed a data set within the data processing application as part of performing a specified task;

means for deriving source attributes of the accessed data set, the source attributes derived from one or more of: user information of the user and data information of the data;

means for identifying target attributes for one or more target data sets and/or one or more target data services;

means for using the derived source attributes with the identified target attributes for at least one of the one or more target data sets and/or at least one of the one or more target data services to determine a desirability of at least one of the one or more target data sets and/or at least one of the one or more target data services as a source of relevant data, including means for submitting the source attributes to a statistical system based on a transformation from a higher-dimensional attribute space to a lower-dimensional space; and

means for recommending the at least one target data set and/or the at least one target data service to the user as being able to provide data found to be useful to other users operating on data similar to the accessed data set.

11. The system of claim 10, wherein means for using the derived source attributes with at least one identified attribute that serves at least one of the one or more target data sets and/or the one or more target data further comprises means for learning a desired rating for each combination in a lower dimensional space.

12. The system of claim 11, further comprising:

means for submitting the desired rating for each combination in the lower dimensional space to a rule-based system; and

means for operating on the source and target attributes to fix recommendations proposed by the statistical system, including one or more of adding recommendations, removing recommendations, and adjusting recommendations.

13. The system of claim 12, wherein means for operating on the source and target attributes to patch recommendations proposed by the statistical system comprises means for adding edit recommendations to recommendations from the statistical system.

14. The system of claim 10, further comprising means for indicating to the user how the recommended at least one target data set and/or the recommended at least one data service may be used to integrate data into the accessed data set.

15. At a computer system including one or more processors, a system memory, and a display device, a method for recommending data related to a data set for use in a data processing application, the method comprising:

an act of automatically accessing user information about a user of the computer system at a specified time;

an act of detecting that the user has accessed a data set within a data processing application at the computer system;

an act of deriving source attributes for the accessed data set, the source attributes derived from the accessed user information, the accessed data set, and environmental conditions associated with the data processing application;

an act of matching the derived source attributes with attributes of the identified at least one of the one or more target data sets and/or at least one of the one or more target data services, the matching comprising submitting the source attributes to a statistical system based on a transformation from a high-dimensional attribute space to a lower-dimensional space; and

recommending the at least one target data set and/or the at least one target data service to the user as an action capable of providing data found to be useful to other users operating on data similar to the accessed data set under similar environmental conditions.