WO2022071946A1

WO2022071946A1 - Data transformations based on policies

Info

Publication number: WO2022071946A1
Application number: PCT/US2020/053580
Authority: WO
Inventors: Christoph Graham
Original assignee: Hewlett-Packard Development Company, L.P.
Priority date: 2020-09-30
Filing date: 2020-09-30
Publication date: 2022-04-07

Abstract

An example system comprising a policy management engine. The example policy management engine identifies a data transfer operation for a dataset, determines a first plurality of characteristics of a destination entity, and transform a copy of the data to be transferred based on a comparison of the first plurality of characteristics to a policy. An example method of managing a data operation includes receiving a data request for a dataset stored on a memory resource, performing a data transformation on a copy of the dataset in accordance with a first policy based on a source characteristic and a destination characteristic, and providing the transformed copy of the dataset to a destination entity based on a second policy with an access restriction corresponding to the destination characteristic.

Description

DATA TRANSFORMATIONS BASED ON POLICIES

BACKGROUND

[0001] Data is processed for use with computing machines. Administrators or other users may be responsible for managing data access for endpoint devices for authorized use such as for configuration, updating, monitoring, and other purposes. An administrator may manage an endpoint device via secure methods of allowing a client device (such as a desktop, laptop, or notebook computer, a smartphone, a tablet computing device, etc.) to access a network where the data is held.

BRIEF DESCRIPTION OF THE DRAWINGS

[0002] Figures 1 and 2 are block diagrams depicting example data systems. [0003] Figure 3 depicts an example environment in which an example data system may be implemented.

[0004] Figure 4 depicts example components useable to implement an example data system.

[0005] Figures 5-7 are flow diagrams depicting example methods of managing a data operation.

DETAILED DESCRIPTION

[0006] In the following description and figures, some example implementations of apparatus, data systems, and/or methods of managing data operations are described. A data system, as used herein, is a combination of circuitry and executable instructions to manage data operations. A data system may include a general-purpose computer (or a specialized computer) with executable program instructions to allow for processing a dataset, transferring the dataset, or otherwise performing operations with the dataset. As used herein, a dataset is information in a digital format and may include a group or multiple items of information (e.g., data) or a single item of information (e.g., a datum).

[0007] People use computers to generate and share data and metadata. A single person may use multiple compute systems such as a laptop, a phone, and a tablet and may want to access data across multiple devices and across multiple hosted services. Computer applications and services may allow storage of data and access of data, for example. In some examples, data is stored across multiple services. It may often be desirable to maintain data outside the purview of a specific application or service for the ability and agility to transition this information to another application, another device, or another software ecosystem for further processing. [0008] A user may want to share their data with others. The user may set access restrictions to users, such as restricting the public from access any of the user’s data, allowing friends in their social media group may access the social media posts (e.g., personal data), and allowing those in their company workgroup to be able to access their work data, but not personal data.

[0009] A user may have a fluid lifestyle where work and personal data appear to mix, and it may not be sufficient to solely enforce access or denial restrictions on data they would like to share or keep private. There appears a need to provide sensitive information to some people while others may deserve only access to general information. There is also a worry that as soon as data is transferred, the owner may lack further ability to restrict subsequent distribution of that data. Indeed, some relationships among entities may be more complex than a simple election of access to or restriction from data.

[0010] Various examples described below relate to using a data policy to determine how to transform data being transferred between entities. By relating data operations to characteristics of the entities at the source and destination, a data policy can be enforced with flexibility across user groups and infrastructures while maintaining data security. Indeed, a data system may thereby have the ability to apply policy regarding what data, data types, and applications can be used to share data across users under the same policy system, translate data across data types to be used in different applications, and share content across diverse applications in different ecosystems.

[0011] Figures 1 and 2 are block diagrams depicting example data systems 100 and 200. Referring to Figure 1 , the example data system 100 of Figure 1 generally includes a memory resource 102, a policy management engine 104, and a delivery engine 106. In general, the delivery engine 106 delivers data to or from the memory resource 102 as transformed by the policy management engine 104. [0012] The policy management engine 104 represents any circuitry or combination of circuitry and executable instructions to transform a dataset based on a comparison of characteristics of a data operation to a policy. For example, the policy management engine 104 may be a combination of circuitry and executable instructions to identify a data transfer operation for a dataset to be stored on the memory resource 102 or transferred from the memory resource 102, determine a plurality of characteristics of a destination entity, and transform a version of the dataset to be transferred based on a comparison of the plurality of characteristics to a policy. In some examples, the policy management engine 104 may be an electronic service or agent that monitors function calls for any data requests, intercedes on behalf of the data request to perform the data request according to a policy, transforms the dataset according to the policy, and passes the dataset to the delivery engine 106 to provide the result (e.g., the transformed dataset) to the destination of the data request. The policy associated with a dataset may contain multiple characteristics for comparison with a request for data. For example both the policy and the request may contain elements of preferred and available formats for translation of the dataset as part of the request, the target application for the request, the target device of the request, the time of the request, the location of the request, or the requesting identity (e.g., the user). These transforms could include encoding a dataset into a different form, augmenting a dataset with additional metadata, minifying a dataset for size and/or most relevant information, redacting a dataset based on sensitivity, or providing additional security constraints shared with the requestor. In some examples, the transformed dataset may include a policy, such as a rule and condition of the policy, to enforce policy when separated from system 100. [0013] The policy management engine 104 may tag a dataset based on a sensitivity level of information in the dataset. For example, a document may include information that is public information (e.g., information about a public location), group information (e.g., information about a specific group of people), and personal information (e.g., personal identifiable information) and each class of data may be tagged with a corresponding sensitivity level. Such tags may allow for the policy management engine 104 to identify the portions of the dataset to transform according to policy, such as by removing all personal identifiable information, changing the names of any users outside the group, and providing photos in higher resolution to users in the group over public users. Sensitivity levels may corresponding to rules of the policy regarding ownership (e.g., who can transfer the document and to whom), authenticity (e.g., adding a watermark), location (e.g., which geographic location or network the dataset is accessible), or expiry (e.g., a time limit to view the dataset). Tags may be saved with the dataset at the storage location of the dataset or may be dynamically identified when a dataset is queued for retrieval. Such tags may also allow for improved searchability for a particular dataset in a data catalog, for example.

[0014] As used herein, a policy is a data structure that represents a set of rules to apply to manage the availability, usability, integrity, and/or security of a dataset to be accessed and/or operated with. For example, a policy may define how a dataset can be used by authorized personnel of an enterprise, such as enumerating the functionality allowed with a dataset. An example policy may include a number of constraints and/or conditions, functions and/or variables, such as: an expiration date, marshaling rules, preferred destinations, transforms, number of allowed copies, a number or list of allowed recipients, and the like.

[0015] Example policies discussed herein include a data governance policy and an infrastructure policy. A data governance policy is a policy that represents a set of rules for how an owner (e.g., creator or user of the data source) limits the availability, usability, integrity, and/or security of the dataset. For example, a transformed version of a dataset may be modified by the policy management engine based on the authorization level of the data environment in accordance with the data governance policy regarding that data environment and authorization level. By using a data governance policy with owner limitations, restrictions enforced by transformation may stay with the dataset as the dataset is transferred. An infrastructure policy represents a set of rules for how a service or device is to operate or move the dataset. An infrastructure policy may allow for compatibility across platforms (e.g., across different applications, services, device architectures, etc.). For example, the policy management engine 104 may be a combination of circuitry and executable instructions to decode an application-specific data type into a raw data format for storage on the memory resource and encode a raw data format into an application-specific data type.

[0016] As used herein, a characteristic is an attribute of an entity. An entity may be a user, a device, or an application and an attribute may be a status or classification of the user, device, or application. The characteristic may be stored on a data structure associated with the entity, such as a user profile, a device configuration, or application metadata. A plurality of characteristics may span multiple entities associated with the data request. For example, a dataset request derived from a user executing an application, the dataset request may include a user characteristic (i.e., a characteristic of a user) and an application characteristic (i.e., a characteristic of an application). Indeed, metadata about the request from each of the user, the application, and the device of the source of the request as well as the user, the application, and the device of the destination of the dataset operation may be used to identify a rule of the policy to determine how to transform the dataset, as an example. A characteristic of a compute environment may include a characteristic about the technology of the compute environment, such as an application characteristic or a device characteristic. The policy management engine 104 may be a combination of circuitry and executable instructions to determine a first plurality of characteristic of a destination entity and a second plurality of characteristics of a source entity and cause a transformation of the dataset to be based on a dissimilarity (e.g., a difference, a delta, a change, or a transform) between the first plurality of characteristics and the second plurality characteristics (as the rules of the policy dictate according to the dissimilarity).

[0017] A transformation operation of a dataset may include any change to the dataset itself, and the change is generally embodied as a resulting version of the dataset derived from a transformation performed on the dataset. A change to a dataset may be constructive, destructive, aesthetic, or structural changes that change the format, structure, or value of the dataset. A dataset is transformed upon transfer by creating a version of the dataset (with changes) at the destination where the transformed version of the dataset is to be stored (e.g., during serialization, marshaling, etc.). In this respect, a “copy” of the dataset may be loaded onto the destination of a data operation, where the transform creates a difference between the version of the dataset at the source and the “copy” of the dataset at the destination, for example. In some examples, the dataset at the source is deleted upon transfer to the destination. Example transformation operations include altering a value of the dataset, removing a datum from the dataset, adding a datum to the dataset, changing a format of the dataset to a different data type, reordering or resequencing the dataset, managing metadata of the dataset (e.g., adding, removing, or updating metadata), or encrypting the dataset. Another example transformation operation may include providing policy instructions to the requestor to use when communicating with downstream requestors or to preserve policy integrity to pass to downstream systems. For example, the policy management engine 104 may allow for encryption upon storage of a dataset or encryption for each retrieval. For another example, the policy management engine 104 may translate voice data to text data to update a form of an application or text data converted into a chart. For yet another example, raw text may be converted to rich text. Transformation operations may correspond to different rules of the policy and may correspond to different levels of sensitivity of the dataset. In this manner, a dataset may be transformed multiple times during a data transfer using the system 100, such as based on separate rules of the same policy or based on multiple policies.

[0018] The delivery engine 106 represents any circuitry or combination of circuitry and executable instructions to provide a copy of a dataset to a destination entity. For example, the delivery engine 106 may be a combination of circuitry and executable instructions to provide a transformed version of a dataset to a destination entity when the destination entity has authorization to access the transformed version of the dataset. The delivery engine 106 may hook into a function call to provide a dataset to the function call. The delivery engine 106 may provide a dataset to the function call that was intercepted by the policy management engine 104. Indeed, the policy management engine 104 and the delivery engine 106 may use cross-application interfaces and have the capability to route a data operation from and to a function call or route the data on behalf of the function call. The delivery engine 106 may include circuitry or a combination of circuitry and executable instructions to generate an audit record corresponding to a transfer of the dataset to a destination entity. For example, the delivery engine 106 may include circuitry to append metadata to a transformed version corresponding to the transformation(s) and limitation(s) of the original owner and update a digital ledger with an entry corresponding to that dataset, the source of the transfer, and the destination of the transfer. In this manner, metadata regarding a data operation (including information of the transformation performed on the dataset) may be associated with a version of the dataset as it is transformed and/or transferred. When saving a dataset for retrieval, especially when in shared storage space, it may be unknown who may access the data and for what purpose. Thus, it may be beneficial to automatically transform the dataset for enforcing limitations (i.e., defined by policy rules) as set when storing the dataset in order to allow for particular transforms to occur for each retrieval, for example.

[0019] In some examples, functionalities described herein in relation to any of Figures 1-3 may be provided in combination with functionalities described herein in relation to any of Figures 4-7.

[0020] Figure 2 depicts the example system 200 may comprise a memory resource 220 operatively coupled to a processor resource 222. Referring to Figure 2, the memory resource 220 may contain a set of instructions that are executable by the processor resource 222. The set of instructions are operable to cause the processor resource 222 to perform operations of the system 200 when the set of instructions are executed by the processor resource 222. The set of instructions stored on the memory resource 220 may be represented as a policy management module 204 and a delivery module 206. The policy management module 202 and the delivery module 204 represent program instructions that when executed cause function of the policy management engine 104 and the delivery engine 106 of Figure 1 , respectively. The processor resource 222 may carry out a set of instructions to execute the modules 204, 206, and/or any other appropriate operations among and/or associated with the modules of the system 200.

[0021] For example, the processor resource 222 may carry out a set of instructions to determine a plurality of characteristics of a destination compute environment, compare a sensitivity level of the dataset to an authorization level of the destination compute environment corresponding to a data governance policy based on the plurality of characteristics of a destination compute environment, transfer a transformed version of the dataset to the destination compute environment in accordance with an infrastructure policy, and generate an audit record corresponding to the transfer and metadata corresponding to the transformed version.

[0022] For another example, the processor resource 222 may carry out a set of instructions to tag a dataset with a sensitivity characteristic; transform the datum based on the sensitivity characteristic according to the data governance policy; identify the destination compute environment is part of a managed computing system in which the data governance policy and the infrastructure policy are able to be applied; add a time restriction or an event restriction to the transformed version in the destination compute environment; and generate an audit record by adding a digital mark to the transformed version of the dataset, sending a notification to an owner of the dataset in response to the transfer of the transformed version of the dataset, and/or updating a digital ledger with an entry corresponding to a transfer of the transformed version of the dataset.

[0023] For yet another example, the processor resource 222 may carry out a set of instructions to monitor a destination compute environment for a data request, route the data request to a cloud service to obtain a transformed version of a dataset, and create the transformed version of the dataset by converting the dataset to a data type corresponding to an application of the destination compute environment to receive the transformed version.

[0024] Although these particular modules and various other modules are illustrated and discussed in relation to Figure 2 and other example implementations, other combinations or sub-combinations of modules may be included within other implementations. Said differently, although the modules illustrated in Figure 2 and discussed in other example implementations perform specific functionalities in the examples discussed herein, these and other functionalities may be accomplished, implemented, or realized at different modules or at combinations of modules. For example, two or more modules illustrated and/or discussed as separate may be combined into a module that performs the functionalities discussed in relation to the two modules. As another example, functionalities performed at one module as discussed in relation to these examples may be performed at a different module or different modules. Figure 4 depicts yet another example of how functionality may be organized into modules.

[0025] A processor resource is any appropriate circuitry capable of processing (e.g., computing) instructions, such as one or multiple processing elements capable of retrieving instructions from a memory resource and executing those instructions. For example, the processor resource 222 may be a central processing unit (CPU) that enables data management by fetching, decoding, and executing modules 204 and 206. Example processor resources include at least one CPU, a semiconductorbased microprocessor, a programmable logic device (PLD), and the like. Example PLDs include an application specific integrated circuit (ASIC), a field-programmable gate array (FPGA), a programmable array logic (PAL), a complex programmable logic device (CPLD), and an erasable programmable logic device (EPLD). A processor resource may include multiple processing elements that are integrated in a single device or distributed across devices. A processor resource may process the instructions serially, concurrently, or in partial concurrence.

[0026] A memory resource represents a medium to store data utilized and/or produced by the system 200. The medium is any non-transitory medium or combination of non-transitory media able to electronically store data, such as modules of the system 200 and/or data used by the system 200. For example, the medium may be a storage medium, which is distinct from a transitory transmission medium, such as a signal. The medium may be machine-readable, such as computer-readable. The medium may be an electronic, magnetic, optical, or other physical storage device that is capable of containing (i.e., storing) executable instructions. A memory resource may be said to store program instructions that when executed by a processor resource cause the processor resource to implement functionality of the system 200 of Figure 2. A memory resource may be integrated in the same device as a processor resource or it may be separate but accessible to that device and the processor resource. A memory resource may be distributed across devices.

[0027] In the discussion herein, the engines 104 and 106 of Figure 1 and the modules 204 and 206 of Figure 2 have been described as circuitry or a combination of circuitry and executable instructions. Such components may be implemented in a number of fashions. Looking at Figure 2, the executable instructions may be processor-executable instructions, such as program instructions, stored on the memory resource 220, which is a tangible, non-transitory computer-readable storage medium, and the circuitry may be electronic circuitry, such as processor resource 222, for executing those instructions. The instructions residing on a memory resource may comprise any set of instructions to be executed directly (such as machine code) or indirectly (such as a script) by a processor resource.

[0028] In some examples, the system 200 may include the executable instructions may be part of an installation package that when installed may be executed by a processor resource to perform operations of the system 200, such as methods described with regards to Figures 4-7. In that example, a memory resource may be a portable medium such as a compact disc, a digital video disc, a flash drive, or memory maintained by a computer device, such as a service device 334 of Figure 3, from which the installation package may be downloaded and installed. In another example, the executable instructions may be part of an application or applications already installed. A memory resource may be a non-volatile memory resource such as read-only memory (ROM), a volatile memory resource such as random-access memory (RAM), non-volatile random-access memory (NVRAM), a storage device, or a combination thereof. Example forms of a memory resource include static RAM (SRAM), dynamic RAM (DRAM), electrically erasable programmable ROM (EEPROM), flash memory, or the like. A memory resource may include integrated memory such as a hard drive (HD), a solid-state drive (SSD), embedded multi-media controller (eMMC) memory, or an optical drive.

[0029] Figure 3 depicts an example environment 390 in which an example data system 300 may be implemented. The system 300 (described herein with respect to Figures 1 and 2) may represent generally any circuitry or combination of circuitry and executable instructions to manage a data transfer. The system 300 includes a policy management engine 304, a delivery engine 306, and a function call monitor 302. The policy management engine 304 and the delivery engine 306 may be the same as the policy management engine 104 and the delivery engine 106 of Figure 1 , respectively, and the associated descriptions are not repeated for brevity. As shown in Figure 3, the engines 304 and 306 may be integrated into a same ecosystem or compute device, such as a distributed cloud service. The engines 304 and 306 may be integrated via circuitry or as installed instructions into a memory resource of the compute device.

[0030] The function call monitor 302 includes a combination of circuitry and executable instructions to monitor and/or intercept operations. For example, the function call monitor 302 may be a combination of circuitry and executable instructions to monitor operations of a compute environment for a data transfer operation and route the request to the policy management engine 306 to retrieve the requested dataset. For another example, the function call monitor 302 may be a combination of circuitry and executable instructions to hook into a cross-system application 326 using a cross-system application interface 324 and route an operation of a cross-system application 326 to the policy management engine 304 for accessing a dataset 308 (and route the resulting, transformed dataset back to the cross-system application 326). The function call monitor 302 may include circuitry or a combination of circuitry and executable instructions to provide a programming interface to an external application as a service, driver, firmware, or circuitry. For example, the function call monitor 302 may include a combination of circuitry and executable instructions to provide a driver to interface with a specific storage solution to perform requests based on input/output (I/O) requests to the storage subsystem through an interface driver, such as a filter driver.

[0031] The policy management engine 304 may use the tags 316 and the characteristics of the application 326 to determine how to transform the dataset 308 in accordance with the data governance policy 312 and the infrastructure policy 314. The dataset 308 may be part of a personalized data catalog stored at a cloud location. A data catalog is a metadata management tool designed to help organizations find and manage sizeable amounts of data. In some examples, the policy management engine 304 includes a combination of circuitry and executable instructions to maintain collections of shared content across different context, such as separation of saved browser links from saved pictures. The example system 300 uses datasets at memory resources 310 and 320, which, in some examples is a memory resource distributed across devices. In other examples, the memory resources 310 and 320 may be the same memory resource.

[0032] The example environment 390 may include compute devices, such as developer devices 332, service devices 334, and user devices 336. A first set of instructions may be developed and/or modified on a developer device 332. For example, an application may be developed and modified on a developer device 332 and stored onto a web server, such as a service device 334. The service devices 334 represent generally any compute devices to respond to a network request received from a user device 336, whether virtual or real. For example, the service device 334 may operate a combination of circuitry and executable instructions to provide a network packet in response to a request for a page or functionality of an application. The user devices 336 represent generally any compute devices to communicate a network request and receive and/or process the corresponding responses. For example, a browser application may be installed on the user device 336 to receive the network packet from the service device 334 and utilize the payload of the packet to display an element of a page via the browser application.

[0033] The compute devices may be located on separate networks 330 or part of the same network 330. The example environment 390 may include any appropriate number of networks 330 and any number of the networks 330 may include a cloud compute environment. A cloud compute environment may include a virtual shared pool of compute resources. For example, networks 330 may be distributed networks comprising virtual computing resources. Any appropriate combination of the system 300 and compute devices may be a virtual instance of a resource of a virtual shared pool of resources. The engines and/or modules of the system 300 herein may reside and/or execute “on the cloud” (e.g., reside and/or execute on a virtual shared pool of resources).

[0034] A link 338 generally represents one or a combination of a cable, wireless connection, fiber optic connection, or remote connections via a telecommunications link, an infrared link, a radio frequency link, or any other connectors of systems that provide electronic communication. The link 338 may include, at least in part, intranet, the Internet, or a combination of both. The link 338 may also include intermediate proxies, routers, switches, load balancers, and the like.

[0035] The policy management engine 304 may provide operations offered as a cloud service, for example. Such a service may allow for data access (and their transformations) to be individualized for the compute environment of the request and the state of the destination entity. For example, when data is copied from an application and pasted back into the application, a data transform may occur using the system 300, such as to translate or change (potentially improve) the format or structure of the dataset even when pasted into the same application from which the dataset originated. In an example, a dataset may be stored at a cloud storage location and retrieved for multiple, different applications executing on the same destination compute environment where each application may receive a differently transformed version of the dataset based on each different application (e.g., based on an application characteristic) even though the applications are accessed by the same user on the same device. In this manner, datasets may be individually transformed for each request according to the policies enforced by the system 300. [0036] Referring to Figures 1-3, the engines 104 and 106 of Figure 1 , the modules 204 and 206 of Figure 2, and/or the engines 304 and 306 may be distributed across devices 332, 334, 336, or a combination thereof. The engine and/or modules may complete or assist completion of operations performed in describing another engine and/or module. For example, the policy management engine 304 of Figure 3 may request, complete, or perform the methods or operations described with the policy management engine 104 of Figure 1 as well as the delivery engine 104 of Figure 1 and the function call monitor 302 of Figure 3. Thus, although the various engines and modules are shown as separate engines in Figures 1 and 2, in other implementations, the functionality of multiple engines and/or modules may be implemented as a single engine and/or module or divided in a variety of engines and/or modules. In some example, the engines of the system 300 may perform example methods described in connection with Figures 4-7.

[0037] Figure 4 depicts example components useable to implement an example data system 400. Referring to Figure 4, the example components of Figure 4 generally include a function call monitor 402, a policy management engine 404, and a delivery engine 406. The example components of Figure 4 may be implemented on a compute device, such as service device 334 of Figure 3. The components may be a processor resource programmed to fetch, decode, and execute instructions of the modules of Figure 4 to cause transformation of a dataset 466 based on a policy 470.

[0038] A data request 458 is made to operate with a dataset 466. The function call monitor 402 is able to identify the data request 458 and use a policy to determine how to transform the dataset 466 of the data request 458. The function call monitor 402 includes program instructions, such as an identifier module 440 and an interceptor module 442, to assist monitoring of function calls to a managed dataset 466. The identifier module 440 represents program instruction that when executed cause a processor resource to identify request parameters 460, such as a source and destination of the data request 458. The interceptor module 442 represents program instruction that when executed cause a processor resource to use an interface 462, such as an application programming interface (API), to intercede on behalf of the data operation (e.g., store or retrieval) using the policy managed version of the requested dataset, such as by initiating action by a policy management engine 404 to perform the data operation on behalf of the data request 458.

[0039] The policy management engine 404 includes program instructions that may be executable by a processor resource, which may be separate from or the same as the processor resource used to execute the modules of the function call monitor 402. The modules of program instructions of the policy management engine 404 include an operation module 444, a characteristics module 446, a sensitivity module 448, a user transformation module 450, and an infrastructure transformation module 452. [0040] The operation module 444 represents program instructions that when executed cause a processor resource to identify an operation to be performed and the dataset 466 on which the operation is to be performed. The characteristics module 446 represents program instructions that when executed cause a processor resource to identify any source characteristics 464 and destination characteristics 472 of the data request 458. The sensitivity module 448 represents program instructions that when executed cause a processor resource to identify a sensitivity level of the dataset 466 based on tags 468 of the dataset 466. In some examples, execution of the sensitivity module 448 causes a dataset 466 to be associated with a tag. The user transformation module 450 represents program instructions that when executed cause a processor resource to perform a transformation operation on the dataset 466 based on a condition of the policy 470 associated with a user-selected limitation on how to access or use the dataset 466. The sensitivity level of the tags 468 may be used via execution of the user transformation module 450 to determine an appropriate transform in accordance with the policy 470 as compared to the authorized sensitivity level corresponding to the destination entity (e.g., as identified by the destination characteristic 472). The infrastructure module 452 represents program instructions that when executed cause a processor resource to a transformation operation on the dataset 466 based on a condition of policy 470 associated with a destination characteristic, such as modifying the dataset 466 to fit the structural format used by the destination device or destination application.

[0041] The delivery engine 406 includes program instructions, such as an audit module 454 and a return module 456, to assist delivery of the transformed dataset 476 to the function call intercepted by the function call monitor 402. The audit module 454 represents program instructions that when executed cause a processor resource to perform an audit operation, such as generating an audit record corresponding to the data transformation of the dataset 466, the transfer operation of the data request 458, and/or the destination of the data request 458. The return module 456 represents program instructions that when executed cause a processor resource to hook the transformed dataset 476 to return to the data request 458. For example, an API may be used the tracks a request reference 474 to point the dataset 476 to the function call of the data request 458. An example of a data structure used herein, such as appended to a transformed dataset may include the following generic example of variables, conditions, and/or functions: [0042] Policy

[0043]

[0044] Unique Identifier

[0045] Version

[0046] Timestamp

[0047] Transformation

[0048] Data

[0049] Source Data Metadata

[0050] {

[0051] Source device

[0052] Location

[0053] Timestamp

[0054] Creator

[0055] Source Application

[0056] Data Type

[0057] Constraints {

[0058] Expires

[0059] Marshaling Rules

[0060] Preferred Destinations

[0061] Required Transforms

[0062] Allowed Copies

[0063] Allowed Recipients

[0064] }

[0065] Thumbprint

[0066]

[0067] Previous Id (previous policy in digital ledger - previous row)

[0068] Source Id (previous policy used in transform - linked Policy)

[0069] Previous Thumbprint

[0070] Source Thumbprint

[0071] New Thumbprint

[0072] }

[0073] Figures 5-7 are flow diagrams depicting example methods 500, 600, and 700 of managing a data operation. Referring to Figure 5, example methods of managing a data operation may generally comprise receiving a data request for a dataset, performing a data transformation of the dataset, and providing a version of the transformed dataset. The operations of the method 500 are performable by execution of a data system as described herein, such as via execution of a policy management engine 104 and a delivery engine 106.

[0074] At block 502, a data request for a dataset is received. The data request for a dataset stored (or to be stored) on a memory resource may include a source characteristic (e.g., a characteristic of the source of the data and/or source of the data request) and a destination characteristic (e.g., a characteristic of the destination compute environment or destination entity to receive a version of the dataset). The operation at block 502 may include parsing the data request to identify the source characteristic(s) and the destination characteristic(s) used to identify a rule regarding transformation as defined by a policy.

[0075] At block 504, a data transformation is performed on a version of the dataset in accordance with a policy. The transformation may be determined via the policy using the source characteristic and the destination characteristic of the data request. Multiple transformations of the dataset may be performed for a data request and a dataset may be transformed a number of times for a number of data requests. For example, a first data transformation on a dataset may be performed in response to a store request to store the dataset in a first transformation state (i.e., a state of the dataset with regards to a transformation performed on the dataset) at a cloud storage location, and a second data transformation on the dataset may be performed for each retrieval of the dataset (as transformed upon being stored in the cloud storage location).

[0076] At block 506, a transformed version of the dataset is provided to a destination entity based on a policy with an access restriction corresponding to the destination characteristic. For example, a delivery engine, such as delivery engine 106 of Figure 1, may provide a version of the dataset as transformed for the destination entity (based on a destination characteristic of the destination entity) when the destination entity is authorized to view the data. The policy used to determine how to provide the dataset at block 506 may be different from the policy used to transform the dataset at block 504. In some examples, the authorization level of the destination entity may be considered by the policy management engine in the determination of how to transform the version of the dataset (in accordance with and defined by the policy). In some examples, the policy used to transform the dataset at block 504 may be separate from, and in addition to, an access restriction of the dataset for a destination entity. In this manner, the transformation of the dataset as managed by a policy management engine may be used with security levels of authorization of an enterprise ecosystem, for example. A data transformation may be different for the dataset when sending the dataset to different destination entities based on ownership and/or access to the destination entities, such that the access to transformed datasets may be based on the user and other characteristic of the request and not just for application translations or dataset transformations for compatibility with the destination compute environment, for example. A local or network clipboard application, for example, may use the policy management engine to transform the dataset of the clipboard whenever a paste operation is performed based on the destination of the paste operation, thus each application could have a different transformed version of the clipboard dataset when pasted into each application.

[0077] By routing data requests to a policy management engine that considers the source characteristic(s) and the destination characteristic(s), the data transformation may be different for the dataset when sending to different destination entities based on ownership or access to the destination entities, for example. In this manner, an owner of data may individually customize how and what data is accessed and viewed for each destination entity, for example, such as on classes of users, applications, and/or devices. Ownership may be a user characteristic and may be tracked as a device level characteristic. Access restrictions, such as denial of access to a dataset, may be implemented in addition to performing a transformation of a dataset in response to a data retrieval request as well as based on who is making the data retrieval request or other destination characteristic.

[0078] Referring to Figure 6, example method 600 of managing a data operation may generally comprise receiving a data request to store a dataset, performing a data transformation of the dataset upon storage, receiving a data request for retrieval of the dataset, performing a data transformation of the dataset upon retrieval, providing a version of the transformed dataset, and managing an audit record. The operations of the method 600 are performable by execution of a data system as described herein, such as via execution of a policy management engine 104 and a delivery engine 106. As some operations of the method 600 may be similar to the operations of the method 500, their respective descriptions may not be repeated in their entirety, for brevity.

[0079] At block 602, a data store request is received, and the source and destination of the data store request are identified. At block 604, a dataset is tagged with a sensitivity level. For example, a regular expression search may be used to parse a dataset for types of information to be classified (with a tag) and the dataset identified to correspond with the condition of the rules of the tag are updated with the tag in corresponding metadata. In this manner, the tagging of the dataset may occur upon storing the dataset in a cloud storage location. This may allow for the policy engine to identify the pieces of data to transform when retrieved, for example.

[0080] At block 606, the dataset tagged at block 604 is transformed based on a first policy using a characteristic of the data store request (e.g., a source characteristic and/or a destination characteristic of the cloud storage service). In this example, the dataset is being tagged for transformation upon storage, such as by scrubbing out sensitive information at the time of storage, so as that any retrieval of the dataset will never have the sensitive information. Automatic transformations upon storage may be helpful for users of live media or other processing, for example. In an example, a regular expression search is performed to transform the tagged dataset based on a sensitivity level of the destination entity (e.g., the cloud storage location of a service where the service may be characterized for public or private use).

[0081] At block 608, an audit record is generated. For example, an audit record corresponding to the transfer may be generated by updating a digital ledger and metadata corresponding to the transformed version of the dataset to be stored at block 610.

[0082] At block 612, a data retrieval request is received, and the source and destination of the data retrieval request are identified. In this example, the dataset stored at block 610 is the dataset to be retrieved by the data retrieval request at block 612. At block 614, the dataset (which was transformed based on a first policy upon storage at the cloud storage location) is transformed (again) based on a second policy using a characteristic of the data retrieval request (e.g., a source characteristic of the cloud storage service or a destination characteristic). In this manner, the dataset may be transformed at least twice, such as upon storage in the cloud system and upon retrieval from the cloud system. In some examples, the dataset may be transformed at any or all transfers of the dataset between devices, such as a transfer between ROM and RAM. At block 616, a version of the dataset (as transformed at block 614) is provided to a destination entity of the data retrieval request and the audit record is updated. For example, the digital ledger is updated with an entry regarding the destination entity receiving a version of the dataset and the metadata appended to the dataset upon storage at the destination entity is updated with transformation information and/or security limitations such as a time limit to access the dataset.

[0083] Referring to Figure 7, example method 700 of managing a data operation may generally comprise monitoring computer operations for function calls to a dataset managed by policy, performing transformations on the dataset when the characteristics of the request meet policy conditions based on the sensitivity level of the dataset, and routing the transformed dataset to the monitored function call. The operations of the method 700 are performable by execution of a data system as described herein, such as via execution of a policy management engine 104 and a delivery engine 106. As some operations of the method 700 may be similar to the operations of the method 500 and/or method 600, their respective descriptions may not be repeated in their entirety, for brevity.

[0084] At block 702, an interface is used to hook into an operating system. For example, an API or other method may be used to track system calls that utilize data from a data catalog managed by a policy management engine. At block 704, function calls are monitored for data transfer of a dataset controlled by policy. Such function calls are routed to the policy management engine at block 706.

[0085] The policy management engine may identify characteristics of the data transfer operation at block 708 and determine a sensitivity level of a datum of the dataset to be transferred at block 710. If the sensitivity level of the datum does not meet a condition of a policy managed by the policy management engine, then the dataset may be routed at block 724 to the function call for further processing. If the sensitivity level of the datum does meet a condition of a policy managed by the policy management engine, then the dataset is transformed before being routed as a return value for the monitored call at block 724, starting at block 714.

[0086] At block 714, a transformation operation is selected in accordance with the sensitivity level and user limitations set on the dataset. At block 716, the dataset is transformed in accordance with a data governance policy using the transformation selected at block 714. [0087] At block 718, a transformation operation is selected in accordance with a destination characteristic. At block 720, the dataset is transformed in accordance with an infrastructure policy using the transformation selected at block 718.

[0088] At block 722, a time restriction or an event restriction may be added to the dataset to limit the amount of time or how the data is allowed to be further transferred. With the dataset transformed based on the data governance policy and the infrastructure policy (and with further security or privacy restrictions such as based on time or events), the dataset is ready to be routed as a return value for the function call or otherwise passed to the function call at block 724. In this manner, the methods described herein allow for datasets to be managed by a policy that generate transforms of the datasets that is individualized for each data operation for the dataset based on the characteristics of the data operation.

[0089] Although the flow diagrams of Figures 4-7 illustrate specific orders of execution, the execution order may differ from that which is illustrated. For example, the execution order of the blocks may be scrambled relative to the order shown. Also, the blocks shown in succession may be executed concurrently or with partial concurrence. All such variations are within the scope of the present description.

[0090] All the features disclosed in this specification (including any accompanying claims, abstract and drawings), and/or all the elements of any method or process so disclosed, may be combined in any combination, except combinations where at least some of such features and/or elements are mutually exclusive.

[0091] The terms “include,” “have,” and variations thereof, as used herein, mean the same as the term “comprise” or appropriate variation thereof. Furthermore, the term “based on,” as used herein, means “based at least in part on.” Thus, a feature described as based on some stimulus may be based only on the stimulus or a combination of stimuli including the stimulus. The article “a” as used herein does not limit the element to a single element and may represent multiples of that element. Furthermore, use of the words “first,” “second,” or related terms in the claims are not used to limit the claim elements to an order or location, but are merely used to distinguish separate claim elements.

[0092] The present description has been shown and described with reference to the foregoing examples. It is understood that other forms, details, and examples may be made without departing from the spirit and scope of the following claims.

Claims

CLAIMS What is claimed is:

1. A data system comprising: a memory resource to store a dataset; a policy management engine to: identify a data transfer operation for the dataset; determine a first plurality of characteristics of a destination entity including a user characteristic and an application characteristic; and transform the dataset to be transferred based on a comparison of the first plurality of characteristics to a policy; and a delivery engine to: provide the transformed dataset to the destination entity.

2. The data system of claim 1 , wherein the dataset is transformed by: altering a value of the dataset; removing a datum from the dataset; changing a format of the dataset to a different data type; reordering the datum within the dataset; resequencing the dataset; or encrypting the dataset.

3. The data system of claim 2, wherein the first plurality of characteristics of the destination entity includes a device characteristic, and the policy management engine is further to: determine a second plurality of characteristics of a source entity; and wherein the transformation of the dataset is based on a dissimilarity between the first plurality of characteristics and the second plurality of characteristics.

4. The data system of claim 1 , wherein the policy management agent is further to include: a function call monitor to hook into a cross-system application interface and route an operation of a cross-system application to the policy management engine for accessing the dataset.

5. The data system of claim 1 , wherein: the policy management engine is further to decode an application-specific data type into a raw data format for storage on the memory resource; and encode a raw data format into an application-specific data type.

6. A non-transitory computer-readable storage medium (NTCRSM) comprising a set of instructions executable by a processor resource to: determine a plurality of characteristics of a destination compute environment; compare a sensitivity level of a dataset to an authorization level of the destination compute environment corresponding to a data governance policy based on the plurality of characteristics of the destination compute environment; transfer a transformed version of the dataset to the destination compute environment in accordance with an infrastructure policy, the transformed version of the dataset being modified based on the authorization level of the destination compute environment and the data governance policy; generate an audit record corresponding to the transfer of the transformed version of the data set; and generate metadata corresponding to the transformed version of the dataset.

7. The NTCRSM of claim 6, wherein the set of instructions is executable by the processor resource to: tag the dataset with a sensitivity characteristic corresponding to ownership, authenticity, location, or expiry; and transform the dataset based on the sensitivity characteristic according to the data governance policy.

8. The NTCRSM of claim 6, wherein the set of instructions is executable by the processor resource to: create the transformed version of the dataset by converting the dataset to a data type corresponding to an application of the destination compute environment to receive the dataset.

9. The NTCRSM of claim 6, wherein the set of instructions is executable by the processor resource to: monitor the destination compute environment for a data request that includes the dataset; and route the data request to a cloud service to obtain the transformed version of the dataset.

10. The NTCRSM of claim 6, wherein the set of instructions is executable by the processor resource to: add a digital mark to the transformed version of the dataset; send a notification to an owner of the dataset in response to the transfer of the transformed version of the dataset; or update a digital ledger with an entry corresponding to a transfer of the transformed version of the dataset.

11 .The NTCRSM of claim 6, wherein the set of instructions is executable by the processor resource to: identify the destination compute environment is part of a managed computing system in which the data governance policy and the infrastructure policy are applied; and add a time restriction or an event restriction to the transformed version of the dataset in the destination compute environment.

12. A method of managing a data operation, the method comprising: receiving a data request for a dataset stored on a memory resource, the data request including a source characteristic and a destination characteristic; performing a data transformation on the dataset in accordance with a first policy based on the source characteristic and the destination characteristic; and providing the transformed dataset to a destination entity based on a second policy with an access restriction corresponding to the destination characteristic.

13. The method of claim 12, wherein the data transformation is different for the dataset when sending to different destination entities based on ownership or access to the destination entities.

14. The method of claim 12, comprising: tagging the dataset upon storing the dataset at a cloud storage location; performing a regular expression search to transform the tagged dataset based on a sensitivity level of the destination entity; generating an audit record corresponding to a transfer of the dataset; and appending metadata to the transformed dataset corresponding to the transfer of the transformed dataset, the first policy, the second policy, and the access restriction.

15. The method of claim 12, comprising: performing a first data transformation on the dataset in response to a store request to store the dataset in a first transformation state at a cloud storage location; and performing a second data transformation on the dataset for each retrieval of the dataset as transformed upon being stored in the cloud storage location.