WO2024192525A1

WO2024192525A1 - System and method of integrating generative ai with a data network

Info

Publication number: WO2024192525A1
Application number: PCT/CA2024/050341
Authority: WO
Inventors: Karanjot SINGH; Daniel Demers; Connor FOWLIE
Original assignee: Cinchy Inc.
Priority date: 2023-03-21
Filing date: 2024-03-21
Publication date: 2024-09-26

Abstract

The description pertains to the integration of an Artificial Intelligence (AI) program with a data network. This allows for the blending of data in a highly controlled, collaborative system with the power of interacting with this data through natural language using generative AI. The description further pertains to customizing the input data available to the AI such that the data the AI can access is identical to the access of the user. This is implemented by limiting the data input into the AI according to the security limitations of the user, allowing for a context specific AI experience.

Description

SYSTEM AND METHOD OF INTEGRATING GENERATIVE Al WITH A DATA

NETWORK

FIELD OF THE INVENTION

[0001] The invention pertains to integrating Al with a data network.

DESCRIPTION OF THE PRIOR ART

[0002] Artificial Intelligence (Al) chatbots have traditionally been trained on publicly available knowledge. For example, known chatbots such as ChatGPT™, have been trained using the entirety of wikipedia™ and other publicly available data. The applications of these chatbots are limited by the data included in the chatbot’s knowledge base. For example, if you were to ask a chatbot about the minor or internal details of a particular business, person or organization, the chatbot would likely be unable to provide an answer as the requested data was likely not included in the chatbot’s knowledge base. This limits the application of chatbots in settings, such as businesses, wherein a user would ask questions based on lesser-known data or data known only to members of a particular business or company.

[0003] Furthermore, it would be an enormous risk to integrate Al programs, as they currently exist, within business networks or in enterprise environments. Al is typically given an entire body of data as an input and then relies on rules or boundaries integrated within its script to decide what data it can provided to a user, and what data should be kept from a user. For example, an Al chatbot script implemented in a business application may be programed to limit a user’s access to other employees’ salary information. However, a cleaver user may be able to work around the boundaries set in the script. For example, a user could ask a question relating to if a certain employee could afford a particular car or house. If the Al has the employee’s salary information, it could return an answer that would give the user insight into the particular employee’s salary. However, if the Al was not provided with any salary information as input, there would be no way for it to return answers that could give a user hints regarding prohibited data. There remains a need for a system and method for limiting the data available to an Al script based on personalized user settings, for example, a user’s data security clearance. SUMMARY

One embodiment pertains to a method of querying a computer database using generative Artificial Intelligence (Al) comprising the steps of: a) receiving at an Al interface a question related to the data in the database; b) converting the question into a query format compatible with the database; c) running the query on the database; d) generating the query output; e) providing the output to Al; and f) the Al returns a response.

In a further embodiment, the Al bases it’s answer with only the data contained within the database.

In yet a further embodiment, after the Al receives a question, a specific knowledge base is used as context for answering the question.

In yet a further embodiment, the specific knowledge base is based on a user’s permissions to access data within the database.

In yet a further embodiment, the generative Al is ChatGPT™.

In yet a further embodiment, the database comprises different user access rights to different information within the database and specific knowledge base is only the data in the database that is available to a user making the query; the method further includes the steps of: accessing only the data available to the user making the query; the Al using only the data available to the current user to answer the question.

In yet a further embodiment, the database is collaborative data network. In yet a further embodiment, the collaborative data network is a computer implemented system comprising a processing unit and a memory unit, said processing unit for creating, managing and storing data in said memory unit, said creation, management and storage of data being organized in a network of data nodes, the system comprising: a first node having: a first dataset containing data; an access controls layer limiting user access to the first dataset; a metadata layer defining characteristics of the first dataset and connecting to at least one subsequent node having: a subsequent dataset containing further data; a subsequent access controls layer limiting user access to the second dataset; a metadata layer defining characteristics of the subsequent dataset; wherein one or more links are created to associate the first node with the subsequent node to create the network of data nodes; wherein the network of data nodes comprises a query layer to interact with the first dataset and at least one subsequent dataset; and wherein a plurality of applications access the data and the further data in the network through the query layer, thereby eliminating the need for data silos and data access control by the application.

DETAILED DESCRIPTION OF THE INVENTION

[0004] The description pertains to the integration of an Artificial Intelligence (Al) program with a data network. This allows for the blending of data in a highly controlled, collaborative system with the power of interacting with this data through natural language using generative Al. The description further pertains to customizing the input data available to the Al such that the data the Al can access is identical to the access of the user. This is implemented by limiting the data input into the Al according to the security limitations of the user, allowing for a context specific Al experience.

[0005] The description refers to example Al programs and example data Networks. In particular, ChatGPT™ is used throughout the description as the example Al program. However, it can be appreciated that the invention disclosed herewith is using ChatGPT™ as an example only and that other Al programs would be known to a person skilled in the art and could be used without departing from the inventive concept. Furthermore, the description refers to Cinchy™, (disclosed in US Patent Application Number 17/307,571, incorporated herein by reference), as an example data network. While the preferred embodiment integrates a collaborative data network such as Cinchy™, it can be appreciated that other data networks, data fabrics or databases could be used to implement the inventive concept.

[0006] Figure 1 shows a flowchart of the integration of an Al script with a data network. For background, the data network referred to herein, has multiple users, each of which has personalized access to either all or a custom subset of the data in the data network based on their user permissions. For the purposes of this disclosure, we refer to the data to which a particular user can access as a personalized data set. A user 2 is logged into their data network and has a set of permissions associated with their account which determines their access to data in the network. When the user 2 would like to ask a question pertaining to data in the data network. In one preferred embodiment the user would type it into the data network platform (step 4), however other methods of passing the question to the Al would be known to a person skilled in the art. At step 4 of figure 1, the user 2 asks a question. The question is then converted into a query at step 6. This conversion to a query can be done within the data network, by the Al or by any other method known to a person skilled in the art. At step 8, the query is run under the user permissions of user 2, meaning the only possible data that can be returned as output 10 at step 11 is from the user’s 2 personalized data set. The output 10 is converted at step 12, if necessary, into a format that is readable by the Al. In the example provided wherein the data network is Cinchy™ and the Al is ChatGPT™, the output 10 data is converted to JSON. At step 14, the API for the Al is called by asking a question. While the actual question can vary, one example of a question could be “Please answer the following question using only the information provided”. Alternatively, the built-in knowledge base of the Al could be used and the input data from the data network could be used to augment the knowledge base. In this example, one example prompt could be phrased as “Augment your response using the data provided”. Various other suitable prompts would be known to a person skilled in the art. In this embodiment, specific context is being passed into the prompt to allow the Al to have access to data network information. The output 10 is also provided to the Al. The Al then returns a response (step 16) based only on the information provided thereto. In one preferred embodiment, the Al would be context aware such that the Al would only have no alternative knowledge base other than that data accessible by a particular user. This would eliminate the need for the system to prompt the Al to collect relevant data.

[0007] In a preferred embodiment, the query of step 8 returns only data that is potentially relevant to the question asked by user 2. In an alternative embodiment, the entirety of the user’s 2 personalized data set is provided to the Al avoiding the need to convert the user’s 2 question to a query, as there is no need to retrieve a subset of the user’s 2 accessible data if the Al already access to the entirety of the user’s 2 accessible data when formulating its response. Figure 2 depicts an alternative flow wherein the knowledge base of the Al is user specific. This knowledge base is kept updated to reflect any changes to user permissions 22, preferably in near- real time. In this embodiment, the user 2 asks a question 4 and the API for the Al is called. The knowledge base 20 is retrieved and used as a context for the Al (Step 24) to answer the user’s question. The Al then returns a context specific response at step 16.

[0008] In some embodiments, the user’s 2 permissions may change over time, thus the data provided to the Al is based on the user’s permissions and user’s personalized data set at the time of query.

[0009] It can be appreciated that the Al is incapable of providing data to which the user does not have access as it has not been provided with any data that lies outside a user’s 2 personalized data set. This eliminates the possibility that a user could trick the Al into returning an answer that discloses any data to which the user does not have access. In the embodiments disclosed herewith, generative Al tools can be implemented in enterprise environments as the data to which the Al is exposed is controlled.

[0010] The primary advantage of doing this, is to allow data to be access controlled. Data access control is important in the context of generative Al because it ensures that the Al is only able to access the data that it needs to generate the desired results. Without access control, the Al could potentially access data that it shouldn't, which could lead to inaccurate and incomplete results or even malicious behavior. Access control also helps to protect the privacy of the data, as it ensures that only authorized users can access the data. By controlling who has access to the data when interfacing with generative Al, organizations can ensure that the data is not misused, abused, or leaked.

[0011] The embodiment described above combines universal data access controls and unlimited sources of real-time intelligence within a single conversation. These are capabilities that have been identified as foundational to the introduction of Al technologies to diverse and highly regulated sectors such as, but not limited to, finance and healthcare.

Example Application

[0012] Figures 3 to 9 relate to an example the integration of generative Al with a data network. Specifically, figures 3 to 9 show the example of ChatGPT™ incorporated in an employee 360-degree experience hosted on the Cinchy Data Collaboration Platform. In particular, this example is used to demonstrate the power of connected data when plugged into ChatGPT™. While this is one use and one way in which an Al program can be integrated within a data network, it can be appreciated that this depicts only a simple example of how the integration can be preformed. A person skilled in the art can appreciate that the concept can applied in a variety of different ways without departing from the scope of the invention.

[0013] Figure 3 shows an example chart from Cinchy™, which shows a list of employees from the perspective of the user logged into the Cinchy Data Collaboration Platform. The chart will only show information which the user has permission to see. The user can select a particular employee to open a 360 experience. In this example, the 360 experience for employee “Dan DeMers” was selected by clicking the Employee 360 link 30 associated with employee “Dan DeMers”. This link leads to the interface shown in figure 4. As can be seen in figure 3, various information regarding employee “Dan DeMers” is gathered. The information pulled when the Employee 360 link is activated is shown on the left as reference 32. In this example, the data includes, the employee biography, career goals, personality profiles, address information, engagement, open tasks, out of office, performance, agreements, current compensation, compensation reviews, emergency information and other data pertaining to the employee “Dan DeMers”. This data is only based permissions of the user. Thus, different data may be available to different users based on their own data permissions.

[0014] Figure 5 shows a further section of the Employee 360 experience titled “Ask Me Anything”. This Ask Me Anything section 34 is designed for a user to ask a question about Dan DeMers and based on the data the user has permission to view, get an answer. The Ask Me Anything section 34 of the Employee 360 experience is connected to chat GPT but with the context of what data the user has access to that is connected via Cinchy™. The question in the Ask Me Anything section 34 can be changed by clicking the edit button 36. A pop up 38 is displayed as shown in figure 6 and the question can be changed. It should be noted that in this example, the Ask Me Anything section 34 is set up in a manner that a user is asking a digital twin of the employee the question. In the example described herein, the question is changed to “Tell me about yourself’, which is asking the Al to tell the user about the employee Dan DeMers.

[0015] The answer 40 provided by the Al is shown in figure 7. In this preferred embodiment, the answer is proved from the point of view of the digital twin of Dan DeMers. However, it can be appreciated that this is simply a preference, and the answer could be a different perspective. The answer reads “Hi. My name is Dan, CEO of Cinchy, and my current engagement score of 77. My personality type is medium steadiness, and I have medium claculativeness, low influence and medium dominance. I’m very extroverted, conscientious and agreeable. I’m a good teammate, reliable, and calm and patient. My long term career goals are to retire wealthy and fulfilled knowing I helped create a successful business that improved the lives of all employees and made a net positive contribution to society. My short-term career goals include getting Cinchy to series C. My skillset includes SDLC, Program Management and Capital Markets. You can find me on Linkedln. . . ” [0016] It is notable that when the same question is entered into ChatGPT™ alone (not in combination with a data network), ChatGPT™ could not return any insightful or useful data pertaining to the user’s question. ChatGPT™ simply doesn’t know the answers as it does not have access to the network data. Combining an Al program with a data network allows for more relevant and useful answers to user questions within a specific context.

[0017] To generate this answer, a question was asked by a user (tell me about Dan DeMers). This was converted into a query which returned the output of all the data available to this particular user in this employee 360 experience. The data returned in the output is defined and limited by the user’s data permissions on the Cinchy™ platform. The output is converted into a format acceptable to ChatGPT™ (in this case JSON). The ChatGPT™ API is called with a question such as “Please answer the following question using only the information provided’ and the question and output is fed into ChatGPT™. ChatGPT™ processes the data and returns the answer based on only the data provided thereto.

[0018] One example of how to integrate ChatGPT™ with Cinchy™ to provide the functionality described in the example above is shown in Appendix A.

[0019] This exemplifies how Al can be used in combination with a data network to ask anything that has any context. In this example, the context is not just about the information of Dan as an employee (which would be limited to the data in the Employee data node 42 in figure 8), but any piece of data that Dan is connected to and to which the user asking the question has permission to access. In the preferred embodiment of the invention where Cinchy™ is the data network, the connections between data nodes (shown graphically in figure 9) can be leveraged.

[0020] The connection of Al and a data network allows the questions to be asked in user specific and data set specific contexts. It allows questions to be answers in the context of who’s asking and what are they asking. The blending of Al with a data network in a highly controlled system, allows a user to ask questions relating to data in their data network and receive useful and relevant answers using only data within their permissions. This allows for the blending of data in a highly controlled, collaborative system with the power of interacting with this data through natural language using generative Al. [0021] Cinchy™, when it is combined with Generative Al, ensures that all the data permissions and controls, for a given user, applied within Cinchy are respected by the Al. Therefore, the user who is interacting with the Al is still bound by the permissions that they have been granted to the underlying data.

[0022] The above example demonstrates that it is possible maintain full control of proprietary and sensitive data while harnessing a diverse range of human and machine intelligence to avoid reliance on single sources.

[0023] In a preferred embodiment, there are at least 3 capabilities that improve the deployment of conversational Al experiences in enterprise environments:

1. Data Liberation: Enterprise Architects would benefit from simplifying their IT ecosystems to support greater control and agility. The Cinchy Data Collaboration Platform ‘de-silos’ data trapped in applications by connecting it to a network that makes app-specific databases or copy -based data integration obsolete.

2. Data Productization: CDOs and compliance leaders are seeking ways to empower teams that hold domain-specific knowledge (e.g., finance and HR) with more control. The Cinchy Data Collaboration Platform supports the creation of data products that help them set governance boundaries and data-layer access controls that are universally enforced.

3. Data Collaboration: CEOs and CIOs expect all sources of operational intelligence to be activated to deliver better business outcomes. The Cinchy Data Collaboration Platform enables IT, data science, business teams, and unlimited systems (including AI/ML and SaaS tools) to co-develop robust data models that power unlimited digital solutions, including apps, dashboards, analytics, and automations.

[0024] Although the invention has been described with reference to certain specific embodiments, various modifications thereof will be apparent to those skilled in the art without departing from the spirit and scope of the invention as outlined in the claims appended hereto. The entire disclosures of all references recited above are incorporated herein by reference.

Claims

Claims:

1. A method of querying a computer database using generative Artificial Intelligence (Al) comprising the steps of: receiving at an Al interface a question related to the data in the database; converting the question into a query format compatible with the database; running the query on the database; generating the query output; output is provided to Al; and the Al returns a response.

2. The method of claim 1 wherein the Al bases it’s answer with only the data contained within the database.

3. The method of claim 1 wherein after the Al receives a question, a specific knowledge base is used as context for answering the question.

4. The method of claim 3 wherein the specific knowledge base is based on a user’s permissions to access data within the database.

5. The method of claim 4 wherein the generative Al is ChatGPT™.

6. The method of claim 3 wherein the database comprises different user access rights to different information within the database and specific knowledge base is only the data in the database that is available to a user making the query; the method further includes the steps of: accessing only the data available to the user making the query; the Al using only the data available to the current user to answer the question.

7. The method of claim 6 wherein the database is collaborative data network. The method of claim 7 wherein the collaborative data network is a computer implemented system comprising a processing unit and a memory unit, said processing unit for creating, managing and storing data in said memory unit, said creation, management and storage of data being organized in a network of data nodes, the system comprising: a first node having: a first dataset containing data; an access controls layer limiting user access to the first dataset; a metadata layer defining characteristics of the first dataset and connecting to at least one subsequent node having: a subsequent dataset containing further data; a subsequent access controls layer limiting user access to the second dataset; a metadata layer defining characteristics of the subsequent dataset; wherein one or more links are created to associate the first node with the subsequent node to create the network of data nodes; wherein the network of data nodes comprises a query layer to interact with the first dataset and at least one subsequent dataset; and wherein a plurality of applications access the data and the further data in the network through the query layer, thereby eliminating the need for data silos and data access control by the application.