US20250139087A1 - Semantically matching natural language queries with parameterized questions - Google Patents
Semantically matching natural language queries with parameterized questions Download PDFInfo
- Publication number
- US20250139087A1 US20250139087A1 US18/822,050 US202418822050A US2025139087A1 US 20250139087 A1 US20250139087 A1 US 20250139087A1 US 202418822050 A US202418822050 A US 202418822050A US 2025139087 A1 US2025139087 A1 US 2025139087A1
- Authority
- US
- United States
- Prior art keywords
- natural language
- insights
- generated
- predefined
- insight
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/242—Query formulation
- G06F16/243—Natural language query formulation
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/248—Presentation of query results
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
- G06N3/0455—Auto-encoder networks; Encoder-decoder networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
Definitions
- the disclosed embodiments relate generally to data analytics and, more specifically, to systems and methods for semantically matching natural language queries with parameterized questions.
- Data analysis is the process of inspecting, cleaning, transforming, and modeling data with the goal of discovering useful information, drawing conclusions, and supporting decision-making. It involves the use of various techniques, methods, and tools to examine and interpret data, uncover patterns, and extract insights.
- the primary objective of data analysis is to gain a better understanding of the underlying trends, relationships, and characteristics within the data.
- Data analysis is widely used across various industries and domains, including business, finance, healthcare, science, and technology. It plays a crucial role in extracting meaningful information from large and complex datasets, helping organizations make informed decisions and gain a competitive advantage.
- business insights accessible to business users and other users (e.g., in sales, marketing, HR, finance, or others) without the need for data analysts or scientists to manually create KPIs, metrics, data visualizations, or other business insights.
- the consumers of business insights have the need to make data-driven decisions but typically rely on others to manually create and track metrics for a selected data source. For example, a data analyst manually selects or creates various metadata that is used to provide business context for a metric. This process can be time consuming and inefficient.
- a method is executed at a computer system having one or more processors and memory storing one or more programs configured for execution by the one or more processors.
- the method includes, receiving a natural language query directed to a data source.
- the method further includes, in accordance with a determination that the natural language query semantically matches a respective predefined question of the one or more predefined questions, selecting the respective predefined question and associated respective generated insights of the one or more generated insights, and in accordance with a determination that the natural language query semantically matches a respective generated insight of the one or more generated insights, selecting the respective generated insight and associated respective predefined question of the one or more predefined questions.
- the method further includes generating instructions for displaying on a display communicatively connected to the computing device the selected representative predefined question and associated respective generated insights and/or the selected representative generated insight and associated predefined question.
- the computer system includes one or more input devices, one or more processors, and memory storing one or more programs.
- the one or more programs are configured to be executed by the one or more processors and the one or more programs include instructions for performing or causing performance of the operations of any of the methods described herein.
- a non-transitory computer readable storage medium has stored therein instructions, which, when executed by one or more processors cause a computer system to perform or cause performance of the operations of any of the methods described herein.
- the disclosed methods, systems, and databases provide semantically matching natural language queries with parameterized questions.
- FIG. 1 illustrates a schematic system diagram for automatically generating metrics using a machine learning model, in accordance with some embodiments.
- FIGS. 2 - 7 illustrate user interfaces for viewing and creating metrics, according to some embodiments.
- FIG. 8 illustrates a schematic system for generating metric objects using a machine learning model, in accordance with some embodiments.
- FIG. 9 illustrates a process for automatically generating metrics using a ML model, in accordance with some embodiments.
- FIG. 10 illustrates suggested metrics generated using a machine learning model, in accordance with some embodiments.
- FIG. 11 illustrates a user interface for viewing and creating metrics using machine learning models, in accordance with some embodiments.
- FIG. 12 is a flowchart diagram of a process for automatically generating metric objects, in accordance with some embodiments.
- FIG. 13 illustrates a schematic system for automatically generating insight summaries using machine learning model, in accordance with some embodiments.
- FIG. 14 illustrates a summary of a bundle of insights for multiple metrics, in accordance with some embodiments.
- FIG. 15 is a flowchart diagram of a process for generating and validating data insights using machine learning models, in accordance with some embodiments.
- FIGS. 16 A- 16 F illustrate a user interface for querying and viewing insights via natural language, in accordance with some embodiments.
- FIG. 17 illustrates semantic comparisons between a natural language query and predefined questions and generated insights, in accordance with some embodiments.
- FIG. 18 illustrates a process for semantically matching natural language queries with predefined questions, in accordance with some embodiments.
- FIG. 19 is a flowchart diagram of a process for semantically matching natural language queries with predefined questions and/or generated insights, in accordance with some embodiments.
- Metric definitions for metrics have typically been created manually by data analysts or other users with specialized knowledge in data analysis, data science, or other expert skills.
- the described methods and systems provide a technique for automatically generating and recommending metric objects (e.g., respective metric definitions or metric data structures) using a machine learning model (e.g., a large language model).
- a data analytics tool such as Tableau Pulse provides insights about data based on predefined metrics. After a metric is created, members of an organization can be added as followers of the metric. In some embodiments, such members can receive a regular email or a digest about metrics to which they are subscribed. Such emails or digests surface trends, outliers, and other changes, keeping followers up to date on relevant data. To learn more about the data, users can investigate a metric (e.g., on a system such as Tableau Cloud) and see how different factors contribute to changes in the data. Such insights into relevant data allow uses to make data-driven decisions without requiring complex analysis and configuration.
- a metric e.g., on a system such as Tableau Cloud
- Metrics are analytical objects that can be interacted with and viewed in a user interface.
- Metric definitions have an underlying data structure that represents a respective metric.
- Table 1 illustrates an example of a metric definition.
- metrics are created when additional data fields (e.g., business context data fields) associated with a metric are adjusted or configured. This occurs, for example, when respective time context options (e.g., time granularity) or filters are specified.
- time context options e.g., time granularity
- Tables 2 and 3 provide an example of the options configured for related metrics. These options are applied on top of the core value that is specified by a respective metric definition.
- a metric definition captures the core value that is being tracked. At a basic level, this value is an aggregate measure tracked based on a time dimension. The definition also specifies options such as the dimensions that viewers are able to filter by, the way the value is formatted, and the types of insights displayed.
- the system e.g., Tableau Pulse
- the system automatically creates an initial related metric.
- the initial metric created for a definition has no filters applied.
- users of an organization adjust the metric filters or time context in a new way, the system creates an additional related metric. For example, a member of a sales organization and/or other users of that organization may need to track metrics across different territories and product lines.
- a metric definition can be created that includes the core value of the sum of daily sales with adjustable metric filters for region and product line. Then, a user can create related metrics for each region and product line. Additionally, members of the organization can be added as followers to the related metrics to view where and what is being sold.
- metric definitions allow for managing data for related metrics from a single parent definition. If a field in a data source changes, the metric definition can be updated to reflect this change, and all metrics related to that definition will automatically reflect the change (e.g., without the need to update each related metric individually).
- a relationship 100 between a metric definition for a metric object labeled “Superstore Sales” described in Table 1 and the two metrics 102 and 104 that are based on the metric definition “Superstore Sales” described in Tables 2 and 3, respectively, are illustrated in FIG. 1 .
- Portion 120 illustrates values of the data fields (i) measure, (ii) aggregation, and (iii) time dimension for the metric object labeled “Superstore Sales.”
- Portion 130 illustrates metrics 102 and 104 .
- Metrics 102 and 104 include additional business context generated based on additional contextual fields that are specified.
- metric 102 includes the sum of all sales from Quarter to date for a category “Technology,” where the value 106 of the aggregated measure is displayed in the metric 102 (e.g., $243.1k).
- Metric 104 includes the sum of all sales Year to Date for category “Office Supplies,” where the value 108 of the aggregated measure is displayed in the metric 104 (e.g $153.3k).
- Metrics 102 and 104 are interactive objects that further include textual descriptions of how the metric is performing over time (e.g., compared to previous year, previous quarter, previous month, week, or day) or other types of performance indicators.
- metrics 102 and 104 may include a miniature data visualization (e.g., a chart) that illustrates the aggregated measure across the selected time dimension, and optionally at the specified time granularity for selected dimensions.
- metric 102 includes a line chart 1010 (e.g., a sparkline) that illustrates how the sum of sales for category “Technology” changed over a period of time from January to November
- metric 104 includes a line chart 1012 (e.g., a sparkline) that illustrates how the sum of sales for category “Office Supplies” changed over a period of time from January to November.
- metrics such as the metrics 102 and 104 , for which additional contextual data is specified (e.g., filters, time granularity, time comparisons, and other data fields) are referred to as scoped metrics.
- FIGS. 2 - 7 illustrate user interfaces for viewing and creating metrics, according to some embodiments.
- a manual process for creating metric definitions and/or respective metrics based on the metric definitions is illustrated in FIGS. 5 - 7 .
- FIG. 2 illustrates a user interface 200 for viewing and creating metrics, according to some embodiments.
- User interface 200 includes control 202 “All Metrics” that when selected, causes a computer system to display a list 220 of metrics (e.g., metrics to which a respective user has access to) that are stored and available for viewing.
- metrics e.g., metrics to which a respective user has access to
- attributes also referred to as data fields, for example, in the context of metric definitions
- the attributes include a name 204 that corresponds to a label or name of each metric.
- the metrics with names “Delivery Time” 204 a; “Resource Utilization” 204 b, and “ATV” 204 c are visible.
- the attributes further include a data source 206 from which the respective metrics are created.
- the metric labeled “Delivery Time” 204 a is created for the data source “Shipping Record-Salesforce” 206 a; the metric labeled “Resource Utilization” 204 b is generated for the data source “HR Analytics” 206 b; and the metric labeled “ATV” 204 c is generated for the data source “Marketing Cloud Sales” 206 c .
- the attributes further include a time granularity 208 that corresponds to contextual time frames over which a respective measure associated with the respective metric is aggregated.
- a time granularity 208 that corresponds to contextual time frames over which a respective measure associated with the respective metric is aggregated.
- the metric labeled “Delivery Time” 204 a is aggregated for the period “Month to Date” 208 a; the metric labeled “Resource Utilization” 204 b is aggregated for the period “Day to Date” 208 b; and the metric labeled “ATV” 204 c is aggregated for the period “Week to date” 208 c .
- the attributes further include filter options 210 that specify filters that are used to select the records that are used in aggregating the respective measure associated with the respective metric.
- the metric labeled “Delivery Time” 204 a is aggregated across “Product” and “Region” dimensions 210 a; the metric labeled “Resource Utilization” 204 b is aggregated across “Employee” and “Department” dimensions 210 b; and the metric labeled “ATV” 204 c is aggregated across “Category” and “Customer” dimensions 210 c.
- the attributes further include an author 212 of respective metrics (e.g., the user that generated the respective metric manually or automatically).
- the attributes further include a followers list 214 for respective metrics (e.g., users in an organizations that have subscribed to or follow performance of the metric).
- the attributes further include an actions list 216 , which specifies the actions that can be performed with the respective metric.
- a user input 218 is detected selecting the “Create New” control 230 for creating new metrics.
- a user interface 300 for selecting a data source is displayed, as illustrated in FIG. 3 .
- Data sources 312 that are available for selection are listed in user interface 300 .
- a name 302 and the number of potential metrics 304 are displayed in user interface 300 .
- the system determines that there potentially 5 metrics can be generated.
- a metrics service e.g., the metrics service 812 , described in relation to FIG. 8 ) determines the number of potential metrics 304 for each data sources based on a number of data fields in the respective data source that corresponds to a measure.
- FIG. 3 further illustrates input 308 that selects the data source 306 named “ESG at NTO.”
- the user interface 400 for creating a metric definition 402 is displayed in FIG. 4 .
- the data source 306 named “ESG at NTO” is already pre-populated in the metric definition 402 in response to the user input 308 .
- the metric definition 306 includes a number of data fields 404 - 414 . Some data fields, such as name 404 , measure 408 , and time dimension 410 correspond to core data fields, and other data fields such as the description 406 , number format 412 , and metric filter 414 are optional or additional contextual data fields.
- a name 404 is a label for the metric that is generated based on the created metric definition, such as “Delivery Time,” “Resource Utilization,” “Average of Partner Happiness,” “Sum of Emissions,” “Sum of Energy Use,” “Partner PayGap,” “Appliance Sales,” “Branch Revenue,” “Campaign ROI,” and/or other textual description of a respective metric).
- a measure 408 corresponds to a data field in the data source, such as a column in a relational database table (e.g., Revenue, Expenses, or other measures depending on the data source). For example, measures that can be selected are fetched from the selected data source 306 .
- fetched measures and associated aggregation types can be selected together.
- aggregation types include, but are not limited to, SUM, AVG, MAX, MIN, Median, Percentile, Standard Deviation, and COUNT.
- a time dimension 410 corresponds to a data field in the data source that includes date and/or time (e.g., order date) by which the measure is aggregated.
- FIGS. 5 - 7 illustrate a user manually selecting respective values for the core data fields 404 , 408 , and 410 of the metric definition 402 .
- a user can select a “Suggest Metrics” control 416 for automatically generating suggested metrics, and a metric service prompts a machine learning model to generate suggested or recommended metrics for the selected data source 306 .
- This process for automatically suggesting metrics using a machine leaning model is described in further detail in relation to FIGS. 8 - 9 .
- FIG. 5 illustrates a menu 500 for selecting or searching for measures in the selected data source 306 .
- the menu 500 is displayed in response to a user input selecting a data field 408 .
- a number of measures are displayed, including a first measure 416 “Customer Satisfaction,” a second measure 420 “Partner Happiness,” a third measure 422 “Product Quality,” a fourth measure 424 “Reliability Perception,” a fifth measure 426 “Energy Use,” and a sixth measure 428 “Pay.”
- the user has selected the first measure 416 “Customer Satisfaction,” and the aggregate type 418 is pre-populated by default.
- the pre-populated aggregation type 418 is “Average.”
- FIG. 5 illustrates user input 430 selecting the aggregate type 418 “Average.”
- FIG. 6 illustrates a transition from FIG. 5 in response to the user input 430 selecting the aggregate type 418 “Average,” and further input selecting the data field 410 that corresponds to a time dimension.
- the menu 600 for selecting or searching for dimensions in the selected data source 306 is displayed.
- a number of dimensions are displayed, including a first dimension 602 “Survey Data,” a second dimension 604 “Transaction,” and a third dimension “Date” 606 .
- the user has selected the first dimension 602 “Survey Data,” and the time granularity 608 is pre-populated by default.
- the pre-populated time granularity 608 is “Monthly.”
- FIG. 6 illustrates user input 610 selecting the time granularity 608 “Monthly.”
- FIG. 7 illustrates a metric “Customer Satisfaction Score” 700 generated based on the metric definition 402 created manually by the user, as illustrated in FIGS. 4 - 6 .
- the metric “Customer Satisfaction Score” 700 illustrates average customer satisfaction based on survey data in a line chart 702 . Further, a textual description 704 of the performance or trend of the average customer satisfaction is also included in the metric “Customer Satisfaction Score” 700 .
- FIG. 8 illustrates a system 800 for generating metric objects using a machine learning model
- FIG. 9 illustrates a process 900 for generating metric objects using a machine learning model.
- FIG. 8 illustrates a schematic system 800 for generating metric objects using a machine learning model.
- a user 802 via a user interface 804 , can initiate a process for creating one or more metrics.
- the user 802 may select a control in user interface 804 , and in response, the metric service 812 in the metrics layer 806 retrieves data from one or more data sources, such as data source 814 a and data source 814 b .
- the metrics layer 806 accesses or retrieves data from a variety of data sources including, but not limited to, Comma-Separated Values (CSV) files, Excel Spreadsheets, relational databases (e.g., MySQL, PostgreSQL, Microsoft SQL Server, or Oracle Databases), cloud-based data sources (e.g., Google BigQuery, Amazon Redshift), NoSQL Databases (e.g., MongoDB), Web Data Connectors, and other multidimensional, relational, and/or hierarchical data sources.
- CSV Comma-Separated Values
- relational databases e.g., MySQL, PostgreSQL, Microsoft SQL Server, or Oracle Databases
- cloud-based data sources e.g., Google BigQuery, Amazon Redshift
- NoSQL Databases e.g., MongoDB
- Web Data Connectors e.g., Web Data Connectors, and other multidimensional, relational, and/or hierarchical data sources.
- a user can select data retrieved
- a user 802 can manually create a metric definition (e.g., the metric definition 402 in FIG. 7 ) for a metric (also referred to as a metric object) (e.g., the “Customer Satisfaction Score” 700 ) by providing values for data fields (e.g., one or more of data fields 404 - 414 ) that are included in a metric definition.
- a metric also referred to as a metric object
- the manual process for creating a metric object is further described in relation to FIGS. 5 - 7 .
- the user 802 via the user interface 804 , can request that metrics are generated automatically.
- the process for automatically generating suggested metrics is further described in relation to FIG. 9 .
- the suggested metrics can be generated without the user 802 specifying any data field (or attribute or metadata) of the metric definition.
- the user 802 can specify some data fields of the metric definition, and the remaining data fields can be automatically generated (e.g., by the metrics service 812 ).
- the metrics service 812 retrieves respective data fields from the selected data source.
- the metric service 812 requests fetching of data fields (e.g., all data fields) from the selected data source, and determines a subset of data fields from the fetched data fields that correspond to measures.
- a metrics service 812 sends a prompt (e.g., a request), via application programming interface (API) 810 , to a machine learning (ML) model 816 to generate a respective metric definition for each of the subset of measures in the fetched data fields.
- API application programming interface
- ML machine learning
- the metrics service 812 (e.g., conceptually part of a metrics layer) is called or used by various analytical tools and applications (e.g., by Tableau Pulse). In some embodiments, metrics service 812 makes one prompt request per metric.
- the ML model 816 is a large language model, such as a generative pre-trained transformer. In some embodiments, the ML model 816 is pre-trained on a data source that already includes metadata and/or semantics that have been pre-configured or predetermined by a user (e.g., a data analyst or other user that has domain knowledge).
- a user e.g., a data analyst or other user that has domain knowledge.
- examples of such semantics and/or metadata data include, but are not limited to, columns that are labeled as measures; names or labels for the measures; usage of the measures across different workbooks, including labels or descriptions of those workbooks (e.g., the measure Sales has been used in a workbook described or labeled as Sales over time), pre-existing aggregations of the measures, calculations in which the measures were used, and/or breaking down of the measures by different dimensions; formatting, styling, and/or visual encodings associated with the measures; and data visualizations in which the measures have been used.
- such semantics and/or metadata in the data source provide domain specific knowledge on which the ML model 816 is trained.
- such semantics and/or metadata are provided to the ML model 816 from the selected data source in the prompt request (e.g., provided as input to the machine learning model).
- the ML model 816 is further trained on textual corpuses produced by humans, such as publications and resources on the Internet.
- training the ML model 816 on such textual corpuses is advantageous because the ML model 816 can determine additional semantic and business context that is not typically available in the selected source. For example, the ML model 816 can determine whether a particular change of an aggregated measure over time is positive, neutral, or negative, thereby generating values for a data field of a metric definition that corresponds to a favorability indicator.
- the ML model 816 outputs a number of metric definitions for suggested metric, and the generated metrics definitions are returned to the metrics service 812 via the API 810 .
- the metrics service 812 after transforming and/or de-duplicating the generated metrics definitions, sends a final list of suggested metrics to user interface 804 .
- the user 802 may modify the suggested metrics, save, and/or discard some of them.
- the generated metrics (and respective metric definitions) are stored in a metrics database 808 and used by the metrics service 812 .
- metric definitions generated by ML model 816 are cached per data source to reduce the computational cost (e.g., the number of requests) of using the ML model 816 .
- metrics service 812 prompts the ML model 816 in response to user requests.
- the metrics service 812 scans the available data sources for changes (e.g., adding measures), and can automatically suggests metrics to the user 802 without additional user input requesting the suggested metrics.
- the user 802 can export charts as metrics, thereby creating metric definitions without entering values for the data fields and without prompting the ML model 816 .
- a user may be viewing a line chart with measures and dimensions in a workbook, and the metrics service 812 may prompt the user to generate a metric from the line chart and the respective measure and dimension.
- the ML model 816 infers or generates remaining data fields of the metric definition that remain uncompleted by exporting the line chart as a metric.
- FIG. 9 illustrates a process 900 for automatically generating metrics using a ML model.
- Process 900 is initiated by user 902 at step 914 by selecting or clicking on a control for creating metrics (e.g., user input 218 selecting control 230 for creating new metrics, as in FIG. 2 ).
- user interface 904 sends a request to retrieve or load available data sources 908 .
- available data sources e.g., data sources 312 in FIG. 3
- the loaded data sources are displayed to user 902 (e.g., in user interface 300 for selecting a data sources, FIG. 3 ).
- user 902 selects respective data sources from the displayed data sources (e.g., user input 308 in FIG. 3 ). Further, at step 923 , user 902 selects a control that requests that suggested metrics be generated (e.g., control 416 in FIG. 4 ). In response, at step 924 , a request is sent from the user interface 904 to metric service 906 to infer or determine metrics from the selected data source (e.g., data sources 306 selected from data sources 312 in FIG. 3 ). In response to the request, at step 926 , metric service 906 sends a request to the selected data source in data sources 908 to fetch the data fields in the data sources.
- metric service 906 sends a request to the selected data source in data sources 908 to fetch the data fields in the data sources.
- the selected data source in data sources 908 fetches or sends the data fields (e.g., data fields names) in the selected data source.
- metric service 906 selects each measure in the fetched data fields (e.g., loops over a subset of data fields that correspond to measures in the fetched data fields) and, at step 930 , sends a prompt for each measure (e.g., a separate prompt is optionally sent for each measure) in the fetched data fields to machine learning model 910 (e.g., large language processing model).
- machine learning model 910 responds and sends to metric service 906 generated metric definitions for each measure in the fetched data fields.
- metric service 906 selects each metric definition (e.g., optionally one by one in a loop cycle) and performs post-processing operations. For example, at step 934 , metric service 906 parses and transforms each metric definition. Additionally, at step 936 , metric service 906 checks whether a respective metric definition in the generated metric definitions is a duplicate of another metric definition that is already stored in metrics database 912 (e.g., the check for duplicates can be performed using a has function). At step 938 , the metrics database 912 returns to metric service 906 , a determination whether a duplicate exist for each respective metric definition.
- metric service 906 removes any duplicates from the list of generated metric definitions.
- metric service 906 returns a list of suggested metrics (e.g., a list of the generated metric definitions that are transformed and without duplicates) to user interface 904 .
- the list of suggested metrics is displayed in user interface 904 (e.g., available for viewing by user 902 ). For example, a list of suggested metrics 1002 is illustrated in FIG. 10 .
- user 902 selects some or all of the suggested metric for saving in the metrics database 912 (e.g., by checking or unchecking a selection box, such as selection boxes.
- the metric service 906 receives a request from user interface 902 to generate and store metric definitions in the metrics database 912 for all metrics selected in step 946 , and, in response, metric service 906 stores all metrics selected in step 946 in the metrics database 912 .
- metrics that are based on metric definitions generated by a machine learning model can be tagged to indicate to a user that a further review or validation is necessary.
- user 902 can edit, validate, or otherwise interact with the metrics based on metric definitions generated by the machine learning model.
- FIG. 10 illustrates suggested metrics 1002 generated using a machine learning model in accordance with process 900 .
- the machine learning model generated and provided as output suggested metrics 1002 based on available measure in the selected data source 306 .
- the list of suggested metrics 1002 includes metric 1004 “Partner Happiness Score”, metric 1006 “Scope 1 Emissions”, metric 1008 “Renewable Energy Usage”, metric 1010 “Partner Pay Gap.”
- Each metric can be selected or unselected for storage in the metrics database 912 using respective controls 1004 c, 1006 c, 1008 c, 1010 c.
- values for the aggregated measure and time dimension are displayed for each respective metric are in the list of suggested metrics 1002 .
- the aggregate measure 1004 a is “Average of Partner Happiness” and the time dimension 1004 b is “Survey Data”
- the aggregate measure 1006 a is “Sum of Emissions” and the time dimension 1006 b is “Date”
- the aggregate measure 1008 “Renewable Energy Usage” is “Sum of Energy Use” and the time dimension 1006 b is “Date”
- metric 1010 “Partner Pay gap” and the aggregate measure 1010 a is “Maximum (Pay)-“Minimum (Pay)” and the time dimension 1010 b is “Review Data.”
- the suggested metrics 1002 are all saved in the metrics database 912 in response to user input 1014 selecting button 1012 for saving/storing selected metrics in the list of suggested metrics 1002 .
- FIG. 11 illustrates user interface 200 for viewing and creating metrics that displays newly created metrics using machine learning model.
- the newly generated metrics appear user interface 200 for viewing and creating metrics.
- the newly created metrics are tagged as such so that the user can recognize which metrics have been recently added. For example, metric 1004 “Partner Happiness Score”, metric 1006 “Scope 1 Emissions”, metric 1008 “Renewable Energy Usage”, and metric 1010 “Partner Pay Gap” are all tagged as in new in user interface 200 .
- metrics cards are displayed in a user interface (e.g., a user interface in a web browser application, a desktop application, or a mobile application) for metrics and insights.
- metric cards include (i) a value of an aggregated measure of the respective metric; (ii) a data visualization (e.g., a line chart) that visualizes the respective aggregated metric; and (iii) an insight based on the aggregated metric that includes to a textual description of the aggregated metrics.
- metric cards metric cards 1420 , 1422 , and 1424 ) are illustrated in FIG. 14 .
- a data model can represent relationships between the metrics.
- relationships between the metrics include, but are not limited to, “a leading indicator relationship” (e.g., one metric being a leading indicator for a change in another metric); a “lagging indicator relationship” (e.g., one metric being a lagging indicator for a change in another metric); “a positive influencer relationship” (e.g., one metric being a positive influencer for a change in another metric); “a negative influencer relationship” (e.g., one metric being a negative influence for a change in another metric); a “component of” relationship.
- the machine learning model is trained on metrics and various relationships that exist between the metrics.
- an output of the machine learning model includes a list of tuples, each corresponding to a Relationship that likely holds between two of the metrics.
- each tuple is in a format of (Metric, Relationship Type, Metric), e.g., (Open Pipeline, Leading Indicator, Revenue).
- an insight generator module can generate insights based on relationships between the metrics (e.g., based on a data model of the relationships between metrics).
- a data model of the relationships can be stored in metrics database (e.g., metrics database 808 ) or a different database to which metrics service 812 has access to.
- the following description provides examples of different types of metric relationships.
- the format that is used to describe the type of relationships is: “Metric [relationship type] Target Measure or Dimension.”
- relationships can be modeled manually.
- relationships can be built by a user using templates (e.g., a package of relationship for a respective domain).
- the user can map metrics and dimensions of a selected data source to metric definitions and templated relationships (e.g., in the package).
- the relationships can be detected and suggested automatically by an insights service. For example, for a selected metric, the insights service inspects the data source and returns possible relationships of the selected metric to other measures and dimensions in the data source using statistics.
- modeling metric relationships can be bootstrapped using a service that scrapes key measures, dimensions and their relationships from existing views of data in existing dashboards, and imports the same as modeled relationships in the metrics layer. In some embodiments, as part of the import workflow, user can select/de-select different mappings and relationships.
- FIG. 12 illustrates a process 1200 for automatically generating metric objects.
- a plurality of data fields are obtained ( 1204 ) from a selected data source.
- a first subset of the plurality of data fields correspond to a plurality of measures and a second subset of the plurality of data fields correspond to a plurality of dimensions.
- a user e.g., analytics professional or a business user selects a data source.
- a metrics service automatically scans for available data sources.
- some data fields in the obtained plurality of data fields are measures and some are dimensions.
- a machine learning model is prompted ( 1206 ) to generate a plurality of suggested metric objects.
- a metric service may prompt (or send a request) to the machine learning model requesting that the machine learning model suggests metric objects.
- the metric service may generate and send the prompt in response to a user input or an automatically generated request from a computer system.
- the machine learning model is a large language model, such as a generative pre-trained transformer.
- the machine learning model is trained on a data source that already includes metadata and/or semantics that have been pre-configured or predetermined by a user (e.g., a data analyst or other user that has domain knowledge).
- examples of such semantics and/or metadata data include, but are not limited to, columns that are labeled as measures; names or labels for the measures; usage of the measures across different workbooks, including labels or descriptions of those workbooks (e.g., the measure Sales have been used in a workbook described or labeled as Sales over time), pre-existing aggregations of the measures, calculations in which the measures were used, and/or breaking down of the measures by different dimensions; formatting, styling, and/or visual encodings associated with the measures; data visualizations in which the measures have been used.
- such semantics and/or metadata in the data source provide domain specific knowledge on which the machine learning model is trained.
- such semantics and/or metadata are provided to the machine learning model from the selected data source in the prompt request (e.g., provided as input to the machine learning model).
- a respective metric definition is generated ( 1208 ) for each measure in the plurality of measures, wherein each generated respective metric definition includes a plurality of data fields, including: (i) a name (e.g., a metric label or a textual description of a metric, such as “Delivery Time,” “Resource Utilization,” “Average of Partner Happiness,” “Sum of Emissions,” “Sum of Energy Use,” “Partner PayGap,” “Appliance Sales,” “Branch Revenue,” “Campaign ROI,” and/or other textual description of a respective metric); (ii) a measure; (iii) a time dimension; and (iv) an aggregation type (e.g., SUM, AVG, MAX, MIN, Median, Percentile, Standard Deviation, COUNT).
- a name e.g., a metric label or a textual description of a metric, such as “Delivery Time,” “Resource Utilization,” “Aver
- the measure corresponds to a data field in the data source, such as a column in a relational database table (e.g., Revenue, Expenses, or other measures depending on the data source).
- the measure is already existing and stored in the data source.
- the measure may be a calculated measure.
- a data field maybe created using a calculation optionally from other already existing or calculated measures (e.g., Revenue ⁇ Expenses to calculate a Profit measure).
- a metric service sends a prompt to the machine learning model requesting a metric definition for each identified data field in the data source that corresponds to a measure.
- the time dimension corresponds to a data field in the data source that includes date and/or time (e.g., order date) by which the measure is aggregated.
- metric objects are defined by the respective metric definition.
- metric objects also referred to as metrics
- metrics can be used in informational snippets referred to as insights that provide contextual (and optionally personalized) information for a respective matric, including optionally information about performance of the metric in relation to other metrics and/or across different dimensions.
- metric objects are entities that are subject to analysis, e.g., to gain insights about data and/or make informed decisions.
- metric objects can be generated manually. For example, a user can select a measure, time dimension, an aggregation type, name, and/or other fields that are included in a definition of a metric.
- some data fields of a metric definition can be manually generated while others can be bootstrapped or otherwise generated automatically by using the machine learning model.
- a plurality of metrics are predefined (e.g., templates with preset metric definitions).
- the metrics definitions are stored in a database, and a metric service retrieves, changes, and/or adds to the stored metrics.
- some of the plurality of fields can be provided by a user and a remainder of the data fields can be generated or suggested by the machine learning model (e.g., on the fly).
- the plurality of data fields that are generated and/or suggested by the machine learning model can be validated by a user or another machine learning model.
- the machine learning model can generate personalized metric definitions based on metadata and/or usage of the measures by a respective user.
- the plurality of data fields include ( 1210 ) additional contextual fields, including one or more related dimensions.
- candidate dimensions by which the measure can be analyzed meaningfully e.g., the measure Revenue can be meaningfully analyzed by the dimensions Region and Product, where analyzing Revenue by Order Id is not helpful.
- the one or more related dimensions are predicted by the machine learning mode to likely be useful for breaking down or filtering the metric.
- a threshold number of dimensions can be included in a metric definition (e.g., no more than five dimensions may be returned by the machine learning model).
- one or more related dimensions can be selected (e.g., inferred) and/or generated by the machine learning model or by a user.
- a metric that has dimensions associated with it is referred to as a scoped metric.
- the plurality of data fields include ( 1212 ) additional contextual fields, including time granularity.
- different time granularities are appropriate for the respective aggregation of the measure associated with the generated metric.
- the measure can be aggregated by an hour, a day, a week, a month, last then days, last 30 days, last 6 months, or any other time frame that is suitable for the respective measure aggregation.
- sales maybe be aggregated for the day, or depending on the data in the data sources, sales should not be aggregated across less than a week time frame.
- time granularity can be selected and/or generated by the machine learning model or by a user.
- a metric that has time granularity associated with it is referred to as a scoped metric.
- the plurality of data fields includes ( 1214 ) additional contextual fields, including a favorability indicator.
- the machine learning model generates, infers, or suggests a favorability indicator related to performance of the respective measure. For example, if a value of the aggregated measure is going up (e.g., historically or over specified comparative time frames), the machine learning model can infer if such a change is positive (e.g., good), negative (e.g., bad), or neutral; if the change is normal or unusual.
- the favorability indicator controls contextual information related to the metric such as color (e.g., red to indicate negative change, green to indicate positive change, and a neutral color such as blue or grey to indicate a neutral change); and language that can be used to describe the metric in digests and/or insights (e.g., “Sales improved . . . ” for a positive change vs. “Sales increased . . . ” for a neutral change).
- additional contextual fields that the machine learning model can generate or infer include a description of the metric in natural language (e.g., a sentence-long description of the metric in natural language, e.g., description for a non-technical person) and/or formatting or appropriate styling for the measure.
- the machine learning model is ( 1216 ) a generative artificial intelligence model.
- the generative artificial intelligence model can generate textual description for the metric (e.g., based on the respective metric, and data and other data fields in the data source).
- the machine learning model that is used to generate the metric definition is a first machine learning model, and the metric definitions generated by the first machine learning model are validated ( 1218 ) using a second machine learning model
- one or more suggested metric objects are displayed ( 1220 ) in a user interface, where each suggested metric objects is based on a respective metric definition generated by the machine learning model. For each of the one or more suggested metric objects, an option for selecting a respective suggested metric object to be saved in a metrics database that includes other metric objects is displayed is displayed in the user interface.
- FIG. 13 illustrates a schematic system 1300 for automatically generating insight summaries using machine learning model 1340 , in accordance with some embodiments.
- insight generation platform 1310 is conceptually a layer on top of the metric layer 1308 , and both have access (direct or indirect) to data sources(s) 1324 (e.g., data sources 814 a and 814 b in FIG. 8 ).
- metric layer 1308 corresponds to metric layer 806
- the metrics service 1322 corresponds to metrics service 812
- metrics database 1320 corresponds to metric database 808 , described in further detail with reference to FIG. 8 .
- Metric bootstrapper 1365 is a module (e.g., a program) for generating metrics using machine learning model 1340 (e.g., in response user input, or autonomously without additional user input) via application programming interface (API) 1360 (e.g., API 1360 corresponds to API 810 in FIG. 9 ).
- API application programming interface
- metric bootstrapper 1365 automatically generates metrics using the machine learning model 1340 in accordance with process 900 (described in relation to FIG. 8 ) and process 1200 (described in relation to FIG. 12 ).
- metric definitions are created manually (e.g., by a business analyst) or automatically (e.g., by metric bootstrapper 1365 ) and stored in metrics database 1320 .
- An example metric definition for a sales metric has one or more values for each respective data field in the metric definition, such as (i) name: Sales; (ii) measure: SUM(Order_Value); (iii) Definitional Filters: “where Order_Value>$100”; (iv) Time Dimension: Order_Closed_Date; (v) Dimensional Filter(s): Region, Product Category, and Sales Channel; (vi) Time Granularities: weekly, monthly, quarterly.
- time granularity is “weekly”
- time comparison “previous period.”
- insight generation platform 1310 includes insight service 1328 and insights database 1326 .
- insight service 1328 generates insights for selected metrics (e.g., a metric selected from the metrics database 1320 ).
- insights are natural language expressions that explain or provide description of changes (or other information) related to a selected metric that are occurring in a respective (e.g., selected) data source (e.g., performance of an aggregated metric for a specific time frame, and/or across different dimensions).
- the insight type generator 1350 receives as input scoped metrics (e.g., the insight service 1328 provides the metrics scoped by the user to the insight type generator 1350 ).
- the insight type generator 1350 generates one or more different types of insights for the scoped metric using one or more different insight type templates (e.g., stored in insights database 1326 ).
- insight generator 1350 can generate a Top Drivers insight for the Sales metric, where the Top Drivers insight provides information about which dimension is the main cause (e.g., or top driver) for the changes occurring for the aggregated measure in the Sales metric.
- the insight type generator 1350 calculates, for each dimension member (i) the contribution amount for a previous time range; and (ii) the contribution amount for a starting time range. It then determines the amount of change driven/explained by calculating the difference in contribution.
- An example template is: “During the last [scope]: [metric_name] increased by [ ⁇ /+percent]. [Dimension]_member_list] increased the most,” and example insight based on that Top Driver template is: “During the last period, Sales increased by $365K. ePhones, Simpson Phones, and Smart Home increased the most.”
- the generated insights are stored in insights database 1326 .
- various types of insight templates are stored in an insights database.
- Example types of insights include “Profit Changed” Insight (e.g., with respective associated metric being Profit); a “Revenue changed” Insight; Top Driver Insight, and other types.
- Example insights that are generated based on the data in the data source and respective insight templates include, but are not limited to, “Gross margin is 14.4% greater than last month;” “Delivery Day increased by 5.6% compared to the same time last month;” and other natural language descriptions of the performance of a respective aggregated measure associated with a respective metric.
- a collection of insights each generated for a metric of a set of selected metrics (e.g., metrics that the user is following) are stored using a data structure that is referred to as a bundle (e.g., bundles are optionally stored in insights database 1326 ).
- a bundle is a collection of insights (e.g., optionally generated using templates) for multiple metrics to which a respective user is subscribed to or is following.
- bundles are predefined or created dynamically. For example, a “Metric Change” bundle can contain a first insight about a metric that changed and a second insight that explains the change.
- a bundle can include relationships between insights included in the bundle.
- bundles can define or include semantic relationships between the insights included in the bundle.
- the semantic relationships can be expressed with text.
- a bundle may contain a “Profit Changed” Insight and a “Revenue changed” Insight, with a “caused by” Relationship between the two insights to capture a business insight, such as “Profit rose, driven by an increase in revenue.”
- the templated output of a bundle can include that relationship, e.g., “Profit rose 15%. This was caused by: Revenue rose 18%”.
- a bundle of insights is generated by insight type generator 1350 and stored in insights database 1326 .
- the bundle of insights is provided to insight summarization module 1330 to generate a summary of insights for all metrics that a user is following.
- the insight summarization module 1330 provides the bundle of insights as input to machine learning model 1340 optionally using application programming interface (API) 1360 .
- the machine learning model 1340 is used by metrics bootstrapper 1332 to automatically generate metric definitions, and by insights summarization module 1330 to summarize all insights in a respective bundle of insights (e.g., for all metrics a user is following).
- metrics bootstrapper 1365 and insights summarization module 1330 use different machine learning models.
- insight summarization module 1330 ranks respective insights included in the bundle of insights according to relevancy of respective metrics associated with the respective insights, and provides the ranked insights in the bundle as input to the machine learning model 1340 .
- the insight summarization module 1330 generates a summary of the bundle of insights using the machine learning model 1340 .
- the insight summarization module 1330 validates the summary of the bundle of insights using regular expression and heuristics, or using a second machine learning model.
- the summary of the bundle of insights includes a natural language expression describing top insights included in the bundle for metrics, which the user is following. For example, a summary can be: “Sales is seeing unusual spike since the beginning of this week, while quarterly Regional Revenue and monthly Campaign ROI are steadily increasing. Additionally, 7 out of 12 other metrics have changed, 4 favorably and 3 unfavourably.”
- validating accuracy of the first summary of the bundle insights includes comparing a respective aggregated measure included in a respective insight in the bundle of insights to a respective aggregated measure included in the summary and determines if there any changes in the respective aggregated measure. For example, a insight summarization model determines whether an aggerated measure in the summary is the same as a respective aggregated measure in the original insight.
- selectable business questions can be generated and displayed for a user 1302 in user interface(s) 1304 via data experiences platform 1306 .
- a summary of the insights is provided in user interface(s) 1304 via data experiences platform 1306 .
- FIG. 14 illustrates a user interface 1400 that includes a summary 1404 of insights related to metrics to which a user is following, in accordance with some embodiments.
- the summary 1404 informs the user of the following: “Appliance Sales is seeing an unusual spike, while Branch Revenue and Campaign ROI are steadily increasing. Of the 12 metrics you are following, 2 is unusual.”
- the top metrics in summary 1404 include “Appliance Sales” 1406 , “Branch Revenue” 1408 , and “Campaign ROI” 1410 .
- a user can hover over each of the underlying metrics to obtain additional information and/or insights about the respective metric (e.g., in a pop-up window).
- metric card 1420 provides information related to the metric “Appliance Sales” 1406 , including a data visualization 1420 a, the calculated measure 1420 d which corresponds to the amount of appliances sales (e.g., 1,675 units), the amount of change 1420 e compared to a previous period (e.g., +365 units (+27% compared to a previous period)); and the time dimension and/or other selected filters 1420 f (e.g.
- Metrics card 1420 further includes an indicator 1420 c for whether the change is usual or unusual. The colors for the indicators are selected based on the favorability indicator in the metric definition for the metric “Appliance Sales” 1406 .
- User interface 1400 further includes metric card 1422 for the metric “Branch Revenue” 1408 , which provides data and/or information related metric “Branch Revenue” 1408 , including a data visualization 1422 a, the calculated measure 1422 d, which corresponds to the amount of Branch Revenue (e.g., 10.7M pounds), the amount of change 1422 e compared to a previous period (e.g., +0.7M pounds (+7% quarter to date)); and the time dimension and/or other selected filters 1422 f (e.g. Quarter to date, Cambridge); and insight 1422 b that states: “The quarter to date Branch Revenue had an unusual increase of +1.2% over the equivalent quarter to date value a year ago, mainly attributed to Home Appliance.”
- metric card 1422 for the metric “Branch Revenue” 1408 provides data and/or information related metric “Branch Revenue” 1408 , including a data visualization 1422 a, the calculated measure 1422 d, which corresponds to the amount of Branch Revenue (e.g., 10.7
- User interface 1400 further includes metric card 1424 for the metric “Campaign ROI” 1410 , which provides data and/or information related to the metric “Campaign ROI” 1410 , including a data visualization 1424 a, the calculated measure 1424 d which corresponds to the amount of Campaign ROI (e.g., 379%), the amount of change 1424 e compared to a previous period (e.g., +1.1 percentage points since last month); the time dimension and/or other selected filters 1424 f (e.g. Monthly, Cambridge); and insight 1424 b that states: “The monthly Campaign ROI has been increasing at a rate of 1 percentage point per month for the past 6 months, in line with change in rate of increase on Social Media campaigns.”
- summary 1404 includes summaries of insights 1420 b , 1422 b, and 1424 b.
- the metric cards include indicators that specify how normal or unusual the metric data is currently (e.g., indicators 1420 c, 1422 c, and 1424 c, which are also color coded to identify the extent of unusual data).
- FIG. 15 is a flowchart diagram of a process 1500 for generating and validating data insights using machine learning models, in accordance with some embodiments.
- a bundle of insights generated based on insight templates is obtained ( 1504 ).
- metrics (optionally including scoped metrics) are associated with respective insights.
- an insight is generated for a respective aggregated measure in a respective metric.
- there is one insight displayed in a metric object e.g., a metric card displayed in a user interface.
- insights are natural language expressions that explain or provide descriptions of changes related to a selected metric that are occurring in a respective (e.g., selected) data source.
- insights are created using templates.
- various types of insight templates are stored in an insights database.
- Example types of insights include “Profit Changed” insights (e.g., with respective associated metric being Profit) and a “Revenue changed” insights.
- Example insights that are generated based on the data in the data source and respective insight templates include, but are not limited to, “Gross margin is 14.4% greater than last month;” “Delivery Day increased by 5.6% compared to the same time last month;” and other natural language description of the performance of a respective aggregated measure associated with a respective metric.
- a collection of insights is stored (e.g., in an insights database) using a data structure that is referred to as a bundle.
- a bundle is a collection of insights (e.g., optionally generated using templates) for multiple metrics to which a respective user is subscribed to or is following.
- bundles are predefined or created dynamically. For example, a “Metric Change” bundle can contain a first insight about a metric that changed and a second insight that explains the change.
- bundles include, but are not limited to, “The week to date Appliance Sales has seen an unusual increase of +356 (+27%) over the last week and is now 17% above the normal range”; “The quarter to date Branch Revenue had an unusual increase of +1.2% over the equivalent quarter to date value a year ago, mainly attributed to Home Appliance”; and/or “The monthly Campaign ROI has been increasing at a rate of 1 percentage point per month for the past 6 months, in line with change in rate of increase on Social Media campaigns.”
- a bundle can be defined or generated “on the fly” or implicitly.
- a bundle can be implicitly created as the user explores or interacts with the metric and associated insights.
- a bundle includes relationships between insights included in the bundle.
- bundles can define or include semantic relationships between the insights included in the bundle.
- the semantic relationships can be expressed with text.
- a bundle may contain a “Profit Changed” insight and a “Revenue changed” insight, with a “caused by” Relationship between the two insights to capture a business insight, such as “Profit rose, driven by an increase in revenue.”
- the templated output of a bundle can include that relationship, such as “Profit rose 15%. This was caused by: Revenue rose 18%”.
- the bundle of insights is provided ( 1506 ) as input to a machine learning model.
- the machine learning model is large language processing model.
- the machine learning model is trained on insights generated using templates (e.g., created by an analytics professional or other users skilled in data analytics) for different data sources.
- the machine learning model is a generative pre-trained transformer.
- the machine learning model is trained on a data source that already includes metadata and/or semantics that have been pre-configured or predetermined by a user (e.g., a data analyst or other user that has domain knowledge).
- examples of such semantics and/or metadata data include, but are not limited to, columns that are labeled as measures; names or labels for the measures; usage of the measures across different workbooks, including labels or descriptions of those workbooks (e.g., the measure Sales have been used in a workbook described or labeled as Sales over time), pre-existing aggregations of the measures, calculations in which the measures were used, and/or breaking down of the measures by different dimensions; formatting, styling, and/or visual encodings associated with the measures; data visualizations in which the measures have been used.
- such semantics and/or metadata in the data source provide domain specific knowledge on which the machine learning model is trained.
- a summary of the bundle of insights is generated ( 1508 ) using the machine learning model (e.g., the machine learning model 1340 in FIG. 13 ).
- the bundle includes a single insight.
- the summary of the bundle improves the templated language in bundle expression using the machine learning model.
- the summary of the bundle (e.g., summary 1400 in FIG. 14 ) of insights corresponds to ( 1510 ) a natural language expression describing respective insights and relationships between the respective insights in the bundle.
- generating the summary of the bundle of insights includes ( 1512 ): generating a first summary of the bundle of insights using the machine learning model; validating accuracy of the first summary of the bundle insights, including identifying one or more inaccuracies; and generating a second summary of the bundle of insights correcting the identified one or more inaccuracies.
- validating accuracy of the first summary of the bundle insights is performed ( 1514 ) using regular expressions and heuristics.
- validating accuracy of the first summary of the bundle insights includes comparing a respective aggregated measure included in a respective insight in the bundle of insights to a respective aggregated measure included in the summary and determines if there any changes in the respective aggregated measure.
- an insight summarization model e.g., insight summarization module 1330 in FIG. 13 ) determines whether an aggerated measure in the summary is the same as a respective aggregated measure in the original insight.
- the machine learning model used to generate the summary of the bundle insights is ( 1516 ) a first machine learning model, and validating accuracy of the first summary of the bundle insights is performed using a second machine learning model.
- the process 1500 further includes ( 1518 ): ranking respective insights in the bundle of insights according to relevancy of respective metrics associated with the respective insights; and providing the ranker insights in the bundle as input to the machine learning model includes providing the ranking of the respective insights in the bundle.
- At least some metrics that are associated with respective insights in the bundle of insights have ( 1520 ) predetermined relationships to other metrics stored in a metrics database.
- FIG. 16 A illustrates a user interface 1600 for querying a data source, via natural language, and viewing results from the query (e.g., a dashboard that includes at least one metric and/or at least one insight).
- a user has selected the Device Sales dashboard 1602 (shown in user interface 1600 ), which is a dashboard that the user follows (e.g., receives updates associated with the dashboard).
- the user can view the Device Sales dashboard via a natural language query (as discussed in more detail below with respect to FIGS. 16 B- 16 E ).
- the Device Sales dashboard 1602 includes a metric 1606 (e.g., 1,675 units) associated within a time interval 1608 (e.g., Sep. 10-Sep. 12, 2023).
- a respective dashboard e.g., metric detail page
- a respective dashboard includes more than one metric and/or includes one or more metrics within one or more time intervals.
- a respective dashboard includes a metric associated with device sales and a metric associated with inventory fill rate.
- the type of metric and/or the time interval may be predetermined by the user, or may be automatically selected based on the data associated with device sales.
- the metric 1606 can be filtered (or otherwise further processed) via one or more filters 1604 , including a week-to-date filter 1604 a, a vs-previous-period filter 1604 b, a region filter 1604 , and/or other filters 1604 d.
- the week-to-date filter is selected, and the metric 1606 is filtered to include data between Sep. 10 to Sep. 12, 2023.
- the vs previous-period filter is selected, and the metric 1606 between Sep. 10 to Sep. 12, 2023 is compared to the previous time interval (e.g., Sep. 3 to Sep. 5, 2023).
- the comparison 1610 indicates that the number of units sold has increased by +356 units (+27%) when compared to the previous period.
- the region filter is selected, and the metric 1606 is filtered to the North American region.
- the Device Sales dashboard 1602 further includes an indicator 1612 of a trend (e.g., current trend, unusual change) associated with the metric 1606 and/or one or more filters 1604 .
- a trend e.g., current trend, unusual change
- the indicator 1612 is “High” because the units sold increased by +356 units (+27%) compared to the previous period.
- the indicator 1612 includes information associated with record-level outliers, top contributors, bottom contributors, concentrated contribution alert (risky monopoly), top drivers, and/or top detractors.
- the indicator 1612 may be based on the metric 1606 and the one or more filters 1604 .
- the Device Sales dashboard 1602 further includes an insight 1613 , which includes a natural language explanation 1614 of the insight 1613 and a data visualization 1616 of the insight 1613 .
- the insight is that the device sales for September 10-September 12 is now 1,675 units and is above the expected range of 1.2K to 1.4K.
- the insight includes comparing the number of units sold to a the number of unit sold in the past to determine that the current number of sales is above the expected range based on previous sales. This information is generated as a natural language explanation 1614 .
- a data visualization 1616 is also generated to visualize the current number of sales (e.g., 1,675 units) is above the expected range (e.g., 1.2K to 1.4K) for the period between September 10 and September 12.
- the Device Sales dashboard 1602 further includes affordances 1618 for feedback from the user. For example, when the thumbs up affordance is selected, the user is indicating that the insight 1613 is useful and/or relevant to the metric 1606 , and when the thumbs down affordance is selected, the user is indicating that the insight 1613 is not useful and/or relevant to the metric 1606 . In some embodiments, in response to selection of the thumbs down affordance, insight 1623 is replaced with a different insight associated with the metric 1606 .
- the Device Sales dashboard 1602 further includes one or more recommended natural language queries 1620 (e.g., recommended questions), which are recommended based on at least the metric 1606 and/or the insight 1613 .
- the recommended natural language queries 1620 are “Which products drove this sudden increase” 1620 a, “How has the trend changed” 1620 b, and “What are the outliers” 1620 c.
- These recommended natural language queries 1620 are related to the sale of units (e.g., type of products, and sales trends).
- the Device Sales dashboard 1602 further includes an “Ask Questions” affordance 1621 , which can be selected to display a query input field for entering a natural language query in addition to the recommended natural language queries 1620 . Furthermore, the Ask Questions affordance 1621 can be selected to view additional recommended natural language queries 1620 .
- a natural language query provided via the input field queries the data source based on the currently displayed metric 1606 and/or insight 1613 . In some embodiments, a natural language query provided via the input field queries all data in the data source, independent of the currently displayed metric 1606 and/or the currently displayed insight 1613 .
- FIG. 16 B transitions from FIG. 16 A in response to selection of the recommended natural language query 1620 a.
- natural language query 1622 is inputted, and one or more insights are generated based on one or more predefined types of analyses (e.g., record-level outliers, period over period change, record-level outliers, top contributors, bottom contributors, concentrated contribution alert (risky monopoly), top drivers, top detractors, unusual change, current trend, and trend change alert).
- predefined types of analyses e.g., record-level outliers, period over period change, record-level outliers, top contributors, bottom contributors, concentrated contribution alert (risky monopoly), top drivers, top detractors, unusual change, current trend, and trend change alert.
- the one or more insights are generated (or updated) in response to a query (e.g., a natural language query, a structured query, and/or other types of queries), such that the one or more insights includes real-time and up-to-date information. Stated differently, the one or more insights are generated or updated in real time (e.g., in response to the query) to reduce data staleness.
- a query e.g., a natural language query, a structured query, and/or other types of queries
- the one or more insights are generated or updated in real time (e.g., in response to the query) to reduce data staleness.
- the insight 1623 includes a natural language explanation 1624 and a data visualization 1626 .
- the natural language explanation 1624 information that during the last week (e.g., the current period) Device Sales increased by 356 units, and ePhones, Simpson Phones, and Smart Home increased the most.
- the insight 1623 is that the ePhones, Simpson Phones, and Smart Home product categories contribute to the increase in Device Sales.
- the data visualization 1626 shows the same, with ePhones increasing by +200 units, Simpson Phones increasing by +72 units, Smart Home products increasing by +21 units, and the other 21 remaining product categories, on average, increasing by +3 units as compared to the previous reporting period.
- the one or more recommended natural language queries 1620 displayed in the Device Sales dashboard 1602 is updated in response to selection of the recommended natural language query 1620 a. Based on the selection of the recommended natural language query 1620 a, the recommended natural language queries 1620 now further include the recommended natural language query “Which channels had low Device Sales” 1620 d. Generally, the recommended natural language queries 1620 are updated based on the query history, including selection of a recommended natural language query, input of a natural language query, or other types of queries.
- the one or more insights are associated with one or more predefined questions (e.g., parametrized questions) for the data source.
- a respective insight is associated with a respective predefined question.
- the predefined question may include a parameterized representation of questions that are associated with the insight (e.g., questions a user may ask when attempting to gain information associated with the insight).
- the natural language query 1622 is semantically compared to respective natural language explanations of the one or more insights and/or respective predefined questions associated with the one or more insights.
- semantic comparison e.g., semantic matching
- semantic comparison is performed by embedding the natural language query and the predefined questions and comparing the mathematical similarity between the embedding of the natural language query and the embedding of the predefined question. The semantic comparison is discussed in more detail with respect to FIG. 18 .
- the semantically matching insight is selected.
- the natural language explanation 1624 semantically matches the natural language query 1620 a.
- the natural language query 1622 is “Which Products drove this sudden increase,” and the natural language explanation is “During the last week, Device Sales increased by 356 units, ePhones, Simpson Phones, and Smart Home increased the most.”
- the semantic match is based on the Products that are the top contributors to the increase in Device Sales.
- the Device Sales dashboard 1602 further includes affordances 1628 for feedback. For example, when the thumbs up affordance is selected, the user is indicating that the insight 1623 is useful and/or relevant to the natural language query 1622 , and when the thumbs down affordance is selected, the user is indicating that the insight 1623 is not useful and/or irrelevant to the natural language query 1622 . For example, in response to selection of the thumbs down affordance, insight 1623 is replaced with a different insight that semantically matches the recommended natural language query 1622 .
- FIG. 16 C transitions from FIG. 16 A in response to selection of the query input field 1630 (displayed in the user interface 1600 ), which is configured to receive natural language queries, structured queries, and/or other types of queries.
- the query input field 1630 displayed in the user interface 1600 .
- the list of recent searches 1632 includes previous natural language queries and associated insights.
- the computer system retrieves the insight associated with the selected previous natural language query. In addition to retrieving the associated insight, the computer system may also update the insight with the most recent data in the data source.
- the list of recent searches 1632 is based on the selected data source. For example, when accessing a first data source and a second data source, only the recent search results of the currently active data source is displayed in the list of recent searches 1632 . In this example, when accessing the first data source, only the recent searches of the first data source are displayed in the list of recent searches 1632 , and when accessing the second data source, only the recent searches of the second data source are displayed in the list of recent searches 1632 .
- FIG. 16 D transitions from FIG. 16 C in response to a natural language query input 1631 .
- the natural language query input 1631 is “Will we fulfill phone orders?”
- suggested natural language queries 1634 are generated.
- the suggest natural language queries 1634 may be based on semantically matching the natural language queries with one or more predefined questions (e.g., parametrized questions) of the data source, one or more insights of the data source, and/or previous searches performed on the data source.
- suggested natural language queries 1634 include “What is the projected Inventory Fill Rate?”, “Is there seasonality in Inventory Fill Rate?”, and “Is the change in Pending Orders unexpected.”
- the suggested natural language queries 1634 are ordered based on the semantic similarity between the suggested natural language query (or associated insight) with the natural language query input 1631 .
- the suggested natural language queries 1634 are ordered based on the popularity of the search result (e.g., how many followers the search result has).
- the search results link to dashboards.
- a respective suggested natural language queries 1634 is associated with a respective metric.
- the “What is the projected Inventory Fill Rate?” natural language query is associated with the “Inventory Fill Rate—North America, Phone” metric.
- the “Is there seasonality in Inventory Fill Rate?”,” natural language query is associated with the “Inventory Fill Rate—North America, Tablet” metric.
- the “Is the change in Pending Orders unexpected” natural language query is associated with the “Pending Orders—North America” metric.
- Additional search results are accessible via “View Search Results” affordance 1638 .
- FIG. 16 E transitions from FIG. 16 D in response to selection of the suggested natural language query “Inventory Fill Rate—North America, Phone.”
- the Inventory Fill Rate metric 1636 (e.g., 91%) is generated with the region filter 1634 c set to North America and the category filter 1634 set to Phone in response to the selection of the suggested natural language query “Inventory Fill Rate—North America, Phone.” Additionally, the Inventory Fill Rate metric is further filtered by the month-to-date filter 1634 a to include data within the current month to date (e.g., Sep. 1 to Sep. 12, 2023). In this example, the vs-previous-period filter is selected, and the metric 1636 is compared to the previous time interval (e.g., Aug. 1 to Aug. 12, 2023). The comparison 1640 indicates that the inventory fill rate is 4 percentage points lower compared to the inventory fill rate from Aug. 1 to Aug. 12, 2023, and the inventory fill rate is 1 percentage point lower than the expected range.
- the Device Sales dashboard 1602 further includes an insight 1643 , which includes a natural language explanation 1644 of the insight 1643 and a data visualization 1646 of the insight 1643 .
- the insight is that the inventory fill rate for September 1-September 12 is now 91% and has dropped below the expected range of 95% to 92%, and a new unfavorable trend has been detected for Inventory Fill Rate (it is trending down compared to the previous trend).
- the insight includes comparing the current Inventory Fill Rate to past Inventory Fill Rates to determine that the current Inventory Fill Rate is below the expected range.
- the insight includes comparing the trend of the Inventory Fill Rate to the previous trend (which was trending upwards).
- This information is generated as a natural language explanation 1644 .
- a data visualization 1646 is also generated to visualize that the Inventory Fill Rate (e.g., 1,675 units) is below the expected range (e.g., 92% to 95%) for the period between September 1 and September 12.
- the Device Sales dashboard 1602 further includes affordances 1648 for feedback from the user. For example, when the thumbs up affordance is selected, the user is indicating that the insight 1643 is useful and/or relevant to the metric 1636 , and when the thumbs down affordance is selected, the user is indicating that the insight 1643 is not useful and/or relevant to the metric 1636 . In some embodiments, in response to selection of the thumbs down affordance, insight 1623 is replaced with a different insight associated with the metric 1636 .
- FIG. 16 E transitions from FIG. 16 A in response to selection of the “Ask Questions” affordance 1621 .
- a popup 1650 is generated and/or displayed in the user interface 1600 and/or the Device Sales dashboard 1602 .
- the popup 1650 includes a query input field 1652 for inputting a natural language query (e.g., by the user) in addition to the recommended natural language queries 1620 .
- the popup 1650 includes additional recommended natural queries (e.g., in addition to the recommended natural language queries 1620 ).
- the additional recommended natural queries may update based on the current input in the query input field 1652 and/or change in context, such as different selected data, a different metric, and/or a different insight.
- the user does not select one of the several recommended natural language queries that were generated and displayed to the input “What sold well?” and instead sends the input without modification.
- the computer system semantically matches the natural language query “What sold well?” to one or more predefined questions (associated with the generated insights) and/or generated insights and selects the generated insight that most closely matches the natural language query.
- the “Ask Questions” affordance 1621 is located proximate to the recommended natural language queries 1620 and/or insight 1613 . Location proximate to the insight improves the accessibility of inputting natural language queries by the user because the user does not need to navigate to another portion of the user interface, or another user interface, to input a natural language query. Thus, the user can more quickly access the desired information.
- the popup 1650 includes information 1656 .
- the information 1656 includes content to guide the user's exploration of the data, metric and/or insights.
- the information 1656 is “Try asking about trends, breakdowns, or contributions” to guide the user's exploration of Device Sales data.
- the information 1656 is based on the data, metric, and/or insight selected and/or filtered in the current view (e.g., Device Sales dashboard).
- FIG. 17 illustrates semantic comparisons between natural language queries 1702 and predefined questions (e.g., question 1704 , 1706 , 1708 , and 1710 ) and generated insights (e.g., insight 1714 , 1716 , 1718 , and 1720 ) in accordance with some embodiments.
- the natural language query 1702 is associated with (or directed to) a metric. For example, as shown in FIG. 17 , the natural language query 1702 “What drove the increase in sales” is directed to the Sales metric 1703 .
- the computer system generates a sentence embedding of the text of the natural language query 1702 by pooling word embeddings of the sentence as a vector. For example, a BERT-like language model is used to embed each word of the natural language query 1702 .
- the embedded words (e.g., embedded text) of the natural language query 1702 is averaged via a mean pooling layer to generate the sentence embedding for the natural language query 1702 .
- the computer system generates a sentence embedding of the text of the predefined questions by pooling word embeddings of the sentence as a vector (e.g., the same methodology as embedding the natural language query 1702 ).
- a BERT-like language model is used to embed each word of the predefined questions.
- the embedded words of the predefined questions is averaged via a mean pooling layer to generate the sentence embedding for the predefined questions.
- the insights associated with the predefined questions are generated in response to the natural language query 1702 .
- the insights are generated (or updated) after a natural language query 1702 is received.
- the insights and the predefined questions are associated with a metric. For example, as shown in FIG. 17 , the user is interested in the Sales metric 1703 , which includes associated predefined questions and associated generated insights.
- Example predefined questions and associated generated insights are listed in the table below.
- the computer system generates a sentence embedding of the text of the generated insights by pooling word embeddings of the sentence as a vector (e.g., the same methodology as embedding the natural language query 1702 ).
- a BERT-like language model is used to embed each word of the generated insights.
- the embedded words of the generated insights is averaged via a mean pooling layer to generate the sentence embedding for the generated insights.
- semantically comparing the natural language query 1702 with the predefined questions and the generated insights is via calculating the cosine similarities between each pair of vectors to get match scores that proxy confidence that the insight answers the question (e.g., natural language query).
- a visual representation of calculating the cosine similarities is illustrated by graph 1722 .
- the vectors are 3 dimensional (vectors of the embeddings of the natural language query 1702 , predefined questions, and generated insights may be more than 3 dimensions).
- the similarity between two vectors is how close they are to pointing in the same direction (e.g., sentence A points in a similar direction to sentence C, whereas sentence B is pointing in a direction different than sentences A and C).
- the comparison is performed between the natural language query 1702 and each predefined question and each generated insight separately. For example, a match score is calculated for each predefined question and each generated insight.
- the selected insights are based on the predefined question or generated insight with the highest score (or a predefined number of top scores). Specifically, if the predefined question has the highest match score, the predefined question and the insight associated with the predefined question is provided to the user. In another example, if the generated insight has the highest match score, the generated insight and the predefined question associated with the predefined question is provided to the user.
- the comparison is performed between the natural language query 1702 and each set of predefined question and associated generated insight. For example, each set of predefined question and associated generated insight has a single match score that is an average of the match score for the respective predefined question and respective generated insight of the set.
- FIG. 18 illustrates a process 1800 for semantically matching natural language queries with predefined questions (e.g., parameterized questions).
- Process 1800 is initiated by user 1802 at step 1812 by loading the metric detail page 1804 .
- insights are generated via the insights service 1806 and at step 1816 the generated insights are returned from the insight service 1806 to the metric detail page 1804 .
- the generated insights are provided (e.g., displayed) to the user 1802 .
- the generated insights and the predefined questions associated with the metric and/or the generated insight are sent from the metric detail page 1804 to the embedding service 1808 .
- an embedding is retrieved from a cache 1810 or generated by embedding service 1808 .
- the embedding service attempts to retrieve an embedding associated with each item from the cache 1810 .
- an embedding is retrieved from the cache to the embedding service if an embedding is available for the respective item. If an embedding is not available for the respective item, at step 1830 , a new embedding is generated by the embedding service for the respective item. Additionally, the new embedding is stored, at step 1832 , in the cache 1810 .
- the embeddings for the generated insights and/or predefined questions are sent from the embedding service to the metric detail page.
- a user 1802 submits an ad hoc query to the metric detail page.
- the metric detail page sends a query to the embedding service requesting the embeddings associated with the user's ad hoc query.
- the embedding service sends the embeddings associated with the user's ad hoc query to the metric detail page.
- the metric detail page then ranks questions based on similarity to the ad hoc query at step 1846 .
- the results of the ad hoc query e.g., the predefined questions and generated insights that are semantically similar to the ad hoc query
- FIG. 19 illustrates a process 1900 for automatically generating metric objects.
- a natural language query directed to a data source e.g., a database
- a data source e.g., a database
- a user inputs a natural language query or selects a recommended natural language query.
- the natural language query may be inputted or selected from a user interface (e.g., user interface 1600 in FIGS. 16 A- 16 E , dashboard 1602 in FIGS. 16 A- 16 D , and/or dashboard 1632 in FIG. 16 E )
- a user interface e.g., user interface 1600 in FIGS. 16 A- 16 E , dashboard 1602 in FIGS. 16 A- 16 D , and/or dashboard 1632 in FIG. 16 E
- automatic recommendations may be generated that are associated with (or complete) the natural language query (e.g., suggested natural language queries 1634 in FIG. 16 D ).
- the natural language query can include, in part or in whole, a structured query, wherein a natural language query includes questions in plain, everyday language (e.g., in a way similar to how the user may ask another person) and a structured query includes particular syntax and knowledge of the structure of the data store.
- one or more insights associated with the one or more predefined questions for the data source are generated ( 1904 ) based on one or more predefined types of analyses.
- a respective insight includes a natural language explanation (e.g., natural language explanation 1614 in FIG. 16 A , natural explanation 1624 in FIG. 16 B and/or natural language explanation 1644 in FIG. 16 E ) and/or a data visualization (e.g., data visualization 1616 in FIG. 16 A , data visualization 1626 in FIG. 16 B , and/or data visualization 1646 in FIG. 16 E ).
- the respective predefined question and associated respective generated insights of the one or more generated insights are selected ( 1906 ).
- the respective generated insight and associated respective predefined question of the one or more predefined questions are selected ( 1908 ).
- instructions for displaying on a display communicatively connected to the computing device the selected representative predefined question and associated respective generated insights and/or the selected representative generated insight and associated predefined question are generated ( 1910 ). Examples of display of the the selected representative predefined question and associated respective generated insights and/or the selected representative generated insight and associated predefined question are shown in FIGS. 16 A- 16 E .
- the selected representative predefined question and associated respective generated insights and/or the selected representative generated insight and associated predefined question are cached. Further, in accordance with a determination that the natural language query semantically matches the respective predefined question of the one or more predefined questions, the cached one or more insights (and/or predefined question) is retrieved.
- the generated one or more insights associated with one or more predefined questions for the data source is cached at the computer device and/or the data source.
- the one or more insights are generated based on one or more insight templates.
- the one or more insights are generated based on one or more insight templates.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- General Health & Medical Sciences (AREA)
- Databases & Information Systems (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Evolutionary Computation (AREA)
- Medical Informatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
A system receives a natural language query directed to a data source, and in response generates insights associated with predefined questions for the data source based on predefined types of analyses. In accordance with a determination that the natural language query semantically matches a respective predefined question, the computer system selects the respective predefined question and associated respective generated insights. In accordance with a determination that the natural language query semantically matches a respective generated insight, selecting the respective generated insight and associated respective predefined question. The system generates instructions for displaying on a display communicatively connected to the system the selected representative predefined question and associated respective generated insights and/or the selected representative generated insight and associated predefined question.
Description
- This application is related to U.S. Utility patent application Ser. No. 18/429,132 (Attorney Docket No. 061127-5342-US) entitled “Generating and Validating Data Insights Using Machine Learning Models,” filed Jan. 31, 2024, which claims the benefit of U.S. Provisional Patent Application No. 63/537,808, entitled “Metric Layer Bootstrapping,” filed Sep. 11, 2023, each of which is incorporated by reference herein in its entirety. This application is also related to U.S. Utility patent application Ser. No. 18/429,072 (Attorney Docket No. 061127-5336-US) entitled “Automatically Generating Metric Objects Using a Machine Learning Model,” filed Jan. 31, 2024, which claims the benefit of U.S. Provisional Patent Application No. 63/537,808, entitled “Metric Layer Bootstrapping,” filed Sep. 11, 2023, each of which is incorporated by reference herein in its entirety. This application is also related to U.S. Provisional Patent Application No. 63/537,808, entitled “Metric Layer Bootstrapping,” filed Sep. 11, 2023, which is incorporated herein in its entirety.
- The disclosed embodiments relate generally to data analytics and, more specifically, to systems and methods for semantically matching natural language queries with parameterized questions.
- Data analysis is the process of inspecting, cleaning, transforming, and modeling data with the goal of discovering useful information, drawing conclusions, and supporting decision-making. It involves the use of various techniques, methods, and tools to examine and interpret data, uncover patterns, and extract insights. The primary objective of data analysis is to gain a better understanding of the underlying trends, relationships, and characteristics within the data. Data analysis is widely used across various industries and domains, including business, finance, healthcare, science, and technology. It plays a crucial role in extracting meaningful information from large and complex datasets, helping organizations make informed decisions and gain a competitive advantage.
- There is an increasing demand for making business insights accessible to business users and other users (e.g., in sales, marketing, HR, finance, or others) without the need for data analysts or scientists to manually create KPIs, metrics, data visualizations, or other business insights. The consumers of business insights have the need to make data-driven decisions but typically rely on others to manually create and track metrics for a selected data source. For example, a data analyst manually selects or creates various metadata that is used to provide business context for a metric. This process can be time consuming and inefficient.
- Manual creation of metrics generally requires extensive knowledge of the syntax and structure of the data source. Inaccurate querying of the data source may result in incomplete or misleading raw or processed data. Accordingly, there is a need to provide users (such as users that are not data analysts or scientists) the ability to search and retrieve metrics and insights (e.g., processed data) from the data source while ensuring that those results are accurate and actionable. The above deficiencies and other problems associated with generating and retrieving metrics and insights without prerequisite knowledge to construct a structure search query are reduced or eliminated by the disclosed systems and methods.
- In accordance with some embodiments, a method is executed at a computer system having one or more processors and memory storing one or more programs configured for execution by the one or more processors. The method includes, receiving a natural language query directed to a data source. In response to receiving the natural language query, generating, based on one or more predefined types of analyses, one or more insights associated with one or more predefined questions for the data source, where a respective insight includes a natural language explanation and/or a data visualization. The method further includes, in accordance with a determination that the natural language query semantically matches a respective predefined question of the one or more predefined questions, selecting the respective predefined question and associated respective generated insights of the one or more generated insights, and in accordance with a determination that the natural language query semantically matches a respective generated insight of the one or more generated insights, selecting the respective generated insight and associated respective predefined question of the one or more predefined questions. The method further includes generating instructions for displaying on a display communicatively connected to the computing device the selected representative predefined question and associated respective generated insights and/or the selected representative generated insight and associated predefined question.
- In accordance with some embodiments, the computer system includes one or more input devices, one or more processors, and memory storing one or more programs. The one or more programs are configured to be executed by the one or more processors and the one or more programs include instructions for performing or causing performance of the operations of any of the methods described herein. In accordance with some embodiments, a non-transitory computer readable storage medium has stored therein instructions, which, when executed by one or more processors cause a computer system to perform or cause performance of the operations of any of the methods described herein.
- The disclosed methods, systems, and databases provide semantically matching natural language queries with parameterized questions.
- The patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawing(s) will be provided by the Office upon request and payment of the necessary fee.
- For a better understanding of the aforementioned systems, methods, and graphical user interfaces, as well as additional systems, methods, and graphical user interfaces that provide data visualization analytics, reference should be made to the Description of Embodiments below, in conjunction with the following drawings in which like reference numerals refer to corresponding parts throughout the figures.
-
FIG. 1 illustrates a schematic system diagram for automatically generating metrics using a machine learning model, in accordance with some embodiments. -
FIGS. 2-7 illustrate user interfaces for viewing and creating metrics, according to some embodiments. -
FIG. 8 illustrates a schematic system for generating metric objects using a machine learning model, in accordance with some embodiments. -
FIG. 9 illustrates a process for automatically generating metrics using a ML model, in accordance with some embodiments. -
FIG. 10 illustrates suggested metrics generated using a machine learning model, in accordance with some embodiments. -
FIG. 11 illustrates a user interface for viewing and creating metrics using machine learning models, in accordance with some embodiments. -
FIG. 12 is a flowchart diagram of a process for automatically generating metric objects, in accordance with some embodiments. -
FIG. 13 illustrates a schematic system for automatically generating insight summaries using machine learning model, in accordance with some embodiments. -
FIG. 14 illustrates a summary of a bundle of insights for multiple metrics, in accordance with some embodiments. -
FIG. 15 is a flowchart diagram of a process for generating and validating data insights using machine learning models, in accordance with some embodiments. -
FIGS. 16A-16F illustrate a user interface for querying and viewing insights via natural language, in accordance with some embodiments. -
FIG. 17 illustrates semantic comparisons between a natural language query and predefined questions and generated insights, in accordance with some embodiments. -
FIG. 18 illustrates a process for semantically matching natural language queries with predefined questions, in accordance with some embodiments. -
FIG. 19 is a flowchart diagram of a process for semantically matching natural language queries with predefined questions and/or generated insights, in accordance with some embodiments. - Reference will now be made to embodiments, examples of which are illustrated in the accompanying drawings. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention. However, it will be apparent to one of ordinary skill in the art that the present invention may be practiced without requiring these specific details.
- Metric definitions for metrics have typically been created manually by data analysts or other users with specialized knowledge in data analysis, data science, or other expert skills. In some embodiments, the described methods and systems provide a technique for automatically generating and recommending metric objects (e.g., respective metric definitions or metric data structures) using a machine learning model (e.g., a large language model).
- In some embodiments, a data analytics tool such as Tableau Pulse provides insights about data based on predefined metrics. After a metric is created, members of an organization can be added as followers of the metric. In some embodiments, such members can receive a regular email or a digest about metrics to which they are subscribed. Such emails or digests surface trends, outliers, and other changes, keeping followers up to date on relevant data. To learn more about the data, users can investigate a metric (e.g., on a system such as Tableau Cloud) and see how different factors contribute to changes in the data. Such insights into relevant data allow uses to make data-driven decisions without requiring complex analysis and configuration.
- Metrics are analytical objects that can be interacted with and viewed in a user interface. Metric definitions have an underlying data structure that represents a respective metric. In some embodiments, Table 1 below illustrates an example of a metric definition.
-
TABLE 1 Metric Definition for Super Store Definition field Example value Metric name Superstore Sales Measure Sales Aggregation Sum (e.g., SUM of Sales) Time dimension Order Date Adjustable metric filters Region, Category Number format Currency Favorability indicator Favorable (e.g., Value is going up) - In some embodiments, metrics (also referred to as metric objects) are created when additional data fields (e.g., business context data fields) associated with a metric are adjusted or configured. This occurs, for example, when respective time context options (e.g., time granularity) or filters are specified. In some embodiments, Tables 2 and 3 provide an example of the options configured for related metrics. These options are applied on top of the core value that is specified by a respective metric definition.
-
TABLE 2 Related metric for Superstore Sales - Technology Metric option Example value Time granularity Quarter to date Time comparison Compared to previous year Filters Category: Technology -
TABLE 3 Related metric for Superstore Sales - Office Supplies Metric option Example value Time granularity Year to date Time comparison Compared to previous year Filters Category: Office Supplies - A metric definition captures the core value that is being tracked. At a basic level, this value is an aggregate measure tracked based on a time dimension. The definition also specifies options such as the dimensions that viewers are able to filter by, the way the value is formatted, and the types of insights displayed. When a metric definition is created, the system (e.g., Tableau Pulse) automatically creates an initial related metric. The initial metric created for a definition has no filters applied. When users of an organization adjust the metric filters or time context in a new way, the system creates an additional related metric. For example, a member of a sales organization and/or other users of that organization may need to track metrics across different territories and product lines. In Tableau Pulse, a metric definition can be created that includes the core value of the sum of daily sales with adjustable metric filters for region and product line. Then, a user can create related metrics for each region and product line. Additionally, members of the organization can be added as followers to the related metrics to view where and what is being sold.
- In some embodiments, members in an organization follow metrics, not metric definitions. By following individual metrics, users can get insights specific to the dimensions that are relevant for them. The metric definition allows for managing data for related metrics from a single parent definition. If a field in a data source changes, the metric definition can be updated to reflect this change, and all metrics related to that definition will automatically reflect the change (e.g., without the need to update each related metric individually).
- In some embodiments, a
relationship 100 between a metric definition for a metric object labeled “Superstore Sales” described in Table 1 and the twometrics FIG. 1 .Portion 120 illustrates values of the data fields (i) measure, (ii) aggregation, and (iii) time dimension for the metric object labeled “Superstore Sales.”Portion 130 illustratesmetrics Metrics value 106 of the aggregated measure is displayed in the metric 102 (e.g., $243.1k).Metric 104 includes the sum of all sales Year to Date for category “Office Supplies,” where thevalue 108 of the aggregated measure is displayed in the metric 104 (e.g $153.3k).Metrics metrics - In some embodiments, metrics, such as the
metrics -
FIGS. 2-7 illustrate user interfaces for viewing and creating metrics, according to some embodiments. A manual process for creating metric definitions and/or respective metrics based on the metric definitions is illustrated inFIGS. 5-7 . -
FIG. 2 illustrates auser interface 200 for viewing and creating metrics, according to some embodiments.User interface 200 includescontrol 202 “All Metrics” that when selected, causes a computer system to display alist 220 of metrics (e.g., metrics to which a respective user has access to) that are stored and available for viewing. For each metric displayed in the list ofmetrics 220, attributes (also referred to as data fields, for example, in the context of metric definitions) associated with each metric are displayed in the list ofmetrics 220. The attributes include aname 204 that corresponds to a label or name of each metric. For example, in theuser interface 200, the metrics with names “Delivery Time” 204 a; “Resource Utilization” 204 b, and “ATV” 204 c are visible. The attributes further include adata source 206 from which the respective metrics are created. For example, the metric labeled “Delivery Time” 204 a is created for the data source “Shipping Record-Salesforce” 206 a; the metric labeled “Resource Utilization” 204 b is generated for the data source “HR Analytics” 206 b; and the metric labeled “ATV” 204 c is generated for the data source “Marketing Cloud Sales” 206 c. The attributes further include atime granularity 208 that corresponds to contextual time frames over which a respective measure associated with the respective metric is aggregated. For example, the metric labeled “Delivery Time” 204 a is aggregated for the period “Month to Date” 208 a; the metric labeled “Resource Utilization” 204 b is aggregated for the period “Day to Date” 208 b; and the metric labeled “ATV” 204 c is aggregated for the period “Week to date” 208 c. The attributes further includefilter options 210 that specify filters that are used to select the records that are used in aggregating the respective measure associated with the respective metric. For example, the metric labeled “Delivery Time” 204 a is aggregated across “Product” and “Region”dimensions 210 a; the metric labeled “Resource Utilization” 204 b is aggregated across “Employee” and “Department”dimensions 210 b; and the metric labeled “ATV” 204 c is aggregated across “Category” and “Customer”dimensions 210 c. The attributes further include anauthor 212 of respective metrics (e.g., the user that generated the respective metric manually or automatically). The attributes further include afollowers list 214 for respective metrics (e.g., users in an organizations that have subscribed to or follow performance of the metric). The attributes further include anactions list 216, which specifies the actions that can be performed with the respective metric. - Further, a
user input 218 is detected selecting the “Create New”control 230 for creating new metrics. In response to selecting thecontrol 230, auser interface 300 for selecting a data source is displayed, as illustrated inFIG. 3 .Data sources 312 that are available for selection are listed inuser interface 300. For each data source in the list ofdata sources 312, aname 302 and the number ofpotential metrics 304 are displayed inuser interface 300. For example, for thedata source 306 named “ESG at NTO,” the system determines that there potentially 5 metrics can be generated. In some embodiments, a metrics service (e.g., themetrics service 812, described in relation toFIG. 8 ) determines the number ofpotential metrics 304 for each data sources based on a number of data fields in the respective data source that corresponds to a measure. -
FIG. 3 further illustratesinput 308 that selects thedata source 306 named “ESG at NTO.” In response, theuser interface 400 for creating ametric definition 402 is displayed inFIG. 4 . InFIG. 4 , thedata source 306 named “ESG at NTO” is already pre-populated in themetric definition 402 in response to theuser input 308. As illustrated inFIG. 4 , themetric definition 306 includes a number of data fields 404-414. Some data fields, such asname 404,measure 408, andtime dimension 410 correspond to core data fields, and other data fields such as thedescription 406,number format 412, andmetric filter 414 are optional or additional contextual data fields. - In some embodiments, a
name 404 is a label for the metric that is generated based on the created metric definition, such as “Delivery Time,” “Resource Utilization,” “Average of Partner Happiness,” “Sum of Emissions,” “Sum of Energy Use,” “Partner PayGap,” “Appliance Sales,” “Branch Revenue,” “Campaign ROI,” and/or other textual description of a respective metric). In some embodiments, ameasure 408 corresponds to a data field in the data source, such as a column in a relational database table (e.g., Revenue, Expenses, or other measures depending on the data source). For example, measures that can be selected are fetched from the selecteddata source 306. In some embodiments, fetched measures and associated aggregation types can be selected together. Examples of aggregation types include, but are not limited to, SUM, AVG, MAX, MIN, Median, Percentile, Standard Deviation, and COUNT. In some embodiments, atime dimension 410 corresponds to a data field in the data source that includes date and/or time (e.g., order date) by which the measure is aggregated. -
FIGS. 5-7 illustrate a user manually selecting respective values for the core data fields 404, 408, and 410 of themetric definition 402. Alternatively, a user can select a “Suggest Metrics”control 416 for automatically generating suggested metrics, and a metric service prompts a machine learning model to generate suggested or recommended metrics for the selecteddata source 306. This process for automatically suggesting metrics using a machine leaning model is described in further detail in relation toFIGS. 8-9 . -
FIG. 5 illustrates amenu 500 for selecting or searching for measures in the selecteddata source 306. In some embodiments, themenu 500 is displayed in response to a user input selecting adata field 408. In themenu 500, a number of measures are displayed, including afirst measure 416 “Customer Satisfaction,” asecond measure 420 “Partner Happiness,” athird measure 422 “Product Quality,” afourth measure 424 “Reliability Perception,” afifth measure 426 “Energy Use,” and asixth measure 428 “Pay.” InFIG. 5 , the user has selected thefirst measure 416 “Customer Satisfaction,” and theaggregate type 418 is pre-populated by default. InFIG. 5 , thepre-populated aggregation type 418 is “Average.” Further,FIG. 5 illustratesuser input 430 selecting theaggregate type 418 “Average.” -
FIG. 6 illustrates a transition fromFIG. 5 in response to theuser input 430 selecting theaggregate type 418 “Average,” and further input selecting thedata field 410 that corresponds to a time dimension. In response to user input selecting thedata field 410, themenu 600 for selecting or searching for dimensions in the selecteddata source 306 is displayed. In themenu 600, a number of dimensions are displayed, including afirst dimension 602 “Survey Data,” asecond dimension 604 “Transaction,” and a third dimension “Date” 606. InFIG. 6 , the user has selected thefirst dimension 602 “Survey Data,” and the time granularity 608 is pre-populated by default. InFIG. 6 , the pre-populated time granularity 608 is “Monthly.” Further,FIG. 6 illustratesuser input 610 selecting the time granularity 608 “Monthly.” -
FIG. 7 illustrates a metric “Customer Satisfaction Score” 700 generated based on themetric definition 402 created manually by the user, as illustrated inFIGS. 4-6 . The metric “Customer Satisfaction Score” 700 illustrates average customer satisfaction based on survey data in aline chart 702. Further, a textual description 704 of the performance or trend of the average customer satisfaction is also included in the metric “Customer Satisfaction Score” 700. - In some embodiments,
FIG. 8 illustrates asystem 800 for generating metric objects using a machine learning model, andFIG. 9 illustrates aprocess 900 for generating metric objects using a machine learning model. -
FIG. 8 illustrates aschematic system 800 for generating metric objects using a machine learning model. In some embodiments, auser 802, via auser interface 804, can initiate a process for creating one or more metrics. For example, theuser 802 may select a control inuser interface 804, and in response, themetric service 812 in themetrics layer 806 retrieves data from one or more data sources, such as data source 814 a anddata source 814 b. In some embodiments, themetrics layer 806 accesses or retrieves data from a variety of data sources including, but not limited to, Comma-Separated Values (CSV) files, Excel Spreadsheets, relational databases (e.g., MySQL, PostgreSQL, Microsoft SQL Server, or Oracle Databases), cloud-based data sources (e.g., Google BigQuery, Amazon Redshift), NoSQL Databases (e.g., MongoDB), Web Data Connectors, and other multidimensional, relational, and/or hierarchical data sources. A user can select data retrieved from the data source 814 a and thedata source 814 b. - In some embodiments, a
user 802 can manually create a metric definition (e.g., themetric definition 402 inFIG. 7 ) for a metric (also referred to as a metric object) (e.g., the “Customer Satisfaction Score” 700) by providing values for data fields (e.g., one or more of data fields 404-414) that are included in a metric definition. The manual process for creating a metric object is further described in relation toFIGS. 5-7 . In some embodiments, theuser 802, via theuser interface 804, can request that metrics are generated automatically. In some embodiments, the process for automatically generating suggested metrics is further described in relation toFIG. 9 . In some embodiments, the suggested metrics can be generated without theuser 802 specifying any data field (or attribute or metadata) of the metric definition. In some embodiments, theuser 802 can specify some data fields of the metric definition, and the remaining data fields can be automatically generated (e.g., by the metrics service 812). - In response to the
user 802's request, themetrics service 812 retrieves respective data fields from the selected data source. In some embodiments, themetric service 812 requests fetching of data fields (e.g., all data fields) from the selected data source, and determines a subset of data fields from the fetched data fields that correspond to measures. Ametrics service 812 sends a prompt (e.g., a request), via application programming interface (API) 810, to a machine learning (ML)model 816 to generate a respective metric definition for each of the subset of measures in the fetched data fields. In some embodiments, the metrics service 812 (e.g., conceptually part of a metrics layer) is called or used by various analytical tools and applications (e.g., by Tableau Pulse). In some embodiments,metrics service 812 makes one prompt request per metric. - In some embodiments, the
ML model 816 is a large language model, such as a generative pre-trained transformer. In some embodiments, theML model 816 is pre-trained on a data source that already includes metadata and/or semantics that have been pre-configured or predetermined by a user (e.g., a data analyst or other user that has domain knowledge). In some embodiments, examples of such semantics and/or metadata data include, but are not limited to, columns that are labeled as measures; names or labels for the measures; usage of the measures across different workbooks, including labels or descriptions of those workbooks (e.g., the measure Sales has been used in a workbook described or labeled as Sales over time), pre-existing aggregations of the measures, calculations in which the measures were used, and/or breaking down of the measures by different dimensions; formatting, styling, and/or visual encodings associated with the measures; and data visualizations in which the measures have been used. In some embodiments, such semantics and/or metadata in the data source provide domain specific knowledge on which theML model 816 is trained. In some embodiments, in addition to training theML model 816, such semantics and/or metadata are provided to theML model 816 from the selected data source in the prompt request (e.g., provided as input to the machine learning model). In some embodiments, theML model 816 is further trained on textual corpuses produced by humans, such as publications and resources on the Internet. In some embodiments, training theML model 816 on such textual corpuses is advantageous because theML model 816 can determine additional semantic and business context that is not typically available in the selected source. For example, theML model 816 can determine whether a particular change of an aggregated measure over time is positive, neutral, or negative, thereby generating values for a data field of a metric definition that corresponds to a favorability indicator. - In some embodiments, the
ML model 816 outputs a number of metric definitions for suggested metric, and the generated metrics definitions are returned to themetrics service 812 via theAPI 810. Themetrics service 812, after transforming and/or de-duplicating the generated metrics definitions, sends a final list of suggested metrics touser interface 804. In some embodiments, theuser 802 may modify the suggested metrics, save, and/or discard some of them. In some embodiments, the generated metrics (and respective metric definitions) are stored in ametrics database 808 and used by themetrics service 812. In some embodiments, metric definitions generated byML model 816 are cached per data source to reduce the computational cost (e.g., the number of requests) of using theML model 816. - In some embodiments,
metrics service 812 prompts theML model 816 in response to user requests. In some embodiments, themetrics service 812 scans the available data sources for changes (e.g., adding measures), and can automatically suggests metrics to theuser 802 without additional user input requesting the suggested metrics. In some embodiments, theuser 802 can export charts as metrics, thereby creating metric definitions without entering values for the data fields and without prompting theML model 816. For example, a user may be viewing a line chart with measures and dimensions in a workbook, and themetrics service 812 may prompt the user to generate a metric from the line chart and the respective measure and dimension. In some embodiments, theML model 816 infers or generates remaining data fields of the metric definition that remain uncompleted by exporting the line chart as a metric. -
FIG. 9 illustrates aprocess 900 for automatically generating metrics using a ML model.Process 900 is initiated byuser 902 atstep 914 by selecting or clicking on a control for creating metrics (e.g.,user input 218 selectingcontrol 230 for creating new metrics, as inFIG. 2 ). In response, atstep 916,user interface 904 sends a request to retrieve or load available data sources 908. Atstep 920, available data sources (e.g.,data sources 312 inFIG. 3 ) are loaded inuser interface 904, and, atstep 918, the loaded data sources are displayed to user 902 (e.g., inuser interface 300 for selecting a data sources,FIG. 3 ). Atstep 922,user 902 selects respective data sources from the displayed data sources (e.g.,user input 308 inFIG. 3 ). Further, atstep 923,user 902 selects a control that requests that suggested metrics be generated (e.g.,control 416 inFIG. 4 ). In response, atstep 924, a request is sent from theuser interface 904 tometric service 906 to infer or determine metrics from the selected data source (e.g.,data sources 306 selected fromdata sources 312 inFIG. 3 ). In response to the request, atstep 926,metric service 906 sends a request to the selected data source indata sources 908 to fetch the data fields in the data sources. At step 928, the selected data source indata sources 908 fetches or sends the data fields (e.g., data fields names) in the selected data source. Atstep 929,metric service 906 selects each measure in the fetched data fields (e.g., loops over a subset of data fields that correspond to measures in the fetched data fields) and, atstep 930, sends a prompt for each measure (e.g., a separate prompt is optionally sent for each measure) in the fetched data fields to machine learning model 910 (e.g., large language processing model). Atstep 932,machine learning model 910 responds and sends tometric service 906 generated metric definitions for each measure in the fetched data fields. Atstep 931,metric service 906 selects each metric definition (e.g., optionally one by one in a loop cycle) and performs post-processing operations. For example, atstep 934,metric service 906 parses and transforms each metric definition. Additionally, atstep 936,metric service 906 checks whether a respective metric definition in the generated metric definitions is a duplicate of another metric definition that is already stored in metrics database 912 (e.g., the check for duplicates can be performed using a has function). Atstep 938, themetrics database 912 returns tometric service 906, a determination whether a duplicate exist for each respective metric definition. Atstep 940,metric service 906 removes any duplicates from the list of generated metric definitions. Atstep 942,metric service 906 returns a list of suggested metrics (e.g., a list of the generated metric definitions that are transformed and without duplicates) touser interface 904. Atstep 942, the list of suggested metrics is displayed in user interface 904 (e.g., available for viewing by user 902). For example, a list of suggestedmetrics 1002 is illustrated inFIG. 10 . In some embodiments, atstep 946,user 902 selects some or all of the suggested metric for saving in the metrics database 912 (e.g., by checking or unchecking a selection box, such as selection boxes. In some embodiments, atstep 948, themetric service 906 receives a request fromuser interface 902 to generate and store metric definitions in themetrics database 912 for all metrics selected instep 946, and, in response,metric service 906 stores all metrics selected instep 946 in themetrics database 912. - In some embodiments, metrics that are based on metric definitions generated by a machine learning model can be tagged to indicate to a user that a further review or validation is necessary. In some embodiments,
user 902 can edit, validate, or otherwise interact with the metrics based on metric definitions generated by the machine learning model. -
FIG. 10 illustrates suggestedmetrics 1002 generated using a machine learning model in accordance withprocess 900. For example, for the selecteddata source 306data source 306 named “ESG at NTO,” the machine learning model generated and provided as output suggestedmetrics 1002 based on available measure in the selecteddata source 306. The list of suggestedmetrics 1002 includes metric 1004 “Partner Happiness Score”, metric 1006 “Scope 1 Emissions”, metric 1008 “Renewable Energy Usage”, metric 1010 “Partner Pay Gap.” Each metric can be selected or unselected for storage in themetrics database 912 usingrespective controls metrics 1002. For example, for metric 1004 “Partner Happiness Score”, theaggregate measure 1004 a is “Average of Partner Happiness” and thetime dimension 1004 b is “Survey Data”; for metric 1006 “Scope 1 Emissions”, theaggregate measure 1006 a is “Sum of Emissions” and thetime dimension 1006 b is “Date”; for metric 1008 “Renewable Energy Usage”, theaggregate measure 1008 a is “Sum of Energy Use” and thetime dimension 1006 b is “Date”; for metric 1010 “Partner Pay gap”, and theaggregate measure 1010 a is “Maximum (Pay)-“Minimum (Pay)” and thetime dimension 1010 b is “Review Data.” - In some embodiments, the suggested
metrics 1002 are all saved in themetrics database 912 in response touser input 1014 selectingbutton 1012 for saving/storing selected metrics in the list of suggestedmetrics 1002. -
FIG. 11 illustratesuser interface 200 for viewing and creating metrics that displays newly created metrics using machine learning model. After all suggestedmetrics 1002 are saved or stored, the newly generated metrics appearuser interface 200 for viewing and creating metrics. The newly created metrics are tagged as such so that the user can recognize which metrics have been recently added. For example, metric 1004 “Partner Happiness Score”, metric 1006 “Scope 1 Emissions”, metric 1008 “Renewable Energy Usage”, and metric 1010 “Partner Pay Gap” are all tagged as in new inuser interface 200. - In some embodiments, various metrics can be built based on the same metric definitions. In some embodiments, a user can subtribe to or follow multiple metrics. In some embodiments, metrics cards are displayed in a user interface (e.g., a user interface in a web browser application, a desktop application, or a mobile application) for metrics and insights. In some embodiments, metric cards include (i) a value of an aggregated measure of the respective metric; (ii) a data visualization (e.g., a line chart) that visualizes the respective aggregated metric; and (iii) an insight based on the aggregated metric that includes to a textual description of the aggregated metrics. In some embodiments, metric cards (metric cards 1420, 1422, and 1424) are illustrated in
FIG. 14 . - In some embodiments, a data model can represent relationships between the metrics. Examples of relationships between the metrics include, but are not limited to, “a leading indicator relationship” (e.g., one metric being a leading indicator for a change in another metric); a “lagging indicator relationship” (e.g., one metric being a lagging indicator for a change in another metric); “a positive influencer relationship” (e.g., one metric being a positive influencer for a change in another metric); “a negative influencer relationship” (e.g., one metric being a negative influence for a change in another metric); a “component of” relationship. In some embodiments, the machine learning model is trained on metrics and various relationships that exist between the metrics. In some embodiments, an output of the machine learning model includes a list of tuples, each corresponding to a Relationship that likely holds between two of the metrics. In some embodiments, each tuple is in a format of (Metric, Relationship Type, Metric), e.g., (Open Pipeline, Leading Indicator, Revenue). In some embodiments, an insight generator module can generate insights based on relationships between the metrics (e.g., based on a data model of the relationships between metrics). In some embodiments, a data model of the relationships can be stored in metrics database (e.g., metrics database 808) or a different database to which
metrics service 812 has access to. - In some embodiments, the following description provides examples of different types of metric relationships. The format that is used to describe the type of relationships is: “Metric [relationship type] Target Measure or Dimension.”
-
- (1) “Metric [relates to] Time Dimension.” This relationship is used to generate time-based insights (e.g., jump, spike, change in trend, seasonality). For example, “Sales [relates to] Order Date Dimension.”
- (2) “Metric [relates to] Flat categorical dimension.” This relationship is used when the dimension is a non-hierarchical, text based/non-numeric) dimension. For example, “Hires [relates to] Gender.”
- (3) “Metric [relates to] Hierarchical dimension (parent child or Level based).” This relationship represents that a metric is related to a hierarchical dimension. In some embodiments, the hierarchy can be based on a parent-child relationship (e.g., Organization, Department, or Cost Center) or it can be level-based (e.g., Location or product hierarchy). For example, “Hires [related to] Departmental Hierarchy,” or “Revenue [relates to] Product Hierarchy.” Example insights (e.g., answers to business questions) that can be based on this relationship includes (i) Peer level insights (e.g., the diversity ratio for your department is higher than your peers), and (ii) parent-child level contribution insights (e.g., Revenue for the sporting goods product category was largely driven by the snowboards, e-bikes and scooters products).
- (4) “Metric A [comparable to] Metric B (metric, benchmark, or goal).” This relationship is used to compare one metric to another metric, benchmark, or goal. Different insights can be based on this relationship, e.g., insights around variance, or progress to goal or benchmark. This relationship can be used when new metrics or plans are loaded from an external system. Examples of this relationship include “Headcount [Comparable to] Headcount Goal”, or “Diversity Ratio [compares to] EEOC.” Example insights (e.g., answers to business questions) that can be based on this relationship includes (i) Metric Progress to Plan insights (e.g., Headcount is currently not tracking to the Headcount Plan); and (ii) Benchmark or Metric comparison insights (e.g., the diversity ratio for the Engineering department is 5 pp higher than the EEOC benchmark).
- (5) “Metric A [leading/lagging indicator to] Metric B.” This relationship can be used when one metric is a leading or lagging indicator of another metric. This relationship can be used to create cross-metric insights that, for example, provide indication (e.g., in advance) that a metric a user is subscribed to is likely going to be impacted. An example insight (e.g., answers to business questions) that can be based on this relationship is: Leading Metric Insights (e.g., the trend of Time Since Promotion is increasing and as a result may have an impact on Exit rate).
- (6) “Metric A [parent stage to] Metric B.” This relationship can be used to generate Funnel Insights. In some embodiments, modeling a funnel as a sequence of stages along with key metrics at each stage allow the detection of several funnel/pipeline related insights. An example of this relationship is: in a 5 stage sales funnel, #Prospects [parent stage to] #MQL [parent stage to] #SQL [Parent stage to] #Opportunities [parent to] #customer. Example insights (e.g., answers to business questions) that can be based on this relationship includes (i) Stage conversion questions (e.g., the MQL/SQL conversion ratio is 35% this week & trending down); (ii) Stage entry, abandonment questions (e.g., 37% of SQL entered the funnel at the SQL stage); and (iii) Funnel forecast questions (e.g., based on the current recruiting pipeline Hires is forecasted to be 245 in Q4).
- (7) “Metric A [included in Formula of] Metric B.” This relationship is used when one metric is referenced in the formula of another metric. For example, the metrics Sales and Expenses are included in the formula for calculating the metric Profit (e.g., Profit=Sales−Expenses), or the metrics Headcount, Hires, and Exits are included in the formula for calculating the metric Net Headcount.
- (8) “Metric A [parent to] Metric B.” This relationship is used in cases where several SUM metrics are modeled together in a hierarchy that rolls up in financial cases (for example the different metrics in a general ledger.” This relationship can be used to generate contribution insights. Example insight that can be based on this relationship is “Total cost of Workforce [parent of] Benefits|Salaries & Office Space metrics and Office Overhead which is a [Parent of] Rent|Insurance|equipment).”
- (9) “Metric A [correlated positively/negatively to] Metric B.” This relationship is used to represent a case where one metric is correlated to another metric, and also represent how the two metrics are correlated (e.g., positively or negatively). In some embodiments, some correlation relationships are blacklisted. For example, correlations that are obvious (e.g., profits being correlated to revenue) as such correlations provide minimal insight. Example insights include (i) higher app usage leads to more up-sells; (2) a lead's score is not predictive of its pipeline value; and (iii) individual who have managers that are highly rated take fewer sick days.
- (10) “Metric [relates to] Geo Hierarchy.” This relationship represents a relationship between a metric and geo (or location) hierarchy dimension. Business questions related to time or location are common. An example insight is “Hires [related to] Location Geo Hierarchy.”
- In some embodiments, relationships can be modeled manually. In some embodiments, relationships can be built by a user using templates (e.g., a package of relationship for a respective domain). In some embodiments, the user can map metrics and dimensions of a selected data source to metric definitions and templated relationships (e.g., in the package). In some embodiments, the relationships can be detected and suggested automatically by an insights service. For example, for a selected metric, the insights service inspects the data source and returns possible relationships of the selected metric to other measures and dimensions in the data source using statistics. In some embodiments, modeling metric relationships can be bootstrapped using a service that scrapes key measures, dimensions and their relationships from existing views of data in existing dashboards, and imports the same as modeled relationships in the metrics layer. In some embodiments, as part of the import workflow, user can select/de-select different mappings and relationships.
-
FIG. 12 illustrates a process 1200 for automatically generating metric objects. A plurality of data fields are obtained (1204) from a selected data source. A first subset of the plurality of data fields correspond to a plurality of measures and a second subset of the plurality of data fields correspond to a plurality of dimensions. In some embodiments, a user (e.g., analytics professional or a business user) selects a data source. In some embodiments, a metrics service automatically scans for available data sources. In some embodiments, some data fields in the obtained plurality of data fields are measures and some are dimensions. - A machine learning model is prompted (1206) to generate a plurality of suggested metric objects. In some embodiments, a metric service may prompt (or send a request) to the machine learning model requesting that the machine learning model suggests metric objects. In some embodiments, the metric service may generate and send the prompt in response to a user input or an automatically generated request from a computer system. In some embodiments, the machine learning model is a large language model, such as a generative pre-trained transformer. In some embodiments, the machine learning model is trained on a data source that already includes metadata and/or semantics that have been pre-configured or predetermined by a user (e.g., a data analyst or other user that has domain knowledge). In some embodiments, examples of such semantics and/or metadata data include, but are not limited to, columns that are labeled as measures; names or labels for the measures; usage of the measures across different workbooks, including labels or descriptions of those workbooks (e.g., the measure Sales have been used in a workbook described or labeled as Sales over time), pre-existing aggregations of the measures, calculations in which the measures were used, and/or breaking down of the measures by different dimensions; formatting, styling, and/or visual encodings associated with the measures; data visualizations in which the measures have been used. In some embodiments, such semantics and/or metadata in the data source provide domain specific knowledge on which the machine learning model is trained. In some embodiments, in addition to training the machine learning model, such semantics and/or metadata are provided to the machine learning model from the selected data source in the prompt request (e.g., provided as input to the machine learning model).
- In response to prompting the machine learning model, a respective metric definition is generated (1208) for each measure in the plurality of measures, wherein each generated respective metric definition includes a plurality of data fields, including: (i) a name (e.g., a metric label or a textual description of a metric, such as “Delivery Time,” “Resource Utilization,” “Average of Partner Happiness,” “Sum of Emissions,” “Sum of Energy Use,” “Partner PayGap,” “Appliance Sales,” “Branch Revenue,” “Campaign ROI,” and/or other textual description of a respective metric); (ii) a measure; (iii) a time dimension; and (iv) an aggregation type (e.g., SUM, AVG, MAX, MIN, Median, Percentile, Standard Deviation, COUNT). In some embodiments, the measure corresponds to a data field in the data source, such as a column in a relational database table (e.g., Revenue, Expenses, or other measures depending on the data source). For example, the measure is already existing and stored in the data source. In some embodiments, the measure may be a calculated measure. For example, a data field maybe created using a calculation optionally from other already existing or calculated measures (e.g., Revenue−Expenses to calculate a Profit measure). In some embodiments, a metric service sends a prompt to the machine learning model requesting a metric definition for each identified data field in the data source that corresponds to a measure. In some embodiments, the time dimension corresponds to a data field in the data source that includes date and/or time (e.g., order date) by which the measure is aggregated.
- In some embodiments, metric objects are defined by the respective metric definition. In some embodiments, metric objects (also referred to as metrics) are analytical objects that can be used in data analysis and/or business intelligence tools. In some embodiments, metrics can be used in informational snippets referred to as insights that provide contextual (and optionally personalized) information for a respective matric, including optionally information about performance of the metric in relation to other metrics and/or across different dimensions.
- In some embodiments, metric objects are entities that are subject to analysis, e.g., to gain insights about data and/or make informed decisions. In some embodiments, metric objects can be generated manually. For example, a user can select a measure, time dimension, an aggregation type, name, and/or other fields that are included in a definition of a metric. In some embodiments, some data fields of a metric definition can be manually generated while others can be bootstrapped or otherwise generated automatically by using the machine learning model. In some embodiments, a plurality of metrics are predefined (e.g., templates with preset metric definitions). In some embodiments, the metrics definitions are stored in a database, and a metric service retrieves, changes, and/or adds to the stored metrics.
- In some embodiments, some of the plurality of fields can be provided by a user and a remainder of the data fields can be generated or suggested by the machine learning model (e.g., on the fly). In some embodiments, the plurality of data fields that are generated and/or suggested by the machine learning model can be validated by a user or another machine learning model. In some embodiments, the machine learning model can generate personalized metric definitions based on metadata and/or usage of the measures by a respective user.
- In some embodiments, the plurality of data fields include (1210) additional contextual fields, including one or more related dimensions. For example, candidate dimensions by which the measure can be analyzed meaningfully (e.g., the measure Revenue can be meaningfully analyzed by the dimensions Region and Product, where analyzing Revenue by Order Id is not helpful). In some embodiments, the one or more related dimensions are predicted by the machine learning mode to likely be useful for breaking down or filtering the metric. In some embodiments, a threshold number of dimensions can be included in a metric definition (e.g., no more than five dimensions may be returned by the machine learning model). In some embodiments, one or more related dimensions can be selected (e.g., inferred) and/or generated by the machine learning model or by a user. In some embodiments, a metric that has dimensions associated with it is referred to as a scoped metric.
- The plurality of data fields include (1212) additional contextual fields, including time granularity. In some embodiments, different time granularities are appropriate for the respective aggregation of the measure associated with the generated metric. For example, the measure can be aggregated by an hour, a day, a week, a month, last then days, last 30 days, last 6 months, or any other time frame that is suitable for the respective measure aggregation. For example, sales maybe be aggregated for the day, or depending on the data in the data sources, sales should not be aggregated across less than a week time frame. In some embodiments, time granularity can be selected and/or generated by the machine learning model or by a user. In some embodiments, a metric that has time granularity associated with it is referred to as a scoped metric.
- In some embodiments, the plurality of data fields includes (1214) additional contextual fields, including a favorability indicator. In some embodiments, the machine learning model generates, infers, or suggests a favorability indicator related to performance of the respective measure. For example, if a value of the aggregated measure is going up (e.g., historically or over specified comparative time frames), the machine learning model can infer if such a change is positive (e.g., good), negative (e.g., bad), or neutral; if the change is normal or unusual. The favorability indicator controls contextual information related to the metric such as color (e.g., red to indicate negative change, green to indicate positive change, and a neutral color such as blue or grey to indicate a neutral change); and language that can be used to describe the metric in digests and/or insights (e.g., “Sales improved . . . ” for a positive change vs. “Sales increased . . . ” for a neutral change). In some embodiments, additional contextual fields that the machine learning model can generate or infer include a description of the metric in natural language (e.g., a sentence-long description of the metric in natural language, e.g., description for a non-technical person) and/or formatting or appropriate styling for the measure.
- In some embodiments, the machine learning model is (1216) a generative artificial intelligence model. In some embodiments, the generative artificial intelligence model can generate textual description for the metric (e.g., based on the respective metric, and data and other data fields in the data source).
- In some embodiments, the machine learning model that is used to generate the metric definition is a first machine learning model, and the metric definitions generated by the first machine learning model are validated (1218) using a second machine learning model
- In some embodiments, one or more suggested metric objects are displayed (1220) in a user interface, where each suggested metric objects is based on a respective metric definition generated by the machine learning model. For each of the one or more suggested metric objects, an option for selecting a respective suggested metric object to be saved in a metrics database that includes other metric objects is displayed is displayed in the user interface.
-
FIG. 13 illustrates aschematic system 1300 for automatically generating insight summaries usingmachine learning model 1340, in accordance with some embodiments. In some embodiments,insight generation platform 1310 is conceptually a layer on top of themetric layer 1308, and both have access (direct or indirect) to data sources(s) 1324 (e.g.,data sources FIG. 8 ). In some embodiments,metric layer 1308 corresponds tometric layer 806, themetrics service 1322 corresponds tometrics service 812, andmetrics database 1320 corresponds tometric database 808, described in further detail with reference toFIG. 8 .Metric bootstrapper 1365 is a module (e.g., a program) for generating metrics using machine learning model 1340 (e.g., in response user input, or autonomously without additional user input) via application programming interface (API) 1360 (e.g.,API 1360 corresponds toAPI 810 inFIG. 9 ). In some embodiments,metric bootstrapper 1365 automatically generates metrics using themachine learning model 1340 in accordance with process 900 (described in relation toFIG. 8 ) and process 1200 (described in relation toFIG. 12 ). - In some embodiments, metric definitions are created manually (e.g., by a business analyst) or automatically (e.g., by metric bootstrapper 1365) and stored in
metrics database 1320. An example metric definition for a sales metric has one or more values for each respective data field in the metric definition, such as (i) name: Sales; (ii) measure: SUM(Order_Value); (iii) Definitional Filters: “where Order_Value>$100”; (iv) Time Dimension: Order_Closed_Date; (v) Dimensional Filter(s): Region, Product Category, and Sales Channel; (vi) Time Granularities: weekly, monthly, quarterly. In some embodiments, the metric definitions can be scoped by a user by selecting values for some additional contextual fields. For example, for the sales metric described above, a user can specify additional filters and/or specific values for fields that have more than one value. For example, a user can filter the aggregated sales metric by Region (e.g. Region==“East”); specify that the time granularity is “weekly”; and specify a time comparison: “previous period.” - In some embodiments,
insight generation platform 1310 includesinsight service 1328 andinsights database 1326. In some embodiments,insight service 1328 generates insights for selected metrics (e.g., a metric selected from the metrics database 1320). In some embodiments, insights are natural language expressions that explain or provide description of changes (or other information) related to a selected metric that are occurring in a respective (e.g., selected) data source (e.g., performance of an aggregated metric for a specific time frame, and/or across different dimensions). In some embodiments, theinsight type generator 1350 receives as input scoped metrics (e.g., theinsight service 1328 provides the metrics scoped by the user to the insight type generator 1350). In some embodiments, theinsight type generator 1350 generates one or more different types of insights for the scoped metric using one or more different insight type templates (e.g., stored in insights database 1326). For example,insight generator 1350 can generate a Top Drivers insight for the Sales metric, where the Top Drivers insight provides information about which dimension is the main cause (e.g., or top driver) for the changes occurring for the aggregated measure in the Sales metric. To determine a top driver, theinsight type generator 1350 calculates, for each dimension member (i) the contribution amount for a previous time range; and (ii) the contribution amount for a starting time range. It then determines the amount of change driven/explained by calculating the difference in contribution. An example template is: “During the last [scope]: [metric_name] increased by [−/+percent]. [Dimension]_member_list] increased the most,” and example insight based on that Top Driver template is: “During the last period, Sales increased by $365K. ePhones, Simpson Phones, and Smart Home increased the most.” - In some embodiments, the generated insights are stored in
insights database 1326. In some embodiments, various types of insight templates are stored in an insights database. Example types of insights include “Profit Changed” Insight (e.g., with respective associated metric being Profit); a “Revenue changed” Insight; Top Driver Insight, and other types. Example insights that are generated based on the data in the data source and respective insight templates include, but are not limited to, “Gross margin is 14.4% greater than last month;” “Delivery Day increased by 5.6% compared to the same time last month;” and other natural language descriptions of the performance of a respective aggregated measure associated with a respective metric. - In some embodiments, a collection of insights, each generated for a metric of a set of selected metrics (e.g., metrics that the user is following) are stored using a data structure that is referred to as a bundle (e.g., bundles are optionally stored in insights database 1326). In some embodiments, a bundle is a collection of insights (e.g., optionally generated using templates) for multiple metrics to which a respective user is subscribed to or is following. In some embodiments, bundles are predefined or created dynamically. For example, a “Metric Change” bundle can contain a first insight about a metric that changed and a second insight that explains the change. Examples of bundles include, but are not limited to, “The week to date Appliance Sales has seen an unusual increase of +356 (+27%) over the last week and is now 17% above the normal range”; “The quarter to date Branch Revenue had an unusual increase of +1.2% over the equivalent quarter to date value a year ago, mainly attributed to Home Appliance”; and/or “The monthly Campaign ROI has been increasing at a rate of 1 percentage point per month for the past 6 months, in line with change in rate of increase on Social Media campaigns.” In some embodiments, a bundle can include relationships between insights included in the bundle. For example, bundles can define or include semantic relationships between the insights included in the bundle. In some embodiments, the semantic relationships can be expressed with text. For example, a bundle may contain a “Profit Changed” Insight and a “Revenue changed” Insight, with a “caused by” Relationship between the two insights to capture a business insight, such as “Profit rose, driven by an increase in revenue.” The templated output of a bundle can include that relationship, e.g., “Profit rose 15%. This was caused by: Revenue rose 18%”.
- In some embodiments, a bundle of insights is generated by
insight type generator 1350 and stored ininsights database 1326. In some embodiments, the bundle of insights is provided toinsight summarization module 1330 to generate a summary of insights for all metrics that a user is following. Theinsight summarization module 1330 provides the bundle of insights as input tomachine learning model 1340 optionally using application programming interface (API) 1360. In some embodiments, themachine learning model 1340 is used by metrics bootstrapper 1332 to automatically generate metric definitions, and byinsights summarization module 1330 to summarize all insights in a respective bundle of insights (e.g., for all metrics a user is following). In some embodiments, metrics bootstrapper 1365 andinsights summarization module 1330 use different machine learning models. In some embodiments,insight summarization module 1330 ranks respective insights included in the bundle of insights according to relevancy of respective metrics associated with the respective insights, and provides the ranked insights in the bundle as input to themachine learning model 1340. Theinsight summarization module 1330 generates a summary of the bundle of insights using themachine learning model 1340. In some embodiments, theinsight summarization module 1330 validates the summary of the bundle of insights using regular expression and heuristics, or using a second machine learning model. In some embodiments, the summary of the bundle of insights includes a natural language expression describing top insights included in the bundle for metrics, which the user is following. For example, a summary can be: “Sales is seeing unusual spike since the beginning of this week, while quarterly Regional Revenue and monthly Campaign ROI are steadily increasing. Additionally, 7 out of 12 other metrics have changed, 4 favorably and 3 unfavourably.” - In some embodiments, validating accuracy of the first summary of the bundle insights includes comparing a respective aggregated measure included in a respective insight in the bundle of insights to a respective aggregated measure included in the summary and determines if there any changes in the respective aggregated measure. For example, a insight summarization model determines whether an aggerated measure in the summary is the same as a respective aggregated measure in the original insight.
- In some embodiments, selectable business questions can be generated and displayed for a
user 1302 in user interface(s) 1304 viadata experiences platform 1306. - In some embodiments, a summary of the insights is provided in user interface(s) 1304 via
data experiences platform 1306. -
FIG. 14 illustrates auser interface 1400 that includes asummary 1404 of insights related to metrics to which a user is following, in accordance with some embodiments. Thesummary 1404 informs the user of the following: “Appliance Sales is seeing an unusual spike, while Branch Revenue and Campaign ROI are steadily increasing. Of the 12 metrics you are following, 2 is unusual.” The top metrics insummary 1404 include “Appliance Sales” 1406, “Branch Revenue” 1408, and “Campaign ROI” 1410. In some embodiments, a user can hover over each of the underlying metrics to obtain additional information and/or insights about the respective metric (e.g., in a pop-up window). In some embodiments, below thesummary 1404, a user can seemetrics cards 1420, 1422, and 1424 for each of the top metrics “Appliance Sales” 1406, “Branch Revenue” 1408, and “Campaign ROI” 1410. For example, metric card 1420 provides information related to the metric “Appliance Sales” 1406, including adata visualization 1420 a, thecalculated measure 1420 d which corresponds to the amount of appliances sales (e.g., 1,675 units), the amount of change 1420 e compared to a previous period (e.g., +365 units (+27% compared to a previous period)); and the time dimension and/or other selectedfilters 1420 f (e.g. Week to date, Cambridge); andinsight 1420 b that states “The week to date Appliance Sales has seen an unusual increase of +356 (+27%) over the last week and is now 17% above the normal range.” Metrics card 1420 further includes an indicator 1420 c for whether the change is usual or unusual. The colors for the indicators are selected based on the favorability indicator in the metric definition for the metric “Appliance Sales” 1406. -
User interface 1400 further includes metric card 1422 for the metric “Branch Revenue” 1408, which provides data and/or information related metric “Branch Revenue” 1408, including adata visualization 1422 a, thecalculated measure 1422 d, which corresponds to the amount of Branch Revenue (e.g., 10.7M pounds), the amount of change 1422 e compared to a previous period (e.g., +0.7M pounds (+7% quarter to date)); and the time dimension and/or other selectedfilters 1422 f (e.g. Quarter to date, Cambridge); andinsight 1422 b that states: “The quarter to date Branch Revenue had an unusual increase of +1.2% over the equivalent quarter to date value a year ago, mainly attributed to Home Appliance.” -
User interface 1400 further includesmetric card 1424 for the metric “Campaign ROI” 1410, which provides data and/or information related to the metric “Campaign ROI” 1410, including adata visualization 1424 a, thecalculated measure 1424 d which corresponds to the amount of Campaign ROI (e.g., 379%), the amount ofchange 1424 e compared to a previous period (e.g., +1.1 percentage points since last month); the time dimension and/or other selectedfilters 1424 f (e.g. Monthly, Cambridge); andinsight 1424 b that states: “The monthly Campaign ROI has been increasing at a rate of 1 percentage point per month for the past 6 months, in line with change in rate of increase on Social Media campaigns.” - In some embodiments,
summary 1404 includes summaries ofinsights indicators 1420 c, 1422 c, and 1424 c, which are also color coded to identify the extent of unusual data). -
FIG. 15 is a flowchart diagram of a process 1500 for generating and validating data insights using machine learning models, in accordance with some embodiments. A bundle of insights generated based on insight templates is obtained (1504). In some embodiments, metrics (optionally including scoped metrics) are associated with respective insights. In some embodiments, an insight is generated for a respective aggregated measure in a respective metric. In some embodiments, for each metric, there is one insight displayed in a metric object (e.g., a metric card displayed in a user interface). In some embodiments, insights are natural language expressions that explain or provide descriptions of changes related to a selected metric that are occurring in a respective (e.g., selected) data source. In some embodiments, insights are created using templates. In some embodiments, various types of insight templates are stored in an insights database. Example types of insights include “Profit Changed” insights (e.g., with respective associated metric being Profit) and a “Revenue changed” insights. Example insights that are generated based on the data in the data source and respective insight templates include, but are not limited to, “Gross margin is 14.4% greater than last month;” “Delivery Day increased by 5.6% compared to the same time last month;” and other natural language description of the performance of a respective aggregated measure associated with a respective metric. In some embodiments, a collection of insights is stored (e.g., in an insights database) using a data structure that is referred to as a bundle. In some embodiments, a bundle is a collection of insights (e.g., optionally generated using templates) for multiple metrics to which a respective user is subscribed to or is following. In some embodiments, bundles are predefined or created dynamically. For example, a “Metric Change” bundle can contain a first insight about a metric that changed and a second insight that explains the change. Examples of bundles include, but are not limited to, “The week to date Appliance Sales has seen an unusual increase of +356 (+27%) over the last week and is now 17% above the normal range”; “The quarter to date Branch Revenue had an unusual increase of +1.2% over the equivalent quarter to date value a year ago, mainly attributed to Home Appliance”; and/or “The monthly Campaign ROI has been increasing at a rate of 1 percentage point per month for the past 6 months, in line with change in rate of increase on Social Media campaigns.” In some embodiments, a bundle can be defined or generated “on the fly” or implicitly. For example, a bundle can be implicitly created as the user explores or interacts with the metric and associated insights. In some embodiments, a bundle includes relationships between insights included in the bundle. For example, bundles can define or include semantic relationships between the insights included in the bundle. In some embodiments, the semantic relationships can be expressed with text. For example, a bundle may contain a “Profit Changed” insight and a “Revenue changed” insight, with a “caused by” Relationship between the two insights to capture a business insight, such as “Profit rose, driven by an increase in revenue.” The templated output of a bundle can include that relationship, such as “Profit rose 15%. This was caused by: Revenue rose 18%”. - The bundle of insights is provided (1506) as input to a machine learning model. In some embodiments, the machine learning model is large language processing model. In some embodiments, the machine learning model is trained on insights generated using templates (e.g., created by an analytics professional or other users skilled in data analytics) for different data sources. In some embodiments, the machine learning model is a generative pre-trained transformer. In some embodiments, the machine learning model is trained on a data source that already includes metadata and/or semantics that have been pre-configured or predetermined by a user (e.g., a data analyst or other user that has domain knowledge). In some embodiments, examples of such semantics and/or metadata data include, but are not limited to, columns that are labeled as measures; names or labels for the measures; usage of the measures across different workbooks, including labels or descriptions of those workbooks (e.g., the measure Sales have been used in a workbook described or labeled as Sales over time), pre-existing aggregations of the measures, calculations in which the measures were used, and/or breaking down of the measures by different dimensions; formatting, styling, and/or visual encodings associated with the measures; data visualizations in which the measures have been used. In some embodiments, such semantics and/or metadata in the data source provide domain specific knowledge on which the machine learning model is trained.
- A summary of the bundle of insights is generated (1508) using the machine learning model (e.g., the
machine learning model 1340 inFIG. 13 ). In some embodiments, the bundle includes a single insight. In such case, the summary of the bundle improves the templated language in bundle expression using the machine learning model. - In some embodiments, the summary of the bundle (e.g.,
summary 1400 inFIG. 14 ) of insights corresponds to (1510) a natural language expression describing respective insights and relationships between the respective insights in the bundle. - In some embodiments, generating the summary of the bundle of insights includes (1512): generating a first summary of the bundle of insights using the machine learning model; validating accuracy of the first summary of the bundle insights, including identifying one or more inaccuracies; and generating a second summary of the bundle of insights correcting the identified one or more inaccuracies.
- In some embodiments, validating accuracy of the first summary of the bundle insights is performed (1514) using regular expressions and heuristics. In some embodiments, validating accuracy of the first summary of the bundle insights includes comparing a respective aggregated measure included in a respective insight in the bundle of insights to a respective aggregated measure included in the summary and determines if there any changes in the respective aggregated measure. For example, an insight summarization model (e.g.,
insight summarization module 1330 inFIG. 13 ) determines whether an aggerated measure in the summary is the same as a respective aggregated measure in the original insight. - In some embodiments, the machine learning model used to generate the summary of the bundle insights is (1516) a first machine learning model, and validating accuracy of the first summary of the bundle insights is performed using a second machine learning model.
- In some embodiments, the process 1500 further includes (1518): ranking respective insights in the bundle of insights according to relevancy of respective metrics associated with the respective insights; and providing the ranker insights in the bundle as input to the machine learning model includes providing the ranking of the respective insights in the bundle.
- In some embodiments, at least some metrics that are associated with respective insights in the bundle of insights have (1520) predetermined relationships to other metrics stored in a metrics database.
-
FIG. 16A illustrates auser interface 1600 for querying a data source, via natural language, and viewing results from the query (e.g., a dashboard that includes at least one metric and/or at least one insight). As shown inFIG. 16A , a user has selected the Device Sales dashboard 1602 (shown in user interface 1600), which is a dashboard that the user follows (e.g., receives updates associated with the dashboard). In some embodiments, the user can view the Device Sales dashboard via a natural language query (as discussed in more detail below with respect toFIGS. 16B-16E ). - The
Device Sales dashboard 1602 includes a metric 1606 (e.g., 1,675 units) associated within a time interval 1608 (e.g., Sep. 10-Sep. 12, 2023). In some embodiments, a respective dashboard (e.g., metric detail page) includes more than one metric and/or includes one or more metrics within one or more time intervals. For example, a respective dashboard includes a metric associated with device sales and a metric associated with inventory fill rate. The type of metric and/or the time interval may be predetermined by the user, or may be automatically selected based on the data associated with device sales. - The metric 1606 can be filtered (or otherwise further processed) via one or
more filters 1604, including a week-to-date filter 1604 a, a vs-previous-period filter 1604 b, aregion filter 1604, and/orother filters 1604 d. For example, as shown inFIG. 16A , the week-to-date filter is selected, and the metric 1606 is filtered to include data between Sep. 10 to Sep. 12, 2023. In another example, the vs previous-period filter is selected, and the metric 1606 between Sep. 10 to Sep. 12, 2023 is compared to the previous time interval (e.g., Sep. 3 to Sep. 5, 2023). The comparison 1610 indicates that the number of units sold has increased by +356 units (+27%) when compared to the previous period. In yet another example, the region filter is selected, and the metric 1606 is filtered to the North American region. - The
Device Sales dashboard 1602 further includes anindicator 1612 of a trend (e.g., current trend, unusual change) associated with the metric 1606 and/or one ormore filters 1604. For example, as shown inFIG. 16A , the indicator is “High” because the units sold increased by +356 units (+27%) compared to the previous period. In some embodiments, theindicator 1612 includes information associated with record-level outliers, top contributors, bottom contributors, concentrated contribution alert (risky monopoly), top drivers, and/or top detractors. Theindicator 1612 may be based on the metric 1606 and the one ormore filters 1604. - The
Device Sales dashboard 1602 further includes aninsight 1613, which includes anatural language explanation 1614 of theinsight 1613 and adata visualization 1616 of theinsight 1613. For example, as shown inFIG. 16A , the insight is that the device sales for September 10-September 12 is now 1,675 units and is above the expected range of 1.2K to 1.4K. In this example, the insight includes comparing the number of units sold to a the number of unit sold in the past to determine that the current number of sales is above the expected range based on previous sales. This information is generated as anatural language explanation 1614. Adata visualization 1616 is also generated to visualize the current number of sales (e.g., 1,675 units) is above the expected range (e.g., 1.2K to 1.4K) for the period between September 10 and September 12. - The
Device Sales dashboard 1602 further includesaffordances 1618 for feedback from the user. For example, when the thumbs up affordance is selected, the user is indicating that theinsight 1613 is useful and/or relevant to the metric 1606, and when the thumbs down affordance is selected, the user is indicating that theinsight 1613 is not useful and/or relevant to the metric 1606. In some embodiments, in response to selection of the thumbs down affordance,insight 1623 is replaced with a different insight associated with the metric 1606. - The
Device Sales dashboard 1602 further includes one or more recommended natural language queries 1620 (e.g., recommended questions), which are recommended based on at least the metric 1606 and/or theinsight 1613. For example, because the metric 1606 is 1,675 units and theinsight 1613 is that the number of units sold is above the expected range, the recommendednatural language queries 1620 are “Which products drove this sudden increase” 1620 a, “How has the trend changed” 1620 b, and “What are the outliers” 1620 c. These recommendednatural language queries 1620 are related to the sale of units (e.g., type of products, and sales trends). - The
Device Sales dashboard 1602 further includes an “Ask Questions”affordance 1621, which can be selected to display a query input field for entering a natural language query in addition to the recommended natural language queries 1620. Furthermore, the Ask Questions affordance 1621 can be selected to view additional recommended natural language queries 1620. In some embodiments, a natural language query provided via the input field queries the data source based on the currently displayed metric 1606 and/orinsight 1613. In some embodiments, a natural language query provided via the input field queries all data in the data source, independent of the currently displayed metric 1606 and/or the currently displayedinsight 1613. -
FIG. 16B transitions fromFIG. 16A in response to selection of the recommendednatural language query 1620 a. For example, in response to the selection of the recommendednatural language query 1620 a,natural language query 1622 is inputted, and one or more insights are generated based on one or more predefined types of analyses (e.g., record-level outliers, period over period change, record-level outliers, top contributors, bottom contributors, concentrated contribution alert (risky monopoly), top drivers, top detractors, unusual change, current trend, and trend change alert). The one or more insights are generated (or updated) in response to a query (e.g., a natural language query, a structured query, and/or other types of queries), such that the one or more insights includes real-time and up-to-date information. Stated differently, the one or more insights are generated or updated in real time (e.g., in response to the query) to reduce data staleness. - For example, in response to the selection (or input) of the
natural language query 1622, one or more insights, includinginsight 1623, are generated. In this example, as shown inFIG. 16B , theinsight 1623 includes anatural language explanation 1624 and adata visualization 1626. Thenatural language explanation 1624 information that during the last week (e.g., the current period) Device Sales increased by 356 units, and ePhones, Simpson Phones, and Smart Home increased the most. In other words, theinsight 1623 is that the ePhones, Simpson Phones, and Smart Home product categories contribute to the increase in Device Sales. Thedata visualization 1626 shows the same, with ePhones increasing by +200 units, Simpson Phones increasing by +72 units, Smart Home products increasing by +21 units, and the other 21 remaining product categories, on average, increasing by +3 units as compared to the previous reporting period. - The one or more recommended
natural language queries 1620 displayed in theDevice Sales dashboard 1602 is updated in response to selection of the recommendednatural language query 1620 a. Based on the selection of the recommendednatural language query 1620 a, the recommendednatural language queries 1620 now further include the recommended natural language query “Which channels had low Device Sales” 1620 d. Generally, the recommendednatural language queries 1620 are updated based on the query history, including selection of a recommended natural language query, input of a natural language query, or other types of queries. - In some embodiments, the one or more insights are associated with one or more predefined questions (e.g., parametrized questions) for the data source. For example, a respective insight is associated with a respective predefined question. The predefined question may include a parameterized representation of questions that are associated with the insight (e.g., questions a user may ask when attempting to gain information associated with the insight).
- The
natural language query 1622 is semantically compared to respective natural language explanations of the one or more insights and/or respective predefined questions associated with the one or more insights. In some embodiments, semantic comparison (e.g., semantic matching) is performed by embedding the natural language query and the predefined questions and comparing the mathematical similarity between the embedding of the natural language query and the embedding of the predefined question. The semantic comparison is discussed in more detail with respect toFIG. 18 . In accordance with a determination that at least one insight (e.g., a respective natural language explanation and/or a predefined question) semantically matches thenatural language query 1622, the semantically matching insight is selected. - For example, the
natural language explanation 1624 semantically matches thenatural language query 1620 a. In this example, thenatural language query 1622 is “Which Products drove this sudden increase,” and the natural language explanation is “During the last week, Device Sales increased by 356 units, ePhones, Simpson Phones, and Smart Home increased the most.” The semantic match is based on the Products that are the top contributors to the increase in Device Sales. - The
Device Sales dashboard 1602 further includesaffordances 1628 for feedback. For example, when the thumbs up affordance is selected, the user is indicating that theinsight 1623 is useful and/or relevant to thenatural language query 1622, and when the thumbs down affordance is selected, the user is indicating that theinsight 1623 is not useful and/or irrelevant to thenatural language query 1622. For example, in response to selection of the thumbs down affordance,insight 1623 is replaced with a different insight that semantically matches the recommendednatural language query 1622. -
FIG. 16C transitions fromFIG. 16A in response to selection of the query input field 1630 (displayed in the user interface 1600), which is configured to receive natural language queries, structured queries, and/or other types of queries. In response to selecting thequery input field 1630, a list ofrecent searches 1632 is displayed in theuser interface 1600. The list ofrecent searches 1632 includes previous natural language queries and associated insights. For example, for the natural language query “How has Average Selling Price changed?” is associated with the insight “Average Selling Price—North America.” In another example, the natural language query “What is the trend for Customer Acquisition Cost?” is associated with the insight “Customer Acquisition Cost—North America.” In yet another example, the natural language query “How is Device Sales impacting Conversion Rate?” is associated with the insight “Conversion Rate—North America.” In some embodiments, in response to selecting a recent search, the computer system retrieves the insight associated with the selected previous natural language query. In addition to retrieving the associated insight, the computer system may also update the insight with the most recent data in the data source. - In some embodiments, the list of
recent searches 1632 is based on the selected data source. For example, when accessing a first data source and a second data source, only the recent search results of the currently active data source is displayed in the list ofrecent searches 1632. In this example, when accessing the first data source, only the recent searches of the first data source are displayed in the list ofrecent searches 1632, and when accessing the second data source, only the recent searches of the second data source are displayed in the list ofrecent searches 1632. -
FIG. 16D transitions fromFIG. 16C in response to a naturallanguage query input 1631. As shown inFIG. 16D , the naturallanguage query input 1631 is “Will we fulfill phone orders?” Based on the naturallanguage query input 1631, suggestednatural language queries 1634 are generated. The suggest natural language queries 1634 may be based on semantically matching the natural language queries with one or more predefined questions (e.g., parametrized questions) of the data source, one or more insights of the data source, and/or previous searches performed on the data source. - For example, based on the natural
language query input 1631, suggestednatural language queries 1634 include “What is the projected Inventory Fill Rate?”, “Is there seasonality in Inventory Fill Rate?”, and “Is the change in Pending Orders unexpected.” The suggestednatural language queries 1634 are ordered based on the semantic similarity between the suggested natural language query (or associated insight) with the naturallanguage query input 1631. In some embodiments, the suggestednatural language queries 1634 are ordered based on the popularity of the search result (e.g., how many followers the search result has). In some embodiments, the search results link to dashboards. - A respective suggested natural language queries 1634 is associated with a respective metric. For example, as shown in
FIG. 16D , the “What is the projected Inventory Fill Rate?” natural language query is associated with the “Inventory Fill Rate—North America, Phone” metric. In another example, the “Is there seasonality in Inventory Fill Rate?”,” natural language query is associated with the “Inventory Fill Rate—North America, Tablet” metric. In yet another example, the “Is the change in Pending Orders unexpected” natural language query is associated with the “Pending Orders—North America” metric. - Additional search results (e.g., search results other than the suggested
natural language queries 1634 and associated metrics) are accessible via “View Search Results”affordance 1638. -
FIG. 16E transitions fromFIG. 16D in response to selection of the suggested natural language query “Inventory Fill Rate—North America, Phone.” The Inventory Fill Rate metric 1636 (e.g., 91%) is generated with theregion filter 1634 c set to North America and thecategory filter 1634 set to Phone in response to the selection of the suggested natural language query “Inventory Fill Rate—North America, Phone.” Additionally, the Inventory Fill Rate metric is further filtered by the month-to-date filter 1634 a to include data within the current month to date (e.g., Sep. 1 to Sep. 12, 2023). In this example, the vs-previous-period filter is selected, and the metric 1636 is compared to the previous time interval (e.g., Aug. 1 to Aug. 12, 2023). Thecomparison 1640 indicates that the inventory fill rate is 4 percentage points lower compared to the inventory fill rate from Aug. 1 to Aug. 12, 2023, and the inventory fill rate is 1 percentage point lower than the expected range. - The
Device Sales dashboard 1602 further includes aninsight 1643, which includes anatural language explanation 1644 of theinsight 1643 and a data visualization 1646 of theinsight 1643. For example, as shown inFIG. 16E , the insight is that the inventory fill rate for September 1-September 12 is now 91% and has dropped below the expected range of 95% to 92%, and a new unfavorable trend has been detected for Inventory Fill Rate (it is trending down compared to the previous trend). In this example, the insight includes comparing the current Inventory Fill Rate to past Inventory Fill Rates to determine that the current Inventory Fill Rate is below the expected range. Furthermore, in addition to the Inventory Fill Rate being lower than the expected range, the insight includes comparing the trend of the Inventory Fill Rate to the previous trend (which was trending upwards). This information is generated as anatural language explanation 1644. A data visualization 1646 is also generated to visualize that the Inventory Fill Rate (e.g., 1,675 units) is below the expected range (e.g., 92% to 95%) for the period between September 1 and September 12. - The
Device Sales dashboard 1602 further includesaffordances 1648 for feedback from the user. For example, when the thumbs up affordance is selected, the user is indicating that theinsight 1643 is useful and/or relevant to the metric 1636, and when the thumbs down affordance is selected, the user is indicating that theinsight 1643 is not useful and/or relevant to the metric 1636. In some embodiments, in response to selection of the thumbs down affordance,insight 1623 is replaced with a different insight associated with the metric 1636. -
FIG. 16E transitions fromFIG. 16A in response to selection of the “Ask Questions”affordance 1621. In response to the selection, apopup 1650 is generated and/or displayed in theuser interface 1600 and/or theDevice Sales dashboard 1602. Thepopup 1650 includes aquery input field 1652 for inputting a natural language query (e.g., by the user) in addition to the recommended natural language queries 1620. Furthermore, thepopup 1650 includes additional recommended natural queries (e.g., in addition to the recommended natural language queries 1620). The additional recommended natural queries may update based on the current input in thequery input field 1652 and/or change in context, such as different selected data, a different metric, and/or a different insight. - For example, as shown in
FIG. 16F , when “What sold well?” is input into thequery input field 1652 by the user, several recommended natural language queries are generated and displayed, including “Which Product Name had high Total Sales” 1654 a, “Which Product Type had high Total Sales” 1654 b, and “Which Product Name had low Total Sales” In this example, in response to selection one of the several recommended natural language queries, an insight associated with the selected natural language query is retrieved and displayed in the Device Sales Dashboard. - In another example, the user does not select one of the several recommended natural language queries that were generated and displayed to the input “What sold well?” and instead sends the input without modification. In response. the computer system semantically matches the natural language query “What sold well?” to one or more predefined questions (associated with the generated insights) and/or generated insights and selects the generated insight that most closely matches the natural language query.
- In some embodiments, the “Ask Questions”
affordance 1621 is located proximate to the recommendednatural language queries 1620 and/orinsight 1613. Location proximate to the insight improves the accessibility of inputting natural language queries by the user because the user does not need to navigate to another portion of the user interface, or another user interface, to input a natural language query. Thus, the user can more quickly access the desired information. - In some embodiments, the
popup 1650 includesinformation 1656. Theinformation 1656 includes content to guide the user's exploration of the data, metric and/or insights. For example, theinformation 1656 is “Try asking about trends, breakdowns, or contributions” to guide the user's exploration of Device Sales data. In some embodiments, theinformation 1656 is based on the data, metric, and/or insight selected and/or filtered in the current view (e.g., Device Sales dashboard). -
FIG. 17 illustrates semantic comparisons betweennatural language queries 1702 and predefined questions (e.g.,question insight natural language query 1702 is associated with (or directed to) a metric. For example, as shown inFIG. 17 , thenatural language query 1702 “What drove the increase in sales” is directed to the Sales metric 1703. - In some embodiments, the computer system generates a sentence embedding of the text of the
natural language query 1702 by pooling word embeddings of the sentence as a vector. For example, a BERT-like language model is used to embed each word of thenatural language query 1702. The embedded words (e.g., embedded text) of thenatural language query 1702 is averaged via a mean pooling layer to generate the sentence embedding for thenatural language query 1702. - In some embodiments, the computer system generates a sentence embedding of the text of the predefined questions by pooling word embeddings of the sentence as a vector (e.g., the same methodology as embedding the natural language query 1702). In the same way, a BERT-like language model is used to embed each word of the predefined questions. The embedded words of the predefined questions is averaged via a mean pooling layer to generate the sentence embedding for the predefined questions.
- In some embodiments, the insights associated with the predefined questions are generated in response to the
natural language query 1702. Stated differently, the insights are generated (or updated) after anatural language query 1702 is received. Furthermore, the insights and the predefined questions are associated with a metric. For example, as shown inFIG. 17 , the user is interested in the Sales metric 1703, which includes associated predefined questions and associated generated insights. Example predefined questions and associated generated insights are listed in the table below. -
Predefined question Generated Insight Which Region increased Compared to the last year, Sales increased by the most? 113.3k, West, East, and South increased the most. Which Category increased Compared to the last year, Sales increased by the most? 113.3k. Office Supplies and Technology increased the most. Which Product had high Top contributors by Product Name are: sales? Canon imageCLASS 220 Advanced Copierwith 35.7k (6.8%), Martin Yale Chadless Opener Electric Letter Opener with 11.8k (2.2%), and GBC DocuBind TL300 Electric Binding System with 10.9k (2.1%). What is the trend? Sales is trending up with high volatility. The trend first began 4 years ago on Jan. 1, 2020. - In some embodiments, the computer system generates a sentence embedding of the text of the generated insights by pooling word embeddings of the sentence as a vector (e.g., the same methodology as embedding the natural language query 1702). In the same way as with embedding the
natural language query 1702, a BERT-like language model is used to embed each word of the generated insights. The embedded words of the generated insights is averaged via a mean pooling layer to generate the sentence embedding for the generated insights. - In some embodiments, semantically comparing the
natural language query 1702 with the predefined questions and the generated insights is via calculating the cosine similarities between each pair of vectors to get match scores that proxy confidence that the insight answers the question (e.g., natural language query). A visual representation of calculating the cosine similarities is illustrated bygraph 1722. In that example, the vectors are 3 dimensional (vectors of the embeddings of thenatural language query 1702, predefined questions, and generated insights may be more than 3 dimensions). As shown ingraph 1722, the similarity between two vectors is how close they are to pointing in the same direction (e.g., sentence A points in a similar direction to sentence C, whereas sentence B is pointing in a direction different than sentences A and C). - In some embodiments, the comparison is performed between the
natural language query 1702 and each predefined question and each generated insight separately. For example, a match score is calculated for each predefined question and each generated insight In this example, the selected insights are based on the predefined question or generated insight with the highest score (or a predefined number of top scores). Specifically, if the predefined question has the highest match score, the predefined question and the insight associated with the predefined question is provided to the user. In another example, if the generated insight has the highest match score, the generated insight and the predefined question associated with the predefined question is provided to the user. In some embodiments, the comparison is performed between thenatural language query 1702 and each set of predefined question and associated generated insight. For example, each set of predefined question and associated generated insight has a single match score that is an average of the match score for the respective predefined question and respective generated insight of the set. -
FIG. 18 illustrates aprocess 1800 for semantically matching natural language queries with predefined questions (e.g., parameterized questions).Process 1800 is initiated byuser 1802 atstep 1812 by loading themetric detail page 1804. In response, atstep 1814, insights are generated via theinsights service 1806 and atstep 1816 the generated insights are returned from theinsight service 1806 to themetric detail page 1804. Atstep 1818, the generated insights are provided (e.g., displayed) to theuser 1802. Further, atstep 1820, the generated insights and the predefined questions associated with the metric and/or the generated insight are sent from themetric detail page 1804 to the embeddingservice 1808. Atstep 1822, for each item (e.g., for each generated insight and/or predefined question) an embedding is retrieved from acache 1810 or generated by embeddingservice 1808. Atstep 1824, the embedding service attempts to retrieve an embedding associated with each item from thecache 1810. Atstep 1826, for a respective item, an embedding is retrieved from the cache to the embedding service if an embedding is available for the respective item. If an embedding is not available for the respective item, atstep 1830, a new embedding is generated by the embedding service for the respective item. Additionally, the new embedding is stored, atstep 1832, in thecache 1810. Atstep 1834, successful caching of the new embedding is reported to the embedding service. Atstep 1836, the embeddings for the generated insights and/or predefined questions are sent from the embedding service to the metric detail page. Atstep 1838, auser 1802 submits an ad hoc query to the metric detail page. Atstep 1840, the metric detail page sends a query to the embedding service requesting the embeddings associated with the user's ad hoc query. In response, atstep 1844, the embedding service sends the embeddings associated with the user's ad hoc query to the metric detail page. The metric detail page then ranks questions based on similarity to the ad hoc query atstep 1846. Atstep 1850, the results of the ad hoc query (e.g., the predefined questions and generated insights that are semantically similar to the ad hoc query) are provided to the user. -
FIG. 19 illustrates aprocess 1900 for automatically generating metric objects. (A1) A natural language query directed to a data source (e.g., a database) is received (1902). - In some embodiments, a user inputs a natural language query or selects a recommended natural language query. The natural language query may be inputted or selected from a user interface (e.g.,
user interface 1600 inFIGS. 16A-16E ,dashboard 1602 inFIGS. 16A-16D , and/ordashboard 1632 inFIG. 16E ) As the user inputs or selects the natural language query, automatic recommendations may be generated that are associated with (or complete) the natural language query (e.g., suggestednatural language queries 1634 inFIG. 16D ). - In some embodiments, the natural language query can include, in part or in whole, a structured query, wherein a natural language query includes questions in plain, everyday language (e.g., in a way similar to how the user may ask another person) and a structured query includes particular syntax and knowledge of the structure of the data store.
- In response to receiving the natural language query, one or more insights associated with the one or more predefined questions for the data source are generated (1904) based on one or more predefined types of analyses. Further, a respective insight includes a natural language explanation (e.g.,
natural language explanation 1614 inFIG. 16A ,natural explanation 1624 inFIG. 16B and/ornatural language explanation 1644 inFIG. 16E ) and/or a data visualization (e.g.,data visualization 1616 inFIG. 16A ,data visualization 1626 inFIG. 16B , and/or data visualization 1646 inFIG. 16E ). - In accordance with a determination that the natural language query semantically matches a respective predefined question of the one or more predefined questions (e.g., via comparison of the respective embeddings of the
natural language query 1702 and the predefined questions and/or comparison of the respective embeddings of the natural language query and the generated insights inFIG. 17 ), the respective predefined question and associated respective generated insights of the one or more generated insights are selected (1906). - In accordance with a determination that the natural language query semantically matches a respective generated insight of the one or more generated insights (e.g.,
step 1846 inFIG. 18 ), the respective generated insight and associated respective predefined question of the one or more predefined questions are selected (1908). - Further, instructions for displaying on a display communicatively connected to the computing device the selected representative predefined question and associated respective generated insights and/or the selected representative generated insight and associated predefined question are generated (1910). Examples of display of the the selected representative predefined question and associated respective generated insights and/or the selected representative generated insight and associated predefined question are shown in
FIGS. 16A-16E . - (A2) In some embodiments, the selected representative predefined question and associated respective generated insights and/or the selected representative generated insight and associated predefined question are cached. Further, in accordance with a determination that the natural language query semantically matches the respective predefined question of the one or more predefined questions, the cached one or more insights (and/or predefined question) is retrieved.
- (A3) In some embodiments, the generated one or more insights associated with one or more predefined questions for the data source is cached at the computer device and/or the data source.
- (A4) In some embodiments, the one or more insights are generated based on one or more insight templates.
- (A5) In some embodiments, the one or more insights are generated based on one or more insight templates.
- (A6) In some embodiments, based on the selected representative predefined question and associated respective generated insights and/or the selected representative generated insight and associated predefined question one or more recommended natural language queries are generated.
- (A7) In some embodiments, in response to a selection of a recommended natural language query of the one or more recommended natural language queries, instructions for displaying on a display communicatively connected to the computing device a respective insight associated with the selected recommended natural language query are generated.
- The terminology used in the description of the various described embodiments herein is for the purpose of describing particular embodiments only and is not intended to be limiting. As used in the description of the various described embodiments and the appended claims, the singular forms “a,” “an,” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will also be understood that the term “and/or” as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items. It will be further understood that the terms “includes,” “including,” “comprises,” and/or “comprising,” when used in this specification, specify the presence of stated features, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, elements, components, and/or groups thereof.
- The foregoing description, for purpose of explanation, has been described with reference to specific embodiments. However, the illustrative discussions above are not intended to be exhaustive or to limit the invention to the precise forms disclosed. Many modifications and variations are possible in view of the above teachings. The embodiments were chosen and described in order to best explain the principles of the invention and its practical applications, to thereby enable others skilled in the art to best utilize the invention and various embodiments with various modifications as are suited to the particular use contemplated.
Claims (20)
1. A method for analytics using natural language, performed at a computing device that includes one or more processors and memory storing one or more programs configured for execution by the one or more processors to perform the method comprising:
receiving a natural language query directed to a data source;
in response to receiving the natural language query, generating, based on one or more predefined types of analyses, one or more insights associated with one or more predefined questions for the data source, wherein a respective insight includes a natural language explanation and/or a data visualization;
in accordance with a determination that the natural language query semantically matches a respective predefined question of the one or more predefined questions, selecting the respective predefined question and associated respective generated insights of the one or more generated insights;
in accordance with a determination that the natural language query semantically matches a respective generated insight of the one or more generated insights, selecting the respective generated insight and associated respective predefined question of the one or more predefined questions; and
generating instructions for displaying on a display communicatively connected to the computing device the selected representative predefined question and associated respective generated insights and/or the selected representative generated insight and associated predefined question.
2. The method of claim 1 , further comprising:
caching the selected representative predefined question and associated respective generated insights and/or the selected representative generated insight and associated predefined question; and
in accordance with a determination that the natural language query semantically matches the respective predefined question of the one or more predefined questions, retrieving the cached one or more insights.
3. The method of claim 1 , wherein the generated one or more insights associated with one or more predefined questions for the data source is cached at the computer device and/or the data source.
4. The method of claim 1 , wherein selecting both the respective predefined question and selecting the respective generated insight is in accordance with a determination that the natural language query semantically matches the respective predefined question and the respective generated insight.
5. The method of claim 1 , wherein the one or more insights are generated based on one or more insight templates.
6. The method of claim 1 , further comprising:
generating, based on the selected representative predefined question and associated respective generated insights and/or the selected representative generated insight and associated predefined question, one or more recommended natural language queries.
7. The method of claim 6 , further comprising:
in response to a selection of a recommended natural language query of the one or more recommended natural language queries, generating instructions for displaying on a display communicatively connected to the computing device a respective insight associated with the selected recommended natural language query.
8. A computer system having one or more processors and memory, wherein the memory stores one or more programs configured for execution by the one or more processors, and the one or more programs comprise instructions for:
receiving a natural language query directed to a data source in response to a user input of the natural language query;
in response to receiving the natural language query, generating, based on one or more predefined types of analyses, one or more insights associated with one or more predefined questions for the data source, wherein a respective insight includes a natural language explanation and/or a data visualization;
in accordance with a determination that the natural language query semantically matches a respective predefined question of the one or more predefined questions, selecting the respective predefined question and associated respective generated insights of the one or more generated insights;
in accordance with a determination that the natural language query semantically matches a respective generated insight of the one or more generated insights, select the respective generated insight and associated respective predefined question of the one or more predefined questions; and
generating instructions for displaying on a display communicatively connected to the computing device the selected representative predefined question and associated respective generated insights and/or the selected representative generated insight and associated predefined question.
9. The computer system of claim 8 , wherein the one or more programs further comprise instructions for:
receiving a user input to save the selected representative predefined question and associated respective generated insights and/or the selected representative generated insight and associated predefined question;
in response to the user input to save, caching the selected representative predefined question and associated respective generated insights and/or the selected representative generated insight and associated predefined question; and
in accordance with a determination that the natural language query semantically matches the respective predefined question of the one or more predefined questions, retrieving the cached one or more insights.
10. The computer system of claim 9 , wherein the generated one or more insights associated with one or more predefined questions for the data source is cached at the computer device and/or the data source.
11. The computer system of claim 8 , wherein selecting both the respective predefined question and selecting the respective generated insight is in accordance with a determination that the natural language query semantically matches the respective predefined question and the respective generated insight.
12. The computer system of claim 8 , wherein the one or more insights are generated based on one or more insight templates.
13. The computer system of claim 8 , wherein the one or more programs further comprise instructions for:
generating, based on the selected representative predefined question and associated respective generated insights and/or the selected representative generated insight and associated predefined question, one or more recommended natural language queries.
14. The computer system of claim 13 , wherein the one or more programs further comprise instructions for:
receiving a user input selecting a recommended natural language query of the one or more recommended natural language queries; and
in response to the selection of a recommended natural language query, generating instructions for displaying on a display communicatively connected to the computing device a respective insight associated with the selected recommended natural language query.
15. A non-transitory computer readable storage medium storing one or more programs configured for execution by a computer system having one or more processors and memory, the one or more programs comprising instructions for:
receiving a natural language query directed to a data source;
in response to receiving the natural language query, generating, based on one or more predefined types of analyses, one or more insights associated with one or more predefined questions for the data source, wherein a respective insight includes a natural language explanation and/or a data visualization;
in accordance with a determination that the natural language query semantically matches a respective predefined question of the one or more predefined questions, selecting the respective predefined question and associated respective generated insights of the one or more generated insights;
in accordance with a determination that the natural language query semantically matches a respective generated insight of the one or more generated insights, select the respective generated insight and associated respective predefined question of the one or more predefined questions; and
generating instructions for displaying on a display communicatively connected to the computing device the selected representative predefined question and associated respective generated insights and/or the selected representative generated insight and associated predefined question.
16. The non-transitory computer readable storage medium of claim 15 , wherein the one or more programs further comprise instructions for:
caching the selected representative predefined question and associated respective generated insights and/or the selected representative generated insight and associated predefined question; and
in accordance with a determination that the natural language query semantically matches the respective predefined question of the one or more predefined questions, retrieving the cached one or more insights.
17. The non-transitory computer readable storage medium of claim 16 wherein the generated one or more insights associated with one or more predefined questions for the data source is cached at the computer device and/or the data source.
18. The non-transitory computer readable storage medium of claim 15 , wherein selecting both the respective predefined question and selecting the respective generated insight is in accordance with a determination that the natural language query semantically matches the respective predefined question and the respective generated insight.
19. The non-transitory computer readable storage medium of claim 15 , wherein the one or more programs further comprise instructions for:
generating, based on the selected representative predefined question and associated respective generated insights and/or the selected representative generated insight and associated predefined question, one or more recommended natural language queries.
20. The non-transitory computer readable storage medium of claim 19 , wherein the one or more programs further comprise instructions for:
in response to a selection of a recommended natural language query of the one or more recommended natural language queries, generating instructions for displaying on a display communicatively connected to the computing device a respective insight associated with the selected recommended natural language query.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US18/822,050 US20250139087A1 (en) | 2023-09-11 | 2024-08-30 | Semantically matching natural language queries with parameterized questions |
PCT/US2024/046262 WO2025059220A1 (en) | 2023-09-11 | 2024-09-11 | Semantically matching natural language queries with parameterized questions |
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US202363537808P | 2023-09-11 | 2023-09-11 | |
US18/429,072 US20250086430A1 (en) | 2023-09-11 | 2024-01-31 | Automatically generating metric objects using a machine learning model |
US18/429,132 US20250086385A1 (en) | 2023-09-11 | 2024-01-31 | Generating and validating data insights using machine learning models |
US18/822,050 US20250139087A1 (en) | 2023-09-11 | 2024-08-30 | Semantically matching natural language queries with parameterized questions |
Related Parent Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US18/429,132 Continuation-In-Part US20250086385A1 (en) | 2023-09-11 | 2024-01-31 | Generating and validating data insights using machine learning models |
US18/429,072 Continuation-In-Part US20250086430A1 (en) | 2023-09-11 | 2024-01-31 | Automatically generating metric objects using a machine learning model |
Publications (1)
Publication Number | Publication Date |
---|---|
US20250139087A1 true US20250139087A1 (en) | 2025-05-01 |
Family
ID=95485219
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US18/822,050 Pending US20250139087A1 (en) | 2023-09-11 | 2024-08-30 | Semantically matching natural language queries with parameterized questions |
Country Status (1)
Country | Link |
---|---|
US (1) | US20250139087A1 (en) |
-
2024
- 2024-08-30 US US18/822,050 patent/US20250139087A1/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US12056120B2 (en) | Deriving metrics from queries | |
US11195050B2 (en) | Machine learning to generate and evaluate visualizations | |
US11526579B1 (en) | System and methods for performing automatic data aggregation | |
Power et al. | Defining business analytics: an empirical approach | |
Gensler et al. | Listen to your customers: Insights into brand image using online consumer-generated product reviews | |
US20210042866A1 (en) | Method and apparatus for the semi-autonomous management, analysis and distribution of intellectual property assets between various entities | |
US8458065B1 (en) | System and methods for content-based financial database indexing, searching, analysis, and processing | |
US12353477B2 (en) | Providing an object-based response to a natural language query | |
Priebe et al. | Business information modeling: A methodology for data-intensive projects, data science and big data governance | |
WO2021257610A1 (en) | Time series forecasting and visualization methods and systems | |
CN119202140B (en) | Data report generation method, electronic device, storage medium and computer program product | |
Nuseir | Designing business intelligence (BI) for production, distribution and customer services: a case study of a UAE-based organization | |
US11461337B2 (en) | Attribute annotation for relevance to investigative query response | |
Even et al. | Economics-driven data management: An application to the design of tabular data sets | |
US12361220B1 (en) | Customized integrated entity analysis using an artificial intelligence (AI) model | |
US20150142782A1 (en) | Method for associating metadata with images | |
CA3013346C (en) | Method and system for personalizing software based on real time tracking of voice-of-customer feedback | |
US20200160359A1 (en) | User-experience development system | |
US20240311406A1 (en) | Data extraction and analysis from unstructured documents | |
Gupta et al. | A Review of Data Warehousing and Business Intelligence in different perspective | |
US20250139087A1 (en) | Semantically matching natural language queries with parameterized questions | |
US20250086385A1 (en) | Generating and validating data insights using machine learning models | |
WO2025059220A1 (en) | Semantically matching natural language queries with parameterized questions | |
Stahl et al. | Marketplaces for data: An initial survey | |
Qiu et al. | Design of multi-mode e-commerce recommendation system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: SALESFORCE, INC., CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:WANG, HON HING HING;DRAKE, JONATHAN ALDEN;LEWIS III, THOMAS JACKSON;AND OTHERS;SIGNING DATES FROM 20240906 TO 20240909;REEL/FRAME:068573/0273 |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |