EP3025295A1 - System and method for discovering and exploring concepts - Google Patents
System and method for discovering and exploring conceptsInfo
- Publication number
- EP3025295A1 EP3025295A1 EP14828714.7A EP14828714A EP3025295A1 EP 3025295 A1 EP3025295 A1 EP 3025295A1 EP 14828714 A EP14828714 A EP 14828714A EP 3025295 A1 EP3025295 A1 EP 3025295A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- sentences
- processor
- computing
- clusters
- interactions
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/237—Lexical tools
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/01—Customer relationship services
Definitions
- aspects of the present invention relate to speech processing, indexing, searching, and analytics.
- aspects of the present invention relate to analyzing recorded and live information to categorize conversations and to identify concepts and trends.
- An organization's contact center typically receives a multitude of calls regarding a variety of issues.
- a sales department of a contact center may receive calls with questions about the feature sets and pricing of various products offered by the organization; a customer support department may receive calls regarding particular problems with using the products or the quality of the services being delivered; an accounts department may receive calls about changes in billing policy, incorrect charges, and other issues.
- conversations can be tagged or categorized based on their containing predefined keywords or phrases. For example, through the above discussed manual (human) analysis of phrases that are either identified by a human listener or identified by a computer system using phrase recognition, one might infer that conversations with a call center that contain the phrases "I would like to speak to your manager" and "Can I talk to your supervisor?" lead to the escalation of the call to a higher level representative. As such, any call containing these phrases would be categorized as containing an "escalation attempt.”
- an organization can identify trends and infer conditions based on the number of such interactions falling into various categories. For example, a large number of interactions originating from a particular area and categorized as indicating a "service outage” or “poor network performance” could alert an internet service provider to take action to address system problems within that particular area.
- embodiments of the present invention are directed to addressing two issues related to discovery and exploration of data.
- One aspect of the present invention is directed to the automatic discovery and extraction of concepts from a set of documents without human assistance.
- embodiments of the present invention are directed to understanding why a certain document or phone call is classified into a predefined category or why the document or phone call contains a particular concept (which may be newly identified).
- embodiments of the present invention are directed to discovering what is the "root cause” for various "symptoms" reported by customers and inferring a predefined category or newly discovered concept "A” that is the root cause for the call to be categorized with a predefined category or newly discovered concept "B", so an organization or a user can resolve the underlying issue.
- a method for identifying concepts in a plurality of interactions includes: filtering, on a processor, the interactions based on intervals; creating, on the processor, a plurality of sentences from the filtered interactions; computing, on the processor, a saliency of each the sentences; pruning away, on the processor, sentences with low saliency for generating a set of informative sentences; clustering, on the processor, the sentences of the set of informative sentences for generating a plurality of sentence clusters, each of the clusters corresponding to a concept of the concepts; computing, on the processor, a saliency of each of the clusters; and naming, on the processor, each of the clusters.
- the interactions may include an output of a voice recognition system.
- the method may further include filtering the output of the voice recognition system based on word confidence.
- the voice recognition system may be a large- vocabulary continuous speech recognition system.
- the intervals may be time intervals.
- the clustering the sentences may include: selecting a plurality of template sentences from the set of informative sentences, each of the template sentences corresponding to one of the clusters; computing a similarity of each sentence in the set of informative sentences to the template sentences; assigning each of the sentences of the set of informative sentences to a cluster of the clusters in accordance with the computed similarities with the template sentences; and removing clusters having fewer than a threshold number of assigned sentences.
- the clustering the sentences may further include, iteratively, selecting additional template sentences from sentences that were not assigned to a cluster and repeating the selecting the plurality of template sentences, computing the similarity of each sentence to the template sentences; the assigning the each of the sentences to the clusters; and removing clusters having fewer than the threshold number of assigned sentences until all sentences are assigned or until an iteration limit is reached.
- the naming each of the clusters may include, for each cluster: computing a term frequency- inverse document frequency for each sentence in the cluster and naming the cluster with a sentence of the cluster having the highest term frequency- inverse document frequency.
- a method for exploring concepts automatically identified in a plurality of interactions includes: receiving, on a processor, a query comprising a concept; retrieving, by the processor, a cluster containing the concept; and displaying the retrieved cluster, wherein the concepts are automatically identified by: filtering, on the processor, the interactions based on intervals; creating, on the processor, a plurality of sentences from the filtered interactions; computing, on the processor, a saliency of each the sentences; pruning away, on the processor, sentences with low saliency for generating a set of informative sentences; and clustering, on the processor, the sentences of the set of informative sentences for generating a plurality of sentence clusters, each of the clusters corresponding to a concept of the concepts.
- a computer system includes: a processor; and a memory, wherein the memory has stored thereon instructions that, when executed by the processor, causes the processor to identify concepts in a plurality of interactions by: filtering the interactions based on intervals; creating a plurality of sentences from the filtered interactions; computing a saliency of each the sentences; pruning away sentences with low saliency for generating a set of informative sentences; clustering the sentences of the set of informative sentences for generating a plurality of sentence clusters, each of the clusters corresponding to a concept of the concepts; computing a saliency of each of the clusters; and naming each of the clusters.
- the interactions may include an output of a voice recognition system.
- the instructions may further include instructions for filtering the output of the voice recognition system based on word confidence.
- the voice recognition system may be a large-vocabulary continuous speech recognition system.
- the intervals may be time intervals.
- the clustering the sentences may include: selecting a plurality of template sentences from the set of informative sentences, each of the template sentences corresponding to one of the clusters; computing a similarity of each sentence in the set of informative sentences to the template sentences; assigning each of the sentences of the set of informative sentences to a cluster of the clusters in accordance with the computed similarities with the template sentences; and removing clusters having fewer than a threshold number of assigned sentences.
- the clustering the sentences may further include, iteratively, selecting additional template sentences from sentences that were not assigned to a cluster and repeating the selecting the plurality of template sentences, computing the similarity of each sentence to the template sentences; the assigning the each of the sentences to the clusters; and removing clusters having fewer than the threshold number of assigned sentences until all sentences are assigned or until an iteration limit is reached.
- the naming each of the clusters may include, for each cluster: computing a term frequency- inverse document frequency for each sentence in the cluster and naming the cluster with a sentence of the cluster having the highest term frequency-inverse document frequency.
- a computer system includes: a processor; and a memory, wherein the memory has stored thereon instructions that, when executed by the processor, causes the processor to respond to requests for exploration of concepts by: receiving a query comprising a concept; retrieving a cluster containing the concept; and displaying the retrieved cluster, wherein the concepts are automatically identified by: filtering the interactions based on intervals; creating a plurality of sentences from the filtered interactions; computing a saliency of each the sentences; pruning away sentences with low saliency for generating a set of informative sentences; and clustering the sentences of the set of informative sentences for generating a plurality of sentence clusters, each of the clusters corresponding to a concept of the concepts.
- aspects of embodiments of the present invention are directed to addressing two issues related to discovery and exploration of data.
- One aspect of the present invention is directed to the automatic discovery and extraction of concepts from a set of documents without human assistance.
- embodiments of the present invention are directed to understanding why a certain document or phone call is classified into a predefined category or why the document or phone call contains a particular concept (which may be newly identified).
- embodiments of the present invention are directed to discovering what is the "root cause” for various "symptoms" reported by customers and inferring a predefined category or newly discovered concept "A” that is the root cause for the call to be categorized with a predefined category or newly discovered concept "B", so an organization or a user can resolve the underlying issue.
- a method for determining a cause of events detected in a plurality of interactions includes: identifying, on a processor, a plurality of elements in the interactions; detecting, on the processor, a plurality of sequences of elements in the interactions; mining, on the processor, the plurality of sequences for generating a set of supported patterns; computing, on the processor, association rules from the set of supported patterns; and returning the computed association rules.
- the elements may include defined topics and identified concepts.
- the identified concepts may be derived by: filtering, on the processor, the interactions based on intervals; creating, on the processor, a plurality of sentences from the filtered interactions; computing, on the processor, a saliency of each the sentences; pruning away, on the processor, sentences with low saliency for generating a set of informative sentences; clustering, on the processor, the sentences of the set of informative sentences for generating a plurality of sentence clusters, each of the clusters corresponding to a concept of the concepts; computing, on the processor, a saliency of each of the clusters; and naming, on the processor, each of the clusters.
- the detecting the sequences of elements may include: sorting the identified elements in each interaction by timestamp within the interaction for generating the plurality of sequences; and condensing each of the sequences for removing repeated consecutive elements.
- the method may further include condensing, on the processor, the set of supported patterns for removing repeated elements in the sequences.
- the computing the association rules from the set of condensed supported patterns may include: computing an association rule for each sequence pattern in the condensed supported patterns, the computing the association rule including, if the sequence pattern includes more than one element: dividing the sequence pattern into a first portion and a second portion, the second portion including the last element in the sequence pattern;
- generating a proposed association rule the proposed association rule being a logical implication from the first portion to the second portion; computing a confidence of the proposed association rule; if the computed confidence is greater than a threshold confidence level, storing the proposed association rule and moving the last element in the first portion to the second portion and iteratively repeating generating the proposed association rule and computing the confidence; and if the computed confidence is less than a threshold confidence level, ending analysis of the sequence pattern and returning a previously stored proposed association rule as the association rule or returning no association rule if no such proposed association rules were stored.
- the method may further include computing a lift and a saliency of the association rule.
- the method may further include computing a lift and a saliency of the proposed association rule.
- a method for determining a root cause of an event detected in a plurality of interactions includes: receiving, on a processor, a query for the root cause of the event; searching, on the processor, a plurality of association rules, each of the association rules including one or more first portion elements and one or more second portion elements, each of the association rules being a logical implication from the first portion to the second portion; and returning, from the processor, one or more association rules matching the query, the second portion elements of each of the matching association rules including the event.
- Each of the association rules may have a corresponding confidence value
- the query may include a confidence threshold
- the returning the one or more association rules matching the query may further include returning association rules matching the query that have confidence values exceeding the confidence threshold.
- Each of the association rules may have a corresponding lift value and a corresponding saliency value
- the query may include at least one of a lift threshold and a saliency threshold
- the returning the one or more association rules matching the query may further include returning association rules matching the query that have lift values or saliency values exceeding the lift threshold or the saliency threshold.
- a computer system includes: a processor; and a memory, wherein the memory has stored thereon instructions that, when executed by the processor, causes the processor to compute association rules between events detected in a plurality of interactions by: identifying a plurality of elements in the interactions; detecting a plurality of sequences of elements in the interactions; mining the plurality of sequences for generating a set of supported patterns; computing association rules from the set of supported patterns; and returning the computed association rules.
- the elements may include defined topics and identified concepts.
- the identified concepts may be derived by: filtering, on the processor, the interactions based on intervals; creating, on the processor, a plurality of sentences from the filtered interactions;
- the processor may be further configured to compute association rules between events detected in a plurality of interactions by: sorting the identified elements in each interaction by timestamp within the interaction for generating the plurality of sequences; and condensing each of the sequences for removing repeated consecutive elements.
- the processor may be further configured to compute association rules between events detected in a plurality of interactions by condensing the set of supported patterns for removing repeated elements in the sequences.
- the computing the association rules from the set of condensed supported patterns may include: computing an association rule for each sequence pattern in the condensed supported patterns, the computing the association rule including, if the sequence pattern includes more than one element: dividing the sequence pattern into a first portion and a second portion, the second portion including the last element in the sequence pattern;
- generating a proposed association rule the proposed association rule being a logical implication from the first portion to the second portion; computing a confidence of the proposed association rule; if the computed confidence is greater than a threshold confidence level, storing the proposed association rule and moving the last element in the first portion to the second portion and iteratively repeating generating the proposed association rule and computing the confidence; and if the computed confidence is less than a threshold confidence level, ending analysis of the sequence pattern and returning a previously stored proposed association rule as the association rule or returning no association rule if no such proposed association rules were stored.
- the computing the association rule may further include computing a lift and a saliency of the association rule.
- the computing the association rule may further include computing a lift and a saliency of the proposed association rule.
- a computer system includes: a processor; and a memory, wherein the memory has stored thereon instructions that, when executed by the processor, causes the processor to respond to a query for the root cause of an event by: receiving the query for the root cause of the event; searching a plurality of association rules, each of the association rules including one or more first portion elements and one or more second portion elements, each of the association rules being a logical implication from the first portion to the second portion; and returning one or more association rules matching the query, the second portion elements of each of the matching association rules including the event.
- FIG. 1 is a schematic block diagram of a system supporting a contact center that is configured to provide access to searchable transcripts to customer service agents according to one exemplary embodiment of the invention.
- FIG. 2A is a block diagram of a computing device according to an embodiment of the present invention.
- FIG. 2B is a block diagram of a computing device according to an embodiment of the present invention.
- FIG. 2C is a block diagram of a computing device according to an embodiment of the present invention.
- FIG. 2D is a block diagram of a computing device according to an embodiment of the present invention.
- FIG. 2E is a block diagram of a network environment including several computing devices according to an embodiment of the present invention.
- FIG. 3 is a screenshot of a category distribution report according to one embodiment of the present invention.
- FIG. 4 is a screenshot illustrating an interface for customizing and defining predefined categories according to one embodiment of the present invention.
- FIG. 5 is a screenshot illustrating an interface for exploring relationships between topics in a plurality of interactions according to one embodiment of the present invention.
- FIG. 6 illustrates a user interface for exploring clustering of key terms according to one embodiment of the present invention.
- FIG. 7 is a flowchart illustrating a method for detecting topics within interactions according to one embodiment of the present invention.
- FIG. 8 is a flowchart illustrating a method for clustering sentences according to one embodiment of the present invention.
- FIG. 9 is a screenshot listing deduced association rules between causes and events along with support, confidence, lift, and saliency levels for each of the derived inference rules according to one embodiment of the present invention.
- FIG. 10 is a flowchart illustrating a method for determining causes of events according to one embodiment of the present invention.
- FIG. 1 1 is a flowchart illustrating a method for generating association rules according to one embodiment of the present invention.
- FIG. 12 is an illustration an output of the method for determining causes of events according to one embodiment of the present invention.
- various applications and aspects of the present invention may be implemented in software, firmware, hardware, and combinations thereof.
- the software may operate on a general purpose computing device such as a server, a desktop computer, a tablet computer, a smartphone, or a personal digital assistant.
- a general purpose computer includes a general purpose processor and memory.
- Some embodiments of the present invention will be described in the context of a contact center. However, embodiments of the present invention are not limited thereto and may also be used in under other conditions involving searching recorded audio such as in computer based education systems, voice messaging systems, medical transcripts, or any speech corpora from any source.
- aspects of embodiments of the present invention are directed to a system and method for automatically inferring and deducing topics of discussion (or "concepts") from a body of recorded or live interactions (or conversations). These interactions may include, for example, telephone conversations, text-based chat sessions, email conversation threads, and the like.
- the inferring of these concepts does not require manual categorization by a human and can be performed by the system (or the “analytics system") according to embodiments of the present invention. Therefore, new, previously unidentified topics of conversation can quickly be identified and brought to the attention of an organization without performing a manual analysis of conversation logs.
- Bluetooth® connectivity and there were no predefined categories in the interactions analytics system to match the phrases “Bluetooth connection” or “Bluetooth pairing” to issues with Bluetooth® connections. In conventional systems, this category might go undetected until those phrases were manually added to the analytics system.
- embodiments of the present invention are directed to a system and method for identifying salient phrases, generating new categories (or "concepts") based on these identified phrases, and categorizing interactions based on these automatically identified categories. As a result, embodiments of the present invention can be used to alert
- FIG. 3 is a screenshot of a portion of a category distribution report 1 showing exemplary categories "New Customer,” “Emergency,” “Identification,” “Billing,” and “Payment Inquiry” along the number of interactions categorized into each of these categories and the percentages of all calls that involve these categories. Note that the percentages add up to more than 100% because any given interaction may be assigned to multiple categories or not assigned to any category. Viewing this category distribution report, an organization can assess the most frequently discussed topics.
- Another aspect of embodiments of the present invention is directed to systems and methods for automatically determining possible root causes of events and concepts within a conversation. For example, an internet service provider may be alerted to a large number of requests to cancel service plans. Using embodiments of the present invention, the
- association rules or inference rules
- FIG. 9 is a screenshot listing deduced association rules between causes (labeled “Left Hand Side”) and events (labeled “Right Hand Side”) along with support, confidence, lift, and saliency levels for each of the derived inference rules according to one embodiment of the present invention.
- a user can search for rules relating to a particular topic (e.g., customer "Dissatisfaction”) by selecting the topic from the "Target Topic” dropdown box, in which case rules containing "Dissatisfaction" on the "Right Hand Side” will be shown.
- rules relating to a particular topic e.g., customer "Dissatisfaction”
- rules containing "Dissatisfaction” on the "Right Hand Side” will be shown.
- the rules "Transfer + Dispute -> Dissatisfaction” and "No Payment + On Hold -> Dissatisfaction” would be shown among the rules involving the
- embodiments of the present invention are directed to systems and methods for providing timely summary of trends in topics of discussion in a collection of interactions and systems and methods for determining root causes of predefined and inferred topics of discussion based on, for example, correlations with particular other topics within the same interaction.
- the above-described systems and methods are used in the context of a contact center and are used to monitor and infer topics of conversation during interactions between customers and an organization and to analyze and determine root causes of events for display to members of the organization.
- FIG. 1 is a schematic block diagram of a system supporting a contact center that is configured to provide customer availability information to customer service agents according to one exemplary embodiment of the invention.
- the contact center may be an in-house facility to a business or corporation for serving the enterprise in performing the functions of sales and service relative to the products and services available through the enterprise.
- the contact center may be a third-party service provider.
- the contact center may be hosted in equipment dedicated to the enterprise or third-party service provider, and/or hosted in a remote computing environment such as, for example, a private or public cloud environment with infrastructure for supporting multiple contact centers for multiple enterprises.
- the contact center includes resources (e.g. personnel, computers, and telecommunication equipment) to enable delivery of services via telephone or other communication mechanisms.
- resources e.g. personnel, computers, and telecommunication equipment
- Such services may vary depending on the type of contact center, and may range from customer service to help desk, emergency response, telemarketing, order taking, and the like.
- Each of the end user devices 10 may be a communication device conventional in the art, such as, for example, a telephone, wireless phone, smart phone, personal computer, electronic tablet, and/or the like. Users operating the end user devices 10 may initiate, manage, and respond to telephone calls, emails, chats, text messaging, web-browsing sessions, and other multi-media transactions.
- Inbound and outbound calls from and to the end users devices 10 may traverse a telephone, cellular, and/or data communication network 14 depending on the type of device that is being used.
- the communications network 14 may include a private or public switched telephone network (PSTN), local area network (LAN), private wide area network (WAN), and/or public wide area network such as, for example, the Internet.
- PSTN public switched telephone network
- LAN local area network
- WAN private wide area network
- the communications network 14 may also include a wireless carrier network including a code division multiple access (CDMA) network, global system for mobile communications (GSM) network, and/or any 3G or 4G network conventional in the art.
- CDMA code division multiple access
- GSM global system for mobile communications
- the contact center includes a switch/media gateway 12 coupled to the communications network 14 for receiving and transmitting calls between end users and the contact center.
- the switch/media gateway 12 may include a telephony switch configured to function as a central switch for agent level routing within the center.
- the switch 12 may include an automatic call distributor, a private branch exchange (PBX), an IP-based software switch, and/or any other switch configured to receive Internet-sourced calls and/or telephone network-sourced calls.
- the switch is coupled to a call server 18 which may, for example, serve as an adapter or interface between the switch and the remainder of the routing, monitoring, and other call-handling systems of the contact center.
- the contact center may also include a multimedia/social media server for engaging in media interactions other than voice interactions with the end user devices 10 and/or web servers 32.
- the media interactions may be related, for example, to email, vmail (voice mail through email), chat, video, text-messaging, web, social media, screen- sharing, and the like.
- the web servers 32 may include, for example, social interaction site hosts for a variety of known social interaction sites to which an end user may subscribe, such as, for example, Facebook, Twitter, and the like.
- the web servers may also provide web pages for the enterprise that is being supported by the contact center. End users may browse the web pages and get information about the enterprise's products and services.
- the web pages may also provide a mechanism for contacting the contact center, via, for example, web chat, voice call, email, web real time communication (WebRTC), or the like.
- WebRTC web real time communication
- the switch is coupled to an interactive voice response (IVR) server 34.
- IVR server 34 is configured, for example, with an IVR script for querying customers on their needs. For example, a contact center for a bank may tell callers, via the IVR script, to "press 1 " if they wish to get an account balance. If this is the case, through continued interaction with the IVR, customers may complete service without needing to speak with an agent.
- the call is forwarded to the call server 18 which interacts with a routing server 20 for finding an appropriate agent for processing the call.
- the call server 18 may be configured to process PSTN calls, VoIP calls, and the like.
- the call server 18 may include a session initiation protocol (SIP) server for processing SIP calls.
- SIP session initiation protocol
- the call server may place the call in, for example, a call queue.
- the call queue may be implemented via any data structure conventional in the art, such as, for example, a linked list, array, and/or the like.
- the data structure may be maintained, for example, in buffer memory provided by the call server 18.
- each agent device 38 may include a telephone adapted for regular telephone calls, VoIP calls, and the like.
- the agent device 38 may also include a computer for communicating with one or more servers of the contact center and performing data processing associated with contact center operations, and for interfacing with customers via a variety of communication mechanisms such as chat, instant messaging, voice calls, and the like.
- the selection of an appropriate agent for routing an inbound call may be based, for example, on a routing strategy employed by the routing server 20, and further based on information about agent availability, skills, and other routing parameters provided, for example, by a statistics server 22.
- the statistics server 22 includes a customer availability aggregation (CAA) module 36 for monitoring availability of end users on different communication channels and providing such information to, for example, the routing server 20, agent devices 38a-38c, and/or other contact center applications and devices.
- the CAA module may also be deployed in a separate application server.
- the aggregation module 36 may be a software module implemented via computer program instructions which are stored in memory of the statistics server 22 (or some other server), and which program instructions are executed by a processor.
- a person of skill in the art should recognize that the aggregation module 36 may also be implemented via firmware (e.g. an application-specific integrated circuit), hardware, or a combination of software, firmware, and hardware.
- the aggregation module 36 is configured to receive customer availability information from other devices in the contact center, such as, for example, the multimedia/social media server 24.
- the multimedia/social media server 24 may be configured to detect user presence on different websites including social media sites, and provide such information to the aggregation module 36.
- the multimedia/social media server 24 may also be configured to monitor and track interactions on those websites.
- the multimedia/social media server 24 may also be configured to provide, to an end user, a mobile application 40 for downloading onto the end user device 10.
- the mobile application 40 may provide user configurable settings that indicate, for example, whether the user is available, not available, or availability is unknown, for purposes of being contacted by a contact center agent.
- the multimedia/social media server 24 may monitor the status settings and send updates to the aggregation module each time the status information changes.
- the contact center may also include a reporting server 28 configured to generate reports from data aggregated by the statistics server 22.
- reports may include near realtime reports or historical reports concerning the state of resources, such as, for example, average waiting time, abandonment rate, agent occupancy, and the like.
- the reports may be generated automatically or in response to specific requests from a requestor (e.g.
- agent/administrator contact center application, and/or the like.
- the routing server 20 is enhanced with functionality for managing back-office/offline activities that are assigned to the agents. Such activities may include, for example, responding to emails, responding to letters, attending training seminars, or any other activity that does not entail real time communication with a customer.
- activities may include, for example, responding to emails, responding to letters, attending training seminars, or any other activity that does not entail real time communication with a customer.
- an activity an activity may be pushed to the agent, or may appear in the agent's workbin 26a-26c (collectively referenced as 26) as a task to be completed by the agent.
- the agent's workbin may be implemented via any data structure conventional in the art, such as, for example, a linked list, array, and/or the like.
- the workbin may be maintained, for example, in buffer memory of each agent device 38.
- the contact center also includes one or more mass storage devices 30 for storing different databases relating to agent data (e.g. agent profiles, schedules, etc.), customer data (e.g. customer profiles), interaction data (e.g. details of each interaction with a customer, including reason for the interaction, disposition data, time on hold, handle time, etc.), and the like.
- agent data e.g. agent profiles, schedules, etc.
- customer data e.g. customer profiles
- interaction data e.g. details of each interaction with a customer, including reason for the interaction, disposition data, time on hold, handle time, etc.
- some of the data may be provided by a third party database such as, for example, a third party customer relations management (CRM) database.
- CRM customer relations management
- the mass storage device may take form of a hard disk or disk array as is conventional in the art.
- the contact center 102 also includes a call recording server 40 for recording the audio of calls conducted through the contact center 102, a call recording storage server 42 for storing the recorded audio, a speech analytics server 44 configured to process and analyze audio collected in the from the contact center 102, and a speech index database 46 for providing an index of the analyzed audio.
- a call recording server 40 for recording the audio of calls conducted through the contact center 102
- a call recording storage server 42 for storing the recorded audio
- a speech analytics server 44 configured to process and analyze audio collected in the from the contact center 102
- a speech index database 46 for providing an index of the analyzed audio.
- the speech analytics server 44 may be coupled to (or may include) an analytics server 45 including a topic detecting module 45a, a root cause mining module 45b, and a user interface module 45c.
- the analytics server 45 may be configured to provide the automatic detection of topics from interactions recorded by the call recording server 40 and stored on the call recording storage server 42.
- the analytics server 45 may also access data stored on, for example, the multimedia/social media server 24 in order to process interactions from various chat, social media, email, and other non-voice interactions.
- the various servers of FIG. 1 may each include one or more processors executing computer program instructions and interacting with other system components for performing the various functionalities described herein.
- the computer program instructions are stored in a memory implemented using a standard memory device, such as, for example, a random access memory (RAM).
- the computer program instructions may also be stored in other non- transitory computer readable media such as, for example, a CD-ROM, flash drive, or the like.
- a standard memory device such as, for example, a random access memory (RAM).
- the computer program instructions may also be stored in other non- transitory computer readable media such as, for example, a CD-ROM, flash drive, or the like.
- the functionality of each of the servers is described as being provided by the particular server, a person of skill in the art should recognize that the functionality of various servers may be combined or integrated into a single server, or the functionality of a particular server may be distributed across one or more other servers without departing from the scope of the embodiments of the present invention.
- Each of the various servers in the contact center may be a process or thread, running on one or more processors, in one or more computing devices 500 (e.g., FIG. 2A, FIG. 2B), executing computer program instructions and interacting with other system components for performing the various functionalities described herein.
- the computer program instructions are stored in a memory which may be implemented in a computing device using a standard memory device, such as, for example, a random access memory (RAM).
- the computer program instructions may also be stored in other non- transitory computer readable media such as, for example, a CD-ROM, flash drive, or the like.
- a computing device may be implemented via firmware (e.g.
- FIG. 2A and FIG. 2B depict block diagrams of a computing device 500 as may be employed in exemplary embodiments of the present invention. As shown in FIG. 2A and FIG.
- each computing device 500 includes a central processing unit 521, and a main memory unit 522.
- a computing device 500 may include a storage device 528, a removable media interface 516, a network interface 518, an input/output (I/O) controller 523, one or more display devices 530c, a keyboard 530a and a pointing device 530b, such as a mouse.
- the storage device 528 may include, without limitation, storage for an operating system and software.
- each computing device 500 may also include additional optional elements, such as a memory port 503, a bridge 570, one or more additional input/output devices 530d, 530e and a cache memory 540 in communication with the central processing unit 521.
- Input/output devices e.g., 530a, 530b, 530d, and 530e, may be referred to herein using reference numeral 530.
- the central processing unit 521 is any logic circuitry that responds to and processes instructions fetched from the main memory unit 522. It may be implemented, for example, in an integrated circuit, in the form of a microprocessor, microcontroller, or graphics processing unit (GPU), or in a field-programmable gate array (FPGA) or
- Main memory unit 522 may be one or more memory chips capable of storing data and allowing any storage location to be directly accessed by the central processing unit 521.
- the central processing unit 521 communicates with main memory 522 via a system bus 550.
- FIG. 2B depicts an embodiment of a computing device 500 in which the central processing unit 521 communicates directly with main memory 522 via a memory port 503.
- FIG. 2B depicts an embodiment in which the central processing unit 521 communicates directly with cache memory 540 via a secondary bus, sometimes referred to as a backside bus.
- the central processing unit 521 communicates with cache memory 540 using the system bus 550.
- Cache memory 540 typically has a faster response time than main memory 522.
- the central processing unit 521 communicates with various I/O devices 530 via a local system bus 550.
- Various buses may be used as a local system bus 550, including a Video Electronics
- FIG. 2B depicts an embodiment of a computer 500 in which the central processing unit 521 communicates directly with I/O device 530e.
- FIG. 2B also depicts an embodiment in which local busses and direct communication are mixed: the central processing unit 521 communicates with I/O device 530d using a local system bus 550 while communicating with I/O device 530e directly.
- I/O devices 530 may be present in the computing device 500.
- Input devices include one or more keyboards 530a, mice, trackpads, trackballs, microphones, and drawing tablets.
- Output devices include video display devices 530c, speakers, and printers.
- An I/O controller 523 may control the I/O devices.
- the I/O controller may control one or more I/O devices such as a keyboard 530a and a pointing device 530b, e.g., a mouse or optical pen.
- the computing device 500 may support one or more removable media interfaces 516, such as a floppy disk drive, a CD-ROM drive, a DVD-ROM drive, tape drives of various formats, a USB port, a Secure Digital or COMPACT FLASHTM memory card port, or any other device suitable for reading data from read-only media, or for reading data from, or writing data to, read-write media.
- An I/O device 530 may be a bridge between the system bus 550 and a removable media interface 516.
- the removable media interface 516 may for example be used for installing software and programs.
- the computing device 500 may further comprise a storage device 528, such as one or more hard disk drives or hard disk drive arrays, for storing an operating system and other related software, and for storing application software programs.
- a removable media interface 516 may also be used as the storage device.
- the operating system and the software may be run from a bootable medium, for example, a bootable CD.
- the computing device 500 may comprise or be connected to multiple display devices 530c, which each may be of the same or different type and/or form.
- any of the I/O devices 530 and/or the I/O controller 523 may comprise any type and/or form of suitable hardware, software, or combination of hardware and software to support, enable or provide for the connection to, and use of, multiple display devices 530c by the computing device 500.
- the computing device 500 may include any type and/or form of video adapter, video card, driver, and/or library to interface, communicate, connect or otherwise use the display devices 530c.
- a video adapter may comprise multiple connectors to interface to multiple display devices 530c.
- the computing device 500 may include multiple video adapters, with each video adapter connected to one or more of the display devices 530c. In some embodiments, any portion of the operating system of the computing device 500 may be configured for using multiple display devices 530c. In other embodiments, one or more of the display devices 530c may be provided by one or more other computing devices, connected, for example, to the computing device 500 via a network. These embodiments may include any type of software designed and constructed to use the display device of another computing device as a second display device 530c for the computing device 500.
- a computing device 500 may be configured to have multiple display devices 530c.
- a computing device 500 of the sort depicted in FIG. 2A and FIG. 2B may operate under the control of an operating system, which controls scheduling of tasks and access to system resources.
- the computing device 500 may be running any operating system, any embedded operating system, any real-time operating system, any open source operating system, any proprietary operating system, any operating systems for mobile computing devices, or any other operating system capable of running on the computing device and performing the operations described herein.
- the computing device 500 may be any workstation, desktop computer, laptop or notebook computer, server machine, handheld computer, mobile telephone or other portable telecommunication device, media playing device, gaming system, mobile computing device, or any other type and/or form of computing, telecommunications or media device that is capable of communication and that has sufficient processor power and memory capacity to perform the operations described herein.
- the computing device 500 may be a virtualized computing device and the virtualized computing device may be running in a networked or cloud based environment.
- the computing device 500 may have different processors, operating systems, and input devices consistent with the device.
- the computing device 500 is a mobile device, such as a
- the computing device 500 comprises a combination of devices, such as a mobile phone combined with a digital audio player or portable media player.
- the central processing unit 521 may comprise multiple processors PI, P2, P3, P4, and may provide functionality for simultaneous execution of instructions or for simultaneous execution of one instruction on more than one piece of data.
- the computing device 500 may comprise a parallel processor with one or more cores.
- the computing device 500 is a shared memory parallel device, with multiple processors and/or multiple processor cores, accessing all available memory as a single global address space.
- the computing device 500 is a distributed memory parallel device with multiple processors each accessing local memory only.
- the computing device 500 has both some memory which is shared and some memory which may only be accessed by particular processors or subsets of processors.
- the central processing unit 521 comprises a multicore microprocessor, which combines two or more independent processors into a single package, e.g., into a single integrated circuit (IC).
- the computing device 500 includes at least one central processing unit 521 and at least one graphics processing unit 52 ⁇ .
- a central processing unit 521 provides single instruction, multiple data (SIMD) functionality, e.g., execution of a single instruction simultaneously on multiple pieces of data.
- SIMD single instruction, multiple data
- several processors in the central processing unit 521 may provide functionality for execution of multiple instructions simultaneously on multiple pieces of data (MIMD).
- MIMD multiple pieces of data
- the central processing unit 521 may use any combination of SIMD and MIMD cores in a single device.
- a computing device may be one of a plurality of machines connected by a network, or it may comprise a plurality of machines so connected.
- FIG. 2E shows an exemplary network environment.
- the network environment comprises one or more local machines 502a, 502b (also generally referred to as local machine(s) 502, client(s) 502, client node(s) 502, client machine(s) 502, client computer(s) 502, client device(s) 502, endpoint(s) 502, or endpoint node(s) 502) in communication with one or more remote machines 506a, 506b, 506c (also generally referred to as server machine(s) 506 or remote machine(s) 506) via one or more networks 504.
- a local machine 502 has the capacity to function as both a client node seeking access to resources provided by a server machine and as a server machine providing access to hosted resources for other clients 502a, 502b.
- the network 504 may be a local-area network (LAN), e.g., a private network such as a company Intranet, a metropolitan area network (MAN), or a wide area network (WAN), such as the Internet, or another public network, or a combination thereof.
- LAN local-area network
- MAN metropolitan area network
- WAN wide area network
- the computing device 500 may include a network interface 518 to interface to the network 504 through a variety of connections including, but not limited to, standard telephone lines, local-area network (LAN), or wide area network (WAN) links, broadband connections, wireless connections, or a combination of any or all of the above. Connections may be established using a variety of communication protocols.
- the computing device 500 communicates with other computing devices 500 via any type and/or form of gateway or tunneling protocol such as Secure Socket Layer (SSL) or Transport Layer Security (TLS).
- the network interface 518 may comprise a built-in network adapter, such as a network interface card, suitable for interfacing the computing device 500 to any type of network capable of communication and performing the operations described herein.
- An I/O device 530 may be a bridge between the system bus 550 and an external communication bus.
- exploration and discovery technologies are directed toward discovering interesting phenomena without the user input—in other words, identifying information that is relevant to the user without the user explicitly looking for this information.
- Categorization technologies are focused on classifying documents (e.g., text, audio, and video) into predefined categories such as "all the calls in which a customer has asked to speak to a supervisor.”
- FIG. 3 is a screenshot of a category distribution report according to one embodiment of the present invention.
- the voice calls, customer-agent phone conversations (or interactions) that have occurred in the last 7 days have been classified into categories (e.g., predefined categories) that represent the set of known reasons for calls.
- conversations are aggregated over different time periods (e.g., over the past day, over the past hour, over the past month, since a particular date, or between two arbitrary dates) may be aggregated.
- the interactions may be limited to particular communication channels, such as one or more of telephone, email, chat, and social media, limited to interactions from particular contact centers, or limited to interactions from particular departments (e.g., sales or customer support).
- FIG. 4 is a screenshot illustrating an interface for customizing and defining predefined categories according to one embodiment of the present invention.
- Each predefined category can be defined as some Boolean expression of topics where each topic may be defined as a union of phrases or words, thereby producing a set of categorizing rules used to classify the interactions.
- FIG. 4 illustrates the definition of the "Repeat Call or Contact” category, which is defined by interactions having 'Found topic "Repeat Calls" at least once with Very-Low strictness OR Found topic “Repeat Contacts” at least once with Very-Low strictness' .
- the "Repeat Calls" and “Repeat Contact” topics may be triggered, for example, by detecting particular triggering events such as a record of multiple calls from a particular phone number or by identifying particular phrases in the interaction such as "thanks for calling again".
- the analytics server 45 can generate the category distribution report by counting the number of interactions within a given time period that fall within each category.
- an analytics server 45 provides a user with the ability to view or "explore” related words, as illustrated, for example, in FIG. 5.
- a user can start from a single word and explore the co-occurrence of the starting word with other words in various conversations. For instance, FIG. 5 depicts the relationship or co-occurrence of the word "credit” with other words in the set of relevant calls.
- the analytics server 45 and the topic detecting module 45 a provide a user interface through user interface module 45 c for a user to select a category from a set of categories (e.g., a set of predefined categories) and a group of calls that were classified into this category in a certain interval (e.g., last 7 days) and request information regarding why the group of calls were classified into the given category.
- the analytics server 45 and the user interface module 45c can present to the user, words and phrases that often occur and that are informative around occurrences of this category (e.g., words that appear with higher frequency than their occurrence in general within this category), thus characterizing the cause, or what is caused by, this category.
- FIG. 6 illustrates a user interface 6 for exploring clustering of key terms (or "buzz around categories") according to one embodiment of the present invention.
- a time range of documents may be specified in field 61.
- the language of the interactions (or documents) and general field e.g., business category
- the types of interactions e.g., audio from telephone calls, chats, email, and social media interactions
- checkboxes 63 As shown in FIG. 6, various defined categories 67 of calls, such as "Account Balance,” “Billing Issue,” “Escalation,”
- “Dissatisfaction,” and “Positive Feedback” are listed, with each row showing the percentage of interactions being classified into the given categories, the total number of interactions matching that category, and percentage change from a prior period (e.g., if the percentages currently shown are for the past week, the percentage change may be shown in comparison to, for example, two weeks ago or, as another example, the same week last year).
- pane 65 of the user interface As seen in FIG. 6, phrases from the interactions that are common to interactions within the selected category are shown in pane 65.
- the size of the words may correspond to the frequency with which the phrases appear in the interaction.
- the user interface may also show a graph showing the number of interactions in this category over time.
- FIG. 7 is a flowchart of a process executed by the analytics server 45 and the topic detecting module 45a for extracting concepts from interactions (e.g., text and text transcriptions of audio) according to one embodiment of the present invention.
- sentences that are semantically related are grouped together (or clustered) as conveying the same idea.
- Clustering is a machine learning technique that can be used to take sentences as input and to cluster the sentences together when the important portions of the sentences are appear to be the similar or the same.
- Each one of these clusters is a concept as mentioned above.
- the text that appears around the location of a phrase associated with a category for example, 30 seconds before and after the phrase
- concepts or phrases are extracted from interactions by supplying the entire body of interactions (or the entire body of text) to the system (e.g., the analytics server 45 as shown in FIG. 1, which may be a computer system 500 as shown in FIG. 2A, including the topic detecting module 45a as shown in FIG. 1) configured to perform the categorization.
- the system e.g., the analytics server 45 as shown in FIG. 1, which may be a computer system 500 as shown in FIG. 2A, including the topic detecting module 45a as shown in FIG. 1
- the system e.g., the analytics server 45 as shown in FIG. 1, which may be a computer system 500 as shown in FIG. 2A, including the topic detecting module 45a as shown in FIG.
- sentences are created out of the filtered text.
- “sentence” refers to a block of consecutive words in the text and this block of words does not necessarily correspond to a particular grammatical or orthographical unit (e.g., a complete grammatical sentence or a sequence of words beginning with a capital letter and ending in a period). For example, in one embodiment, ft-grams with overlaps
- n may be 4, which means that every consecutive 4- words form a "sentence.”
- n can be any other whole number greater than 1.
- ft-grams are merely one way to create sentences from words, and in other embodiments, other methods of forming "sentences" or blocks of consecutive worlds can be applied.
- the saliency of a sentence is computed from the saliency of the words in the sentence.
- the inverse document frequency (IDF) of a word is used to measure the saliency of word w, and the saliency of the sentence is given by the square of the sum of the IDFs of each of the words in the sentence:
- N is the total number of documents in the collection and DF(w) is the number of documents in which the word w appears.
- the sentences are pruned by sorting the sentences by saliency and discarding the sentences with low saliency relative to the top ones. For example, in one embodiment, sentences with less than 5% of the saliency of the top ones are removed from consideration.
- the sentences are clustered to group together similar sentences that differ from one another only by less-salient words. The similarity of sentences can be measured based on various text mining measures, and is described in more detail below.
- the saliency of each cluster is computed based on text mining measures.
- the saliency of a cluster is constructed from a weighted sum of the saliencies of the sentences of the cluster:
- the clusters are named with a sequence of words that is both informative and frequent in the cluster is selected.
- N is the n-gram level
- TFWF NG WF NG ⁇ TF SfC
- the IDF of an ft-gram is defined above, and the TF of the ft-gram is the number of times this ft-gram appears in the cluster.
- the cluster can be named by the ft-gram of the cluster having the largest
- a measure of similarity between ft-grams is formulated in which two ft-grams are more similar if they contain the same important words, and less similar if the important words are not shared between them.
- the following similarity measure Sim exhibits these characteristics:
- sent ⁇ and sent ⁇ are sentences to be compared in similarity.
- FIG. 8 is a flowchart illustrating a method executed by the analytics server 45 and the topic detecting module 45 a for clustering sentences according to one embodiment of the present invention.
- sentences are randomly selected to serve as centers (templates) for the clusters.
- the centers are not numeric vectors (as would be typical), but instead are sequences of words, and the clustering process is based on words that occur in both sentences, in a way that somewhat resembles. See, e.g., H. Ye and S. Young, "A Clustering Approach to Semantic Decoding” ICSLP 2006, Pittsburgh, PA (2006), the entire disclosure of which is incorporated herein by reference.
- each of the sentences involved in the clustering process is compared to each of the cluster centers (templates) using a similarity formula such as the sentence similarity function Sim described above.
- each sentence is assigned to the cluster that has highest similarity to it, provided that this similarity is also greater than a threshold similarity (e.g., a predefined threshold).
- a threshold similarity e.g., a predefined threshold
- operation 204 if all of the sentences have been tried as templates or a certain number of iterations have been reached, then the ending conditions have been satisfied, and the clustering process ends. If the ending conditions are not satisfied, then the process returns to operation 200 and selects additional sentences to serve as templates, where the additional sentences are randomly selected from the set of sentences that have not yet been tried as templates.
- embodiments of the present invention can automatically identify and assign names to new topics of conversation based on detecting related phrases and label (or tag or classify) interactions as involving these topics without manual entry of trigger phrases by a user.
- a root cause mining procedure can be employed on the concepts and/or categories assigned in an earlier processing stage to infer association rules (for example, logical implication) between them.
- association rules for example, logical implication
- an association rule between categories and/or concepts A, B, and C such as A, B ⁇ C, can indicate that the root cause of C is having A and B occurring in the same call or document prior to C.
- A, B, and C can be predefined categories or newly discovered concepts, or a mixture thereof.
- FIG. 9 is a screenshot illustrating a user interface for querying and viewing inferred association rules. The association rules are shown along with support and confidence levels of each of them, along with lift and saliency measures.
- the two approaches described above are combined: First, the user can select a category (e.g., category "C”) and then choose to "search” for relationships to other categories.
- the root cause mining module 45b of the analytics server 45 then performs mining of association rules of categories that cause category C, i. e. , rules having C on their right hand side (RHS) (for example B ⁇ C). For deeper investigation, the user can then choose to see the "buzz around relation.”
- the root cause mining module 45 b can then extract concepts only from the set of interactions having this relation, in a similar way to "Buzz around categories" as shown in FIG. 5, which is a screenshot illustrating an interface for exploring relationships between topics in a plurality of interactions according to one embodiment of the present invention.
- the root cause mining module 45b can be configured and constrained to look only on a part of an interaction that starts just before category B and ends just after category C.
- a "term” is a part of a “topic” and a “topic' a part of “category.”
- the systems and methods described herein can be applied to any of these levels (term/phrase, topic, or category).
- the term “I want to speak to a supervisor” can be part of "Escalation” topic which can be part of a category "the customer asks to escalate more than once at the same call”.
- Embodiments of the present invention can be applied to any and all of these levels and, for the sake of convenience, are described herein with respect to topics. However, embodiments of the present invention are not limited to use with topics and may be used at other levels, such as terms, phrases, and categories.
- FIG. 9 is a screenshot listing deduced association rules between causes and events along with support, confidence, lift, and saliency levels for each of the derived inference rules according to one embodiment of the present invention.
- Support, confidence, lift, and saliency are computed metrics that may be viewed by a user to evaluate the quality of each of the derived inference rules and are described in more detail below.
- association rules between topics but association rules between terms/phrases or categories can be generated and used in substantially the same way.
- a user can search for rules relating to a particular topic (e.g., customer "Dissatisfaction") by selecting the topic from the "Target Topic” dropdown box, in which case rules containing "Dissatisfaction" on the "Right Hand Side” will be shown.
- a user can also restrict the results to rules matching particular criteria (e.g., minimum lift and/or minimum confidence). In the example shown in FIG. 9, only rules satisfying a minimum confidence of 70 were returned and, if the "Dissatisfaction" topic were selected under “Target Topic,” only the rules “Transfer + Dispute ⁇ Dissatisfaction” and "No Payment + On Hold ⁇ Dissatisfaction” would be shown among the rules involving the "Dissatisfaction” concept.
- embodiments of the present invention can be used to mine correlations and causal relationships between predefined topics or categories, between discovered concepts, or both.
- the mined objects whether they are instances of predefined topics or instances of discovered concepts, will be referred to as "elements.”
- FIG. 10 is a flowchart illustrating a process executed by the analytics server 45 and the root cause mining module 45b for determining causes of events according to one embodiment of the present invention.
- the set of documents to be analyzed can be the entire set of interactions or any subset of documents defined by the application or selected by the user (e.g., only calls, only chats, all interactions other than sales calls, etc.).
- elements below a certain confidence value are filtered out in operation 300 and a sequence of element IDs, sorted by ascending element start time, is created in operations 301 and 302.
- the start time of the element is the recognition start time of the term contained in the topic.
- the element start time is the recognition start time of the first word in the sentence contained in the cluster.
- the set of sequences can then be condensed by eliminating occurrences of consecutive similar topics in operation 303.
- the resulting set is then mined in operation 304 using an algorithm for mining frequent sequences such as PrefixSpan, (see Pei J, Han J, Mortazavi-Asl B, Wang J, Pinto H, Chen Q, Dayal U, Hsu M-C, Mining Sequential Patterns by Pattern-Growth: The PrefixSpan Approach, IEEE Trans. Knowl. Data Eng. 16: 1424-1440 (2004)), resulting in a set F of patterns with support greater than a minimum support threshold s.
- PrefixSpan see Pei J, Han J, Mortazavi-Asl B, Wang J, Pinto H, Chen Q, Dayal U, Hsu M-C, Mining Sequential Patterns by Pattern-Growth: The PrefixSpan Approach, IEEE Trans. Knowl. Data Eng. 16: 1424-1440 (2004)
- the set of patterns F is condensed in operation 305 by deleting sequences with repetitions (even non-consecutive ones).
- the pattern (a, d, a, b) is not much more interesting than the pattern (a, d, b).
- the pattern (a, d, a, b) would be deleted.
- deleting non-repeating patterns can ensure that association rules derived from the pattern do not contain the same element in both the conditions (LHS) and the consequences (RHS).
- the pattern (a, d, a, b) can be removed from the set of patterns F without losing information regarding the 1 relationship between elements ⁇ a, d, b ⁇ because both patterns (a, d, b) and (d, a, b) remain in the set F after condensing the set F.
- sup(p) is the support function, which is defined as the number of times pattern p appears in the across all documents.
- this procedure can be performed without updating the supports of (a, d, b) and (d, a, b) because every instance of (a, d, a, b) is already counted for the patterns (a, d, b) and(d, a, b).
- association rules for the patterns in F are computed in operation 306.
- FIG. 1 1 is a flowchart illustrating, in more detail, the operation of generating association rules 306 of the process shown in FIG. 10 according to one embodiment of the present invention.
- the association rules can be computed by the analytics server 45 and the topic detection module 45a according to the following method:
- An inference rule from the first portion pi to the second portion j3 ⁇ 4 (pi ⁇ P2 or LHS -> RHS) is generated and stored.
- a confidence of the inference rule is then computed in operation 403, where confidence is computed as:
- the confidence of a rule represents the probability of having its RHS in the documents, given that they contain its LHS. In other words, given the conditions (LHS), what is the probability
- the computed confidence is compared with a threshold in operation 404 and, if the confidence is above a given threshold, then the rule is added to the results set in operation
- the last element of pi is moved to the start of j3 ⁇ 4- If this has not caused the pi to be empty (as checked in operation 406), a new inference rule is generated with the modified first and second portions pi and j3 ⁇ 4 and the confidence of the new inference rule is computed in operation 403.
- lift and saliency can also be computed for the rules, either in operation 403 or after the result set is returned in operation 409.
- the properties lift and saliency can be computed as follows:
- N is the total number of patterns in the collection and DF(s ) is the number of patterns in which the element e appears.
- Lift is a measure of probabilistic dependence of the RHS on the LHS. Lower lift indicates that the events (that is, the LHS and RHS) are more independent; for example, having lift ⁇ 1. Higher lift indicates that they are more dependent on each other; for example, having lift > 1.
- Saliency is a measure of the amount of information contained in the pattern, given the whole set of patterns. It can be viewed as the relative importance or prominence of the pattern among all of the patterns that appear in the set. Higher saliency indicates that the pattern is more interesting.
- Rules in the result set can also be sorted by confidence, lift, and saliency.
- FIG. 12 is an illustration of an output of a root cause mining process as shown, for example, in FIGS. 10 and 1 1 , according to one embodiment of the present invention. As shown in FIG. 12, each of the ovals represents a particular detected element or mined object, where the elements include:
- each path of arrows from left to right represents a different sequence of elements and darker or more strongly bolded arrows represent patterns that are better in terms of higher lift. As such, paths with more strongly bolded arrows indicate likely pathways between events that lead toward events on the right.
- the user can supply a query to the analytics server 45 through the user interface module 45c (see, e.g., FIG. 9) to search for the "purchase process event” in the derived rules, the analytics server 45 can search the data structure illustrated in FIG. 12 to identify a set of events that generally lead to the desired event. As seen in FIG.
- Embodiments of the invention can be practiced as methods or systems.
- Computer devices or systems including, for example, a microprocessor, memory, a network
- the communications device and a mass storage device can be used to execute the processes described above in an automated or semi-automated fashion.
- the above processes can be coded as computer executable code and processed by the computer device or system.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Business, Economics & Management (AREA)
- Computational Linguistics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Strategic Management (AREA)
- General Business, Economics & Management (AREA)
- Marketing (AREA)
- Finance (AREA)
- Economics (AREA)
- Development Economics (AREA)
- Accounting & Taxation (AREA)
- Acoustics & Sound (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Probability & Statistics with Applications (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Telephonic Communication Services (AREA)
Abstract
Description
Claims
Applications Claiming Priority (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US13/952,470 US10061822B2 (en) | 2013-07-26 | 2013-07-26 | System and method for discovering and exploring concepts and root causes of events |
| US13/952,459 US9971764B2 (en) | 2013-07-26 | 2013-07-26 | System and method for discovering and exploring concepts |
| PCT/US2014/048089 WO2015013554A1 (en) | 2013-07-26 | 2014-07-24 | System and method for discovering and exploring concepts |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| EP3025295A1 true EP3025295A1 (en) | 2016-06-01 |
| EP3025295A4 EP3025295A4 (en) | 2016-07-20 |
Family
ID=52393853
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| EP14828714.7A Withdrawn EP3025295A4 (en) | 2013-07-26 | 2014-07-24 | SYSTEM AND METHOD FOR DISCOVERING AND EXPLORING CONCEPTS |
Country Status (4)
| Country | Link |
|---|---|
| EP (1) | EP3025295A4 (en) |
| KR (1) | KR102111831B1 (en) |
| CN (1) | CN105745679B (en) |
| WO (1) | WO2015013554A1 (en) |
Families Citing this family (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US10061822B2 (en) | 2013-07-26 | 2018-08-28 | Genesys Telecommunications Laboratories, Inc. | System and method for discovering and exploring concepts and root causes of events |
| US9971764B2 (en) | 2013-07-26 | 2018-05-15 | Genesys Telecommunications Laboratories, Inc. | System and method for discovering and exploring concepts |
| CN107864457B (en) * | 2017-10-18 | 2021-06-01 | 上海复旦通讯股份有限公司 | Method for multi-call management in railway communication network |
| CN108345583B (en) * | 2017-12-28 | 2020-07-28 | 中国科学院自动化研究所 | Event recognition and classification method and device based on multilingual attention mechanism |
| TWI841866B (en) * | 2021-09-14 | 2024-05-11 | 中國信託商業銀行股份有限公司 | Business handling willingness determination method and computing device thereof |
| CN116484001B (en) * | 2023-04-23 | 2025-10-28 | 平安科技(深圳)有限公司 | Method, device and equipment for mining questions in question-answering model |
Family Cites Families (12)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US5860063A (en) * | 1997-07-11 | 1999-01-12 | At&T Corp | Automated meaningful phrase clustering |
| KR100638198B1 (en) * | 1998-06-02 | 2006-10-26 | 소니 가부시끼 가이샤 | Information processing apparatus and method and information providing medium |
| US6185527B1 (en) * | 1999-01-19 | 2001-02-06 | International Business Machines Corporation | System and method for automatic audio content analysis for word spotting, indexing, classification and retrieval |
| US6778979B2 (en) * | 2001-08-13 | 2004-08-17 | Xerox Corporation | System for automatically generating queries |
| US7584100B2 (en) * | 2004-06-30 | 2009-09-01 | Microsoft Corporation | Method and system for clustering using generalized sentence patterns |
| US20080154579A1 (en) * | 2006-12-21 | 2008-06-26 | Krishna Kummamuru | Method of analyzing conversational transcripts |
| US8156378B1 (en) * | 2010-10-15 | 2012-04-10 | Red Hat, Inc. | System and method for determination of the root cause of an overall failure of a business application service |
| JP5024154B2 (en) * | 2008-03-27 | 2012-09-12 | 富士通株式会社 | Association apparatus, association method, and computer program |
| TW201025035A (en) * | 2008-12-18 | 2010-07-01 | Univ Nat Taiwan | Analysis algorithm of time series word summary and story plot evolution |
| US9213687B2 (en) * | 2009-03-23 | 2015-12-15 | Lawrence Au | Compassion, variety and cohesion for methods of text analytics, writing, search, user interfaces |
| US8566360B2 (en) * | 2010-05-28 | 2013-10-22 | Drexel University | System and method for automatically generating systematic reviews of a scientific field |
| KR101339103B1 (en) * | 2011-10-05 | 2013-12-09 | (주)워드워즈 | Document classifying system and method using semantic feature |
-
2014
- 2014-07-24 EP EP14828714.7A patent/EP3025295A4/en not_active Withdrawn
- 2014-07-24 KR KR1020167005393A patent/KR102111831B1/en active Active
- 2014-07-24 WO PCT/US2014/048089 patent/WO2015013554A1/en not_active Ceased
- 2014-07-24 CN CN201480053132.9A patent/CN105745679B/en active Active
Also Published As
| Publication number | Publication date |
|---|---|
| EP3025295A4 (en) | 2016-07-20 |
| KR102111831B1 (en) | 2020-05-15 |
| WO2015013554A1 (en) | 2015-01-29 |
| CN105745679A (en) | 2016-07-06 |
| CN105745679B (en) | 2020-01-14 |
| KR20160039273A (en) | 2016-04-08 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US10061822B2 (en) | System and method for discovering and exploring concepts and root causes of events | |
| US10446135B2 (en) | System and method for semantically exploring concepts | |
| US9971764B2 (en) | System and method for discovering and exploring concepts | |
| US10824814B2 (en) | Generalized phrases in automatic speech recognition systems | |
| US11425255B2 (en) | System and method for dialogue tree generation | |
| US10061867B2 (en) | System and method for interactive multi-resolution topic detection and tracking | |
| US10652391B2 (en) | System and method for automatic quality management in a contact center environment | |
| CN112671823B (en) | Optimal routing of interactions to contact center agents based on machine learning | |
| EP3063721B1 (en) | System and method for performance-based routing of interactions in a contact center | |
| US20170300499A1 (en) | Quality monitoring automation in contact centers | |
| US20230315992A1 (en) | System and method for model derivation for entity prediction | |
| KR102111831B1 (en) | System and method for discovering and exploring concepts | |
| US20240378373A1 (en) | Systems and methods related to entity tagging of identified personal information in contact center data |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
| 17P | Request for examination filed |
Effective date: 20160226 |
|
| AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR |
|
| AX | Request for extension of the european patent |
Extension state: BA ME |
|
| A4 | Supplementary search report drawn up and despatched |
Effective date: 20160620 |
|
| RIC1 | Information provided on ipc code assigned before grant |
Ipc: G06Q 30/02 20120101AFI20160614BHEP Ipc: G06Q 30/06 20120101ALI20160614BHEP Ipc: G06F 17/27 20060101ALI20160614BHEP Ipc: G06F 17/28 20060101ALI20160614BHEP Ipc: G06Q 30/00 20120101ALI20160614BHEP Ipc: G06F 17/30 20060101ALI20160614BHEP |
|
| DAX | Request for extension of the european patent (deleted) | ||
| STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: EXAMINATION IS IN PROGRESS |
|
| 17Q | First examination report despatched |
Effective date: 20191104 |
|
| STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE APPLICATION HAS BEEN WITHDRAWN |
|
| 18W | Application withdrawn |
Effective date: 20220831 |