HK1162783A

HK1162783A - Conversation support

Info

Publication number: HK1162783A
Application number: HK12103069.6A
Authority: HK
Inventors: T. Frazier Kristopher; Stallings Heath
Original assignee: Verizon Patent And Licensing Inc.
Priority date: 2009-03-27
Filing date: 2010-03-18
Publication date: 2012-08-31

Description

Dialog support

Background

During a communication (e.g., a telephone call, chat session, etc.) with another party, one party may describe an event in which it attended, such as a recent vacation, sporting event, or meeting. However, describing an event verbally or via text often leaves the other party with insufficient details needed to understand what is being described, or simply provides the other party with an ambiguous concept of the event.

Drawings

FIG. 1 illustrates an exemplary network in which systems and methods described herein may be implemented;

FIG. 2 illustrates an exemplary configuration of the user device or network device of FIG. 1;

FIG. 3 illustrates an exemplary configuration of logic components implemented in the device of FIG. 2;

FIG. 4 illustrates an exemplary structure of a database stored in one of the devices of FIG. 1;

FIG. 5 is a flow diagram illustrating exemplary processing by the various devices illustrated in FIG. 1; and

fig. 6 is a flow diagram illustrating exemplary processing associated with retrieving stored information associated with a conversation or communication session.

Detailed Description

The following detailed description refers to the accompanying drawings. The same reference numbers in different drawings may identify the same or similar elements. Also, the following detailed description does not limit the invention.

Embodiments described herein relate to monitoring conversations between parties or other communications involving one or more parties. In one exemplary embodiment, while a party is communicating, items related to the context of the communication (context) may be retrieved. For example, when one party is describing an event that it has recently attended, content associated with the event, such as digital pictures, audio files, data files, etc., may be automatically retrieved and provided to another party in real-time during the communication session. In some embodiments, the party describing the event may optionally choose whether to send all or part of the retrieved content to another party. In another exemplary embodiment, contextually relevant content can be provided to a user interacting with an automated system, such as a voicemail system, an Interactive Voice Response (IVR) system, or the like.

Fig. 1 is a block diagram of an exemplary network 100 in which systems and methods described herein may be implemented. Network 100 may include user devices 110, 120, and 130, network device 140, and network 150.

Each of the user devices 110 and 130 may comprise any device or combination of devices capable of transmitting voice signals and/or data to a network, such as network 150. In one embodiment, user equipment 110 and 130 may comprise any type of communication device, such as a Plain Old Telephone System (POTS) phone, a voice over internet protocol (VoIP) phone (e.g., a Session Initiation Protocol (SIP) phone), a wireless or cellular telephone device (e.g., a Personal Communication System (PCS) terminal that may combine a cellular radiotelephone with data processing and data communication capabilities, a Personal Digital Assistant (PDA) that may comprise a radiotelephone, etc.), and so forth. In another embodiment, the user device 110 and 130 may comprise any type of computer device or system, such as a Personal Computer (PC), laptop, PDA, wireless or cellular telephone that may communicate via telephone calls, teleconferencing (e.g., video teleconferencing), and/or text-based messaging (e.g., text messaging, instant messaging, email, etc.). The user devices 110 and 130 may be connected to the network 150 via any conventional technique, such as wired, wireless, or optical connection.

Network device 140 may include one or more computing devices, such as one or more servers, computers, etc., for receiving information from other devices in network 100. For example, as will be described in detail below, network device 140 may identify information from conversations between various parties associated with user device 110 and 130.

The network 150 may include one or more wired and/or wireless networks capable of receiving and transmitting data, voice, and/or video signals, including multimedia signals containing voice, data, and video information. For example, the network 150 may include one or more Public Switched Telephone Networks (PSTN) or other types of switched networks. Network 150 may also include one or more wireless networks and may include a plurality of transmission towers for receiving and forwarding wireless signals toward an intended destination. Network 150 may further include one or more packet-switched networks, such as an Internet Protocol (IP) based network, a Local Area Network (LAN), a Wide Area Network (WAN), a Personal Area Network (PAN), an intranet, the internet, or another type of network capable of transmitting data.

For simplicity, the exemplary configuration illustrated in fig. 1 is provided. It should be understood that a typical network may include more or fewer devices than illustrated in fig. 1. For example, network 100 may include additional elements, such as switches, gateways, routers, etc., that facilitate routing traffic, such as telephone calls, from user devices 110 and 130 to their respective destinations in network 100. Additionally, although the user device 110 and the network device 140 are illustrated in FIG. 1 as separate devices, in other implementations, the functions performed by two or more of the user device 110 and the network device 140 may be performed by a single device or platform. For example, in some implementations, functionality described as being performed by one of the user device 110 and the network device 140 may be performed by one of the user device 110 and the network device 130.

Fig. 2 illustrates an exemplary configuration of the user equipment 110. User devices 120 and 130 and network device 140 may be configured in a similar manner. Referring to fig. 1, user device 110 may include a bus 210, a processor 220, a memory 230, an input device 240, an output device 250, a power supply 260, and a communication interface 270. Bus 210 may include a path that allows communication among the elements of user device 110.

Processor 220 may include one or more processors, microprocessors, or processing logic that may interpret and execute instructions. Memory 230 may include a Random Access Memory (RAM) or another type of dynamic storage device that may store information and instructions for execution by processor 220. Memory 230 may also include a Read Only Memory (ROM) device or another type of static storage device that may store static information and instructions for use by processor 220. The memory 230 may further include a solid state drive (SDD). Memory 230 may also include a magnetic and/or optical recording medium and its corresponding drive.

Input device 240 may include mechanisms that allow a user to input information to user device 110, such as a keyboard, keypad, mouse, pen, microphone, touch screen, voice recognition and/or biometric mechanisms, and so forth. Output device 250 may include mechanisms to output information to the user including a display, a printer, speakers, and the like. Power supply 260 may include a battery or other power source for providing power to user device 110.

Communication interface 270 may include any transceiver-like mechanism that user device 110 may use to communicate with other devices (e.g., user device 120/130 or network device 140) and/or systems. For example, communication interface 270 may include mechanisms for communicating via network 150, which may include a wireless network. In these embodiments, communication interface 270 may include one or more Radio Frequency (RF) transmitters, receivers, and/or transceivers and one or more antennas for transmitting and receiving RF data via network 150. Communication interface 270 may also include a modem or an ethernet interface to a LAN. Alternatively, communication interface 270 may include other mechanisms for communicating via a network, such as network 150.

User device 110 may perform processing associated with conducting a communication session. For example, user device 110 may perform processing associated with: placing and receiving telephone calls; sending and receiving electronic mail (email) messages, text messages, Instant Messages (IM), mobile IM (mim), Short Message Service (SMS) messages; carrying out a teleconference; receiving a voicemail message; interact with IVR systems, etc. As will be described in detail below, user equipment 110 may also perform processing associated with: monitoring the communication; and identifying relevant content associated with the communication/conversation. User device 110 may perform these operations in response to processor 220 executing sequences of instructions contained in a computer-readable medium, such as memory 230. A computer-readable medium may be defined as a physical or logical memory device. The software instructions may be read into memory 230 from another computer-readable medium (e.g., a Hard Disk Drive (HDD), SSD, etc.) or from another device via communication interface 270. Alternatively, hardwired circuitry may be used in place of or in combination with software instructions to implement processes consistent with the embodiments described herein. Thus, embodiments described herein are not limited to any specific combination of hardware circuitry and software.

Fig. 3 is an exemplary functional block diagram of components implemented in user equipment 110 of fig. 2. In an exemplary embodiment, all or a portion of the components illustrated in fig. 3 may be stored in the memory 230. For example, referring to FIG. 3, memory 230 may include conversation monitoring program 300, content storage 370, and content indexing logic 380. In addition, the various logical components illustrated in fig. 3 may be implemented by processor 220 executing one or more programs stored in memory 230.

Conversation monitoring program 300 may include a software program executed by processor 220 that monitors conversations or communications involving a user of user device 110, such as telephone calls, text-based communication sessions, voicemail messages, video teleconferences, and the like. In an exemplary embodiment, conversation monitoring program 300 may include speech recognition logic 310, capture logic 320, rules database 330, content retrieval logic 340, and output control logic 350. Conversation monitoring program 300 and its various logical components are shown in FIG. 3 as being included in user equipment 110. In alternative embodiments, these components or portions of these components may be located externally with respect to user device 110. For example, in some implementations, one or more of the components of conversation monitoring program 300 may be located in network device 140 or executed by network device 140.

The speech recognition logic 310 may include logic to perform speech recognition on speech data provided by one or more parties or automated systems during a conversation. For example, speech recognition logic 310 may convert speech data from a party participating in a telephone conversation or video teleconference, such as the parties at user devices 110 and 120, into text corresponding to the speech data. The capture logic 320 may then extract information from the conversation, as described below.

Capture logic 320 may interact with other logic components of conversation monitoring program 300 to identify certain portions of a conversation between parties. For example, capture logic 320 may interact with rules database 330 to identify words and/or phrases that most likely correspond to relevant information, such as events, topics, locations, people, places, dates/times, and so forth. As one example, the rules database 330 may include rules that instruct the capture logic 320 to extract certain event-related words. For example, words/phrases related to an event, such as "vacation," "meeting," "concert," "basketball game," "party," etc., may be extracted from the conversation.

The capture logic 320 may also capture topic-related information associated with the conversation. For example, rules database 330 may indicate that words/phrases frequently spoken during a conversation session generally refer to the general subject matter of the conversation. For example, a word may correspond to a topic of a conversation if a party in the conversation mentions the word more than a predetermined number of times (e.g., two or more). As an example, assume that the parties at user devices 110 and 120 are conversing, and mention "server upgrade" several times. In this case, the capture logic 320 may extract the word "server upgrade" from the conversation. As another example, assume that a party uses the word "birthday party" multiple times during a conversation. In this case, capture logic 320 may capture the word "birthday party".

Rules database 330 may also include or may be associated with one or more databases including: names of cities, states, countries, such as names of retail establishments (e.g., restaurants, shopping centers, stores, etc.), schools, places of parks, and so forth. In such a case, capture logic 320 may compare the name spoken during the conversation or the text entered with the names in rules database 330 and capture the word or phrase corresponding to the location or place.

Rules database 330 may also include rules identifying the following: various names, such as person names; words indicating relationships such as "mom", "brother", "son", and the like. In such implementations, capture logic 320 may capture these name and/or relationship related terms.

Rule database 330 may also store rules indicating that the numbers immediately following the numbers and words generally correspond to addresses. For example, the phrase "a, two, and three avenues" includes a number (i.e., 123) followed by a word (i.e., avenue). In this case, capture logic 320 may identify the phrase "two and three avenues" as corresponding to the address 123 avenue. The rules database 330 may also store rules indicating that one or more words preceding any of the words "street", "thoroughfare", etc. generally correspond to an address. In this case, capture logic 320 may capture the words and one or more words that precede the words.

The capture logic 320 may also capture or extract other information from the conversation. For example, rules database 330 may include rules that instruct capture logic 320 to capture information such as the phone number, IP address, and other contact related information of the parties in the conversation. In this case, rules database 330 may indicate that seven or more digits spoken consecutively or entered as a text string correspond to a telephone number. In some cases, rules database 330 may indicate 10 or more digits spoken in succession, where digit 1 precedes the 10 digits and the string of digits after digit 011 may represent a foreign telephone number. Rule database 330 may further indicate that phrases ending with "dot com" refer to IP addresses. Similarly, a first input string or letter/number sequence followed by a second input string or letter/number sequence separated by the word "at" or symbol "may be identified as an email address. In such a case, capture logic 320 may capture a phone number, an IP address, and/or an email address.

Rules database 330 may further include rules designed to capture words that are frequently used to elicit information from users. For example, rules database 330 may include rules that indicate that one or more words in a phrase ending with the word "number" are to be captured. In this case, capture logic 320 may capture words/phrases such as "account number," "sequence number," "social security number," "phone number," and the like. The rules database 330 may also include rules indicating that one or more words ending with the words "password", "identifier", "ID" are to be captured by the capture logic 320. In this case, capture logic 320 may capture words/phrases such as "account password," "service identifier," "account identifier," "login ID," and the like. In each case, capture logic 320 may capture the desired information based on various rules and/or databases stored, for example, in rules database 330.

In some implementations, the rules database 330 may be designed to capture words/phrases associated with various queries. For example, as will be described in detail below, rules database 330 may include rules designed to capture address-related queries, contact-related queries, math-related queries, and the like.

The content retrieval logic 340 may include the following logic: information (e.g., words or phrases) identified by capture logic 320 is used to retrieve information from content store 370. For example, as will be described in detail below, the content retrieval logic 340 may use the words/phrases identified by the capture logic 320 and search the content store 370 to identify content that may be relevant to the conversation or communication.

Content storage 370 may include one or more memories, such as an HDD, SSD, RAM, ROM, or another memory, that stores content and/or metadata or tag data associated with content stored on user device 110. For example, in one embodiment, content storage 370 may include text files, audio/music files, image/video files, multimedia files, and the like. In some implementations, all or a portion of the files stored in content storage 370 may include metadata, tag data, filename information, or other information associated with the content and/or information associated with identifying the content. As will be described in detail below, the metadata, tag data, and/or name data may facilitate retrieval of the content at a later time.

Content indexing logic 380 may include the following logic: the content stored in content storage 370 is indexed based on one or more parameters associated with the content. Content indexing logic 380 may continuously or periodically index the content in content storage 370 as the content is stored on user device 110. In one embodiment, the content indexing logic 380 may include software instructions executed by the processor 220 that index information stored in the content memory 370 and populate entries of the database based on content and/or metadata, tag data, or other information associated with content stored on the user device 110.

For example, FIG. 4 illustrates an exemplary database 400 populated by content indexing logic 380. Database 400 may be stored in memory 230 and may include entries 405-1 through 405-N (referred to herein as entries 405). Each entry 405 may be associated with a certain type of content included in content storage 370, such as text files, audio/music files, image/video files, multimedia files, and so forth. Database 400 may include an event/title field 410, a party field 420, a subject field 430, a location field 440, a date/time field 450, and other fields 460.

As described above, metadata may be included with various files in content store 370. For example, the metadata may include title information, location information, date/time information, and the like. As one example, metadata for music files stored in content storage 370 may include the name of the song. The content indexing logic 380 may extract metadata from the files stored in the content store 370 and store the appropriate information and the link or location of the actual content associated with the metadata in the fields 410- _ 460 of the various entries 405 in the database 400. The link or location information facilitates retrieval of the content at a later time.

In other cases, a user associated with user device 110 may add tag information to various files, such as image files, text files, music files, and so forth. As one example, the user may add tag data such as "big canyon vacation" to an image or video file stored in the content storage 370. In this case, the content indexing logic 380 may identify the tag information and store the appropriate tag information in the fields 410- > 460 of the various entries 405 of the database 400 along with a link or information identifying the actual content/file and/or its location in the content store 370.

In still other cases, various files may include title or name information that identifies the file. As one example, a text file may include a name or title of the file (e.g., "server upgrade presentation"). In this case, the content indexing logic 380 may identify the title information and store the appropriate information in the fields 410- > 460 of the various entries 405 of the database 400.

Referring back to fig. 4, the event/title field 410 may store information identifying the event or title associated with the file. As one example, the picture or image file may include a label or name of "big canyon vacation". As illustrated in fig. 4, content indexing logic 380 may store the tag/name "big canyon vacation" in field 410 of entry 405-1. As another example, the file may include the title "server upgrade presentation". As illustrated in fig. 4, content indexing logic 380 may store the title "server upgrade presentation" or "server upgrade" in event/title field 410 of entry 405-2 along with a link or information identifying the actual content/file and/or its location in content storage 370.

Principal field 420 can store information identifying one or more principals associated with the file. For example, a data file transferred from the Hissex Smith (Heath Smith) to the user of user device 110 and stored in content storage 370 may include the name "Hissex Smith" in the metadata field of the file. As illustrated in FIG. 4, content indexing logic 380 may store this information in principal field 420 of entry 405-3. As another example, the above-described large canyon vacation picture may include tag information added by the user after taking the picture, identifying a person in a particular picture. Content indexing logic 380 may access the tag information and store the person name (e.g., bill, mary, robert) in party field 420 of entry 405-1, which also stores the title "big canyon vacation" in field 410.

The subject field 430 may store information associated with the subject of various content in the content store 370. For example, a text file may include several occurrences of the same word. In this case, content indexing logic 380 may scan the file and store the words/phrases that appear in the file at least a predetermined number of times (e.g., two or more times) in subject field 430 of entry 405 of database 400. As one example, a data file stored in content storage 370 associated with a new project in which a user of user device 110 is working may include the phrase "group phone" several times. In this case, as illustrated in FIG. 4, content indexing logic 380 may identify the word "Cluster Phone" and store it in subject field 430 of entry 405-4.

The location field 440 may store information associated with the location of a particular file. For example, a geo-tag included with a picture may identify the location where the picture was taken. The content indexing logic 380 may access the geotag and store the location information from the geotag in the location field 440 of the entry 405 in the database 400. As another example, a file including a canyon vacation picture may include the locations of "canyons" and "sky walks". As illustrated in FIG. 4, content indexing logic 380 may store these locations in location field 440 of entry 405-1.

Date/time field 450 may include information associated with: the date and time the file was received, the date and/or time associated with the time the image file (e.g., picture) was taken, the date and/or time associated with the time the file was updated, etc. For example, a particular text file, such as a file associated with a server upgrade, may include the time at which the file was last updated. As illustrated in FIG. 4, content indexing logic 380 may store the date and time information in date/time field 450 of entry 405-2.

Other fields 460 may include other information associated with files stored in content storage 370. For example, the other fields 460 may include links or information identifying the location of the actual content corresponding to each entry 405 in the database 400. As one example, other field 460 of entry 405-1 may include a location of a large canyon vacation picture stored in content memory 370 of user device 110. That is, in this example, the grand canyon vacation picture may be stored under a folder named "pictures" in the C-drive of user device 110 and in a subfolder named "vacation". Other fields 460 may also include any other information that may assist in retrieving potentially relevant content from content store 370. For example, other fields 460 may include any context-related information associated with files or other information stored in content storage 370 that may facilitate later retrieval of content.

As described above, capture logic 320 may extract information from a communication session between parties at user devices 110 and 120. The content retrieval logic 340 may then compare the captured information to information stored in the database 400 to identify relevant content.

Referring back to fig. 3, the output control logic 350 may include the following logic: the information retrieved from the content store 370 by the content retrieval logic 340 is output to a user of the user device 110 via an output device 250, such as a display (e.g., an LCD display). Output control logic 350 may also facilitate the transfer of retrieved information to other devices, such as user device 120. For example, output control logic 350 may allow a user to confirm whether various content displayed to the user should be delivered to other parties via, for example, network 150.

As described above, conversation monitoring program 300 may retrieve information from content store 370 during a conversation or other communication. As will be described in detail below, the conversation monitoring program 300 may also provide the retrieved information to the user to allow the user to communicate the retrieved information to other parties in real-time or near real-time.

FIG. 5 is a flow diagram illustrating exemplary processing associated with: identify portions of a conversation between parties in network 100, and retrieve relevant content based on the identified portions. The process may begin with: the user of user device 110 initiates a communication with another party, such as placing a telephone call or sending a text-based message, or receiving a communication from another party. For example, assume that the principal at user device 110 places a telephone call to the principal at user device 120 and establishes a voice-based communication session with the principal at user device 120. Assume further that a subsequent session occurs (act 510).

When the parties at user devices 110 and 120 are talking to each other, conversation monitoring program 300 may identify portions of the conversation (act 520). For example, speech recognition logic 310 may convert speech from the parties at user devices 110 and 120 into corresponding text. Capture logic 320 may then identify portions of text using the rules stored in rules database 330 (act 520).

For example, as discussed above with respect to FIG. 3, capture logic 320 may identify words and/or phrases that most likely correspond to the event in which the party is discussing. As an example, assume that the party arbor at user device 110 says the following for the party beer at user device 120: "we really enjoy visiting the big canyon. We step on the sky footpath and see the whole large canyon, which is an exclamatory point.

In this case, the speech recognition logic 310 may convert the speech of the joe and beer inputs to text and transmit the text to the capture logic 320. The capture logic 320 may identify the word "canyon" as corresponding to a keyword/word, such as a topic, event, or location associated with the conversation. For example, since the word "big canyon" is mentioned more than once, the capture logic 320 may identify the word as corresponding to the topic of the conversation. The capture logic 320 may also access a database stored in the rules database 330 that includes locations and identify the word "big canyon" as corresponding to a location. Other words in the dialog may also be identified as being related to or corresponding to information that satisfies or complies with one or more rules stored in rules database 330. For example, the word "sky-walk" may be identified as corresponding to a location.

Content retrieval logic 340 may access database 400 and/or content store 370 to identify content that may be relevant to the conversation (act 530). For example, continuing the example above, content retrieval logic 340 may access database 400 and search for information that matches the terms "big canyon" or "sky-walk". In this case, assume that user joe of user device 110 stores his picture of a vacation to a canyon, and the picture is labeled with the name "canyon vacation". Further assume that the name "big canyon vacation" is indexed by the content indexing logic 380 and stored in field 410 of entry 405-1 in database 400, as illustrated in FIG. 4. Assume further that the location of the actual picture within user device 110 is stored in other field 460 of entry 405-1, as illustrated in fig. 4.

Content retrieval logic 340 may then retrieve the large canyon vacation pictures stored in content memory 370 (act 530). That is, the content retrieval logic 340 may utilize the location information in the other field 460 (or the link included in the field 460) to retrieve the actual vacation picture/file. Output control logic 350 may then output the retrieved picture to a user of user device 110 via output device 250, such as a display (act 530). The user of user device 110 (i.e., joe in this example) may then determine that he wants to transmit the pictures to the party at user device 120 (act 540).

For example, in one embodiment, the user of user device 110 may select a "select all" or "transfer all" button that indicates that all pictures that have been retrieved are to be transferred to user device 120. In other cases, the user may select a single picture to transmit. In either case, output control logic 350 may transmit the selected picture/photograph to user device 120 via network 150 (act 550). In still other cases, output control logic 350 may automatically transmit the retrieved pictures to user device 120 without the user of user device 110 having to select any pictures and/or without the user of user device 110 having to manually generate a message (e.g., a text message, an IM, or an email message) to a party at user device 120. In each case, the retrieved content may be delivered to the other party in the conversation while the conversation is occurring. This allows both parties to have a more complete understanding of what is being discussed in the conversation.

As described above, conversation monitoring program 300 may identify portions of an audio-based conversation, such as a telephone call, retrieve contextually-relevant content, and provide the content to other parties "on-the-fly" or in real-time during the conversation. In other cases, conversation monitoring program 300 can identify portions of a text-based conversation, such as an IM-based communication session, and provide contextually-relevant content to parties in a similar manner as the conversation is occurring.

The conversation monitoring program 300 can also retrieve and/or communicate other types of context-related information. For example, assume that users of user device 110 and user device 120 are discussing new songs by a particular artist. The capture logic 320 may identify the title or artist of the song in question at the parties at the user devices 110 and 120, and the content retrieval logic 340 may search the database 400 for information that matches the names of the captured song title and/or artist. Assume that the content retrieval logic 340 identifies information stored in the database 400 that matches song titles and/or artist names and retrieves an audio file from the content storage 370. The output control logic 350 may then automatically play an audio file or display a link to a song to the user of the user device 120 so that the user can select the song and play the song. In other cases, the output control logic 350 may transmit the audio files retrieved for the songs to the user device 120.

In some implementations, when the song in question is not stored in the content memory 370, the output control logic 350 can automatically output a link to an online music store from which the user can purchase and download the song in question. That is, the output control logic 350 may perform an online search for a music store that sells music for download and may automatically display a link to the online music store on the output device 250 of the user device 110. In this manner, the conversation monitor 300 may automatically perform the desired function (i.e., the search function in this example) when the information of interest is not readily available on the user device 110.

As described above, conversation monitoring program 300 may access local information stored on user device 110 to enhance conversations. In some embodiments, the conversation monitoring program 300 may also access or utilize information accessible within a network to perform various functions. For example, assume that parties at user devices 110 and 120 are discussing a particular television program. The speech recognition logic 310 may perform speech recognition for the conversation and the capture logic 320 may use the rules stored in the rules database 330 to identify the name of the television program in question. The content retrieval logic 340 may receive the name of the television program in question from the capture logic 320. Content retrieval logic 340 may then access the locally stored television guide or remotely access the television guide via network 150 to identify the time at which the program was broadcast. Output control logic 350 may output a message to the user of user equipment 110 indicating the time at which the program was played and may also make an inquiry as to whether the user would like to record the television program on, for example, a digital video recorder. If the user responds with a positive indication, the output control logic 350 may signal the digital video recorder or a set-top box associated with the digital video recorder via the network 150 to record the program.

As yet another example, a user at user device 110 may be engaged in a telephone conversation with a colleague at user device 120. The user at user device 110 may mention the name of another colleague. In this case, speech recognition logic 310 may perform speech recognition and capture logic 320 may capture the name of the colleague. Content retrieval logic 340 may access a contact list/address book stored on user device 110 to attempt to retrieve the phone number and/or other contact information associated with the colleague mentioned during the conversation. Alternatively, or if the contact information for the colleague is not found locally on user device 110, content retrieval logic 340 may access a corporate database listing employee names and contact information (e.g., phone numbers, addresses, etc.) via network 150 in an attempt to retrieve phone numbers or other contact information associated with the colleague mentioned during the conversation. In each case, if the content retrieval logic 340 identifies appropriate contact information, the output control logic 350 may output a message to the user of the user device 110 via the output device 250 displaying the contact information for the colleague. In a similar manner, if a phone number, email address, or other identifier is mentioned during the conversation, the content retrieval logic 340 may attempt to identify additional contact information (e.g., name, work place, etc.) and provide that information to the user of the user device 110 while the conversation is occurring.

In some embodiments, when the telephone number or the name of the party is mentioned, output control logic 350 may also query whether the user at user device 110 wants to establish a communication link. For example, as described above, if a name of a colleague is mentioned during the conversation, and the content retrieval logic 340 identifies a telephone number associated with the name, the output control logic 350 may ask the user at the user device 110 whether the user wants to establish a telephone link with the identified colleague. If the user responds with a positive indication, the output control logic 350 may automatically initiate the establishment of a telephone call to the other party (e.g., dialing the telephone number). In a similar manner, if a telephone number is identified during the conversation, the output control logic 350 may ask the user at the user device 110 whether the user wants to establish a telephone link to the identified telephone number. If so, the output control logic 350 may initiate a call (e.g., dial the telephone number). In each case, the conversation monitoring program 300 may access locally stored information or access information available via a network, such as the network 150, to provide information to the user and/or initiate additional communications.

In still other implementations, names and/or phone numbers captured by capture logic 320 during a conversation may be added to a contact list/address book application stored on user device 110. In such an embodiment, the conversation monitor 300 may access a locally stored contact list/address book and determine whether the name and/or telephone number is listed in the contact list/address book. If the name and/or telephone number is not listed, output control logic 350 may store the name and/or telephone number in a contact list/address book and may also prompt the user to provide additional information for entry into the contact list/address book. Alternatively, addresses or phone numbers captured during a communication session may be mapped to or cross-referenced with individuals or businesses. In this case, output control logic 350 may add the address and/or telephone number information to an existing contact in the contact list/address book or create a new entry in the contact list/address book. In still other cases, the captured address may correspond to a meeting location. In this case, the output control logic 350 may transmit the captured address to a calendar/meeting application stored on the user device 110. The calendar/meeting application may then prompt the user to determine whether to store the address in a particular entry associated with a future meeting.

As described above, capture logic 320 may capture various types of information during a communication session and provide additional information that may be useful during the communication session. In still other cases, capture logic 320 may capture various types of queries from one or more users during a communication session. For example, assume that the user at user device 120 is discussing simple mathematical calculations with the user of user device 110. As some simple examples, assume that what is "15% of $ 175? "," 13 times 17? ", or provide some other mathematical equation. The capture logic 320 may use the rules stored in the rules database 330 and determine that a math-based question or equation is verbally expressed. The content retrieval logic 340 may access a calculator program stored on the user device 110 and perform the identified calculations. The output control logic 350 may then output the calculated answer on the output device 250. In this manner, the user at user device 110 may quickly provide answers to the user at user device 120 without having to manually perform calculations.

As another example, assume that the user at user device 120 asks the user at user device 110 to: "what is the mary's phone number? ". In this case, capture logic 320 may use the rules stored in rules database 330 and determine that the query was verbally formulated for the telephone number requesting mary. Content retrieval logic 340 may automatically access a contact list/address book stored on user device 110 and identify mary's phone number. Alternatively, content retrieval logic 340 may access a database (e.g., a corporate database) via network 150 to identify mary's phone number. In either case, if mary's phone number is identified, output control logic 350 may output mary's phone number for display on output device 250. In this manner, the user at user device 110 may quickly provide answers to the user at user device 120 without having to manually look up mary's contact information.

As described above, the conversation monitoring program 300 can interact with one or more parties during a communication session and share contextually-relevant content with the party with whom the conversation is being conducted. In another exemplary embodiment, it is assumed that user devices 110 and 120 each include a video teleconferencing apparatus. During a typical video teleconference involving two sites, one or more cameras at each site may provide live video feed to the other site so that parties at each site can view each other what is happening at the other site. Typically, the cameras at each site are focused on the whiteboard and/or the teleconference participants so that the parties at each site can communicate via text and audio, as well as see each other and display various items over the video teleconference link.

Assume that a principal at user device 110 participates in a video teleconference with one or more principals at user device 120. Assume further that the parties at user devices 110 and 120 mention a particular document or presentation associated with an item in which the parties are working in concert, and that the document/presentation is mentioned multiple times. Capture logic 320 at user device 110 may capture the name of the item or document referenced by the party, and content retrieval logic 340 may search database 400 and/or content store 370 for files matching the captured name. Assume that the captured name is "cluster phone" and, as illustrated in FIG. 4, entry 405-4 in database 400 includes the word "cluster phone" in subject field 430. In this case, the content retrieval logic 340 may retrieve the appropriate file from the content store 370. Output control logic 350 may then display the file on output device 250 of user device 110. Output control logic 350 may also transmit the file to user device 120 via a video teleconferencing link. In one case, the party at user device 120 may also view files displayed on output device 250 of user device 110 via the teleconferencing link.

For example, a document displayed at user device 110 may be displayed to a principal in a teleconference at user device 120 via a split-screen type window, where one portion of the split-screen displays a file that the principal is discussing, and another portion of the split-screen displays the principal at user device 110 with which the teleconference is occurring. In this manner, information may be displayed or transmitted via the video teleconference window to enhance or augment a conventional video teleconference. Other types of information, such as contact information, follow-up items, etc., may also be communicated between the parties in this manner.

In still other embodiments, the conversation monitoring program 300 may interact with automated systems such as voice mail systems, IVR systems, and the like. For example, assume that user device 110 includes a voicemail system and that a party associated with user device 110 accesses the voicemail system and plays a voicemail, such as a video voicemail, left by a party associated with user device 120. While the voicemail is playing, the capture logic 320 may identify keywords/words that may be related to the voicemail.

For example, capture logic 320 may identify the subject of a voicemail, such as a voicemail in which a caller leaves a message requesting information that is not available to the caller. As one example, a caller at the user device 120 may leave a message, such as "hello, joe, i are bill. I are now engaged in server upgrades. I want to know if you have a slide presentation of a server upgrade? ". In this example, capture logic 320 may recognize the phrase "server upgrade" as corresponding to the subject matter of the voicemail because the phrase was spoken more than once. The content retrieval logic 340 may access the database 400 and search for the phrase. Assume that the phrase "server upgrade" is stored in the event/title field 410 of entry 405-2, as illustrated in FIG. 4. Content retrieval logic 340 may then retrieve the identified content from content storage 370. That is, the content retrieval logic 340 may use the location information or link information stored in the other fields 460 that identifies the location of the actual files stored in the user device 110 and retrieve the slide presentation files for server upgrade.

Output control logic 350 may then allow the user of user device 110 to transfer the retrieved file to user device 120. For example, the output control logic 350 may display a message such as "do you want to send this file to bill? "is received. The user at the user device 110 may choose yes and the output control logic 350 may automatically transfer the desired file to the user device 120. In this manner, the user of the user device 110 may transfer the retrieved file to the user device 120 without requiring the user at the user device 110 to call back to the party at the user device 120 (i.e., bill in this example).

In some implementations, the content retrieval logic 340 may have reviewed the voicemail and identified or tagged certain keywords that may need action before the voicemail is played by the user of the user device 110. For example, in this case, when the user plays back a voicemail (e.g., a video voicemail), the word "server upgrade" may be flagged and visually highlighted for the user (e.g., via an output device 250 such as an LCD display). For example, the word "server upgrade" may be displayed on a display device along with a hyperlink that, when selected, automatically retrieves the relevant file or allows the user to select a document to send to a caller who left a voicemail message.

Additionally, in some implementations, user device 110 may send an alert message to a user of user device 110 when receiving the voicemail. In such implementations, the keywords/terms identified by the capture logic 320 and the alert may be provided to the user. This may allow a user to quickly identify information being requested by another party.

Still further, in some implementations, the user may be highlighted words or phrases in the voicemail identified by the content retrieval logic 340 for other types of actions. For example, the telephone number identified by the content retrieval logic 340 may be highlighted and state "do you want to save this telephone number in your address book? "is received. The user may choose yes and the output control logic 350 may automatically save the phone number in an address book on the user device 110.

As yet another example, in some implementations, words or phrases in the voicemail that are recognized by the content retrieval logic 340 may be used to trigger searches, provide links, and/or perform other functions based on the particular recognized words and/or phrases. For example, assume that a caller at user device 120 leaves a voicemail message for his friend at user device 110 asking them the address of a particular restaurant that they should meet dinner, such as "what is the address of the pizza in torny? ". In this case, content retrieval logic 340 may identify the word "address" and location "pizza in Tony". If content store 370 does not include information matching one or more of these terms, content retrieval logic 340 may automatically perform an Internet search of the address of the pizza in Tony or a local search of user device 110. The address information may then be highlighted for the party at user device 110 prior to playing the voicemail message. In this manner, the conversation monitoring program 300 can perform various functions based on the context of the communication received by the user device 110.

As discussed briefly above, the conversation monitoring program 300 may also interact with other automated systems, such as IVR systems. For example, in one embodiment, assume that a user of the user device 110 is interacting with an IVR system that requests various types of information from the user (e.g., a serial number for a product, a customer service identifier, a customer account number, etc.). As an example, assume that a user at user device 110 calls an IVR system associated with a service support department of a computer vendor. Assume further that the IVR system provides the following automated messages to the user of user device 110: "please provide the customer service account number of your computer". In this case, capture logic 320 may capture the word "customer service account". The content retrieval logic 340 may use the terms to search the content database 400 and/or the content store 370. Assume that the content retrieval logic 340 finds a match in entry 405 in the database 400 and retrieves a service agreement document that includes the requested customer service account. Output control logic 350 may display the customer service account on output device 250 of user device 110. In this manner, the user of user device 110 may quickly provide an account number to the IVR system.

In another embodiment, the user device 110 may display the customer service account and send the account to the IVR system, as opposed to the user providing verbally or entering the account via a keyboard. This may help eliminate problems associated with the voice recognition system at the IVR system (e.g., misidentifying verbally provided information). In a similar manner, other information may be provided to the IVR system via the data link to avoid voice recognition errors. For example, the output device 250 of the user device 110 may be a display device such as a touch screen device. When interacting with the IVR system, the output control logic 350 may display context items, such as a "yes" box, a "no" box, or other items associated with the context of the interaction with the IVR system. Rather than the user providing the response verbally or entering the response by pressing a combination of letters and numbers on a keyboard, the user of user device 110 may then simply press the appropriate display item to respond to the IVR system, and output control logic 350 may communicate the corresponding response to the IVR system.

In each case, the conversation capture program 300 selectively identifies information from the conversation and automatically retrieves or provides information that may be relevant to the current conversation/communication. This may provide the user with the following capabilities: contextually relevant information is easily retrieved when interacting with other people and/or automated systems.

The conversation capture program 300 may also facilitate retrieval of information such as a name, a phone number, a portion of a conversation, or other information that may be exchanged during a conversation. For example, as will be described in detail below, after a communication session occurs, a user may access information associated with the various communication sessions to retrieve information of interest.

FIG. 6 illustrates exemplary processing associated with retrieving information captured during an earlier conversation. Processing may begin with a principal at user device 110 accessing conversation monitoring program 300. Conversation monitor 300 may include a search option that allows a user to search through information associated with earlier conversations. For example, as described above, voice recognition logic 310 may convert audio input provided by parties participating in a conversation, teleconference, or the like into text. In some implementations, the speech recognition logic 310 may store an entire copy of the communication session/conversation in the content storage 370. Alternatively, another memory may be used to store a copy of the communication session.

Assume that a user at user device 110 accesses a search option and enters a search input. For example, assume that the user of user device 110 knows that he/she is talking to multiple people during a conference call on day 1, 7 of 2009. Assume further that the user wants to retrieve information associated with the content discussed during the teleconference by one of the parties susan in the teleconference. In this example, the user may enter the name Susan as a search input. Alternatively, the user may enter the date of the teleconference (i.e., 1, 7, 2009 in this example) or other information as the search input.

Content retrieval logic 340 may receive a search input (act 610). Content retrieval logic 340 may then search a conversation memory associated with conversation monitoring program 300 to identify one or more entries corresponding to the search input (act 620). Content retrieval logic 340 may then identify one or more entries corresponding to the entered search information (act 620). Output control logic 350 may then display information associated with the one or more dialogs identified based on the search input (act 620).

For example, if more than one dialog matches the search term, output control logic 350 may display a snippet or abstract portion of each identified dialog that corresponds with the search input (act 620). The snippet may include the name of the party associated with the conversation, the date/time of the conversation, and the like.

Assume that the user selects a snippet or information associated with the teleconference of interest (act 630). Output control logic 350 may then provide a complete copy of the selected conversation (i.e., the teleconference in this example) (act 640). Alternatively, the output control logic 350 may output a modified version of the selected dialog. That is, the output control logic 350 may display the portion of the dialog relating to the input search term. As one example, the output control logic 350 may display the portion of the transcript associated with the search term of interest (i.e., Susan in this example). In each case, the output control logic 350 may provide information that allows a user of the user device 110 to quickly identify information of interest. In this manner, conversation monitor 300 can facilitate later retrieval of information from an earlier conversation. This may allow the user to retrieve information of interest, such as the portion of the conversation that includes the telephone number, email address, follow-up action, and so forth.

In some cases, the user of user device 110 may communicate information associated with the conversation to the other parties (act 640). For example, a user of user device 110 may transmit all or part of a copy of a teleconference to a principal at user device 120.

As described above, the user device 110 may store and execute the conversation monitoring program 300 to identify and retrieve context-related information in real-time or near real-time as a conversation is occurring. This information may then be provided to the other party with whom the conversation is being conducted while the conversation is still occurring. This may allow a more full, interactive conversation to occur.

In some implementations, not the user device (e.g., user device 110) but network device 140 may store and execute conversation monitoring program 300. In such an implementation, the database 400 may be stored on the network device 140, and the network device 140 may search the database 400 to identify relevant content. The network device 140 may then access the content store 370 on the user device 110 or signal the user device 110 to retrieve the relevant content and, optionally, to communicate the relevant content to another party participating in the conversation. Additionally, in some implementations, network device 140 may store conversation-related information for parties participating in a conversation.

For example, as described above, voice recognition logic 310 may generate a copy of the earlier conversation and store it in the conversation memory. In such an embodiment, network device 140 may act as a server that stores copies of the conversation for a large number of parties. In this case, the user at user device 110 may log into server 140 using a username/password (or other suitable access control mechanism) to search for and/or retrieve his/her dialog-related information.

Additionally, in such embodiments, network device 140 may include a gateway or proxy device located between the parties participating in the conversation. In this case, the dialog data (e.g., audio or text-based) may be captured and analyzed as it is passed between the parties. Alternatively, one or more user devices 110 and 130 may transmit the conversation data to the network device 140 for capture/analysis in real-time or at a time subsequent to the conversation.

Embodiments described herein provide for identifying portions of a conversation and retrieving information based on content discussed during the conversation. For example, items related to the context of the conversation and items that may be related to the conversation may be retrieved. The retrieved items or information may then be displayed to the user and may also be transmitted to other parties participating in the conversation. This may also allow the parties involved in the conversation to interact more quickly and more fully during the communication session. Additionally, in some cases, the retrieved information may be used to initiate further communications.

The foregoing description of exemplary embodiments provides illustration and description, but is not intended to be exhaustive or to limit the embodiments to the precise form disclosed. Modifications and variations are possible in light of the above teachings or may be acquired from practice of the embodiments.

For example, features are described above with respect to identifying various types of information from a conversation and retrieving related information. In other implementations, other types of information may be identified or captured during the conversation and retrieved or transmitted in other manners.

Additionally, features are described above as relating to the content indexing logic 308 indexing various content stored in the content memory 370. In other implementations, the content retrieval logic 340 may search the context-related information directly in the files of the content storage 370. In such a case, content indexing logic 380 may not be needed.

Further, in some implementations, conversation monitoring program 300 may alert parties participating in a conversation that portions of the conversation are being captured and/or stored for later retrieval. For example, audio or text alerts may be provided to the parties to the conversation before the conversation monitoring program 300 identifies and stores portions of the conversation.

Additionally, while series of acts have been described with regard to fig. 5 and 6, the order of the acts may be varied in other implementations. Further, non-dependent acts may be performed in parallel.

It will be apparent that various of the features described above may be implemented in many different forms of software, firmware, and hardware in the implementations illustrated in the figures. The actual software code or specialized control hardware used to implement the various features is not limiting. Thus, the operation and behavior of the features were described without reference to the specific software code — it being understood that one of ordinary skill in the art would be able to design software and control hardware to implement the various features based on the description herein.

Furthermore, certain portions of the invention may be implemented as "logic" that performs one or more functions. The logic may include: hardware, such as one or more processors, microprocessors, application specific integrated circuits, field programmable gate arrays, or other processing logic; software; or a combination of hardware and software.

In the foregoing specification, various preferred embodiments have been described with reference to the accompanying drawings. It will, however, be evident that various modifications and changes may be made thereto, and additional embodiments may be implemented, without departing from the broader scope of the invention as set forth in the claims that follow. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense.

No element, act, or instruction used in the description of the present application should be construed as critical or essential to the invention unless explicitly described as such. Also, as used herein, an item not limited by a quantity is intended to include one or more items. Further, the phrase "based on" is intended to mean "based, at least in part, on" unless explicitly stated otherwise.

Claims

1. An apparatus, comprising:

a communication interface configured to send and receive communications associated with a communication session;

a memory configured to store a file;

a display; and

logic, the logic configured to:

identifying at least one word or phrase from the communication session,

retrieving from the memory at least one file associated with the at least one word or phrase, an

Outputting the at least one file to the display.

2. The device of claim 1, wherein the logic is further configured to:

transmitting, during the communication session and via the communication interface, the at least one file to a second device associated with a party participating in the communication session.

3. The device of claim 1, wherein the communication session comprises a telephone conversation and the logic is further configured to:

automatically transmitting the at least one file to at least one other device participating in the telephone conversation during the telephone conversation.

4. The device of claim 1, wherein when identifying at least one word or phrase from the communication session, the logic is configured to: identifying at least one of a topic, an event, or a request for information associated with a topic of the communication session.

5. The device of claim 1, wherein the logic is further configured to:

generating an index associated with information included in the file,

searching the index for the at least one word or phrase, an

When retrieving the at least one file, the logic is configured to:

retrieving the at least one file based on results of the search.

6. The device of claim 1, wherein the communication session comprises an audio conversation between a user of the device and the first party, and wherein the logic is further configured to:

performing voice recognition associated with voice input provided by the user of the device and the first party.

7. The device of claim 1, wherein the communication session comprises a video teleconference involving the device and a second device, the device and the second device coupled via a video teleconference link, wherein the logic is further configured to:

performing at least one of the following actions: displaying or transmitting the at least one file to the second device via the video teleconferencing link.

8. The device of claim 1, wherein the communication session comprises a voicemail message left by the first party for a user of the device, wherein the logic is further configured to:

transmitting the retrieved at least one file to a second device associated with the first party via the communication interface.

9. The device of claim 1, wherein the communication session comprises communication with an interactive voice response system, wherein the logic is further configured to:

identifying a request for information from the interactive voice response system, and when retrieving the at least one file, the logic is configured to:

automatically retrieving information from the memory in response to the request, an

Outputting the retrieved information to the display.

10. A computer-readable medium having stored thereon sequences of instructions, which when executed by at least one processor, cause the at least one processor to:

monitoring a communication session comprising at least a first party;

identifying at least one word or phrase from the communication session;

retrieving information related to the at least one word or phrase from a first device associated with the first party; and

transmitting the retrieved information to a second device via a network.

11. The computer-readable medium of claim 10, further comprising instructions for causing the at least one processor to:

displaying the retrieved information to the first party prior to transmitting the retrieved information to the second device; and

receiving an indication from the first party to transfer the retrieved information.

12. The computer-readable medium of claim 10, wherein when retrieving information, the instructions cause the at least one processor to:

retrieving at least one of a text file, an audio file, an image file, or a video file, and when transmitting the retrieved information, the instructions cause the at least one processor to:

transmitting the retrieved file to the second device during the communication session.

13. The computer-readable medium of claim 10, further comprising instructions for causing the at least one processor to:

identifying a second word or phrase from the communication session;

retrieving information related to the second word or phrase; and

transmitting the retrieved information related to the second word or phrase to the second device during the communication session.

14. The computer-readable medium of claim 10, wherein the communication session comprises a telephone conversation, the instructions further causing the at least one processor to:

speech recognition is performed on audio information provided during the telephone conversation.

15. The computer-readable medium of claim 10, wherein the communication session comprises a text-based communication session or a multimedia-based communication session.

16. The computer-readable medium of claim 10, wherein the communication session comprises a voicemail message left by a second party associated with the second device, the computer-readable medium further comprising instructions for causing the at least one processor to:

playing the voice mail message; and

identifying a request associated with the voicemail message, wherein when retrieving information related to the at least one word or phrase, the instructions cause the at least one processor to:

information is retrieved based on the request.

17. The computer-readable medium of claim 10, further comprising instructions for causing the at least one processor to:

indexing information stored in a memory of the first device; and

searching the indexed information for the at least one word or phrase, and when retrieving information related to the at least one word or phrase, the instructions cause the at least one processor to:

information is retrieved based on results of the search.

18. A method, comprising:

receiving, at a first device associated with a first party, a communication from a second device, the communication associated with a communication session comprising at least one of an audio conversation, a text-based conversation, or a multimedia conversation;

identifying at least one word or phrase from the communication session;

retrieving information associated with the at least one word or phrase from a memory included in the first device; and

outputting the retrieved information to a display associated with the first device.

19. The method of claim 18, further comprising at least one of:

transmitting the retrieved information to the second device via a network during the communication session, or

Providing the retrieved information to a party associated with the second device during the communication session.

20. The method of claim 18, wherein the communication session comprises a voicemail message or a communication with an automated system.