US20140278403A1 - Systems and methods for interactive synthetic character dialogue - Google Patents
Systems and methods for interactive synthetic character dialogue Download PDFInfo
- Publication number
- US20140278403A1 US20140278403A1 US13/829,925 US201313829925A US2014278403A1 US 20140278403 A1 US20140278403 A1 US 20140278403A1 US 201313829925 A US201313829925 A US 201313829925A US 2014278403 A1 US2014278403 A1 US 2014278403A1
- Authority
- US
- United States
- Prior art keywords
- user
- character
- speech
- synthetic
- computer system
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/06—Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids
- G10L21/10—Transforming into visible information
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/011—Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/048—Interaction techniques based on graphical user interfaces [GUI]
- G06F3/0481—Interaction techniques based on graphical user interfaces [GUI] based on specific properties of the displayed interaction object or a metaphor-based environment, e.g. interaction with desktop elements like windows or icons, or assisted by a cursor's changing behaviour or appearance
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T13/00—Animation
- G06T13/20—3D [Three Dimensional] animation
- G06T13/40—3D [Three Dimensional] animation of characters, e.g. humans, animals or virtual beings
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/004—Artificial life, i.e. computing arrangements simulating life
- G06N3/008—Artificial life, i.e. computing arrangements simulating life based on physical entities controlled by simulated intelligence so as to replicate intelligent life forms, e.g. based on robots replicating pets or humans in their appearance or behaviour
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N7/00—Computing arrangements based on specific mathematical models
- G06N7/01—Probabilistic graphical models, e.g. probabilistic networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/225—Feedback of the input speech
Definitions
- Various of the disclosed embodiments concern systems and methods for conversation-based human-computer interactions.
- HCl Human computer interaction
- Al Artificial intelligence
- Al is another developing discipline which includes adaptive behaviors allowing computer systems to respond organically to a user's input. While Al may be used to augment HCl, possibly by providing a synthetic character for interacting with the user, the interaction may seem stale and artificial to the user if the Al is unconvincing. This is particularly true where the Al fails to account for contextual factors regarding the interaction and where the Al fails to maintain a “life-like” persona when interacting with the user. Conversation, though an excellent method for human-human interaction, may be especially problematic for an Al system because of conversation's contextual and inherently ambiguous character. Even children, who may more readily embrace inanimate characters as animate entities, can recognize when a conversational Al has become disassociated from the HCl context. Teaching and engaging children through Has would be highly desirable, but must overcome the obstacle of lifeless and contextually unaware Al behaviors.
- Certain embodiments contemplate a method for engaging a user in conversation with a synthetic character, the method comprising: receiving an audio input from a user, the audio input comprising speech; acquiring a textual description of the speech; determining a responsive audio output based upon the textual description; and causing a synthetic character to speak using the determined responsive audio output.
- the method further comprises receiving a plurality of audio inputs comprising speech from a user, the plurality of audio inputs associated with a plurality of spoken outputs from one or more synthetic characters.
- the plurality of audio inputs comprise answers to questions posed by one or more synthetic characters.
- the plurality of audio inputs comprise a narration of text and the plurality of spoken outputs from one or more synthetic characters comprise ad-libbing or commentary to the narration.
- the plurality of audio inputs comprise statements in a dialogue regarding a topic.
- acquiring a textual description of the speech comprises transmitting the audio input to a dedicated speech processing service.
- receiving an audio input comprises determining whether to perform one of “Automatic-Voice-Activity-Detection”, “Hold-to-Talk”, “Tap-to-Talk”, or “Tap-to-Talk-With-Silence-Detection” operations.
- the method further comprises modifying an icon to reflect the determined audio input operation.
- the method further comprises modifying an icon to reflect the determined audio input operation.
- determining a responsive audio output comprises determining user personalization metadata.
- the method further comprises acquiring phoneme animation metadata associated with the responsive audio output for the purpose of animating some of the character's facial features.
- the method further comprises modifying an icon to reflect the determined audio input operation.
- the method further comprises associating prioritization metadata with each potential response for the synthetic character and using these prioritization metadata to cause one possible response to be output before other responses.
- causing a synthetic character to speak using the determined responsive audio output comprises causing the synthetic character to propose taking a picture using a user device.
- the method further comprises: causing a picture to be taken of a user, using a user device; and sending the picture to one or more users of a social network.
- Certain embodiments contemplate a method for visually engaging a user in conversation with a synthetic character comprising: retrieving a plurality of components associated with an interactive scene, the interactive scene selected by a user; configuring at least one of the plurality of components to represent a synthetic character in the scene; and transmitting at least some of the plurality of components to a user device.
- the method further comprises retrieving personalization metadata associated with a user and modifying at least one of the plurality of components based on the personalization metadata.
- retrieving a plurality of components comprises retrieving a plurality of speech waveforms from a database.
- Certain embodiments contemplate a computer system for engaging a user in conversation with a synthetic character, the system comprising: a display; a processor; a communication port; a memory containing instructions, wherein the instructions are configured to cause the processor to: receive an audio input from a user, the audio input comprising speech; acquire a textual description of the speech; determine a responsive audio output based upon the textual description; and cause a synthetic character to speak using the determined responsive audio output.
- receiving an audio input comprises determining whether to perform one of “Automatic-Voice-Activity-Detection”, “Hold-to-Talk”, “Tap-to-Talk”, or “Tap-to-Talk-With-Silence-Detection” operations.
- the instructions are further configured to cause the processor to modify an icon to reflect the determined operation.
- to determine a responsive audio output comprises determining user personalization metadata.
- the instructions are further configured to cause the processor to acquire phoneme metadata associated with the responsive audio output for the purpose of animating some of the character's facial features.
- the instructions are further configured to cause the processor to review a plurality of responses from the user and perform more inter-character dialogue rather than user-character dialogue based on the review. In some embodiments, the instructions are further configured to cause the processor to associate prioritization metadata with each potential response for the synthetic character and use these prioritization metadata to cause one possible response to be output before other responses. In some embodiments, causing a synthetic character to speak using the determined responsive audio output comprises causing the synthetic character to propose taking a picture using a user device.
- Certain embodiments contemplate a computer system for engaging a user in conversation with a synthetic character, the computer system comprising: means for receiving an audio input from a user, the audio input comprising speech; means for determining a description of the speech; means for determining a responsive audio output based upon the description; and means for causing a synthetic character to speak using the determined responsive audio output.
- the audio input receiving means comprises one of a microphone, a packet reception module, a WiFi receiver, a cellular network receiver, an Ethernet connection, a radio receiver, a local area connection, or an interface to a transportable memory storage device.
- the speech description determining means comprises one of a connection to a dedicated speech processing server, a natural language processing program, a speech recognition system, a Hidden Markov Model, or a Bayesian Classifier.
- the responsive audio output determination means comprises one of an Artificial Intelligence engine, a Machine Learning classifier, a decision tree, a state transition diagram, a Markov Model, or a Bayesian Classifier.
- the synthetic character speech means comprises one of a speaker, a connection to a speaker on a mobile device, a WiFi transmitter in communication with a user device, a packet transmission module, a cellular network transmitter in communication with a user device, an Ethernet connection in communication with a user device, a radio transmitter in communication with a user device, or a local area connection in communication with a user device.
- FIG. 1 illustrates a block diagram of various components in a system as may be implemented in certain embodiments.
- FIG. 3 illustrates an example screenshot of a graphical user interface (GUI) of a main scene in a virtual environment as may be implemented in certain embodiments.
- GUI graphical user interface
- FIG. 4 illustrates an example screenshot of a “fireside chat scene” GUI in a virtual environment as may be implemented in certain embodiments.
- FIG. 5 illustrates an example screenshot of a “versus scene” GUI in a virtual environment as may be implemented in certain embodiments.
- FIG. 6 illustrates an example screenshot of a “game show scene” GUI in a virtual environment as may be implemented in certain embodiments.
- FIG. 7 illustrates an example screenshot of a “story telling scene” GUI in a virtual environment as may be implemented in certain embodiments.
- FIG. 8 is a flowchart depicting certain steps in a user interaction process with the virtual environment as may be implemented in certain embodiments.
- FIG. 9 is a flowchart depicting certain steps in a component-based content management and delivery process as may be implemented in certain embodiments.
- FIG. 11 is a flowchart depicting certain steps in a dynamic Al conversation management process as may be implemented in certain embodiments.
- FIG. 12 is a flowchart depicting certain steps in a frustration management process as may be implemented in certain embodiments.
- FIG. 13 is a flowchart depicting certain steps in a speech reception process as may be implemented in certain embodiments.
- FIG. 14 illustrates an example screenshot of a social asset sharing GUI as may be implemented in certain embodiments
- FIG. 15 illustrates an example screenshot of message drafting tool in the social asset sharing GUI of FIG. 14 as may be implemented in certain embodiments.
- FIG. 16 is a flowchart depicting certain steps in a social image capture process as may be implemented in certain embodiments.
- FIG. 17 is a block diagram of components in a computer system which may be used to implement certain of the disclosed embodiments.
- the system includes a plurality of interactive scenes in a virtual environment.
- a user may access each scene and engage in conversation with a synthetic character regarding an activity associated with that active scene.
- a central server may house a plurality of waveforms associated with the synthetic character's speech, and may dynamically deliver the waveforms to a user device in conjunction with the operation of an artificial intelligence.
- speech is generated with text-to-speech utilities when the waveform from the server is unavailable or inefficient to retrieve.
- FIG. 1 illustrates a block diagram of various components in a system as may be implemented in certain embodiments.
- a host server system 101 may perform various of the disclosed features and may be in communications with a user devices 110 a - b via networks 108 a - b .
- networks 108 a - b are the same network and may be any commonly known network, such as the Internet, a Local Area Network (LAN), a local WiFi ad-hoc network, etc.
- the networks include transmissions from cellular towers 107 a - b and the user devices 110 a - b .
- Users 112 a - b may interact with a local application on their respective devices using a user interface 109 a - b .
- the user may be in communication with server 101 via the local application.
- the local application may be a stand-alone software program, or may present information from server 101 with minimal specialized local processing, for example, as an internet browser.
- the server 101 may include a plurality of software, firmware, and/or hardware modules to implement various of the disclosed processes.
- the server may include a plurality of system tools 102 , such as dynamic libraries, to perform various functions.
- a database to store metadata 103 may be included as well as databases for storing speech data 104 and animation data 105 .
- the server 101 may also include a cache 106 to facilitate more efficient response times to asset requests from user devices 110 a - b.
- server 101 may host a service that provides assets to user devices 110 a - b so that the devices may generate synthetic characters for interaction with a user in a virtual environment.
- the operation of the virtual environment may be distributed between the user devices 110 a - b and the server 101 in some embodiments.
- the virtual environment and/or Al logic may be run on the server 101 and the user devices may request only enough information to display the results.
- the virtual environment and/or Al may run predominately on the user devices 110 a - b and communicate with the server only aperiodically to acquire new assets.
- FIG. 2 illustrates a topological relationship between a plurality of interactive scenes in a virtual environment as may be used in certain embodiments.
- the scenes may comprise “rooms” in a house, or different “games” in a game show.
- Each interactive scene may present a unique context and may contain some elements common to the other scenes and some elements which are unique.
- a user may transition from some scenes without restriction, as in the case of transitions 202 c - e .
- transitions may be unidirectional, such as the transition 202 b from scene A 201 a to scene B 201 b and the transition 202 a from scene C 201 c to scene A 201 a .
- the user transitions between scenes by oral commands or orally indicated agreement with synthetic character propositions.
- the user may be required to return to the main scene 201 d following an interaction, so that the conversation Al logic may be reinitialized and configured for a new scene.
- FIG. 3 illustrates an example screenshot of a graphical user interface (GUI) 300 of a main scene in a virtual environment as may be implemented in certain embodiments.
- GUI graphical user interface
- the GUI may appear on an interface 109 a - b , such as on a display screen of a mobile phone, or on a touch screen of a mobile phone or of a tablet device.
- the GUI 300 may include a first 301 a and second 301 b depiction of a synthetic character, a menu bar 302 having a user graphic 304 a , a separate static or real-time user video 304 b , and a speech interface 303 .
- Menu 302 may depict common elements across all the scenes of the virtual environment, to provide visual and functional continuity to the user.
- Speech interface 303 may be used to respond to inquiries from synthetic characters 301 a - b .
- the user may touch the interface 303 to activate a microphone to receive their response.
- the interface 303 may illuminate or otherwise indicate an active state when the user selects some other input device.
- the interface 303 may illuminate automatically when recording is initiated by the system.
- real-time user video 304 b depicts a real-time, or near real-time, image of a user as they use a user device, possibly acquired using a camera in communication with the user device.
- the depiction of the user may be modified by the system, for example, by overlaying facial hair, wigs, hats, earrings, etc. onto the real-time video image.
- the overlay may be generated in response to the activities occurring in the virtual environment and/or by conversation with the synthetic characters.
- the interaction involves role-playing, such as including the user in a pirate adventure
- the user's image may be overlaid with a pirate hat, skull and bones, or similar asset germane to the interaction.
- user graphic 304 a is a static image of the user.
- the system may take an image of the user and archive the image as a “standard” or “default” image to be presented as user graphic 304 a .
- the user may elect to have their image with an overlaid graphic replace the user graphic 304 a .
- the user may replace user graphic 304 a at their own initiative.
- the interaction may include a suggestion or an invitation by one or more of the synthetic characters for the user to activate the taking of their picture by the user device, or for the system to automatically take the user's picture.
- a synthetic character may comment on the user's appearance and offer to capture the user's image using a camera located on the user device. If the user responds in the affirmative, the system may then capture the image and archive the image or use the image to replace user graphic 304 a , either permanently or for some portion of the piracy interaction.
- the same or corresponding graphics may be overlaid upon the synthetic characters' images.
- synthetic characters 301 a - b may perform a variety of animations, both to indicate that they are speaking as well as to interact with other elements of the scene.
- FIG. 4 illustrates an example screenshot of a “fireside chat scene” GUI 400 in a virtual environment as may be implemented in certain embodiments.
- Elements in the background 403 may indicate to the user which scene the user is currently in.
- an image of the user 401 possibly a real-time image acquired using a camera on the user's device, may be used.
- a synthetic character such as synthetic character 301 b , may pose questions to the user throughout an interaction and the user may respond using speech interface 303 .
- a text box 402 may be used to indicate the topic and nature of the conversation (e.g., “school”).
- FIG. 5 illustrates an example screenshot of a “versus scene” GUI 500 in a virtual environment as may be implemented in certain embodiments.
- the system may still pose questions (possibly with the voice of a synthetic character) and receive responses and statements from the user.
- a scrolling header 504 a may be used to indicate contextual information relevant to the conversation.
- the user depicted in element 501
- Text boxes 502 a - b may be used to indicate questions posed by the system and possible answer responses that may be given, or are expected to be given, by the user.
- FIG. 6 illustrates an example screenshot of a “game show scene”GUI in a virtual environment as may be implemented in certain embodiments.
- synthetic character 301 b may conduct a game show wherein the user is a contestant.
- the synthetic character 301 b may pose questions to the user. Expected answers may be presented in text boxes 602 a - c .
- a synthetic character 301 c may be a different synthetic character from character 301 b or may be a separately animated instantiation of the same character.
- Synthetic character 301 c may be used to pose questions to the user.
- a title screen 603 may be used to indicate the nature of the contest.
- the user's image may be displayed in real-time or near real-time in region 601 .
- FIG. 7 illustrates an example screenshot of a “story telling scene” GUI 700 in a virtual environment as may be implemented in certain embodiments.
- the GUI 700 may be divided into a text region 701 and a graphic region 702 .
- the synthetic characters 301 a - b may narrate and/or role-play portions of a story as each region 701 , 702 is updated.
- the characters 301 a - b may engage in dialogue with one another and may periodically converse with the user, possibly as part of a role-playing process wherein the user assumes a role in the story.
- the user reads the text in region 701 , and the characters 301 a - b ad-lib or comment upon portions of the story or upon the user's reading.
- FIG. 8 is a flowchart depicting certain steps in a user interaction process with the virtual environment as may be implemented in certain embodiments.
- the system may present the user with a main scene, such as an scene depicted in FIG. 3 .
- the system may receive a user selection for an interactive scene (such as an oral selection).
- the input may comprise a touch or swipe action relative to a graphical icon, but in other instances the input may be an oral response by the user, such as a response to an inquiry from a synthetic character.
- the system may present the user with the selected interactive scene.
- the system may engage the user in a dialogue sequence based on criteria.
- the criteria may include previous conversations with the user and a database of statistics generated based on social information or past interactions with the user.
- the system may determine whether the user wishes to repeat an activity associated with the selected scene. For example, a synthetic character may inquire as to the user's preferences. If the user elects, perhaps orally or via tactile input, to pursue the same activity, the system may repeat the activity using the same criteria as previously, or at step 806 may modify the criteria to reflect the previous conversation history.
- the system can determine whether the user wishes to quit at step 807 , again possibly via interaction with a synthetic character. If the user does not wish to quit the system an again determine which interactive scene the user wishes to enter at step 802 . Before or after entering the main scene at step 802 the system may also modify criteria based on previous conversations and the user's personal characteristics. In some embodiments, the user transitions between scenes using a map interface.
- Criteria may also be derived from analytics.
- the system logs statistics for all major events that occur during a dialogue session. These statistics may be logged to the server and can be aggregated to provide analytics for how users interact with the service at scale. This can be used to drive updates to the content or changes to the priorities of content. For example, analytics can tell that users prefer one activity over another, allowing more engaging content to be surfaced more quickly for future users. In some embodiments, this re-prioritizing of content can happen automatically based upon data logged from users at scale.
- the writing team can gain insights into topics that require more writing because they occur frequently.
- some content may play out to be funnier than other content.
- the system may want to use the “best” content early on in order to grab the user's interest and attention.
- the Al or the designers, may accordingly tag content with High, Medium, or Low priorities.
- the Al engine may prefer to deliver content that is marked with higher priority than other content in some embodiments.
- components may include: An Image—an image layer with possible alpha transparency; A User Video Feed—displays the output of the device's camera, in some embodiments with face tracking to keep the camera trained on the user; Character Animation—displays an animated virtual character using either 3 D geometry or 2D images; A Text Viewer—displays status text or an overview of the last question from the virtual character; Progressive Text Reveal—used to reveal words as the virtual character speaks them; Image-based Animation —display image-based affine animations such as flashing lights, moving pictures, or transitions between components; etc.
- the system may determine which components are relevant to the interactive experience.
- Server 101 may then provide the user device 110 a - b with the components, or a portion of the predicted components, to be cached locally for use during the interaction.
- the server 101 may determine which components to send to the user device 110 a - b .
- the user device may determine which components to request from the server. In each instance, in some embodiments the Al engine will only have components transmitted which are not already locally cached on the user device 110 a - b.
- the system may retrieve user characteristics, possibly from a database in communication with server 101 or a user device.
- the system may retrieve components associated with the interactive scene.
- the system may determine component personalization metadata. For example, the system may determine behavioral and conversational parameters of the synthetic characters, or may determine the images to be associated with certain components, possibly using criteria as described above.
- the system may initiate an interactive session 905 .
- the system may log interaction statistics.
- the system can report the interaction statistics.
- FIG. 10 illustrates an example screenshot of a GUI 1000 for a component creation and management system as may be implemented in certain embodiments.
- a designer may create a list of categories 1002 , some of which may be common to a plurality of scenes, while others, such as “fireside chats” 1004 are unique to a particular scene.
- categories 1002 some of which may be common to a plurality of scenes, while others, such as “fireside chats” 1004 are unique to a particular scene.
- a designer may specify components 1003 and conversation elements 1005 , as well as the interaction between the two.
- the designer may indicate relations between the conversation elements and the components and may indicate what preferential order components should be selected, transmitted, prioritized, and interacted with.
- FIG. 11 is a flowchart depicting certain steps in a dynamic Al conversation management process as may be implemented in certain embodiments.
- the system can predict possible conversation paths that may occur between a user and one or more synthetic characters, or between the synthetic characters where their conversations are nondeterministic.
- the system may retrieve N speech waveforms from a database and cache them either locally at server system 101 or at user device 110 a - b .
- the system can retrieve metadata corresponding to the N speech waveforms from a database and cache them either locally at server system 101 or at user device 110 a - b .
- the system may notify an Al engine of the speech waveforms and animation metadata cached locally and may animate synthetic characters using the animation metadata.
- the Al engine may anticipate network latency and/or resource availability in the selection of content to be provided to a user.
- the animation may be driven by phoneme metadata associated with the waveform. For example, timestamps may be used to correlate certain animations, such as jaw and lip movements, with the corresponding points of the waveform. In this manner, the synthetic character's animations may dynamically adapt to the waveforms selected by the system.
- this “phoneme metadata” may comprise offsets to be blended with the existing synthetic character animations.
- the phoneme metadata may be automatically created during the asset creation process or it may be explicitly generated by an animator or audio engineer. Where the waveforms are generated by a text-to-speech program, the system may concatenate elements form a suite of phoneme animation metadata to produce the phoneme animation metadata associated with the generated waveform.
- FIG. 12 is a flowchart depicting certain steps in a frustration management process as may be implemented in certain embodiments.
- the system monitors a conversation log.
- the system may monitor a preexisting record of conversations.
- the system may monitor an ongoing log of a current conversation. As part of the monitoring, the system may identify responses from a user as indicative of frustration and may tag the response accordingly.
- the system may determine if frustration tagged responses exceed a threshold or if the responses otherwise meet a criteria for assessing the user's frustration level. Where the user's responses indicate frustration, the system may proceed to step 1203 , and notify the Al Engine regarding the user's frustration. In response, at step 1204 , the Al engine may adjust the interaction parameters between the synthetic characters to help alleviate the frustration. For example, rather than engage the user as often in responses, the characters may be more likely to interact with one another or to automatically direct the flow of the interaction to a situation determined to be more conducive to engaging the user.
- FIG. 13 is a flowchart depicting certain steps in a speech reception process 1300 as may be implemented in certain embodiments.
- the system may determine a character of an expected response by the user.
- the character of the response may be determined based on the immediately preceding statements and inquiries of the synthetic characters.
- the system can determine if “Hold-to-Talk” functionality is suitable. If so, the system may present a “Hold-to-Talk” icon at step 1305 , and perform a “Hold-to-Talk” operation at step 1306 .
- the “Hold-to-Talk” icon may appear as a modification of, or icon in proximity to, speech interface 303 . In some embodiments, no icon is present (e.g., step 1305 is skipped) and the system performs “Hold-to-Talk” operation at step 1306 using the existing icon(s).
- the “Hold-to-Talk” operation may include a process whereby recording at the user device's microphone is disabled when the synthetic characters are initially waiting for a response.
- recording at the user device's microphone may be enabled and the user may respond to the conversation involving the synthetic characters.
- the user may continue to hold (e.g. physically touching or otherwise providing tactile input) the icon until they are done providing their response and may then release the icon to complete the recording.
- recording at the user device's microphone may be enabled and the user may respond to the conversation involving the synthetic characters. Following completion of their response, the user may again select the icon, perhaps the same icon as initially selected, to complete the recording and, in some embodiments, to disable the microphone.
- the system can determine if “Tap-to-Talk-With-Silence-Detection”functionality is suitable. If so, the system may present a “Tap-to-Talk-With-Silence-Detection” icon at step 1309 , and perform a “Tap-to-Talk-With-Silence-Detection” operation at step 1310 .
- the “Tap-to-Talk-With-Silence-Detection” icon may appear as a modification of, or icon in proximity to, speech interface 303 .
- no icon is present (e.g., step 1309 is skipped) and the system performs “Tap-to-Talk-With-Silence-Detection” operation at step 1310 using the existing icon(s).
- the “Tap-to-Talk-With-Silence-Detection” operation may include a process whereby recording at the user device's microphone is disabled when the characters initially wait for a response from the user.
- an icon such as speech interface 303
- recording at the user device's microphone may be enabled and the user may respond to the conversation involving the synthetic characters. Following completion of their response, the user may fall silent, without actively disabling the microphone.
- the system may detect the subsequent silence and stop the recording after some threshold period of time has passed. In some embodiments, silence may be detected by measuring the energy of the recording's frequency spectrum.
- the system may perform an “Automatic-Voice-Activity-Detection” operation.
- “Automatic-Voice-Activity-Detection” the system may activate a microphone 1311 , if not already activated, on the user device.
- the system may then analyze the power and frequency of the recorded audio to determine if speech is present at step 1312 . If speech is not present over some threshold period of time, the system may conclude the recording.
- FIG. 14 illustrates an example screenshot of a social asset sharing GUI as may be implemented in certain embodiments.
- a reviewer such as the user or a relation of the user, may be presented with a series of images 1401 captured during various interactions with the synthetic characters. For example, some of the images may have been voluntarily requested by the user and may depict various asset overlays to the user's image, such as hat and/or facial hair.
- the plurality of images 1401 may also include images automatically taken of the user at various moments in various interactions.
- Gallery controls 1402 and 1403 may be used to select from different collections of images, possibly images organized by different scenarios engaged with the user.
- FIG. 15 illustrates an example screenshot 1500 of a message drafting tool in the social asset sharing GUI of FIG. 14 as may be implemented in certain embodiments.
- the system may present a pop-up display 1501 .
- the display 1501 may include an enlarged version 1502 of the selected image and a region 1503 for accepting text input.
- An input 1505 for selecting one or more message mediums, such as Facebook, MySpace, Twitter, etc. may also be provided.
- the user may insert commentary text in the region 1503 .
- sharing icon 1504 the user may share the image and commentary text with a community specified by input 1505 .
- the message drafting tool is used by a parent of the child user.
- FIG. 16 is a flowchart depicting certain steps in a social image capture process as may be implemented in certain embodiments.
- the system may determine that image capture is relevant to a conversation. For example, following initiation of a roleplaying sequence which involves overlaying certain assets on the user's image 304 b (or at image 401 , 501 , etc.) the system may be keyed to encourage the user to have their image, with the asset overlaid, captured. Following the overlaying of the asset on to the user image at step 1602 the system may propose that the user engage in an image capture at step 1603 . The proposal may be made by one of the synthetic characters in the virtual environment.
- the system may capture an image of the user at step 1605 .
- the system may then store the image at step 1606 and present the captured image for review at step 1607 .
- the image may be presented for review by the user, or by another individual, such as the user's mother or other family member. If the image is accepted for sharing during the review at step 1608 the system may transmit the captured image for sharing at step 1609 to a selected social network.
- FIG. 17 is an example of a computer system 1700 with which various embodiments may be utilize. Various of the disclosed features may be located on computer system 1700 .
- the computer system includes a bus 1705 , at least one processor 1710 , at least one communication port 1715 , a main memory 1720 , a removable storage media 1725 , a read only memory 1730 , and a mass storage 1735 .
- Processor(s) 1710 can be any known processor, such as, but not limited to, an Intel® Itanium® or Itanium 2® processor(s), or AMD® Opteron® or Athlon MP® processor(s), or Motorola® lines of processors.
- Communication port(s) 1715 can be any of an RS-232 port for use with a modem based dialup connection, a 10/100 Ethernet port, or a Gigabit port using copper or fiber.
- Communication port(s) 1715 may be chosen depending on a network such a Local Area Network (LAN), Wide Area Network (WAN), or any network to which the computer system 1700 connects.
- LAN Local Area Network
- WAN Wide Area Network
- Main memory 1720 can be Random Access Memory (RAM), or any other dynamic storage device(s) commonly known in the art.
- Read only memory 1730 can be any static storage device(s) such as Programmable Read Only Memory (PROM) chips for storing static information such as instructions for processor 1710 .
- PROM Programmable Read Only Memory
- Mass storage 1735 can be used to store information and instructions.
- hard disks such as the Adaptec® family of SCSI drives, an optical disc, an array of disks such as RAID, such as the Adaptec family of RAID drives, or any other mass storage devices may be used.
- Bus 1705 communicatively couples processor(s) 1710 with the other memory, storage and communication blocks.
- Bus 1705 can be a PCI/PCI-X or SCSI based system bus depending on the storage devices used.
- Removable storage media 1725 can be any kind of external hard-drives, floppy drives, IOMEGA® Zip Drives, Compact Disc-Read Only Memory (CD-ROM), Compact Disc-Re-Writable (CD-RW), Digital Video Disk-Read Only Memory (DVD-ROM).
- CD-ROM Compact Disc-Read Only Memory
- CD-RW Compact Disc-Re-Writable
- DVD-ROM Digital Video Disk-Read Only Memory
- While the computer-readable medium is shown in an embodiment to be a single medium, the term “computer-readable medium” should be taken to include a single medium or multiple media (e.g., a centralized or distributed database, and/or associated caches and servers) that stores the one or more sets of instructions.
- the term “computer-readable medium” shall also be taken to include any medium that is capable of storing, encoding or carrying a set of instructions for execution by the computer and that cause the computer to perform any one or more of the methodologies of the presently disclosed technique and innovation.
- the computer may be, but is not limited to, a server computer, a client computer, a personal computer (PC), a tablet PC, a laptop computer, a set-top box (STB), a personal digital assistant (PDA), a cellular telephone, an iPhone®, an iPad®, a processor, a telephone, a web appliance, a network router, switch or bridge, or any machine capable of executing a set of instructions (sequential or otherwise) that specify actions to be taken by that machine.
- PC personal computer
- PDA personal digital assistant
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Human Computer Interaction (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Multimedia (AREA)
- Acoustics & Sound (AREA)
- User Interface Of Digital Computer (AREA)
- Data Mining & Analysis (AREA)
- Robotics (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Quality & Reliability (AREA)
- Signal Processing (AREA)
Abstract
Description
- Various of the disclosed embodiments concern systems and methods for conversation-based human-computer interactions.
- Human computer interaction (HCl) involves the interaction between humans and computers, focusing on the intersection of computer science, cognitive science, interface design, and many other fields. Artificial intelligence (Al) is another developing discipline which includes adaptive behaviors allowing computer systems to respond organically to a user's input. While Al may be used to augment HCl, possibly by providing a synthetic character for interacting with the user, the interaction may seem stale and artificial to the user if the Al is unconvincing. This is particularly true where the Al fails to account for contextual factors regarding the interaction and where the Al fails to maintain a “life-like” persona when interacting with the user. Conversation, though an excellent method for human-human interaction, may be especially problematic for an Al system because of conversation's contextual and inherently ambiguous character. Even children, who may more readily embrace inanimate characters as animate entities, can recognize when a conversational Al has become disassociated from the HCl context. Teaching and engaging children through Has would be highly desirable, but must overcome the obstacle of lifeless and contextually ignorant Al behaviors.
- Accordingly, there exists a need for systems and methods to provide effective HCl interactions to users, particularly younger users, that accommodate the challenges of conversational dialogue.
- Certain embodiments contemplate a method for engaging a user in conversation with a synthetic character, the method comprising: receiving an audio input from a user, the audio input comprising speech; acquiring a textual description of the speech; determining a responsive audio output based upon the textual description; and causing a synthetic character to speak using the determined responsive audio output.
- In some embodiments, the method further comprises receiving a plurality of audio inputs comprising speech from a user, the plurality of audio inputs associated with a plurality of spoken outputs from one or more synthetic characters. In some embodiments, the plurality of audio inputs comprise answers to questions posed by one or more synthetic characters. In some embodiments, the plurality of audio inputs comprise a narration of text and the plurality of spoken outputs from one or more synthetic characters comprise ad-libbing or commentary to the narration. In some embodiments, the plurality of audio inputs comprise statements in a dialogue regarding a topic. In some embodiments, acquiring a textual description of the speech comprises transmitting the audio input to a dedicated speech processing service. In some embodiments, receiving an audio input comprises determining whether to perform one of “Automatic-Voice-Activity-Detection”, “Hold-to-Talk”, “Tap-to-Talk”, or “Tap-to-Talk-With-Silence-Detection” operations. In some embodiments, the method further comprises modifying an icon to reflect the determined audio input operation. In some embodiments, the method further comprises modifying an icon to reflect the determined audio input operation. In some embodiments, determining a responsive audio output comprises determining user personalization metadata. In some embodiments, the method further comprises acquiring phoneme animation metadata associated with the responsive audio output for the purpose of animating some of the character's facial features. In some embodiments, the method further comprises modifying an icon to reflect the determined audio input operation. reviewing a plurality of responses from the user and performing more inter-character dialogue rather than user-character dialogue based on the review. In some embodiments, the method further comprises associating prioritization metadata with each potential response for the synthetic character and using these prioritization metadata to cause one possible response to be output before other responses. In some embodiments, causing a synthetic character to speak using the determined responsive audio output comprises causing the synthetic character to propose taking a picture using a user device. In some embodiments, the method further comprises: causing a picture to be taken of a user, using a user device; and sending the picture to one or more users of a social network.
- Certain embodiments contemplate a method for visually engaging a user in conversation with a synthetic character comprising: retrieving a plurality of components associated with an interactive scene, the interactive scene selected by a user; configuring at least one of the plurality of components to represent a synthetic character in the scene; and transmitting at least some of the plurality of components to a user device.
- In some embodiments, the method further comprises retrieving personalization metadata associated with a user and modifying at least one of the plurality of components based on the personalization metadata. In some embodiments, retrieving a plurality of components comprises retrieving a plurality of speech waveforms from a database.
- Certain embodiments contemplate a computer system for engaging a user in conversation with a synthetic character, the system comprising: a display; a processor; a communication port; a memory containing instructions, wherein the instructions are configured to cause the processor to: receive an audio input from a user, the audio input comprising speech; acquire a textual description of the speech; determine a responsive audio output based upon the textual description; and cause a synthetic character to speak using the determined responsive audio output.
- In some embodiments receiving an audio input comprises determining whether to perform one of “Automatic-Voice-Activity-Detection”, “Hold-to-Talk”, “Tap-to-Talk”, or “Tap-to-Talk-With-Silence-Detection” operations. In some embodiments, the instructions are further configured to cause the processor to modify an icon to reflect the determined operation. In some embodiments, to determine a responsive audio output comprises determining user personalization metadata. In some embodiments, the instructions are further configured to cause the processor to acquire phoneme metadata associated with the responsive audio output for the purpose of animating some of the character's facial features. In some embodiments, the instructions are further configured to cause the processor to review a plurality of responses from the user and perform more inter-character dialogue rather than user-character dialogue based on the review. In some embodiments, the instructions are further configured to cause the processor to associate prioritization metadata with each potential response for the synthetic character and use these prioritization metadata to cause one possible response to be output before other responses. In some embodiments, causing a synthetic character to speak using the determined responsive audio output comprises causing the synthetic character to propose taking a picture using a user device.
- Certain embodiments contemplate a computer system for engaging a user in conversation with a synthetic character, the computer system comprising: means for receiving an audio input from a user, the audio input comprising speech; means for determining a description of the speech; means for determining a responsive audio output based upon the description; and means for causing a synthetic character to speak using the determined responsive audio output.
- In some embodiments, the audio input receiving means comprises one of a microphone, a packet reception module, a WiFi receiver, a cellular network receiver, an Ethernet connection, a radio receiver, a local area connection, or an interface to a transportable memory storage device. In some embodiments, the speech description determining means comprises one of a connection to a dedicated speech processing server, a natural language processing program, a speech recognition system, a Hidden Markov Model, or a Bayesian Classifier. In some embodiments, the responsive audio output determination means comprises one of an Artificial Intelligence engine, a Machine Learning classifier, a decision tree, a state transition diagram, a Markov Model, or a Bayesian Classifier. In some embodiments, the synthetic character speech means comprises one of a speaker, a connection to a speaker on a mobile device, a WiFi transmitter in communication with a user device, a packet transmission module, a cellular network transmitter in communication with a user device, an Ethernet connection in communication with a user device, a radio transmitter in communication with a user device, or a local area connection in communication with a user device.
- One or more embodiments of the present disclosure are illustrated by way of example and not limitation in the figures of the accompanying drawings, in which like references indicate similar elements.
-
FIG. 1 illustrates a block diagram of various components in a system as may be implemented in certain embodiments. -
FIG. 2 illustrates a topological relationship between a plurality of interactive scenes in a virtual environment as may be used in certain embodiments. -
FIG. 3 illustrates an example screenshot of a graphical user interface (GUI) of a main scene in a virtual environment as may be implemented in certain embodiments. -
FIG. 4 illustrates an example screenshot of a “fireside chat scene” GUI in a virtual environment as may be implemented in certain embodiments. -
FIG. 5 illustrates an example screenshot of a “versus scene” GUI in a virtual environment as may be implemented in certain embodiments. -
FIG. 6 illustrates an example screenshot of a “game show scene” GUI in a virtual environment as may be implemented in certain embodiments. -
FIG. 7 illustrates an example screenshot of a “story telling scene” GUI in a virtual environment as may be implemented in certain embodiments. -
FIG. 8 is a flowchart depicting certain steps in a user interaction process with the virtual environment as may be implemented in certain embodiments. -
FIG. 9 is a flowchart depicting certain steps in a component-based content management and delivery process as may be implemented in certain embodiments. -
FIG. 10 illustrates an example screenshot of a GUI for a component creation and management system as may be implemented in certain embodiments. -
FIG. 11 is a flowchart depicting certain steps in a dynamic Al conversation management process as may be implemented in certain embodiments. -
FIG. 12 is a flowchart depicting certain steps in a frustration management process as may be implemented in certain embodiments. -
FIG. 13 is a flowchart depicting certain steps in a speech reception process as may be implemented in certain embodiments. -
FIG. 14 illustrates an example screenshot of a social asset sharing GUI as may be implemented in certain embodiments -
FIG. 15 illustrates an example screenshot of message drafting tool in the social asset sharing GUI ofFIG. 14 as may be implemented in certain embodiments. -
FIG. 16 is a flowchart depicting certain steps in a social image capture process as may be implemented in certain embodiments. -
FIG. 17 is a block diagram of components in a computer system which may be used to implement certain of the disclosed embodiments. - The following description and drawings are illustrative and are not to be construed as limiting. Numerous specific details are described to provide a thorough understanding of the disclosure. However, in certain instances, well-known details are not described in order to avoid obscuring the description. References to one or an embodiment in the present disclosure can be, but not necessarily are, references to the same embodiment; and, such references mean at least one of the embodiments.
- Reference in this specification to “one embodiment” or “an embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the disclosure. The appearances of the phrase “in one embodiment” in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Moreover, various features are described which may be exhibited by some embodiments and not by others. Similarly, various requirements are described which may be requirements for some embodiments but not other embodiments.
- The terms used in this specification generally have their ordinary meanings in the art, within the context of the disclosure, and in the specific context where each term is used. Certain terms that are used to describe the disclosure are discussed below, or elsewhere in the specification, to provide additional guidance to the practitioner regarding the description of the disclosure. For convenience, certain terms may be highlighted, for example using italics and/or quotation marks. The use of highlighting has no influence on the scope and meaning of a term; the scope and meaning of a term is the same, in the same context, whether or not it is highlighted. It will be appreciated that the same thing can be said in more than one way.
- Consequently, alternative language and synonyms may be used for any one or more of the terms discussed herein, nor is any special significance to be placed upon whether or not a term is elaborated or discussed herein. Synonyms for certain terms are provided. A recital of one or more synonyms does not exclude the use of other synonyms. The use of examples anywhere in this specification including examples of any term discussed herein is illustrative only, and is not intended to further limit the scope and meaning of the disclosure or of any exemplified term. Likewise, the disclosure is not limited to various embodiments given in this specification.
- Without intent to further limit the scope of the disclosure, examples of instruments, apparatus, methods and their related results according to the embodiments of the present disclosure are given below. Note that titles or subtitles may be used in the examples for convenience of a reader, which in no way should limit the scope of the disclosure. Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure pertains. In the case of conflict, the present document, including definitions will control.
- Certain of the disclosed embodiments concern systems and methods for conversation-based human-computer interactions. In some embodiments, the system includes a plurality of interactive scenes in a virtual environment. A user may access each scene and engage in conversation with a synthetic character regarding an activity associated with that active scene. In certain embodiments, a central server may house a plurality of waveforms associated with the synthetic character's speech, and may dynamically deliver the waveforms to a user device in conjunction with the operation of an artificial intelligence. In some embodiments, speech is generated with text-to-speech utilities when the waveform from the server is unavailable or inefficient to retrieve.
-
FIG. 1 illustrates a block diagram of various components in a system as may be implemented in certain embodiments. In some embodiments, ahost server system 101 may perform various of the disclosed features and may be in communications with a user devices 110 a-b via networks 108 a-b. In some embodiments, networks 108 a-b are the same network and may be any commonly known network, such as the Internet, a Local Area Network (LAN), a local WiFi ad-hoc network, etc. In some embodiments, the networks include transmissions from cellular towers 107 a-b and the user devices 110 a-b. Users 112 a-b may interact with a local application on their respective devices using a user interface 109 a-b. In some embodiments, the user may be in communication withserver 101 via the local application. The local application may be a stand-alone software program, or may present information fromserver 101 with minimal specialized local processing, for example, as an internet browser. - The
server 101 may include a plurality of software, firmware, and/or hardware modules to implement various of the disclosed processes. For example, the server may include a plurality ofsystem tools 102, such as dynamic libraries, to perform various functions. A database to storemetadata 103 may be included as well as databases for storingspeech data 104 andanimation data 105. In some embodiments, theserver 101, may also include acache 106 to facilitate more efficient response times to asset requests from user devices 110 a-b. - In certain embodiments,
server 101 may host a service that provides assets to user devices 110 a-b so that the devices may generate synthetic characters for interaction with a user in a virtual environment. The operation of the virtual environment may be distributed between the user devices 110 a-b and theserver 101 in some embodiments. For example, in some embodiments the virtual environment and/or Al logic may be run on theserver 101 and the user devices may request only enough information to display the results. In other embodiments, the virtual environment and/or Al may run predominately on the user devices 110 a-b and communicate with the server only aperiodically to acquire new assets. -
FIG. 2 illustrates a topological relationship between a plurality of interactive scenes in a virtual environment as may be used in certain embodiments. In this example, there are three interactive scenes A, B, C, 201 a-c and aMain Scene 201 d from which a user may begin an interactive session. In some embodiments, the scenes may comprise “rooms” in a house, or different “games” in a game show. Each interactive scene may present a unique context and may contain some elements common to the other scenes and some elements which are unique. A user may transition from some scenes without restriction, as in the case oftransitions 202 c-e. Some transitions, however, may be unidirectional, such as thetransition 202 b from scene A 201 a toscene B 201 b and thetransition 202 a fromscene C 201 c to scene A 201 a. In some embodiments the user transitions between scenes by oral commands or orally indicated agreement with synthetic character propositions. - In some embodiments, the user may be required to return to the
main scene 201 d following an interaction, so that the conversation Al logic may be reinitialized and configured for a new scene. -
FIG. 3 illustrates an example screenshot of a graphical user interface (GUI) 300 of a main scene in a virtual environment as may be implemented in certain embodiments. In some embodiments, the GUI may appear on an interface 109 a-b, such as on a display screen of a mobile phone, or on a touch screen of a mobile phone or of a tablet device. As illustrated in this example, theGUI 300 may include a first 301 a and second 301 b depiction of a synthetic character, amenu bar 302 having a user graphic 304 a, a separate static or real-time user video 304 b, and aspeech interface 303. -
Menu 302 may depict common elements across all the scenes of the virtual environment, to provide visual and functional continuity to the user.Speech interface 303 may be used to respond to inquiries from synthetic characters 301 a-b. For example, in some embodiments the user may touch theinterface 303 to activate a microphone to receive their response. In other embodiments theinterface 303 may illuminate or otherwise indicate an active state when the user selects some other input device. In some embodiments, theinterface 303 may illuminate automatically when recording is initiated by the system. - In some embodiments, real-
time user video 304 b depicts a real-time, or near real-time, image of a user as they use a user device, possibly acquired using a camera in communication with the user device. As indicated inFIG. 3 , the depiction of the user may be modified by the system, for example, by overlaying facial hair, wigs, hats, earrings, etc. onto the real-time video image. The overlay may be generated in response to the activities occurring in the virtual environment and/or by conversation with the synthetic characters. For example, where the interaction involves role-playing, such as including the user in a pirate adventure, the user's image may be overlaid with a pirate hat, skull and bones, or similar asset germane to the interaction. In some embodiments, user graphic 304 a is a static image of the user. During application setup, the system may take an image of the user and archive the image as a “standard” or “default” image to be presented as user graphic 304 a. However, as described in greater detail herein, in some embodiments the user may elect to have their image with an overlaid graphic replace the user graphic 304 a. In some embodiments, the user may replace user graphic 304 a at their own initiative. - In some embodiments, the interaction may include a suggestion or an invitation by one or more of the synthetic characters for the user to activate the taking of their picture by the user device, or for the system to automatically take the user's picture. For example, upon initiating the piracy interaction and after first presenting the user with the pirate hat, a synthetic character may comment on the user's appearance and offer to capture the user's image using a camera located on the user device. If the user responds in the affirmative, the system may then capture the image and archive the image or use the image to replace user graphic 304 a, either permanently or for some portion of the piracy interaction. In some embodiments, the same or corresponding graphics may be overlaid upon the synthetic characters' images.
- As described in greater detail herein, synthetic characters 301 a-b may perform a variety of animations, both to indicate that they are speaking as well as to interact with other elements of the scene.
-
FIG. 4 illustrates an example screenshot of a “fireside chat scene”GUI 400 in a virtual environment as may be implemented in certain embodiments. Elements in thebackground 403 may indicate to the user which scene the user is currently in. In this example, an image of theuser 401, possibly a real-time image acquired using a camera on the user's device, may be used. A synthetic character, such assynthetic character 301 b, may pose questions to the user throughout an interaction and the user may respond usingspeech interface 303. Atext box 402 may be used to indicate the topic and nature of the conversation (e.g., “school”). -
FIG. 5 illustrates an example screenshot of a “versus scene”GUI 500 in a virtual environment as may be implemented in certain embodiments. In this example, even though a synthetic character is not visible in theGUI 500, the system may still pose questions (possibly with the voice of a synthetic character) and receive responses and statements from the user. In this scene, a scrollingheader 504 a may be used to indicate contextual information relevant to the conversation. In this example, the user, depicted inelement 501, is engaged in a battle of wits with a pirate, depicted inopponent image 503. Text boxes 502 a-b may be used to indicate questions posed by the system and possible answer responses that may be given, or are expected to be given, by the user. -
FIG. 6 illustrates an example screenshot of a “game show scene”GUI in a virtual environment as may be implemented in certain embodiments. In this scene,synthetic character 301 b may conduct a game show wherein the user is a contestant. Thesynthetic character 301 b may pose questions to the user. Expected answers may be presented in text boxes 602 a-c. Asynthetic character 301 c may be a different synthetic character fromcharacter 301 b or may be a separately animated instantiation of the same character.Synthetic character 301 c may be used to pose questions to the user. Atitle screen 603 may be used to indicate the nature of the contest. The user's image may be displayed in real-time or near real-time inregion 601. -
FIG. 7 illustrates an example screenshot of a “story telling scene”GUI 700 in a virtual environment as may be implemented in certain embodiments. In this scene, theGUI 700 may be divided into atext region 701 and agraphic region 702. The synthetic characters 301 a-b may narrate and/or role-play portions of a story as eachregion region 701, and the characters 301 a-b ad-lib or comment upon portions of the story or upon the user's reading. -
FIG. 8 is a flowchart depicting certain steps in a user interaction process with the virtual environment as may be implemented in certain embodiments. Atstep 801 the system may present the user with a main scene, such as an scene depicted inFIG. 3 . Atstep 802, the system may receive a user selection for an interactive scene (such as an oral selection). In some instances, the input may comprise a touch or swipe action relative to a graphical icon, but in other instances the input may be an oral response by the user, such as a response to an inquiry from a synthetic character. Atstep 803, the system may present the user with the selected interactive scene. - At
step 804, the system may engage the user in a dialogue sequence based on criteria. The criteria may include previous conversations with the user and a database of statistics generated based on social information or past interactions with the user. Atstep 805, the system may determine whether the user wishes to repeat an activity associated with the selected scene. For example, a synthetic character may inquire as to the user's preferences. If the user elects, perhaps orally or via tactile input, to pursue the same activity, the system may repeat the activity using the same criteria as previously, or atstep 806 may modify the criteria to reflect the previous conversation history. - Alternatively, if the user does not wish to repeat the activity the system can determine whether the user wishes to quit at
step 807, again possibly via interaction with a synthetic character. If the user does not wish to quit the system an again determine which interactive scene the user wishes to enter atstep 802. Before or after entering the main scene atstep 802 the system may also modify criteria based on previous conversations and the user's personal characteristics. In some embodiments, the user transitions between scenes using a map interface. - In some embodiments, content can be tagged so that it will only be used when certain criteria are met. This may allow the system to serve content that is customized for the user. Example fields for criteria may include the following: Repeat—an alternative response to use when the character is repeating something; Once Only—use this response only one time, e.g., never repeat it; Age—use the response only if the user's age falls within a specified range; Gender—use the response only if the user's gender is male or female; Day—use the response only if the current day matches the specified day; Time —use the response only if the current time falls within the time range; Last Activity—use the response if the previous activity matches a specific activity; Minutes Played—use a response if the user has exceeded the given number of minutes of play; Region—use the response if the user is located in a given geographic region; Last Played—use the response if the user has not used the service for a given number of days; etc. Responses used by synthetic characters can be timestamped and recorded by the system so that the Al engine will avoid giving repetitive responses in the future. Users may be associated with user accounts to facilitate storage of their personal information.
- Criteria may also be derived from analytics. In some embodiments, the system logs statistics for all major events that occur during a dialogue session. These statistics may be logged to the server and can be aggregated to provide analytics for how users interact with the service at scale. This can be used to drive updates to the content or changes to the priorities of content. For example, analytics can tell that users prefer one activity over another, allowing more engaging content to be surfaced more quickly for future users. In some embodiments, this re-prioritizing of content can happen automatically based upon data logged from users at scale.
- Additionally, through analysis of past conversations, the writing team can gain insights into topics that require more writing because they occur frequently. Naturally, some content may play out to be funnier than other content. The system may want to use the “best” content early on in order to grab the user's interest and attention. The Al, or the designers, may accordingly tag content with High, Medium, or Low priorities. The Al engine may prefer to deliver content that is marked with higher priority than other content in some embodiments.
-
FIG. 9 is a flowchart depicting certain steps in a component-based content management anddelivery process 900 as may be implemented in certain embodiments. In each of the example scenes ofFIGS. 3-7 a variety of elements such as thetext boxes title screen 603,user images - Upon, or before, entering an scene the system may determine which components are relevant to the interactive experience.
Server 101 may then provide the user device 110 a-b with the components, or a portion of the predicted components, to be cached locally for use during the interaction. Where the Al engine operates onserver 101 theserver 101 may determine which components to send to the user device 110 a-b. In embodiments where the Al engine operates on the user device 110 a-b, the user device may determine which components to request from the server. In each instance, in some embodiments the Al engine will only have components transmitted which are not already locally cached on the user device 110 a-b. - With reference to the
process 900, atstep 901 the system may retrieve user characteristics, possibly from a database in communication withserver 101 or a user device. Atstep 902 the system may retrieve components associated with the interactive scene. Atstep 903 the system may determine component personalization metadata. For example, the system may determine behavioral and conversational parameters of the synthetic characters, or may determine the images to be associated with certain components, possibly using criteria as described above. - At
step 905 the system may initiate aninteractive session 905. During the interactive session, atstep 906 the system may log interaction statistics. During the interactive session atstep 907, or following the interactive session'sconclusion 908, atstep 909, the system can report the interaction statistics. -
FIG. 10 illustrates an example screenshot of aGUI 1000 for a component creation and management system as may be implemented in certain embodiments. In this example interface, a designer may create a list ofcategories 1002, some of which may be common to a plurality of scenes, while others, such as “fireside chats” 1004 are unique to a particular scene. Within each category, a designer may specifycomponents 1003 andconversation elements 1005, as well as the interaction between the two. In some embodiments, the designer may indicate relations between the conversation elements and the components and may indicate what preferential order components should be selected, transmitted, prioritized, and interacted with.Various tools 1001 may be used to edit and design the conversation and component interactions, which may have elements common to a text editing or word processing software (e.g., spell checking, text formatting, etc.). Using GUI 1000 a designer may direction conversation interactions via component selection. For example, by specifying components for the answers 602 a-c the system can increase the probability that a user will respond with one of these words. -
FIG. 11 is a flowchart depicting certain steps in a dynamic Al conversation management process as may be implemented in certain embodiments. At step 1101, the system can predict possible conversation paths that may occur between a user and one or more synthetic characters, or between the synthetic characters where their conversations are nondeterministic. At step 1102, the system may retrieve N speech waveforms from a database and cache them either locally atserver system 101 or at user device 110 a-b. At step 1103, the system can retrieve metadata corresponding to the N speech waveforms from a database and cache them either locally atserver system 101 or at user device 110 a-b. At step 1104, the system may notify an Al engine of the speech waveforms and animation metadata cached locally and may animate synthetic characters using the animation metadata. In this manner, the Al engine may anticipate network latency and/or resource availability in the selection of content to be provided to a user. - In some embodiments the animation may be driven by phoneme metadata associated with the waveform. For example, timestamps may be used to correlate certain animations, such as jaw and lip movements, with the corresponding points of the waveform. In this manner, the synthetic character's animations may dynamically adapt to the waveforms selected by the system. In some embodiments, this “phoneme metadata” may comprise offsets to be blended with the existing synthetic character animations. The phoneme metadata may be automatically created during the asset creation process or it may be explicitly generated by an animator or audio engineer. Where the waveforms are generated by a text-to-speech program, the system may concatenate elements form a suite of phoneme animation metadata to produce the phoneme animation metadata associated with the generated waveform.
-
FIG. 12 is a flowchart depicting certain steps in a frustration management process as may be implemented in certain embodiments. Atstep 1201 the system monitors a conversation log. In some embodiments the system may monitor a preexisting record of conversations. In some embodiments the system may monitor an ongoing log of a current conversation. As part of the monitoring, the system may identify responses from a user as indicative of frustration and may tag the response accordingly. - At
step 1202, the system may determine if frustration tagged responses exceed a threshold or if the responses otherwise meet a criteria for assessing the user's frustration level. Where the user's responses indicate frustration, the system may proceed to step 1203, and notify the Al Engine regarding the user's frustration. In response, atstep 1204, the Al engine may adjust the interaction parameters between the synthetic characters to help alleviate the frustration. For example, rather than engage the user as often in responses, the characters may be more likely to interact with one another or to automatically direct the flow of the interaction to a situation determined to be more conducive to engaging the user. -
FIG. 13 is a flowchart depicting certain steps in aspeech reception process 1300 as may be implemented in certain embodiments. Atstep 1301, the system may determine a character of an expected response by the user. In some embodiments, the character of the response may be determined based on the immediately preceding statements and inquiries of the synthetic characters. - At
step 1302, the system can determine if “Hold-to-Talk” functionality is suitable. If so, the system may present a “Hold-to-Talk” icon atstep 1305, and perform a “Hold-to-Talk” operation atstep 1306. The “Hold-to-Talk” icon may appear as a modification of, or icon in proximity to,speech interface 303. In some embodiments, no icon is present (e.g.,step 1305 is skipped) and the system performs “Hold-to-Talk” operation atstep 1306 using the existing icon(s). The “Hold-to-Talk” operation may include a process whereby recording at the user device's microphone is disabled when the synthetic characters are initially waiting for a response. Upon selecting an icon, such asspeech interface 303, recording at the user device's microphone may be enabled and the user may respond to the conversation involving the synthetic characters. The user may continue to hold (e.g. physically touching or otherwise providing tactile input) the icon until they are done providing their response and may then release the icon to complete the recording. - At
step 1303, the system can determine if “Tap-to-Talk” functionality is suitable. If so, the system may present a “Tap-to-Talk” icon atstep 1307, and perform a “Tap-to-Talk” operation atstep 1308. The “Tap-to-Talk” icon may appear as a modification of, or icon in proximity to,speech interface 303. In some embodiments, no icon is present (e.g.,step 1307 is skipped) and the system performs “Tap-to-Talk” operation atstep 1308 using the existing icon(s). The “Tap-to-Talk” operation may include a process whereby recording at the user device's microphone is disabled when the synthetic characters initially wait for a response. Upon selecting an icon, such asspeech interface 303, recording at the user device's microphone may be enabled and the user may respond to the conversation involving the synthetic characters. Following completion of their response, the user may again select the icon, perhaps the same icon as initially selected, to complete the recording and, in some embodiments, to disable the microphone. - At
step 1304, the system can determine if “Tap-to-Talk-With-Silence-Detection”functionality is suitable. If so, the system may present a “Tap-to-Talk-With-Silence-Detection” icon atstep 1309, and perform a “Tap-to-Talk-With-Silence-Detection” operation atstep 1310. The “Tap-to-Talk-With-Silence-Detection” icon may appear as a modification of, or icon in proximity to,speech interface 303. In some embodiments, no icon is present (e.g.,step 1309 is skipped) and the system performs “Tap-to-Talk-With-Silence-Detection” operation atstep 1310 using the existing icon(s). The “Tap-to-Talk-With-Silence-Detection” operation may include a process whereby recording at the user device's microphone is disabled when the characters initially wait for a response from the user. Upon selecting an icon, such asspeech interface 303, recording at the user device's microphone may be enabled and the user may respond to the conversation involving the synthetic characters. Following completion of their response, the user may fall silent, without actively disabling the microphone. The system may detect the subsequent silence and stop the recording after some threshold period of time has passed. In some embodiments, silence may be detected by measuring the energy of the recording's frequency spectrum. - If the system does not determine that any of “Hold-to-Talk”, “Tap-to-Talk”, or “Tap-to-Talk-With-Silence-Detection” is suitable, the system may perform an “Automatic-Voice-Activity-Detection” operation. During “Automatic-Voice-Activity-Detection” the system may activate a
microphone 1311, if not already activated, on the user device. The system may then analyze the power and frequency of the recorded audio to determine if speech is present atstep 1312. If speech is not present over some threshold period of time, the system may conclude the recording. -
FIG. 14 illustrates an example screenshot of a social asset sharing GUI as may be implemented in certain embodiments. In these embodiments, a reviewer, such as the user or a relation of the user, may be presented with a series ofimages 1401 captured during various interactions with the synthetic characters. For example, some of the images may have been voluntarily requested by the user and may depict various asset overlays to the user's image, such as hat and/or facial hair. In some embodiments, the plurality ofimages 1401 may also include images automatically taken of the user at various moments in various interactions. Gallery controls 1402 and 1403 may be used to select from different collections of images, possibly images organized by different scenarios engaged with the user. -
FIG. 15 illustrates anexample screenshot 1500 of a message drafting tool in the social asset sharing GUI ofFIG. 14 as may be implemented in certain embodiments. Following selection of an image to share, the system may present a pop-up display 1501. Thedisplay 1501 may include anenlarged version 1502 of the selected image and aregion 1503 for accepting text input. Aninput 1505 for selecting one or more message mediums, such as Facebook, MySpace, Twitter, etc. may also be provided. The user may insert commentary text in theregion 1503. By selectingsharing icon 1504, the user may share the image and commentary text with a community specified byinput 1505. In some embodiments the message drafting tool is used by a parent of the child user. -
FIG. 16 is a flowchart depicting certain steps in a social image capture process as may be implemented in certain embodiments. Atstep 1601, the system may determine that image capture is relevant to a conversation. For example, following initiation of a roleplaying sequence which involves overlaying certain assets on the user'simage 304 b (or atimage step 1602 the system may propose that the user engage in an image capture atstep 1603. The proposal may be made by one of the synthetic characters in the virtual environment. If the user agrees, possibly via an oral response, at step 1604, the system may capture an image of the user at step 1605. The system may then store the image atstep 1606 and present the captured image for review atstep 1607. The image may be presented for review by the user, or by another individual, such as the user's mother or other family member. If the image is accepted for sharing during the review atstep 1608 the system may transmit the captured image for sharing atstep 1609 to a selected social network. - Various embodiments include various steps and operations, which have been described above. A variety of these steps and operations may be performed by hardware components or may be embodied in machine-executable instructions, which may be used to cause a general-purpose or special-purpose processor programmed with the instructions to perform the steps. Alternatively, the steps may be performed by a combination of hardware, software, and/or firmware. As such,
FIG. 17 is an example of acomputer system 1700 with which various embodiments may be utilize. Various of the disclosed features may be located oncomputer system 1700. According to the present example, the computer system includes abus 1705, at least one processor 1710, at least onecommunication port 1715, amain memory 1720, aremovable storage media 1725, a read onlymemory 1730, and amass storage 1735. - Processor(s) 1710 can be any known processor, such as, but not limited to, an Intel® Itanium® or
Itanium 2® processor(s), or AMD® Opteron® or Athlon MP® processor(s), or Motorola® lines of processors. Communication port(s) 1715 can be any of an RS-232 port for use with a modem based dialup connection, a 10/100 Ethernet port, or a Gigabit port using copper or fiber. Communication port(s) 1715 may be chosen depending on a network such a Local Area Network (LAN), Wide Area Network (WAN), or any network to which thecomputer system 1700 connects. -
Main memory 1720 can be Random Access Memory (RAM), or any other dynamic storage device(s) commonly known in the art. Read onlymemory 1730 can be any static storage device(s) such as Programmable Read Only Memory (PROM) chips for storing static information such as instructions for processor 1710. -
Mass storage 1735 can be used to store information and instructions. For example, hard disks such as the Adaptec® family of SCSI drives, an optical disc, an array of disks such as RAID, such as the Adaptec family of RAID drives, or any other mass storage devices may be used. -
Bus 1705 communicatively couples processor(s) 1710 with the other memory, storage and communication blocks.Bus 1705 can be a PCI/PCI-X or SCSI based system bus depending on the storage devices used. -
Removable storage media 1725 can be any kind of external hard-drives, floppy drives, IOMEGA® Zip Drives, Compact Disc-Read Only Memory (CD-ROM), Compact Disc-Re-Writable (CD-RW), Digital Video Disk-Read Only Memory (DVD-ROM). - The components described above are meant to exemplify some types of possibilities. In no way should the aforementioned examples limit the scope of the invention, as they are only exemplary embodiments.
- While detailed descriptions of one or more embodiments of the invention have been given above, various alternatives, modifications, and equivalents will be apparent to those skilled in the art without varying from the spirit of the invention. For example, while the embodiments described above refer to particular features, the scope of this invention also includes embodiments having different combinations of features and embodiments that do not include all of the described features. Accordingly, the scope of the present invention is intended to embrace all such alternatives, modifications, and variations. Therefore, the above description should not be taken as limiting the scope of the invention.
- While the computer-readable medium is shown in an embodiment to be a single medium, the term “computer-readable medium” should be taken to include a single medium or multiple media (e.g., a centralized or distributed database, and/or associated caches and servers) that stores the one or more sets of instructions. The term “computer-readable medium” shall also be taken to include any medium that is capable of storing, encoding or carrying a set of instructions for execution by the computer and that cause the computer to perform any one or more of the methodologies of the presently disclosed technique and innovation.
- The computer may be, but is not limited to, a server computer, a client computer, a personal computer (PC), a tablet PC, a laptop computer, a set-top box (STB), a personal digital assistant (PDA), a cellular telephone, an iPhone®, an iPad®, a processor, a telephone, a web appliance, a network router, switch or bridge, or any machine capable of executing a set of instructions (sequential or otherwise) that specify actions to be taken by that machine.
- In general, the routines executed to implement the embodiments of the disclosure, may be implemented as part of an operating system or a specific application, component, program, object, module or sequence of instructions referred to as “programs,” The programs typically comprise one or more instructions set at various times in various memory and storage devices in a computer, and that, when read and executed by one or more processing units or processors in a computer, cause the computer to perform operations to execute elements involving the various aspects of the disclosure.
- Moreover, while embodiments have been described in the context of fully functioning computers and computer systems, various embodiments are capable of being distributed as a program product in a variety of forms, and that the disclosure applies equally regardless of the particular type of computer-readable medium used to actually effect the distribution.
- Unless the context clearly requires otherwise, throughout the description and the claims, the words “comprise,” “comprising,” and the like are to be construed in an inclusive sense, as opposed to an exclusive or exhaustive sense; that is to say, in the sense of “including, but not limited to.” As used herein, the terms “connected,” “coupled,” or any variant thereof, means any connection or coupling, either direct or indirect, between two or more elements; the coupling of connection between the elements can be physical, logical, or a combination thereof. Additionally, the words “herein,” “above,” “below,” and words of similar import, when used in this application, shall refer to this application as a whole and not to any particular portions of this application. Where the context permits, words in the above Detailed Description using the singular or plural number may also include the plural or singular number respectively. The word “or,” in reference to a list of two or more items, covers all the following interpretations of the word: any of the items in the list, all of the items in the list, and any combination of the items in the list.
- The above detailed description of embodiments of the disclosure is not intended to be exhaustive or to limit the teachings to the precise form disclosed above. While specific embodiments of, and examples for the disclosure, are described above for illustrative purposes, various equivalent modifications are possible within the scope of the disclosure, as those skilled in the relevant art will recognize. For example, while processes or blocks are presented in a given order, alternative embodiments may perform routines having steps, or employ systems having blocks, in a different order, and some processes or blocks may be deleted, moved, added, subdivided, combined, and/or modified to provide alternative or subcombinations. Each of these processes or blocks may be implemented in a variety of different ways. Also, while processes or blocks are at times shown as being performed in series, these processes or blocks may instead be performed in parallel, or may be performed at different times. Further any specific numbers noted herein are only examples: alternative implementations may employ differing values or ranges.
- The teaching of the disclosure provided herein can be applied to other systems, not necessarily the system described above. The elements and acts of the various embodiments described above can be combined to provide further embodiments.
- Any patents and applications and other references noted above, including any that may be listed in accompanying filing papers, are incorporated herein by reference. Aspects of the disclosure can be modified, if necessary, to employ the systems, functions, and concepts of the various references described above to provide yet further embodiments of the disclosure.
- These and other changes can be made to the disclosure in light of the above Detailed Description. While the above description describes certain embodiments of the disclosure, and describes the best mode contemplated, no matter how detailed the above appears in text, the teachings can be practiced in many ways. Details of the system may vary considerably in its implementation details, while still being encompassed by the subject matter disclosed herein. As noted above, particular terminology used when describing certain features or aspects of the disclosure should not be taken to imply that the terminology is being redefined herein to be restricted to any specific characteristics, features, or aspects of the disclosure with which that terminology is associated. In general, the terms used in the following claims should not be construed to limited the disclosure to the specific embodiments disclosed in the specification, unless the above Detailed Description section explicitly defines such terms. Accordingly, the actual scope of the disclosure encompasses not only the disclosed embodiments, but also all equivalent ways of practicing or implementing the disclosure under the claims.
Claims (30)
Priority Applications (10)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US13/829,925 US20140278403A1 (en) | 2013-03-14 | 2013-03-14 | Systems and methods for interactive synthetic character dialogue |
MX2015013070A MX2015013070A (en) | 2013-03-14 | 2014-03-07 | Systems and methods for interactive synthetic character dialogue. |
KR1020157029066A KR20160011620A (en) | 2013-03-14 | 2014-03-07 | Systems and methods for interactive synthetic character dialogue |
PCT/US2014/021650 WO2014159037A1 (en) | 2013-03-14 | 2014-03-07 | Systems and methods for interactive synthetic character dialogue |
EP14775160.6A EP2973550A4 (en) | 2013-03-14 | 2014-03-07 | Systems and methods for interactive synthetic character dialogue |
BR112015024561A BR112015024561A2 (en) | 2013-03-14 | 2014-03-07 | systems and methods for interactive dialogue of synthetic characteristics. |
AU2014241373A AU2014241373A1 (en) | 2013-03-14 | 2014-03-07 | Systems and methods for interactive synthetic character dialogue |
CN201480022536.1A CN105144286A (en) | 2013-03-14 | 2014-03-07 | Systems and methods for interactive synthetic character dialogue |
CA2906320A CA2906320A1 (en) | 2013-03-14 | 2014-03-07 | Systems and methods for interactive synthetic character dialogue |
SG11201507641WA SG11201507641WA (en) | 2013-03-14 | 2014-03-07 | Systems and methods for interactive synthetic character dialogue |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US13/829,925 US20140278403A1 (en) | 2013-03-14 | 2013-03-14 | Systems and methods for interactive synthetic character dialogue |
Publications (1)
Publication Number | Publication Date |
---|---|
US20140278403A1 true US20140278403A1 (en) | 2014-09-18 |
Family
ID=51531821
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/829,925 Abandoned US20140278403A1 (en) | 2013-03-14 | 2013-03-14 | Systems and methods for interactive synthetic character dialogue |
Country Status (10)
Country | Link |
---|---|
US (1) | US20140278403A1 (en) |
EP (1) | EP2973550A4 (en) |
KR (1) | KR20160011620A (en) |
CN (1) | CN105144286A (en) |
AU (1) | AU2014241373A1 (en) |
BR (1) | BR112015024561A2 (en) |
CA (1) | CA2906320A1 (en) |
MX (1) | MX2015013070A (en) |
SG (1) | SG11201507641WA (en) |
WO (1) | WO2014159037A1 (en) |
Cited By (34)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130297317A1 (en) * | 2012-04-16 | 2013-11-07 | Htc Corporation | Method for offering suggestion during conversation, electronic device using the same, and non-transitory storage medium |
US20150104080A1 (en) * | 2013-10-10 | 2015-04-16 | Elwha Llc | Methods, systems, and devices for obscuring entities depicted in captured images |
CN105740948A (en) * | 2016-02-04 | 2016-07-06 | 北京光年无限科技有限公司 | Intelligent robot-oriented interaction method and device |
US9799036B2 (en) | 2013-10-10 | 2017-10-24 | Elwha Llc | Devices, methods, and systems for managing representations of entities through use of privacy indicators |
US9965837B1 (en) | 2015-12-03 | 2018-05-08 | Quasar Blu, LLC | Systems and methods for three dimensional environmental modeling |
US10013564B2 (en) | 2013-10-10 | 2018-07-03 | Elwha Llc | Methods, systems, and devices for handling image capture devices and captured images |
US10102543B2 (en) | 2013-10-10 | 2018-10-16 | Elwha Llc | Methods, systems, and devices for handling inserted data into captured images |
US10185841B2 (en) | 2013-10-10 | 2019-01-22 | Elwha Llc | Devices, methods, and systems for managing representations of entities through use of privacy beacons |
US10311877B2 (en) | 2016-07-04 | 2019-06-04 | Kt Corporation | Performing tasks and returning audio and visual answers based on voice command |
WO2019161216A1 (en) | 2018-02-15 | 2019-08-22 | DMAI, Inc. | System and method for prediction based preemptive generation of dialogue content |
CN110196927A (en) * | 2019-05-09 | 2019-09-03 | 大众问问(北京)信息科技有限公司 | It is a kind of to take turns interactive method, device and equipment more |
US10540973B2 (en) | 2017-06-27 | 2020-01-21 | Samsung Electronics Co., Ltd. | Electronic device for performing operation corresponding to voice input |
CN110730953A (en) * | 2017-10-03 | 2020-01-24 | 谷歌有限责任公司 | Customizing interactive dialog applications based on creator-provided content |
US10565790B2 (en) | 2016-11-11 | 2020-02-18 | Magic Leap, Inc. | Periocular and audio synthesis of a full face image |
WO2020060151A1 (en) * | 2018-09-19 | 2020-03-26 | Samsung Electronics Co., Ltd. | System and method for providing voice assistant service |
US10607328B2 (en) | 2015-12-03 | 2020-03-31 | Quasar Blu, LLC | Systems and methods for three-dimensional environmental modeling of a particular location such as a commercial or residential property |
US10650816B2 (en) | 2017-01-16 | 2020-05-12 | Kt Corporation | Performing tasks and returning audio and visual feedbacks based on voice command |
US10681489B2 (en) * | 2015-09-16 | 2020-06-09 | Magic Leap, Inc. | Head pose mixing of audio files |
CN111274910A (en) * | 2020-01-16 | 2020-06-12 | 腾讯科技(深圳)有限公司 | Scene interaction method, device and electronic device |
USD888765S1 (en) * | 2018-06-05 | 2020-06-30 | Ernieapp Ltd. | Display screen or portion thereof with graphical user interface |
US10726836B2 (en) * | 2016-08-12 | 2020-07-28 | Kt Corporation | Providing audio and video feedback with character based on voice command |
JP2020522920A (en) * | 2017-06-06 | 2020-07-30 | インターナショナル・ビジネス・マシーンズ・コーポレーションInternational Business Machines Corporation | Edge caching for cognitive applications |
CN111801730A (en) * | 2017-12-29 | 2020-10-20 | 得麦股份有限公司 | System and method for artificial intelligence driven automated companion |
US10834290B2 (en) | 2013-10-10 | 2020-11-10 | Elwha Llc | Methods, systems, and devices for delivering image data from captured images to devices |
CN112309403A (en) * | 2020-03-05 | 2021-02-02 | 北京字节跳动网络技术有限公司 | Method and apparatus for generating information |
US11068043B2 (en) | 2017-07-21 | 2021-07-20 | Pearson Education, Inc. | Systems and methods for virtual reality-based grouping evaluation |
US11087445B2 (en) | 2015-12-03 | 2021-08-10 | Quasar Blu, LLC | Systems and methods for three-dimensional environmental modeling of a particular location such as a commercial or residential property |
US20210375023A1 (en) * | 2020-06-01 | 2021-12-02 | Nvidia Corporation | Content animation using one or more neural networks |
US11347051B2 (en) | 2018-03-16 | 2022-05-31 | Magic Leap, Inc. | Facial expressions from eye-tracking cameras |
US11354841B2 (en) * | 2019-12-26 | 2022-06-07 | Zhejiang University | Speech-driven facial animation generation method |
US11398041B2 (en) * | 2015-09-10 | 2022-07-26 | Sony Corporation | Image processing apparatus and method |
CN115240684A (en) * | 2022-06-30 | 2022-10-25 | 青牛智胜(深圳)科技有限公司 | Role recognition method and system for double-person conversation voice information |
US11699353B2 (en) | 2019-07-10 | 2023-07-11 | Tomestic Fund L.L.C. | System and method of enhancement of physical, audio, and electronic media |
US20240394077A1 (en) * | 2023-05-23 | 2024-11-28 | Hia Technologies, Inc. | Digital Character Interactions with Media Items in a Conversational Session |
Families Citing this family (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105719670B (en) * | 2016-01-15 | 2018-02-06 | 北京光年无限科技有限公司 | A kind of audio-frequency processing method and device towards intelligent robot |
CN105763420B (en) * | 2016-02-04 | 2019-02-05 | 厦门幻世网络科技有限公司 | A kind of method and device of automatic information reply |
CN105893771A (en) * | 2016-04-15 | 2016-08-24 | 北京搜狗科技发展有限公司 | Information service method and device and device used for information services |
JP6753707B2 (en) * | 2016-06-16 | 2020-09-09 | 株式会社オルツ | Artificial intelligence system that supports communication |
WO2018016095A1 (en) | 2016-07-19 | 2018-01-25 | Gatebox株式会社 | Image display device, topic selection method, topic selection program, image display method and image display program |
CN106297782A (en) * | 2016-07-28 | 2017-01-04 | 北京智能管家科技有限公司 | A kind of man-machine interaction method and system |
KR101889278B1 (en) * | 2017-01-16 | 2018-08-21 | 주식회사 케이티 | Public device and method for providing service in response to voice command, and public device for providing moving character in response to voice command |
CN106528137A (en) * | 2016-10-11 | 2017-03-22 | 深圳市天易联科技有限公司 | Method and apparatus for conversation with virtual role |
CN107066444B (en) * | 2017-03-27 | 2020-11-03 | 上海奔影网络科技有限公司 | Corpus generation method and apparatus based on multi-round interaction |
CN107330961A (en) * | 2017-07-10 | 2017-11-07 | 湖北燿影科技有限公司 | A kind of audio-visual conversion method of word and system |
CN107564510A (en) * | 2017-08-23 | 2018-01-09 | 百度在线网络技术(北京)有限公司 | A kind of voice virtual role management method, device, server and storage medium |
CN109427334A (en) * | 2017-09-01 | 2019-03-05 | 王阅 | A kind of man-machine interaction method and system based on artificial intelligence |
CN112334973B (en) * | 2018-07-19 | 2024-04-26 | 杜比国际公司 | Method and system for creating object-based audio content |
CN111190530A (en) * | 2018-11-15 | 2020-05-22 | 青岛海信移动通信技术股份有限公司 | Human-computer interaction method based on virtual character in mobile terminal and mobile terminal |
CN109448472A (en) * | 2018-12-19 | 2019-03-08 | 商丘师范学院 | A kind of tourism English simulation shows explanation platform |
CN109712627A (en) * | 2019-03-07 | 2019-05-03 | 深圳欧博思智能科技有限公司 | It is a kind of using speech trigger virtual actor's facial expression and the voice system of mouth shape cartoon |
CN110035325A (en) * | 2019-04-19 | 2019-07-19 | 广州虎牙信息科技有限公司 | Barrage answering method, barrage return mechanism and live streaming equipment |
KR102096598B1 (en) * | 2019-05-02 | 2020-04-03 | 넷마블 주식회사 | Method to create animation |
CN110648672A (en) * | 2019-09-05 | 2020-01-03 | 深圳追一科技有限公司 | Character image generation method, interaction method, device and terminal equipment |
CN111785104B (en) * | 2020-07-16 | 2022-03-04 | 北京字节跳动网络技术有限公司 | Information processing method and device and electronic equipment |
CN112991081A (en) * | 2021-05-17 | 2021-06-18 | 北京清奇科技有限公司 | Social contact method and system for option interaction |
CN118873937A (en) * | 2021-06-25 | 2024-11-01 | 网易(杭州)网络有限公司 | Display control method, device, electronic device and readable storage medium in game |
CN116453549B (en) * | 2023-05-05 | 2024-07-02 | 武汉嫦娥投资合伙企业(有限合伙) | AI dialogue method based on virtual digital character and online virtual digital system |
Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5930752A (en) * | 1995-09-14 | 1999-07-27 | Fujitsu Ltd. | Audio interactive system |
US20030041032A1 (en) * | 2000-03-31 | 2003-02-27 | Daniel Ballin | Systems for supply of information, services or products |
US20060069546A1 (en) * | 2002-11-22 | 2006-03-30 | Rosser Roy J | Autonomous response engine |
US20060155765A1 (en) * | 2004-12-01 | 2006-07-13 | Takeuchi Johane | Chat information service system |
US20060241936A1 (en) * | 2005-04-22 | 2006-10-26 | Fujitsu Limited | Pronunciation specifying apparatus, pronunciation specifying method and recording medium |
US20090013255A1 (en) * | 2006-12-30 | 2009-01-08 | Matthew John Yuschik | Method and System for Supporting Graphical User Interfaces |
US20110016004A1 (en) * | 2000-11-03 | 2011-01-20 | Zoesis, Inc., A Delaware Corporation | Interactive character system |
US8214216B2 (en) * | 2003-06-05 | 2012-07-03 | Kabushiki Kaisha Kenwood | Speech synthesis for synthesizing missing parts |
US8295468B2 (en) * | 2008-08-29 | 2012-10-23 | International Business Machines Corporation | Optimized method to select and retrieve a contact center transaction from a set of transactions stored in a queuing mechanism |
US8433573B2 (en) * | 2007-03-20 | 2013-04-30 | Fujitsu Limited | Prosody modification device, prosody modification method, and recording medium storing prosody modification program |
US20140032471A1 (en) * | 2012-07-25 | 2014-01-30 | Toytalk, Inc. | Artificial intelligence script tool |
US8719200B2 (en) * | 2006-06-29 | 2014-05-06 | Mycybertwin Group Pty Ltd | Cyberpersonalities in artificial reality |
Family Cites Families (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6526395B1 (en) * | 1999-12-31 | 2003-02-25 | Intel Corporation | Application of personality models and interaction with synthetic characters in a computing system |
US20020055844A1 (en) * | 2000-02-25 | 2002-05-09 | L'esperance Lauren | Speech user interface for portable personal devices |
US20040121812A1 (en) * | 2002-12-20 | 2004-06-24 | Doran Patrick J. | Method of performing speech recognition in a mobile title line communication device |
JP2005157494A (en) * | 2003-11-20 | 2005-06-16 | Aruze Corp | Conversation control device and conversation control method |
WO2006083020A1 (en) * | 2005-02-04 | 2006-08-10 | Hitachi, Ltd. | Audio recognition system for generating response audio by using audio data extracted |
US7697827B2 (en) * | 2005-10-17 | 2010-04-13 | Konicek Jeffrey C | User-friendlier interfaces for a camera |
US8766983B2 (en) * | 2006-05-07 | 2014-07-01 | Sony Computer Entertainment Inc. | Methods and systems for processing an interchange of real time effects during video communication |
US8924261B2 (en) * | 2009-10-30 | 2014-12-30 | Etsy, Inc. | Method for performing interactive online shopping |
US8949346B2 (en) * | 2010-02-25 | 2015-02-03 | Cisco Technology, Inc. | System and method for providing a two-tiered virtual communications architecture in a network environment |
US20120204120A1 (en) * | 2011-02-08 | 2012-08-09 | Lefar Marc P | Systems and methods for conducting and replaying virtual meetings |
US20130031476A1 (en) * | 2011-07-25 | 2013-01-31 | Coin Emmett | Voice activated virtual assistant |
-
2013
- 2013-03-14 US US13/829,925 patent/US20140278403A1/en not_active Abandoned
-
2014
- 2014-03-07 WO PCT/US2014/021650 patent/WO2014159037A1/en active Application Filing
- 2014-03-07 SG SG11201507641WA patent/SG11201507641WA/en unknown
- 2014-03-07 BR BR112015024561A patent/BR112015024561A2/en not_active Application Discontinuation
- 2014-03-07 KR KR1020157029066A patent/KR20160011620A/en not_active Withdrawn
- 2014-03-07 AU AU2014241373A patent/AU2014241373A1/en not_active Abandoned
- 2014-03-07 CN CN201480022536.1A patent/CN105144286A/en active Pending
- 2014-03-07 CA CA2906320A patent/CA2906320A1/en not_active Abandoned
- 2014-03-07 EP EP14775160.6A patent/EP2973550A4/en not_active Withdrawn
- 2014-03-07 MX MX2015013070A patent/MX2015013070A/en unknown
Patent Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5930752A (en) * | 1995-09-14 | 1999-07-27 | Fujitsu Ltd. | Audio interactive system |
US20030041032A1 (en) * | 2000-03-31 | 2003-02-27 | Daniel Ballin | Systems for supply of information, services or products |
US20110016004A1 (en) * | 2000-11-03 | 2011-01-20 | Zoesis, Inc., A Delaware Corporation | Interactive character system |
US20060069546A1 (en) * | 2002-11-22 | 2006-03-30 | Rosser Roy J | Autonomous response engine |
US8214216B2 (en) * | 2003-06-05 | 2012-07-03 | Kabushiki Kaisha Kenwood | Speech synthesis for synthesizing missing parts |
US20060155765A1 (en) * | 2004-12-01 | 2006-07-13 | Takeuchi Johane | Chat information service system |
US20060241936A1 (en) * | 2005-04-22 | 2006-10-26 | Fujitsu Limited | Pronunciation specifying apparatus, pronunciation specifying method and recording medium |
US8719200B2 (en) * | 2006-06-29 | 2014-05-06 | Mycybertwin Group Pty Ltd | Cyberpersonalities in artificial reality |
US20090013255A1 (en) * | 2006-12-30 | 2009-01-08 | Matthew John Yuschik | Method and System for Supporting Graphical User Interfaces |
US8433573B2 (en) * | 2007-03-20 | 2013-04-30 | Fujitsu Limited | Prosody modification device, prosody modification method, and recording medium storing prosody modification program |
US8295468B2 (en) * | 2008-08-29 | 2012-10-23 | International Business Machines Corporation | Optimized method to select and retrieve a contact center transaction from a set of transactions stored in a queuing mechanism |
US20140032471A1 (en) * | 2012-07-25 | 2014-01-30 | Toytalk, Inc. | Artificial intelligence script tool |
Cited By (52)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9685160B2 (en) * | 2012-04-16 | 2017-06-20 | Htc Corporation | Method for offering suggestion during conversation, electronic device using the same, and non-transitory storage medium |
US20130297317A1 (en) * | 2012-04-16 | 2013-11-07 | Htc Corporation | Method for offering suggestion during conversation, electronic device using the same, and non-transitory storage medium |
US10834290B2 (en) | 2013-10-10 | 2020-11-10 | Elwha Llc | Methods, systems, and devices for delivering image data from captured images to devices |
US10289863B2 (en) | 2013-10-10 | 2019-05-14 | Elwha Llc | Devices, methods, and systems for managing representations of entities through use of privacy beacons |
US9799036B2 (en) | 2013-10-10 | 2017-10-24 | Elwha Llc | Devices, methods, and systems for managing representations of entities through use of privacy indicators |
US10346624B2 (en) * | 2013-10-10 | 2019-07-09 | Elwha Llc | Methods, systems, and devices for obscuring entities depicted in captured images |
US10013564B2 (en) | 2013-10-10 | 2018-07-03 | Elwha Llc | Methods, systems, and devices for handling image capture devices and captured images |
US10102543B2 (en) | 2013-10-10 | 2018-10-16 | Elwha Llc | Methods, systems, and devices for handling inserted data into captured images |
US10185841B2 (en) | 2013-10-10 | 2019-01-22 | Elwha Llc | Devices, methods, and systems for managing representations of entities through use of privacy beacons |
US20150104080A1 (en) * | 2013-10-10 | 2015-04-16 | Elwha Llc | Methods, systems, and devices for obscuring entities depicted in captured images |
US11398041B2 (en) * | 2015-09-10 | 2022-07-26 | Sony Corporation | Image processing apparatus and method |
US11039267B2 (en) | 2015-09-16 | 2021-06-15 | Magic Leap, Inc. | Head pose mixing of audio files |
US11438724B2 (en) | 2015-09-16 | 2022-09-06 | Magic Leap, Inc. | Head pose mixing of audio files |
US11778412B2 (en) | 2015-09-16 | 2023-10-03 | Magic Leap, Inc. | Head pose mixing of audio files |
US12185086B2 (en) | 2015-09-16 | 2024-12-31 | Magic Leap, Inc. | Head pose mixing of audio files |
US10681489B2 (en) * | 2015-09-16 | 2020-06-09 | Magic Leap, Inc. | Head pose mixing of audio files |
US10607328B2 (en) | 2015-12-03 | 2020-03-31 | Quasar Blu, LLC | Systems and methods for three-dimensional environmental modeling of a particular location such as a commercial or residential property |
US10339644B2 (en) | 2015-12-03 | 2019-07-02 | Quasar Blu, LLC | Systems and methods for three dimensional environmental modeling |
US9965837B1 (en) | 2015-12-03 | 2018-05-08 | Quasar Blu, LLC | Systems and methods for three dimensional environmental modeling |
US11087445B2 (en) | 2015-12-03 | 2021-08-10 | Quasar Blu, LLC | Systems and methods for three-dimensional environmental modeling of a particular location such as a commercial or residential property |
US11798148B2 (en) | 2015-12-03 | 2023-10-24 | Echosense, Llc | Systems and methods for three-dimensional environmental modeling of a particular location such as a commercial or residential property |
US12299843B2 (en) | 2015-12-03 | 2025-05-13 | Echosense, Llc | Systems and methods for three-dimensional environmental modeling of a particular location such as a commercial or residential property |
CN105740948A (en) * | 2016-02-04 | 2016-07-06 | 北京光年无限科技有限公司 | Intelligent robot-oriented interaction method and device |
US10311877B2 (en) | 2016-07-04 | 2019-06-04 | Kt Corporation | Performing tasks and returning audio and visual answers based on voice command |
US10726836B2 (en) * | 2016-08-12 | 2020-07-28 | Kt Corporation | Providing audio and video feedback with character based on voice command |
US10565790B2 (en) | 2016-11-11 | 2020-02-18 | Magic Leap, Inc. | Periocular and audio synthesis of a full face image |
US11200736B2 (en) | 2016-11-11 | 2021-12-14 | Magic Leap, Inc. | Periocular and audio synthesis of a full face image |
US11636652B2 (en) | 2016-11-11 | 2023-04-25 | Magic Leap, Inc. | Periocular and audio synthesis of a full face image |
US10650816B2 (en) | 2017-01-16 | 2020-05-12 | Kt Corporation | Performing tasks and returning audio and visual feedbacks based on voice command |
JP2020522920A (en) * | 2017-06-06 | 2020-07-30 | インターナショナル・ビジネス・マシーンズ・コーポレーションInternational Business Machines Corporation | Edge caching for cognitive applications |
US11283894B2 (en) | 2017-06-06 | 2022-03-22 | International Business Machines Corporation | Edge caching for cognitive applications |
JP7067845B2 (en) | 2017-06-06 | 2022-05-16 | インターナショナル・ビジネス・マシーンズ・コーポレーション | Edge caching for cognitive applications |
US10540973B2 (en) | 2017-06-27 | 2020-01-21 | Samsung Electronics Co., Ltd. | Electronic device for performing operation corresponding to voice input |
US11068043B2 (en) | 2017-07-21 | 2021-07-20 | Pearson Education, Inc. | Systems and methods for virtual reality-based grouping evaluation |
CN110730953A (en) * | 2017-10-03 | 2020-01-24 | 谷歌有限责任公司 | Customizing interactive dialog applications based on creator-provided content |
CN111801730A (en) * | 2017-12-29 | 2020-10-20 | 得麦股份有限公司 | System and method for artificial intelligence driven automated companion |
WO2019161216A1 (en) | 2018-02-15 | 2019-08-22 | DMAI, Inc. | System and method for prediction based preemptive generation of dialogue content |
EP3753014A4 (en) * | 2018-02-15 | 2021-11-17 | DMAI, Inc. | System and method for prediction based preemptive generation of dialogue content |
CN112204654A (en) * | 2018-02-15 | 2021-01-08 | 得麦股份有限公司 | System and method for predictive-based proactive dialog content generation |
US11598957B2 (en) | 2018-03-16 | 2023-03-07 | Magic Leap, Inc. | Facial expressions from eye-tracking cameras |
US11347051B2 (en) | 2018-03-16 | 2022-05-31 | Magic Leap, Inc. | Facial expressions from eye-tracking cameras |
USD888765S1 (en) * | 2018-06-05 | 2020-06-30 | Ernieapp Ltd. | Display screen or portion thereof with graphical user interface |
US11848012B2 (en) | 2018-09-19 | 2023-12-19 | Samsung Electronics Co., Ltd. | System and method for providing voice assistant service |
WO2020060151A1 (en) * | 2018-09-19 | 2020-03-26 | Samsung Electronics Co., Ltd. | System and method for providing voice assistant service |
CN110196927A (en) * | 2019-05-09 | 2019-09-03 | 大众问问(北京)信息科技有限公司 | It is a kind of to take turns interactive method, device and equipment more |
US11699353B2 (en) | 2019-07-10 | 2023-07-11 | Tomestic Fund L.L.C. | System and method of enhancement of physical, audio, and electronic media |
US11354841B2 (en) * | 2019-12-26 | 2022-06-07 | Zhejiang University | Speech-driven facial animation generation method |
CN111274910A (en) * | 2020-01-16 | 2020-06-12 | 腾讯科技(深圳)有限公司 | Scene interaction method, device and electronic device |
CN112309403A (en) * | 2020-03-05 | 2021-02-02 | 北京字节跳动网络技术有限公司 | Method and apparatus for generating information |
US20210375023A1 (en) * | 2020-06-01 | 2021-12-02 | Nvidia Corporation | Content animation using one or more neural networks |
CN115240684A (en) * | 2022-06-30 | 2022-10-25 | 青牛智胜(深圳)科技有限公司 | Role recognition method and system for double-person conversation voice information |
US20240394077A1 (en) * | 2023-05-23 | 2024-11-28 | Hia Technologies, Inc. | Digital Character Interactions with Media Items in a Conversational Session |
Also Published As
Publication number | Publication date |
---|---|
BR112015024561A2 (en) | 2017-07-18 |
WO2014159037A1 (en) | 2014-10-02 |
AU2014241373A1 (en) | 2015-10-08 |
CA2906320A1 (en) | 2014-10-02 |
CN105144286A (en) | 2015-12-09 |
EP2973550A4 (en) | 2016-10-19 |
MX2015013070A (en) | 2016-05-10 |
KR20160011620A (en) | 2016-02-01 |
SG11201507641WA (en) | 2015-10-29 |
EP2973550A1 (en) | 2016-01-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20140278403A1 (en) | Systems and methods for interactive synthetic character dialogue | |
US11501480B2 (en) | Multi-modal model for dynamically responsive virtual characters | |
Ben-Youssef et al. | UE-HRI: a new dataset for the study of user engagement in spontaneous human-robot interactions | |
JP7069778B2 (en) | Methods, systems and programs for content curation in video-based communications | |
US20240290342A1 (en) | Automated Conversation Content Items from Natural Language | |
EP4027614A1 (en) | Automated messaging reply-to | |
US11148296B2 (en) | Engaging in human-based social interaction for performing tasks using a persistent companion device | |
US20150243279A1 (en) | Systems and methods for recommending responses | |
US20140036022A1 (en) | Providing a conversational video experience | |
US20200357382A1 (en) | Oral, facial and gesture communication devices and computing architecture for interacting with digital media content | |
EP3164806A1 (en) | Systems and methods for assessing, verifying and adjusting the affective state of a user | |
WO2022229834A1 (en) | Artificial intelligence (ai) based automated conversation assistance system and method thereof | |
US12395369B2 (en) | Systems and methods for decentralized generation of a summary of a virtual meeting | |
US20240291892A1 (en) | Systems and methods for recommending interactive sessions based on social inclusivity | |
US20250071157A1 (en) | Intelligent reporting within online communities | |
US12033258B1 (en) | Automated conversation content items from natural language | |
EP4605855A1 (en) | Virtual ai assistant for virtual meetings | |
US20240303891A1 (en) | Multi-modal model for dynamically responsive virtual characters | |
CN113301352A (en) | Automatic chat during video playback | |
US20240256711A1 (en) | User Scene With Privacy Preserving Component Replacements | |
CN114449297B (en) | Multimedia information processing method, computing device and storage medium | |
CN115767194A (en) | Live broadcast method, device and terminal of virtual digital object | |
WO2013181633A1 (en) | Providing a converstional video experience | |
EP4395242A1 (en) | Artificial intelligence social facilitator engine | |
JP7698810B1 (en) | Information processing device, information processing method, and program |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: TOYTALK, INC., CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:JACOB, OREN M.;REDDY, MARTIN;IVES, LUCAS R.A.;AND OTHERS;REEL/FRAME:030477/0172 Effective date: 20130417 |
|
AS | Assignment |
Owner name: PULLSTRING, INC., CALIFORNIA Free format text: CHANGE OF NAME;ASSIGNOR:TOYTALK, INC.;REEL/FRAME:038589/0639 Effective date: 20160407 |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- AFTER EXAMINER'S ANSWER OR BOARD OF APPEALS DECISION |
|
AS | Assignment |
Owner name: CHATTERBOX CAPITAL LLC, CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:PULLSTRING, INC.;REEL/FRAME:050670/0006 Effective date: 20190628 |