WO2018186821A1

WO2018186821A1 - Displaying visual cues on speakers

Info

Publication number: WO2018186821A1
Application number: PCT/US2017/025695
Authority: WO
Inventors: David H. Hanes
Original assignee: Hewlett Packard Development Co LP
Current assignee: Hewlett Packard Development Co LP
Priority date: 2017-04-03
Filing date: 2017-04-03
Publication date: 2018-10-11
Anticipated expiration: 2019-10-03

Abstract

In example implementations, a method for displaying a visual cue and an apparatus for performing the same is provided. The method is performed by a processor of a speaker. The method includes receiving a voice command. The voice command is transmitted to a linked host computer that executes a digital voice controlled assistant. A visual cue that is generated by the digital voice controlled assistant in response to the voice command is received by the processor of the speaker. The visual cue is converted in accordance with a display of the speaker and the visual cue that is converted is displayed on the display.

Description

DISPLAYING VISUAL CUES ON SPEAKERS

BACKGROUND

[0001] Technology-based, or smart, devices are being used within the home. The smart devices may include a smartphone, a tablet computer, a desktop computer, or a smart television that can perform different tasks using voice control. For example, a user can speak to the smart device to perform a task. The smart device may be centrally located within a user's home. The user may speak to the smart device to activate a voice control as described above. By speaking to the smart device the user may obtain certain information or perform a task without having to grab the device out of his or her pocket or looking at a display on the device.

[0002] The tasks may include personal assistant type functions to check "to- do" items, appointments on a calendar, obtain travel times, check the weather, obtain the latest news, and the like. Other tasks may include turning lights on in the house, adjusting a thermostat, and the like.

BRIEF DESCRIPTION OF THE DRAWINGS

[0003] FIG. 1 is a block diagram of an example system of the present disclosure;

[0004] FIG. 2 is a block diagram of an example speaker device of the present disclosure;

[0005] FIG. 3 is a block diagram of an example flow diagram of a method for displaying visual cues on a speaker; and

[0006] FIG. 4 is an example non-transitory computer readable medium storing instructions executed by a processor of the present disclosure.

DETAILED DESCRIPTION

[0007] The present disclosure relates to a speaker for displaying visual cues from a voice controlled digital assistant and methods for performing the same. As discussed above, users can speak to a smart device to perform a task. The smart device may be centrally located within a user's home. The user may speak to the smart device to activate a voice control as described above. By speaking to the smart device the user may obtain certain information or perform a task without having to grab the device out of his or her pocket or looking at a display on the device.

[0008] The tasks may include personal assistant type functions to check "to- do" items, appointments on a calendar, obtain travel times, check the weather, obtain the latest news, and the like. Other tasks may include turning lights on in the house, adjusting a thermostat, and the like.

[0009] The present disclosure provides a speaker that is modified with a display. The speaker may be communicatively connected to a host device that executes the voice controlled digital assistant. The speaker may allow the features of the voice controlled digital assistant to be extended beyond a range of the host device.

[0010] For example, the speaker may be located in a bedroom upstairs where the host device may be in another room downstairs. The speaker may allow the user to leverage the capabilities of the voice controlled digital assistant from various different remote locations.

[0011] In addition, the display may allow the user to see a visual cue that is displayed by the voice controlled digital assistant. For example, the visual cue may indicate to a user that the voice controlled digital assistant was successfully activated and is ready to receive a voice command from the user. Without the visual cue, it may be difficult for the user to determine whether the voice controlled digital assistant was successfully activated by a voice command received by the speaker.

[0012] FIG. 1 illustrates a block diagram of a system 100 of the present disclosure. In one implementation, the system 100 may include a speaker 102 and a host computer 106. The speaker 102 may be communicatively coupled to the host computer 106 via a two-way communications channel that is established over a communications network 1 10. The communications network 1 10 may be either a wired, or wireless, Internet protocol (IP) network. In some implementations, the communications network 1 10 may be a local area network (LAN) within a home or building (e.g., a Wi-Fi network within the home or the building).

[0013] In one example, the host computer 106 may include a digital voice controlled assistant 108 that is executed by the host computer 106. For example, the digital voice controlled assistant 108 may provide various information or execute tasks using voice commands from a user. For example, the digital voice controlled assistant 108 may provide information such as appointments on a user's calendar, answers to questions, travel information, weather, traffic updates, and the like. The digital voice controlled assistant 108 may also perform tasks such as drafting a message based on dictation, operating wireless devices connected to the host computer 106, initiating phone calls, and the like.

[0014] In one example, the speaker 102 may be located remotely from the host computer 106. For example, "remotely" may be defined as being at a distance that is greater than an audible range of a microphone of the host computer 106. In other words, the speaker 102 may be located where a voice of a user cannot be heard by a microphone of the host computer 106 that may be used to activate the digital voice controlled assistant 108.

[0015] Although a single speaker 102 is illustrated in FIG. 1 , it should be noted that a plurality of speakers 102 may be deployed at different locations within a house or a building. As a result, the speaker 102 may allow a user to leverage the capabilities of the digital voice controlled assistant 108 throughout a house or a building, even though the user may be outside of a range of audible detection of the host computer 106.

[0016] In one example, the speaker 102 may be modified to include a display 104. The display 104 may display a visual cue that may be generated by the digital voice controlled assistant 108. The visual cue may provide visual confirmation that the digital voice controlled assistant 108 was successfully activated and is awaiting a voice input from the user. For example, the visual cue may be a waving line, a moving circle, a text that corresponds to an audible response from the digital voice controlled assistant 108, a new pop up dialogue box, and the like.

[0017] The visual cue that is displayed on the display 104 may be dependent on the capabilities or specifications of the display 104. For example, the speaker 102 may identify a minimum amount of information that can be used to convey the visual cue. The speaker 102 may then display the minimum amount of information on the display 104. For example, the display 104 may be much smaller than a monitor, or a display, that is used with the host computer 106. As a result, the display 104 may not be large enough to display all of the visual cue. Thus, the minimum amount of information identified by the speaker 102 may be shown on the display 104.

[0018] In addition, the visual cue that is shown on the display 104 may depend on a type of display 104 that is deployed. For example, the display 104 may be a text display (e.g., a liquid crystal display (LCD), or a light emitting diode (LED) array that has a scrolling text display). For the text display, the speaker 102 may convert visual cues into text and display the text on the display 104.

[0019] In another example, the display 104 may be a red, green, blue (RGB) display. For example, the display 104 may be a color LCD or LED display. In this example, the speaker 102 may format the visual cue to be displayed on the display 104. For example, the formatting may include reducing, or cropping, a size of the visual cue from how the visual cue is displayed on a monitor of the host computer 106. In another example, the formatting may include capturing a portion of the visual cue that is displayed on the host computer 106. In other words, a subset or sub-image of the entire visual cue that is displayed on the host computer 106 may be displayed on the display 104.

[0020] In another example, the display 104 may be an e-ink display that is black and white. In this example, the speaker 102 may format the visual cue to convert an image that was in color to an image that is in grayscale or halftone that can be displayed on the display 104. The speaker 102 may remove some graphical images if the e-ink display is incapable of generating some graphical images.

[0021] FIG. 2 illustrates a block diagram of the speaker 102. In one implementation, the speaker 102 may include a processor 202, a microphone 204, a communications device 206 and the display 104. It should be noted that the speaker 102 has been simplified for ease of explanation. For example, the speaker 102 may include additional components not shown, such as an audio speaker or audio output device, interface for different output connections, and the like.

[0022] In one example, the microphone 204 may receive a voice command from a user. The voice command may be captured and stored on a computer readable storage medium of the speaker 102 such that the voice command may be processed by the processor 202.

[0023] The communications device 206 may be a network adaptor that uses an Ethernet cable or a wireless network adapter that can communicate over a WiFi network. The communications device 206 may establish a two-way communication path to the host computer. The two-way communication path may be used to transmit the voice command that is received to the host computer 106. The two-way communication path may also be used to receive a visual cue from the host computer that is generated in response to the voice command by the digital voice controlled assistant 108 executed by the host computer 106.

[0024] The processor 202 may be in communication with the microphone 204, the communications device 206 and the display 104. The processor 202 may include a graphical processing unit (GPU) or graphical processing capabilities that may be used to convert or format the visual cues received from the host computer 106 into a converted visual cue for display on the display 104. In one example, the GPU and the processor 202 may be separate devices in the speaker 102.

[0025] As noted above, the display 104 may be any type of display. For example, the display 104 may be a text display that displays the converted visual cue as text. The display 104 may be a graphical display that displays the converted visual cue as a sub-image. The sub-image may be a portion of the entire graphical user interface image generated by the host computer 106 that includes the visual cue. For example, if the visual cue is a bottom left hand corner of the entire graphical user interface image generated by the host computer 106, the sub-image may be a cropped portion of the left hand corner of the entire graphical user interface image that includes the visual cue.

[0026] The display 104 may be a graphical display that displays the converted visual cue as a minimum amount of the visual cue generated by the host computer 106 that conveys the visual cue. For example, the visual cue generated by the host computer 106 may include a flashing background, with an animated character and text that corresponds to an audible response. The minimum amount of information to convey the visual cue may be the text of the audible response. In another example, the minimum amount of information to convey the visual cue may be an animated icon associated with the digital voice controlled assistant 108. As a result, the converted visual cue may be text and the text may be displayed on the display 104.

[0027] As a result, the speaker 102 may be used to extend the range of the digital voice controlled assistant 108 throughout a house or a building. The user may provide a voice command to the speaker 102. The speaker 102 may transmit the voice command over the communication network 1 10 to the host computer 106. The host computer 106 may receive the voice command and activate the digital voice controlled assistant 108 in response to the voice command. The host computer 106 may generate a visual cue in response to the voice command that indicates the digital voice controlled assistant 108 was activated and is ready to receive another voice command. The visual cue may be transmitted to the speaker 102 over the communication network 1 10. The visual cue may be converted, or formatted, into a converted visual cue for display on the display 104. The user may see the converted visual cue on the display 104 and then proceed to interact with the digital voice controlled assistant 108. The user may interact with the digital voice controlled assistant 108 even though the user is outside of an audible range of a microphone on the host computer 106.

[0028] FIG. 3 illustrates a flow diagram of an example method 300 for displaying visual cues on speakers. In one example, the method 300 may be performed by the speaker 102 or an apparatus 400 described below and illustrated in FIG. 4.

[0029] At block 302, the method 300 begins. At block 304, the method 300 receives a voice command. For example, a microphone of the speaker may capture the voice command from a user. The voice command may be stored in a computer readable storage medium of the speaker.

[0030] At block 306, the method 300 transmits the voice command to a linked host computer, wherein the linked host computer executes a digital voice controlled assistant. In one example, the voice command may be retrieved from the computer readable storage medium of the speaker and transmitted over a communication path to the linked host computer. In another example, the voice command may be transmitted directly from the microphone to the linked host computer without saving the voice command in a computer readable storage medium.

[0031] In one example, the speaker that receives the voice command may be located remotely from the linked host computer. In other words, the speaker may be located outside of an audible range that can be detected by the linked host computer. For example, the speaker may be located in separate rooms of a house or building from the linked host computer.

[0032] In one implementation, the communication path to the linked host computer may be established before the voice command is received in block 304. For example, the communication path may be established over a local area network (e.g., a Wi-Fi connection, a local Ethernet connection, a

Bluetooth® connection, and the like) during an initial set-up process to connect the speaker to the linked host computer.

[0033] At block 308, the method 300 receives a visual cue that is generated in response to the voice command by the digital voice controlled assistant. In one example, the linked host computer may receive the voice command and activate the digital voice controlled assistant in response to the voice command. For example, the voice command may be a word or phrase that is used to "wake" the digital voice controlled assistant on the linked host computer.

[0034] The linked host computer may generate and display a visual cue on a local display of the linked host computer when the digital voice controlled assistant is activated. However, since the user is located near the speaker that is located remotely from the linked host computer, the user may not see the visual cue.

[0035] In one implementation, the visual cue that is generated may be transmitted over the communication path to the speaker. Notably, the visual cue that is generated is not modified by the linked host computer. Rather, the visual cue is transmitted "as-is" (e.g., without modification) to the speaker. In other examples, the host computer may convert the format of the visual cue to be compatible with the speaker rather than having the speaker perform the conversion.

[0036] At block 310, the method 300 converts the visual cue in accordance with a display of the speaker. For example, the display of the speaker may be much smaller than the local display associated with the linked host computer. As a result, the display of the speaker may not be able to display the entire visual cue that was received "as-is." Thus, a processor or a graphical processing unit in the speaker may translate, convert, or format the visual cue into a converted visual cue that is used for the display.

[0037] In one example, the translation, conversion or the formatting may be based on specifications or capabilities of the display. For example, the display may be a text display that cannot display graphical images. As a result, a visual cue that is a graphical image may be translated into text. For example, an image of a spinning circle may be displayed on the display of the speaker as "a spinning circle is being displayed" or something similar.

[0038] In one example, the display may be a graphical display. For example, the graphical display may be a color display, but a smaller display than the local display of the linked host computer. As a result, the entire graphic user interface image of the visual cue from the linked host computer may be cropped into a sub-image that includes the visual cue. For example, the sub-image may include a portion or a subset of the entire graphical user interface image of the visual cue.

[0039] In another example, the display may be a low resolution black and white display. As a result, some graphical images generated by the linked host computer may not be properly displayed on the display. As a result, a minimum amount of the visual cue that conveys the visual cue may be identified and displayed. For example, if the visual cue is an animated image with moving background images, a non-animated version of the image without the moving background images may be displayed on the display.

[0040] At block 312, the method 300 displays the visual cue that is converted on the display of the speaker. For example, the converted visual cue that is displayed may notify a user that the digital voice controlled assistant was successfully activated on the linked host computer and that the digital voice controlled assistant is ready to receive another voice command, or for interaction with the user. At block 314, the method 300 ends.

[0041] FIG. 4 illustrates an example of an apparatus 400. In one example, the apparatus 400 may be the speaker 100. In one example, the apparatus 400 may include a processor 402 and a non-transitory computer readable storage medium 404. The non-transitory computer readable storage medium 404 may include instructions 406, 408, 410, 412 and 414 that when executed by the processor 402, cause the processor 402 to perform various functions.

[0042] In one example, the instructions 406 may include instructions to record a voice command. The instructions 408 may include instructions to transmit the voice command to a remotely located computer. The instructions 410 may include instructions to receive a visual cue that is generated by a digital voice controlled assistant in response to the voice command, wherein the digital voice controlled assistant is executed by the remotely located computer. The instructions 412 may include instructions to format the visual cue in accordance with a display of the speaker. The instructions 414 may include instructions to display the visual cue that is formatted on the display of the speaker. [0043] It will be appreciated that variants of the above-disclosed and other features and functions, or alternatives thereof, may be combined into many other different systems or applications. Various presently unforeseen or unanticipated alternatives, modifications, variations, or improvements therein may be subsequently made by those skilled in the art which are also intended to be encompassed by the following claims.

Claims

1 . A method, comprising:

receiving, by a processor of a speaker, a voice command;

transmitting, by the processor, the voice command to a linked host computer, wherein the linked host computer executes a digital voice controlled assistant;

receiving, by the processor, a visual cue that is generated in response to the voice command by the digital voice controlled assistant;

converting, by the processor, the visual cue in accordance with a display of the speaker; and

displaying, by the processor, the visual cue that is converted on the display of the speaker.

2. The method of claim 1 , comprising:

establishing, by the processor, a communication path to the linked host computer before the receiving the voice command.

3. The method of claim 1 , wherein the speaker is located outside of an audible range of a microphone in the linked host computer.

4. The method of claim 1 , wherein the converting comprises:

translating, by the processor, the visual cue into text, wherein the visual cue that is displayed is the text.

5. The method of claim 1 , wherein the converting comprises:

receiving, by the processor, an entire graphical user interface image of the visual cue generated by the linked host computer; and

cropping the entire graphical user interface image to a sub-image that includes the visual cue, wherein the sub-image that includes the visual cue is displayed.

6. The method of claim 1 , wherein the converting comprises:

identifying, by the processor, a minimum amount of the visual cue that conveys the visual cue, wherein the minimum amount of the visual cue is displayed.

7. A speaker, comprising:

a microphone to receive a voice command;

a communication device to establish a two-way communication path to a computer to transmit the voice command to the computer and receive a visual cue that is generated in response to the voice command by a digital voice controlled assistant executed by the computer;

a display to display a converted visual cue; and

a processor in communication with the microphone, the communication device and the display to control the microphone, the communication device and the display, wherein the processor converts the visual cue into the converted visual cue.

8. The speaker of claim 7, comprising:

a computer readable storage medium to record the voice command that is received.

9. The speaker of claim 7, wherein the speaker is located outside of an audible range of a microphone in the computer.

10. The speaker of claim 7, wherein the display comprises a text display that displays the converted visual cue comprising text.

1 1 The speaker of claim 7, wherein the display comprises a graphical display that displays the converted visual cue comprising a sub-image that includes the visual cue, wherein the sub-image is a portion that is cropped from an entire graphical user interface image of the visual cue generated by the computer.

12. The speaker of claim 7, wherein the display comprises a graphical display that displays the visual cue comprising a minimum amount of the visual cue that conveys the visual cue, wherein the minimum amount of the visual cue is displayed.

13. A non-transitory computer readable storage medium encoded with instructions executable by a processor of a speaker, the non-transitory computer-readable storage medium comprising:

instructions to record a voice command;

instructions to transmit the voice command to a remotely located computer;

instructions to receive a visual cue that is generated by a digital voice controlled assistant in response to the voice command, wherein the digital voice controlled assistant is executed by the remotely located computer;

instructions to format the visual cue in accordance with a display of the speaker; and

instructions to display the visual cue that is formatted on the display of the speaker.

14. The non-transitory computer readable storage medium of claim 13, wherein the instructions to format comprises:

instructions to translate the visual cue into text.

15. The non-transitory computer readable storage medium of claim 13, wherein the instructions to format comprises:

instructions to generate an image formatted for the display that contains a minimum amount of the visual cue that conveys the visual cue.