HK1159805B

HK1159805B - Multi-tiered voice feedback in an electronic device

Info

Publication number: HK1159805B
Application number: HK12100103.0A
Authority: HK
Inventors: J‧E‧马森; J‧波特科尔
Original assignee: 苹果公司
Priority date: 2008-09-05
Filing date: 2009-09-01
Publication date: 2015-02-18

Description

Multi-tier voice feedback in electronic devices

Technical Field

The present disclosure is directed to providing multi-tier voice feedback in an electronic device.

Background

Many electronic devices provide a large number of functions or operations that a user may use. The number of functions or operations available often exceeds the number of inputs available with the input mechanisms of the electronic device. In order for a user to be able to operate with an electronic device that does not specifically rely on a particular input (e.g., an input that is not associated with a key sequence or button press, such as the MENU button on an iPod available from Apple inc.), the electronic device may provide a MENU with selectable options that are associated with the operation of the electronic device. For example, in response to receiving an input associated with a MENU from an input mechanism (e.g., a MENU button), the electronic device may display the MENU with selectable options on the display.

Since menus are typically displayed on the electronic device display, the user may be required to look at the display to select a particular option. Sometimes this is not desirable. For example, if a user wishes to conserve power (e.g., power in a portable electronic device), then requiring the electronic device to display a menu and move the user-manipulated highlight region to provide a selection may drain power. As another example, if the user is in a dark environment and the display does not include a backlight, the user cannot distinguish the display options of the menu. As yet another example, if the user is blind or visually impaired, the user may not be able to view the displayed menu.

To overcome this problem, some systems may provide audio FEEDBACK in response to detecting a user input or a change in battery state, as described in commonly assigned U.S. patent publication No.2008/0129520 entitled "ELECTRONIC DEVICE WITH enthancedaudio FEEDBACK" (attorney docket No. p4250us1), which is incorporated herein by reference in its entirety. In some cases, the electronic device may provide voice feedback that describes user-selectable options or operations that the user may instruct the electronic device to perform. If several menus are displayed simultaneously, or if the display comprises different modules or display areas (e.g. several views), it is difficult for the electronic device to determine the objects or menu options, or the order of the objects or menu options, for which voice feedback is provided.

Disclosure of Invention

The present invention is directed to systems and methods for providing multi-tier (multi-tipped) voice feedback to a user. In particular, the present invention is directed to providing voice feedback for a number of display objects (e.g., menu items) in a predetermined order (e.g., based on a hierarchy associated with each display object).

In some embodiments, a method, electronic device, and computer-readable medium for providing voice feedback to a user of an electronic device may be provided. The electronic device may display a number of elements and identify at least two of the elements for which voice feedback is provided. The electronic device can determine a hierarchy associated with the display of each identified element, wherein the hierarchy defines a relative importance of each displayed element. The electronic device can then provide voice feedback for the identified elements in the order of the determined hierarchy, e.g., such that voice feedback is provided for the most important element first, followed by voice feedback for the second most important element until voice feedback is provided for each element.

In some embodiments, methods, electronic devices, and computer-readable media may be provided that provide audio feedback for displayed content. The electronic device may instruct the display to display a number of elements, wherein speakable properties are associated with at least two of the elements. The electronic device can determine a hierarchy associated with each of the at least two elements and generate a queue including the at least two elements. The determined hierarchy may set the order of the elements in the generated queue. The electronic device may instruct the audio output to recite each queue element in turn in the order of the queue, wherein the audio output includes voice feedback associated with each of the at least two elements.

In some embodiments, a method, electronic device, and computer-readable medium may be provided for speaking text of an element displayed by an electronic device. The electronic device may display several elements associated with the speakable property. The speakable property may identify the text to speak for each element. The electronic device may display several elements in several views, where each view is associated with a speakable order. The electronic device may generate a queue including the number of elements, where the order of the elements in the queue is set according to the speakable order of each view (e.g., such that the element with the higher speakable order is at the head of the queue). The electronic device may wait for a first timeout period to elapse and identify an audio file associated with each element of the queue. Within the first timeout period, the electronic device may modify the audio playback to make the speech easier to hear and to avoid the electronic device reading aloud when a transaction is detected. The audio file may include spoken speakable attribute text to speak for each element. The electronic device may replay the identified audio files sequentially in the order of the queue and pause for a second timeout. The second timeout allows the electronic device to return the audio playback to the pre-speakable configuration (e.g., music playback). In some embodiments, the electronic device may receive an audio file from a host device that generates the audio file with a text-to-speech engine according to the speakable property text to speak for each element.

Drawings

The above and other features of the present invention, its nature and various advantages will be more apparent from the following detailed description taken in conjunction with the accompanying drawings in which:

FIG. 1 is a schematic diagram of an electronic device in accordance with one embodiment of the invention;

FIG. 2 is a schematic diagram of an illustrative display screen having content to which voice feedback may be applied in accordance with one embodiment of the invention;

FIG. 3 is a schematic diagram of an illustrative queue of speakable items for playback associated with the display of FIG. 2 in accordance with one embodiment of the invention;

FIG. 4 is a schematic illustration of an electronic device display after receiving a user selection of an option of the display of FIG. 2 in accordance with an embodiment of the present invention;

FIG. 5 is a schematic diagram of an illustrative queue of speakable items for playback associated with the display of FIG. 4 in accordance with one embodiment of the invention;

FIG. 6 is a schematic view of the electronic device display of FIG. 4 with different tagging options, in accordance with one embodiment of the present invention;

FIG. 7 is a schematic diagram of an illustrative queue of speakable items for playback associated with the display of FIG. 6 in accordance with one embodiment of the invention;

FIG. 8 is a schematic illustration of an electronic device display provided in response to a user selecting the highlighted menu option of FIG. 6, in accordance with one embodiment of the invention;

FIG. 9 is a schematic diagram of an illustrative queue of speakable items for playback associated with the display of FIG. 8 in accordance with one embodiment of the invention;

FIG. 10 is a schematic diagram of an illustrative "now playing" display in accordance with one embodiment of the present invention;

FIG. 11 is a schematic diagram of an illustrative queue of speakable items for a now playing display in accordance with one embodiment of the invention;

FIG. 12 is an illustrative state diagram for speaking a speakable string (spoken string), in accordance with one embodiment of the invention;

FIG. 13 is a schematic diagram of an illustrative communication system including an electronic device and a host device in accordance with one embodiment of the present invention;

FIG. 14 is a flow diagram of an illustrative process for providing a static string to an electronic device; and

FIG. 15 is a flow diagram of an illustrative process for providing dynamic strings to an electronic device.

Detailed Description

An electronic device is provided that provides selective voice feedback based on the various tiers associated with displayed options.

The electronic device may include a processor and a display. The electronic device may display any suitable information to the user. For example, the display may include a title bar, a menu with selectable options, an information area that displays information associated with one or more options, information identifying media or files available for selection, or any other suitable information. When the user uses the display, the electronic device may provide voice feedback for different display elements.

Each display element may be associated with a different attribute. In some embodiments, the display elements for which voice feedback is to be provided may be associated with speakable properties. The speakable property may include the text to speak for the associated element. Additionally, as part of the view implemented to display each element, the element may be associated with a speakable order or tier. When the electronic device displays elements (e.g., as part of a view), the electronic device may determine the text for which voice feedback is provided (e.g., the text to be spoken) and the order or hierarchy associated with each element based on the speakable properties and the speakable order. The electronic device may select the element with the highest hierarchy and provide voice feedback (e.g., speaks) for the selected element. The electronic device may then successively select each element having the next highest level and provide voice feedback for subsequent elements in the hierarchical order (e.g., using a queue in which the order of the elements is set according to the hierarchy associated with each element). When the electronic device provides voice feedback, elements that do not include a speakable property or speakable order (e.g., elements for which voice feedback is not provided) may be ignored or skipped by the electronic device.

The electronic device may determine which element to speak at a particular time using any suitable method. In some embodiments, the electronic device may provide voice feedback in response to detecting a transaction (transaction) (e.g., a decision as to what elements can be spoken). For example, the electronic device may detect a transaction in response to determining that the display has transitioned, or in response to receiving a user action that causes the display to change (e.g., the user selecting an option, or moving a highlight region). In response to detecting the transaction, the electronic device may identify speakable elements of the updated display and a tier associated with the speakable elements (e.g., elements to speak sequentially within the transaction). The electronic device may then create a new queue of elements for which voice feedback is to be provided based on the identified elements that update the display, and provide voice feedback based on the newly created queue. In some embodiments, a new queue may be constructed by replacing the same or lower level items of the existing queue that are not read. The particular elements recited and the order in which the elements are recited may vary with each transaction.

Any suitable method may be utilized to generate an audio file that is played back in response to receiving an instruction to provide voice feedback for a particular element displayed. In some embodiments, to provide high quality audio using a text-to-speech (TTS) engine, an audio file may be received from a host device connected to the electronic device. This approach is particularly desirable if the resources of the electronic device are limited (e.g., inherent storage, processing, and power limitations caused by the portability of the electronic device). The electronic device may provide a file to the host device that enumerates the strings associated with each element to be spoken by the electronic device. The host device may then convert the character string to speech using a text-to-speech engine and provide an audio file of the speech to the electronic device. The electronic device may then reference the mapping of the character strings to audio files to provide an appropriate audio file for playback in response to determining that voice feedback for the displayed elements is to be provided.

FIG. 1 is a schematic diagram of an electronic device according to one embodiment of the invention. The electronic device 100 may include a processor 102, a storage 104, a memory 106, an input mechanism 108, an audio output 110, a display 112, and communication circuitry 114. In some embodiments, one or more of the electronic device components 100 may be combined or omitted (e.g., the storage device 104 and the memory 106 may be combined). In some embodiments, electronic device 100 may include other components (e.g., power supplies or buses) not combined or included in those components shown in FIG. 1, or several instances of the components shown in FIG. 1. For simplicity, only one of each component is shown in FIG. 1.

Processor 102 may include any processing circuitry for controlling the operation and performance of electronic device 100. For example, the processor 102 may be used to run an operating system application, a firmware application, a media playback application, a media editing application, or any other application. In some embodiments, the processor may drive the display and process inputs received from the user interface.

For example, storage 104 may include one or more storage media including a hard disk drive, a solid state drive, flash memory, permanent memory such as ROM, any other suitable type of storage component, or any combination thereof. For example, storage 104 may store media data (e.g., music and video files), application data (e.g., for implementing various functions on device 100), firmware, user preference information data (e.g., media playback preferences), authentication information (e.g., a library of data associated with authorized users), lifestyle information data (e.g., food preferences), fitness information data (e.g., information obtained by fitness monitoring equipment), transaction information data (e.g., information such as credit card information), wireless connection information data (e.g., information that enables electronic device 100 to establish a wireless connection), subscription information data (e.g., information that records podcasts or television shows or other media subscribed to by the user), contact information data (e.g., telephone numbers and email addresses), calendar information data, and any other suitable data, or any combination thereof.

Memory 106 may include cache memory, semi-permanent memory such as RAM, and/or one or more different types of memory for temporarily storing data. In some embodiments, the memory 106 may also be used to store data for operating electronic device applications, or any other type of data that may be stored in the storage 104. In some embodiments, memory 106 and storage 104 may be combined into a single storage medium.

The input mechanism 108 may provide input to input/output circuitry of the electronic device. The input mechanism 108 may include any suitable input mechanism, such as, for example, a button, a keypad, a dial, a click wheel, or a touch screen. In some embodiments, electronic device 100 may include a capacitive sensing mechanism, or a multi-touch capacitive sensing mechanism. Sensing mechanisms are described in commonly owned U.S. patent application No.10/902,964 entitled "Gestures for Touch Sensitive Input Device" at 10.2004 and U.S. patent application No.11/028,590 entitled "model-based graphical User Interfaces for Touch Sensitive Input Device" at 18.2005, both of which are hereby incorporated by reference in their entirety.

The audio output 110 may include one or more speakers (e.g., a mono speaker or a stereo speaker) built into the electronic device 100, or an audio connector (e.g., an audio jack or suitable bluetooth connection) coupled with an audio output mechanism. For example, the audio output 110 may provide audio data to a headset, headphones, or earpieces using a wired or wireless connection.

Display 112 may include display circuitry (e.g., a screen or a projection system) for providing a display viewable by a user. For example, display 112 may include a screen (e.g., an LCD screen) incorporated into electronic device 100. As another example, display 112 may include a removable display or a projection system (e.g., a video projector) that provides for the display of content on a surface remote from electronic device 100. In some embodiments, display 112 may include an encoder/decoder (codec) to convert digital media data into an analog signal. For example, display 112 (or other suitable circuitry within electronic device 100) may include a video codec, an audio codec, or any other suitable type of codec.

The display 112 may also include display driver circuitry, circuitry for driving display drivers, or both. Under the direction of processor 102, display 112 may display content (e.g., media playback information, an application screen for an application implemented on the electronic device, information regarding an ongoing communication operation, information regarding an incoming communication request, or a device operation screen).

One or more of the input mechanism 108, the audio output 110, and the display 112 may be coupled to input/output circuitry. The input/output circuitry may convert (and encode/decode, if desired) analog and other signals into digital data. In some embodiments, the input/output circuitry may also convert digital data into any other type of signal, and vice versa. For example, the input/output circuitry may receive and convert physical contact input (e.g., from a multi-touch screen), physical movement (e.g., from a mouse or sensor), analog audio signals (e.g., from a microphone), or any other input. The digital data may be provided to or received from the processor 102, the storage 104, the memory 106, or any other component of the electronic device 100. In some embodiments, several instances of input/output circuitry may be included in electronic device 100.

The communication circuit 114 may communicate with other devices or with one or more servers using any suitable communication protocol. Electronic device 100 may include one or more instances of communication circuitry to perform several communication operations simultaneously using different communication networks. For example, the communication circuit may support Wi-Fi (e.g., 802.11 protocol), Ethernet, Bluetooth^TM(which IS a trademark owned by Bluetooth Sig, inc.), a radio frequency system, a cellular network (e.g., GSM, AMPS, GPRS, CDMA, EV-DQ, EDGE, 3GSM, DECT, IS-136/TDMA, iDen, LTE, or any other suitable cellular network or protocol), infrared, TCP/IP (e.g., any protocol used in each TCP/IP layer), HTTP, BitTorrent, FTP, RTP, RTSP, SSH, Voice Over IP (VOIP), any other communication protocol, or any combination thereof. In some embodiments, communications circuitry 114 may include one or more communications ports that provide a wired communications link between electronic device 100 and a host device. For example, the portable electronic device may include one or more connectors (e.g., 30-pin connectors or USB connectors) that receive a cable that couples the portable electronic device to the host computer. The portable device can communicate with the host computer using software on the host computer (e.g., iTunes, available from Apple inc.).

In some embodiments, the electronic device 100 may include a bus that provides a data transfer path for transferring data to, from, or between the control processor 102, the storage 104, the memory 106, the input/output circuitry 108, the sensors 110, and any other components included in the electronic device.

The electronic device can provide voice feedback for any suitable display content, including, for example, menu options or content available for playback to the user (e.g., voice feedback for metadata associated with the media, such as artist name, media title, or album). FIG. 2 is a schematic diagram of an illustrative display screen having content to which voice feedback may be applied in accordance with an embodiment of the present invention. The display 200 includes several areas on which content is displayed. For example, the display 200 may include a title bar 210, a menu 220, and additional information 230. The title bar 210 may include a title 212 indicating the mode or application used by the electronic device. For example, the title 212 may include an iPod (e.g., the top-most title when no application is selected), music, video, photos, podcasts, other (extras), and settings. Other headings may be available, for example, when an accessory device is coupled to the electronic device (e.g., a radio accessory or a fitness accessory). The title bar 210 may also include any other suitable information, including, for example, a battery indicator 214.

Menu 220 may include a number of selectable options 222, including, for example, options for selecting a mode or application, or options associated with a particular mode or application selected. The user may select an option from menu 220 by navigating highlight region 224 over the option. When the highlight region is over a particular option, the user may provide a selection instruction (e.g., by pressing a button, or providing any other suitable input) to select the particular option. Additional information 230 may include any suitable information, including, for example, information associated with the mode or application identified by title 212, one or more displayed options 222, a particular option identified by highlight region 224, or any other suitable information.

The electronic device may generate display 200 or any other display using any suitable method. In some embodiments, a model-view-controller (MVC) architecture or design may be used. The model may include any suitable information associated with the view for display by the controller (e.g., the controller may query the model to compose the view, or modify the view's association with the model at runtime). For example, a model may include one or more strings or images. Each view may be configured to display (e.g., support) one or more types of elements. The view may pass the supported types to a get _ Property call, in response to which the model may provide data associated with the supported types to the view for display by the device. Several views may be combined to form each display. For example, the display 200 may include at least one view for each region of the display.

To make it easier to provide voice feedback of displayed content, the electronic device may incorporate voice feedback variables and settings into the MVC architecture associated with the actual display of the content. In some embodiments, the model may include additional speakable property fields. The speakable properties field may include any suitable information needed or available to provide voice feedback. In some embodiments, the speakable properties field may include an indication (e.g., a switch setting) that voice feedback is to be provided. The electronic device may determine the text to speak using any suitable method. In some embodiments, the view or scheduling system may query the type of attribute ID associated with the view. In some embodiments, a fixed-size ID generated from an attribute ID (e.g., using a hash table) may alternatively or additionally be provided to identify the text for which voice feedback is provided. In some embodiments, the speakable properties may alternatively or additionally include a string of text to be spoken by the electronic device, or a pointer to a field having text to be displayed in the model.

The electronic device may include the hierarchy or importance in any suitable component of the MVC architecture, including, for example, as a speakable order variable associated with each view. The speakable order may provide an indication of the importance of the speakable element displayed in the corresponding view, e.g., the relevant other text in the other views that may be displayed. For example, the indication may include a hierarchy of voices. The electronic device may define any suitable speakable order or tier, including, for example, context (e.g., associated with a menu title), focus (e.g., a list control such as a highlight region location), selection (e.g., an option associated with an item on a list), properties (e.g., a specification or lyrics of the media), details, and free. Each view may be associated with one or more levels or speakable orders depending on the model or elements displayed in the view. For example, if menu options and associated settings (e.g., backlight options 224 and settings 226) are displayed simultaneously within a view, the view may be associated with several levels. Alternatively, menu options and settings may be provided in different views.

If one or several views are displayed as part of the display, the electronic device may retrieve the elements to be displayed from the model and the manner in which the elements are displayed. In addition, the electronic device may retrieve speakable properties from each model and speakable order from each displayed view. The electronic device may provide voice feedback for any suitable speakable element of the display. For example, the electronic device can provide voice feedback for one or more views. As another example, the electronic device can provide voice feedback for one or more elements in a particular view. In some embodiments, the electronic device can provide voice feedback for only one element at each level in a particular view (e.g., provide voice feedback for only one element in menu 220, each option in menu 220 being associated with a particular level).

To provide voice feedback for the speakable elements displayed in the proper order, the speech scheduler of the electronic device may define a queue of items for which voice feedback is provided (e.g., speakable items), where the speakable order or tier sets the order of the elements in the queue. The electronic device may recite any suitable combination of displayed elements. For example, the electronic device may recite only one menu item (e.g., a menu item identified with a highlight region). As another example, the electronic device may recite several menu items (e.g., all menu items following the highlighted menu item). As yet another example, the electronic device may recite all menu items. To ensure that the electronic device first speaks the menu item identified with the highlight region, the electronic device may associate a higher hierarchy or order with the corresponding menu item. The present discussion will use the terms "speakable element or string and" play "an" audio file "associated with the speakable element or string interchangeably to describe providing voice feedback of the speakable element.

In some embodiments, the speech scheduler may include only one speakable element per level per view in the queue. For example, this may provide the electronic device with an easy mechanism to speak only the highlighted menu item (e.g., by assigning only the focus level to the "music" menu option, only "music" and not other items in menu 220). If within a transaction several displayed items change within a given hierarchical view, the speech scheduler may only place the most recently changed items in the queue. To provide voice feedback for several items associated with the same speakable order in a single transaction, the electronic device may display the several items in different views associated with the same speakable order. The speech scheduler may use any suitable method to provide voice feedback for different elements of a view having the same level (e.g., a level of idleness in the now playing display, described in more detail below). For example, the speech scheduler may follow the order of elements in one or more resource files, the order based on the graphical position of the view, alphabetically, or utilizing any suitable order.

FIG. 3 is a schematic diagram of an illustrative queue of speakable items for playback associated with the display of FIG. 2 in accordance with one embodiment of the invention. Queue 300 may be described using any suitable approach. In the example of FIG. 3, queue 300 may include a list 310 of speakable strings to speak successively. As part of the view, each speakable string may be associated with a speakable tier identified in corresponding column 340. Using the elements from display 200 (FIG. 2), the speakable strings may include iPod string 312 having context tier 342 and music string 313 having focus tier 343 (e.g., the menu item identified by the highlight region is the only menu item spoken). In implementations in which all menu items are spoken (e.g., not just the menu items identified with the highlight region), the speakable strings may include, for example, video strings, photo strings, podcast strings, other strings, setup strings, shuffle song strings, and backlight strings all having a selection level (e.g., a level below the focus level of music string 313). Additionally, since the backlight options may be displayed with the associated settings, queue 300 may also include On strings associated with the attribute hierarchy that are spoken after the backlight string is spoken. In implementations in which only the highlighted option is recited, the electronic device may assign a focus tier to the backlight string and a selection tier to the On string in response to detecting that the highlight region has been placed over the backlight option in the menu. The electronic device may identify the audio file associated with each speakable string (e.g., using a hash or database) and play back each identified audio file in succession, in the order set by queue 300.

When the content on the display of the electronic device changes, the electronic device can modify the provided voice feedback to reflect the changed display. FIG. 4 is a schematic view of an electronic device display after receiving a user selection of an option of the display of FIG. 2 in accordance with one embodiment of the present invention. Similar to display 200 (FIG. 2), display 400 includes several regions that display content. For example, the display 400 may include a title bar 410, a menu 420, and additional information 430. The title bar 410 may include a title 412 indicating the mode or application used by the electronic device. In the example of fig. 4, title 412 may include music indicating the options of menu 220 (fig. 2) selected.

Menu 420 may include a number of selectable options 422, including, for example, options associated with a particular mode or application selected. The user may select an option from menu 420 by navigating highlight region 424 over the option. When the highlight region is over a particular option, the user may provide a selection instruction (e.g., by pressing a button, or providing any other suitable input) to select the particular option. In the example of FIG. 4, options 422 may include Cover Flow, playlist, artist, album, song, genre, composer, audio, and search. Additional information 430 may include any suitable information, including, for example, information associated with the mode or application identified by title 412, one or more displayed options 422, a particular option identified by highlight region 424, or any other suitable information.

In response to determining that the displayed content has changed (e.g., in response to detecting a transaction), the speech scheduler may update or modify the queue of speakable items that provide voice feedback for the display. For example, the speech scheduler may determine speakable properties associated with each view of the changed display to generate the queue. FIG. 5 is a schematic diagram of an illustrative queue of speakable items for playback associated with the display of FIG. 4 in accordance with one embodiment of the invention. Queue 500 may be described using any suitable approach. In the example of FIG. 5, queue 500 includes a list 510 of speakable strings to speak successively. As part of the view, each speakable string may be associated with a speakable tier identified in corresponding column 540. Using the elements from display 400 (FIG. 4), the speakable strings may include music string 512 having context tier 542, and Cover Flow string 513 having Focus tier 543 (e.g., a menu option identified by a highlight region). In implementations where all menu options are read aloud, queue 500 may include, for example, a playlist string, an artist string, an album string, a song string, a genre string, a composer, an audio book string, and a search string all having a selection hierarchy (e.g., a hierarchy below focus hierarchy 543 of Cover Flow string 513). The electronic device may identify the audio file associated with each speakable string (e.g., using a hash or database) and play back each identified audio file in succession, in the order set by queue 500.

In some embodiments, the voice feedback provided by the electronic device changes when the displayed content remains the same, but the user-controlled indicia (e.g., highlight region) changes. This allows the user to identify actions to be performed in response to the user's selection of the option identified by the tag as the user moves the tag. FIG. 6 is a schematic view of the electronic device display of FIG. 4 with different markup options, in accordance with one embodiment of the present invention. Similar to display 400 (FIG. 4), display 600 includes several regions that display content. For example, display 600 includes title bar 610, menu 620, and additional information 630. The title bar 610 includes a title 612 indicating a mode or application used by the electronic device, which may be the same mode (e.g., music) as the display 400.

Menu 620 may include the same selectable options 622 as display 400. As shown in FIG. 6, the user has navigated highlight region 624 over the artist option (e.g., instead of the Cover Flow option as in display 400). The displayed additional information 630 may include any suitable information, including, for example, information associated with the mode or application identified by the title 612, one or more of the displayed options 622, the particular option identified by the highlight region 624, or any other suitable information. In the examples of fig. 4 and 6, the auxiliary information displayed may be different, reflecting the location of the highlight region 624.

In response to determining that the location of the highlight region has changed (e.g., in response to detecting a transaction), the speech scheduler may update the queue of speakable items that provide voice feedback for the display. For example, the speech scheduler may determine modified, altered, or updated speakable properties associated with each view of the changed display to generate the queue. FIG. 7 is a schematic diagram of an illustrative queue of speakable items for playback associated with the display of FIG. 6 in accordance with one embodiment of the invention. Queue 700 may be described using any suitable approach. In the example of FIG. 7, queue 700 includes a list 710 of speakable strings to speak successively. As part of the view, each speakable string is associated with a speakable tier identified in corresponding column 740. Using the elements from display 600 (FIG. 6), the speakable strings may include music string 712 having context tier 742, and artist string 713 having Focus tier 743 (e.g., a menu option identified with a highlight region). In particular, the listing of speakable strings in queue 700 may be different than the listing of speakable strings in queue 500 (FIG. 5) to reflect that the highlight region is moved down to the artist option. For example, speakable strings that would be spoken in queue 500 before queue 700 may be removed from queue 700. The electronic device may identify the audio file associated with each speakable string (e.g., using a hash or database) and play back each identified audio file in succession, in the order set by queue 700. In implementations in which voice feedback for non-highlighted menu options is provided, queue 700 may include, for example, album strings, song strings, genre strings, composers, audio book strings, search strings, Cover Flow strings, and playlist strings all having a selection hierarchy (e.g., a hierarchy below focus hierarchy 743 of artist string 713). The other menu options may be ordered in any suitable manner, including for example as a repeating list starting with the menu item identified by the highlight region.

In response to detecting the transaction, the electronic device may replay any portion of the speakable option audio file. In some embodiments, if the electronic device begins playing back an audio file associated with display 200 when the user provides instructions to access display 400, or begins playing back an audio file associated with the speakable string of display 400 when the user moves the highlight region to a position reflected in display 600, the electronic device may selectively stop playing back the audio file or continue playing back the audio file based on the tier associated with the audio file and/or the modification of the speech scheduler queue of speakable items. In some embodiments, the speech scheduler first determines an updated queue and compares the initial queue and the updated queue. In particular, the speech scheduler may determine, from the beginning of the queue, the portions of the initial queue and the updated queue that remain the same, and the locations in the updated queue where the order of speakable elements begins to change. For example, as the speech scheduler moves from queue 300 to queue 500, the speech scheduler may determine that the two queues do not share any common speakable strings and thus differ from the initial position. As another example, as the speech scheduler moves from queue 500 to queue 700, the speech scheduler may determine that the two queues share speakable strings associated with the context tier, but become different starting with the speakable string associated with the focus tier.

The speech scheduler may also determine the location on the initial queue and the updated queue (if any) of speakable strings for which audio is currently being provided, respectively. For example, as the speech scheduler moves from queue 500 to queue 700, the speech scheduler may determine whether the speakable string for which the audio file is played back is the speakable string "music" (e.g., the speakable string shared by queues 500 and 700) or a different speakable string (e.g., a speakable string not shared by queues 500 and 700). If the speech scheduler determines that the currently spoken speakable string belongs to speakable strings shared by the initial queue and the updated queue, the speech scheduler may continue to speak or play back audio associated with the speakable string and then continue to play back audio associated with the speakable string of the updated queue in the order set by the updated queue. For example, if the electronic device is playing back audio associated with the speakable string "music" (which has a context hierarchy) when the user changes the display from display 400 to display 600, the electronic device may provide audio associated with the speakable string "artist" (the next item in the queue associated with display 600) when the electronic device finishes playing back audio associated with the speakable string "music" (e.g., instead of audio associated with the speakable string "Cover Flow," which is the next speakable string in the queue associated with display 400).

If the speech scheduler instead determines that the currently spoken speakable string does not fall within the range of speakable strings that is common to the initial queue and the updated queue, the electronic device may cease playing back audio associated with the currently spoken speakable string. For example, the electronic device may stop playing back audio once the speech scheduler determines that the currently spoken speech is not within the range of the common speakable string. The electronic device may then resume playing back audio associated with any suitable speakable strings of the updated queue, including, for example, speakable strings of the updated queue beginning with a speakable string of the updated queue from which the order of speakable elements was changed. For example, if the electronic device is currently speaking the speakable string "Cover Flow" when the user moves the electronic device from display 400 to display 600, the electronic device may stop playing back audio associated with the speakable string "Cover Flow" (e.g., playing back audio only for "Cover Flow") and begin playing back audio associated with the speakable string "artist" (e.g., the first speakable string of queue 700 that is different from queue 500). In implementations in which all menu items are spoken, if the electronic device is currently speaking the speakable string "genre" when the user moves the electronic device from display 400 to display 600, the electronic device may stop playing back audio associated with the speakable string "genre" and begin playing back audio associated with the speakable string "artist". The speakable string "genre" may then be spoken again when the speakable string "genre" is reached in a queue (e.g., queue 700) associated with display 600. Thus, if the user moves the highlight region along the options displayed in the display 400 at an appropriate speed, the electronic device may play back only a portion (e.g., the first syllable) of each option of the display 400.

In some embodiments, the electronic device may provide voice feedback for menu items not statically provided by the electronic device firmware or operating system. For example, the electronic device may provide voice feedback of dynamic strings generated from content provided to the electronic device by a user (e.g., from a host device). In some embodiments, the electronic device may provide voice feedback for media transmitted by the user to the electronic device (e.g., according to metadata associated with the transmitted media). FIG. 8 is a schematic illustration of an electronic device display provided in response to a user selecting the highlighted menu option of FIG. 6, in accordance with one embodiment of the invention. Similar to display 600 (FIG. 6), display 800 may include several regions that display content. For example, the display 800 may include a title bar 810, a menu 820, and additional information 830. The title bar 810 may include a title 812 (e.g., "artist") indicating the mode or application in use by the electronic device.

Menu 820 may include any suitable list associated with an "artist" mode, such as a list 822 including artist names of media available to the electronic device (e.g., media saved by the electronic device). The electronic device may collect the artist names using any suitable method, including, for example, collecting the artist names from metadata associated with the media. The displayed additional information 830 may include any suitable information, including, for example, information associated with one or more artists identified in the menu 820 (e.g., information related to media available from the artist identified with highlight region 824), or a mode or application identified with title 612.

In response to detecting a transaction (e.g., a user selection of an artist option in display 600 of FIG. 6), the speech scheduler may update the queue of speakable items to reflect the displayed dynamic artist name. For example, the speech scheduler may determine modified, changed, or updated speakable properties associated with each view of the changed display to generate the queue. FIG. 9 is a schematic diagram of an illustrative queue of speakable items for playback associated with the display of FIG. 8 in accordance with one embodiment of the invention. Queue 900 may be described using any suitable approach. In the example of FIG. 9, queue 900 includes a list 910 of speakable strings to speak successively. As part of the view, each speakable string may be associated with a speakable tier identified in corresponding column 940. Using the elements from display 800 (FIG. 8), the speakable strings may include artist string 912 having context tier 942 and common string 913 having Focus tier 943 (e.g., the artist identified by the highlight region). In implementations in which voice feedback of non-highlighted menu options is provided, queue 900 may include, for example, Corrs strings, Craig David strings, creded strings, D12 strings, Da Brat strings, and Daniel Beddingfield strings all having a selection level (e.g., a level below the focus level 843 of common string 813). The other artists may be ordered in any suitable manner, including for example as a repeating list starting with the artist identified with the highlight region.

In some embodiments, the electronic device can selectively provide voice feedback based on the status of media playback. For example, when the electronic device is playing back media, the electronic device may not provide voice feedback for a particular element or in a particular mode. FIG. 10 is a schematic diagram of an illustrative "now playing" display in accordance with one embodiment of the present invention. The display screen 1000 includes a title bar 1010, a menu 1020, and additional information 1030. The title bar 1010 includes a title 1012 indicating the mode or application the electronic device is using. For example, title 1012 may include an iPod (e.g., the top-most title when no application is selected), music, video, photos, podcasts, other (Extras), settings, and now playing. The title bar 1010 may also include any other suitable information, including, for example, a battery indicator 1014.

Menu 1020 may include a number of selectable options 1022, including, for example, options for selecting a mode or application, or options associated with a particular mode or application selected. The user may select an option from menu 1020 by navigating highlight region 1024 over the option. While the highlight region is placed over a particular option, the user may provide a selection instruction (e.g., by pressing a button or providing any other suitable input) to select the particular option. For example, to view information associated with currently played media (e.g., currently playing or paused media), the user may select the now playing option. In response to receiving a user selection of the now playing option, the electronic device can display additional information 1030 related to the now playing media. For example, the additional information 1030 may include an artist 1032, a title 1034, and an album 1036 overlaid on the album jacket. In some embodiments, each of artist 1032, title 1034, and album 1036 may be associated with the same or different views (e.g., different views that allow for voice feedback of additional information by using the same hierarchy for all additional information elements).

In response to receiving a selection of the now playing option of display 1000 (FIG. 10), the speech scheduler may update the queue of speakable items to speak the one or more strings associated with the now playing media. For example, the speech scheduler may determine modified, changed, or updated speakable properties associated with each view of the changed display to generate the queue. FIG. 11 is a schematic diagram of an illustrative queue of speakable items for a now playing display in accordance with one embodiment of the invention. Queue 1100 may be described using any suitable approach. In the example of FIG. 11, queue 1100 includes a list 1110 of speakable strings to speak successively. As part of the view, each speakable string may be associated with a speakable tier identified in corresponding column 1140. Using the elements from display 1000 (FIG. 10), the speakable strings may include an iPod string 1112 having a context tier 1142, a now playing string 1113 having a focus tier 1143 (e.g., a menu option identified with a highlight region), a Mika string 1114 having an idle tier 1144, a Grace Kelly string 1115 having an idle tier 1145, and a Life in Cartoon Motion string 1116 having an idle tier 1146.

To ensure that voice feedback for artists, titles, and albums is not provided at inappropriate times, the electronic device may not provide voice feedback for speakable elements associated with the free tier when the media is played back (e.g., not paused). For example, the electronic device may first determine whether media is being played back. In response to determining that no media is being played back, the electronic device can provide voice feedback for all elements in queue 1100, including elements associated with the idle level. If the electronic device instead determines that media is currently being played back, the electronic device can provide voice feedback for elements of the various views in queue 1100 associated with levels other than the idle level. In response to detecting that media is being played back, the speech scheduler may remove elements associated with the idle level from queue 1100, or instead skip elements associated with the idle level in queue 1100. The electronic device may assign a free tier to any suitable displayed information, including for example, to information displayed in additional information windows or areas (e.g., the number of songs or photos saved on the device).

The electronic device may determine what character strings to speak at what time using any suitable method. FIG. 12 is an illustrative state diagram for speaking speakable strings in accordance with one embodiment of the invention. State diagram 1200 may include a number of states and a number of paths to reach each of the number of states. The electronic device may begin in an idle state 1202. For example, the electronic device may remain idle while no content is displayed. As another example, the electronic device may remain idle while content is displayed, but the displayed content is not related to voice feedback (e.g., album art is displayed). As yet another example, the electronic device may remain idle when speakable content is displayed, but both speakable content has been spoken.

While in the idle state 1202, the electronic device may monitor the display for transactions. Any decision by the electronic device as to what elements to speak may result in a transaction. The transaction can be initiated (and detected by the electronic device) using several different methods. For example, a transaction may be detected in response to receiving a user instruction (e.g., a user selection of a selectable option that causes a display to change). As another example, a transaction may be detected in response to a transition in the display (e.g., a change in the display due to, for example, a timeout or due to a user moving the highlight region). In response to detecting the transaction, the electronic device may enter an update step 1204. At an update step 1204, the electronic device can update variables or fields associated with providing voice feedback. For example, the speech scheduler may generate a queue of items for the electronic device to speak, e.g., according to fields available from one or more models used to generate views of the post-transaction display. After the update step 1204, the electronic device may enter a PreSpeakTimeout state 1206.

In PreSpeakTimeout state 1206, the electronic device may pause for a first timeout. Within the timeout, the electronic device may perform any suitable operation, including, for example, generating a queue of speakable strings to speak, identifying an audio file associated with the speakable strings, and performing an initial operation to prepare the audio file for playback, hiding (duck) or attenuating a prior audio output (e.g., an output due to music playback), or performing any other suitable operation. For example, the electronic device may reduce prior audio feedback (e.g., hide) so that the spoken string is clearer. As another example, during voice feedback, the electronic device may pause playback of the media (so that the user does not miss any media). As yet another example, the electronic device may use the PreSpeakTimeout state to ensure that no updated transactions (e.g., subsequent movement of the highlight region) are detected to avoid partially speaking the text. The electronic device may remain in PreSpeakTimeout state 1206 for any suitable duration, including for example a duration in the range of 0ms-500ms (e.g., 100 ms). Once the first timeout associated with PreSpeakTimeout state 1206 has elapsed, the electronic device may enter resume step 1208, thereby entering speakable state 1210.

In speakable state 1210, the electronic device speaks the speakable item placed in the queue generated during update step 1204. For example, the electronic device may identify an audio file associated with a speakable item in the generated queue and play back the identified audio file. When the electronic device completes the first entry in the voice feedback queue generated by the speech scheduler, the electronic device can determine that appropriate voice feedback has been provided and proceed to complete step 1212. At completion step 1212, the speech scheduler may remove the speakable element from the queue or move the pointer to the next speakable element in the queue. In some embodiments, the electronic device may instead remove a speakable element from the queue just prior to speaking the speakable element (e.g., while in speaking state 1210), such that the first speakable element identified by the electronic device is the next element to speak when the electronic device returns to speaking state 1210 after completing step 1212. The electronic device may continue to move between speaking state 1210 and completing step 1212 until all of the speakable items in the queue generated in the updating step (e.g., updating step 1204) have been spoken (i.e., the queue is empty, or the pointer has reached the end of the queue), or until the display is changed and a new updating step is performed.

In response to detecting a transaction while in speaking state 1210 (e.g., as described above), the electronic device may enter update step 1214. At an update step 1214, the electronic device can update the variables or fields associated with providing voice feedback to coincide with the display caused by the transaction. For example, the speech scheduler may update the speakable elements and the order of speakable elements for which voice feedback is provided in the updated voice feedback queue based on the display after the transaction. In some embodiments, the electronic device may also determine, starting with the first speakable element of the queue, the portion of the updated queue that matches the initial voice feedback queue (e.g., prior to step 1214), and identify the current speakable element for which voice feedback is being provided. If the electronic device determines that the current speakable element is within the portion of speakable elements common to the initial queue and the updated queue, the electronic device may return to speaking state 1210 and continue to speak the next speakable element of the updated queue (e.g., using complete step 1212 and speaking state 1210). If the electronic device instead determines that the current speakable element is not within the portion of speakable elements common to the initial queue and the updated queue, the electronic device may cease speaking the current speakable element (e.g., cease playing back the audio file associated with the current speakable element) and return to speaking state 1210. Upon returning to speaking state 1210, the electronic device may provide voice feedback that updates the speakable elements of the queue, e.g., starting with the first speakable element of the queue after the determined portion of common speakable elements.

Once the electronic device has provided voice feedback for each element in the queue generated by the speech scheduler (e.g., once the queue is empty), the electronic device may proceed to no _ ready _ queue step 1216. At no _ ready _ queue step 1216, the electronic device may receive an indication from the speech scheduler that the queue of speakable items is empty (e.g., a no _ ready _ queue variable). From no _ ready _ queue step 1216, the electronic device may enter PostSpeakTimeout state 1218. In state 1218, the electronic device pauses for a second timeout time. Within the timeout period, the electronic device may perform any suitable operation, including, for example, preparing other audio for playback, initiating a user-selected operation (e.g., in response to detecting a selection instruction for one of the displayed and spoken menu options), or any other suitable operation. The electronic device may instead or additionally resume audio output from a hidden or faded mode (e.g., from a hidden or faded mode initiated during PreSpeakTimeout state 1206 to a normal mode for playing back audio or other media). Alternatively, the electronic device may resume playback of the paused media. The electronic device may remain in PostSpeakTimeout state 1218 for any suitable duration, including for example a duration in the range of 0ms to 500ms (e.g., 100 ms). Once the first timeout associated with PostSpeakTimeout state 1218 has elapsed, the electronic device proceeds to resume step 1220 and returns to idle state 1202.

In some embodiments, while in PostSpeakTimeout state 1218, the electronic device may detect a transaction (e.g., the transaction described above) and proceed to update step 1222. The updating step 1222 may include some or all of the features of the updating step 1214. At update step 1222, the electronic device may update the variables or fields associated with providing voice feedback to coincide with the display caused by the transaction. For example, the speech scheduler may update the speakable elements and the order of speakable elements for which voice feedback is provided in the updated voice feedback queue based on the display after the transaction. Additionally, in some embodiments, the electronic device may determine, starting with the first speakable element of the queue, the portion of the updated queue that matches the initial voice feedback queue (e.g., prior to step 1222), and identify the current speakable element for which voice feedback is being provided (e.g., as described above in connection with update step 1214). The electronic device then returns to speaking state 1210 and provides voice feedback for the speakable elements of the updated queue, e.g., starting with the first speakable element of the queue after the determined portion of common speakable elements.

In some embodiments, the electronic device may detect an error in the speakable process. For example, at play _ error step 1224, the electronic device may receive an indication of an error associated with speakable state 1210. The electronic device may receive any suitable indication of an error at step 1224, including, for example, a play _ error variable. The electronic device may then enter the ErrorSpeaking state 1226. In ErrorSpeaking state 1226, the electronic device may perform any suitable operations. For example, the electronic device may perform debugging operations, or other operations for identifying the source of the error. As another example, the electronic device may collect information associated with the error to provide to a developer of the software for debugging or modification. If the electronic device completes one or more operations associated with ErrorSpeaking state 1226, the electronic device may proceed to complete step 1228, returning to speaking state 1210 to continue providing voice feedback for speakable elements in the queue generated by the speech scheduler.

Alternatively, if the electronic device fails to complete all operations associated with ErrorSpeaking state 1226, the electronic device may enter restart step 1230, returning to speaking state 1210. The electronic device may be unable to perform the operations associated with speaking state 1210 for any suitable reason, including, for example, failing to receive a valid "complete" message, receiving a user instruction to cancel the ErrorSpeaking operation or return to speaking state 1210, an error timeout (e.g., 100ms), or any other suitable reason, or according to any other suitable condition.

The electronic device may obtain the audio file associated with each speakable element using any suitable approach. In some embodiments, the audio file may be saved locally by the electronic device, for example as part of the device's firmware or software. However, an inherent limitation of this approach is that firmware is typically provided globally to all electronic devices sold or used in different locations where language and accent are different. To ensure that voice feedback is provided in the proper language or in the proper accent, the firmware used by each device needs to be personalized. This can be quite costly due to the need to save and provide several versions of firmware, and can be significantly more complex as a firmware or software provider may need to manage the distribution of different firmware or software to different devices. In addition, the size of the audio file (as opposed to a text file, for example) may be large and not susceptible to being provided in the form of a firmware or software update.

In some embodiments, the electronic device may generate the audio file locally using a text-to-speech (TTS) engine running on the device. In this way, the electronic device can provide text strings associated with different menu options to a TTS engine of the device in a language associated with the device to generate an audio file for voice feedback. This approach facilitates easier firmware or software updates since changes in the text strings that the TTS engine is capable of operating can be used to reflect changes in the display in which speakable elements reside. However, the TTS engine available from the electronic device may limit this approach. In particular, if the electronic device has limited resources, such as limited memory, processing power, or power supply power (e.g., limitations associated with portable electronic devices), the quality of speech produced by the TTS engine may be reduced. For example, intonations associated with dialects or accents may not be available, or speech associated with a particular language (e.g., a language excessively different from a default language) may not be supported.

In some embodiments, the electronic device may instead or additionally receive an audio file associated with a speakable element from a host device to which the electronic device is connected. Fig. 13 is a schematic diagram of an illustrative communication system including an electronic device and a host device in accordance with one embodiment of the present invention. The communication system 1300 includes an electronic device 1302 and a communication network 1310, and the electronic device 1302 can communicate with other devices within the communication network 1310 by wire or wirelessly using the communication network 1310. For example, electronic device 1302 may operate in communication with host device 1320 via communication network 1310. Although communication system 1300 may include several electronic devices 1302 and host devices 1320, to avoid overcomplicating fig. 13, only one electronic device and one host device are shown in fig. 13, respectively.

The communication network 1310 may be established using any suitable circuitry, device, system, or combination thereof (e.g., a wireless communication infrastructure including communication towers and telecommunication servers) operable to establish a communication network. The communication network 1310 can provide wireless communication using any suitable short-range or long-range communication protocol. In some embodiments, for example, communication network 1310 may support Wi-Fi (e.g., an 802.11 protocol), Bluetooth (registered trademark), radio frequency systems (e.g., 1300MHz, 2.4GHz, and 5.6GHz communication systems), infrared, protocols used by wireless and cellular telephones and personal email devices, or any other protocol that supports wireless communication between electronic device 1302 and host device 1320. The communication network 1310 may instead or additionally be capable of providing wired communication between the electronic device 1302 and the host device 1320, such as by utilizing any suitable port (e.g., 30-pin, USB, firewire, serial, or ethernet) on the electronic device 1302 and/or the host device 1320.

Electronic device 1302 may include any suitable device that receives media or data. For example, electronic device 1302 may include one or more features of electronic device 100 (FIG. 1). Electronic device 1302 may be coupled with host device 1320 via communication link 1340 using any suitable method. For example, electronic device 1302 may connect to host device 1320 through communication link 1340 using any suitable wireless communication protocol. As another example, communication link 1340 may be a wired link (e.g., an Ethernet cable) that couples to both electronic device 1302 and media provider 1320. As yet another example, communication link 1340 may comprise a combination of wired and wireless links (e.g., an accessory device for wirelessly communicating with main device 1320 may be coupled to electronic device 1302). In some embodiments, any suitable connector, adapter (dongle), or docking station may be used as part of communication link 1340 to couple electronic device 1302 and host device 1320.

Host device 1320 may include any suitable type of device that provides audio files to electronic device 1302. For example, host device 1320 may include a computer (e.g., a desktop or laptop computer), a server (e.g., a server available over the internet or using a dedicated communication link), a kiosk, or any other suitable device. Host device 1320 may provide audio files for speakable elements of the electronic device using any suitable approach. For example, host device 1320 may include a TTS engine that may access more resources than are locally available on electronic device 1302. Using a more comprehensive host device TTS engine, host device 1320 may generate audio files associated with text strings for speakable elements of the electronic device. The host device TTS engine enables the electronic device to provide voice feedback in different languages or with personalized accents or voice patterns (e.g., using celebrity voices or accents in a particular region). The TTS engine may include a general speech dictionary, and pronunciation rules for different sounds to produce audio for the provided text and convert the produced audio into a suitable format for playback by the electronic device (e.g., AIFF files). In some embodiments, the TTS engine may include a preprocessor for performing music-specific processing (e.g., replacing the string "feat." or "ft.") with "featureing". In some embodiments, host device 1320 may limit the amount of media passed to the electronic device to account for the storage space required to store audio files associated with providing voice feedback (e.g., calculate the space expected to be required for voice feedback audio files based on the expected number of media files stored on the electronic device).

The host device may identify the text strings for which the audio file is provided using any suitable method. In some embodiments, the host device may identify text strings associated with data transmitted from the host device to the electronic device and provide the identified text strings to the TTS engine to generate corresponding audio files. For example, such an approach may be used for text strings associated with metadata (e.g., title, artist, album, genre, or any other metadata) of media files (e.g., music or video) transferred from a host device to an electronic device. In some embodiments, the electronic device may identify specific metadata for which audio feedback is provided to the host device (e.g., the electronic device identifies title, artist, and album metadata). The host device may name the audio files and save the audio files in the electronic device using any suitable method. For example, the audio file name and storage location (e.g., directory number) may be the result of applying a hash to the spoken text string.

However, for speakable elements (e.g., text of menu options of the electronic device firmware) that are not passed from the host device to the electronic device, the host device is unaware of the text strings for which the TTS engine will provide audio files. In some embodiments, the electronic device may provide a text file (e.g., an XML file) that includes a string associated with each static speakable element for which voice feedback is to be provided to the host device. The electronic device may generate a text file with speakable element strings at any suitable time. In some embodiments, the file may be generated each time the electronic device boots from data extracted from firmware or software source code during compilation. For example, when the electronic device compiles source code associated with the model and view of the display, the electronic device may identify elements having speakable properties (e.g., speakable elements) and extract text strings and priorities to speak associated with the speakable elements. In some embodiments, the electronic device may generate a text file, emit feedback speech, or create a change in response to detecting a change in the speech feedback language.

The extracted text may be provided to the host device using a data file (e.g., an XML file) generated at startup of the electronic device. This approach enables speakable elements to be more easily changed with firmware or software updates because the compiled firmware or software code may include the extracted speakable element information needed by the host device to generate an audio file for voice feedback. In response to receiving the text file, the host device may generate an audio file for each speakable element using the TTS engine. In some embodiments, the text file may include an indication of a language change to instruct the host device to generate a new audio file for the changed text, or with the changed voice or language. The system and method for generating AUDIO files from received text files is described in more detail in commonly assigned U.S. patent publication No.2006/0095848 entitled AUDIO USER interface converting DEVICES (attorney docket No. p3504us1), which is hereby incorporated by reference in its entirety.

The following flow chart illustrates an illustrative process for providing an audio file for voice feedback to an electronic device. FIG. 14 is a flow diagram of an illustrative process for providing static strings to an electronic device. Process 1400 begins at step 1402. At step 1404, the electronic device generates a data file that enumerates the static string. For example, the electronic device may extract from the firmware a string of text that is displayed by the electronic device for which voice feedback may be provided. At step 1406, the electronic device provides the file to the host device. For example, the electronic device may provide the file to the host device using a wired or wireless communication path.

At step 1408, the host device may convert the static strings of the provided data file into an audio file. For example, the host device may generate audio for each static string using a TTS engine (e.g., generate audio, compress the audio, and convert the audio to a file format that may be played back by the electronic device). At step 1410, the host device may transmit the generated audio to the electronic device. For example, the host device may transmit the generated audio file to the electronic device over a communication path. The process 1400 then ends at step 1412. The host device may store the audio file at any suitable location on the electronic device, including, for example, at a location or directory number resulting from a hash of the text string to be spoken.

FIG. 15 is a flow diagram of an illustrative process for providing dynamic strings to an electronic device. Process 1500 begins at step 1502. At step 1504, the host device may identify media to be transferred to the electronic device. For example, the host device may retrieve a list of media to be transferred (e.g., media within a playlist) for transfer to the electronic device. At step 1506, the master device may identify a metadata string associated with the identified media. For example, the host device may retrieve specific metadata strings (e.g., artist, title, and album strings) identified by the host device for each identified media item to be transferred to the electronic device.

At step 1508, the host device may convert the identified metadata string (e.g., dynamic string) to an audio file. For example, the host device may generate audio for each dynamic string using a TTS engine (e.g., generate audio, compress the audio, and convert the audio to a file format that may be played back by the electronic device). At step 1510, the host device may transmit the generated audio to the electronic device. For example, the host device may transmit the generated audio file to the electronic device over a communication path. Process 1500 then ends at step 1512. The host device may store the audio file at any suitable location on the electronic device, including, for example, at a location or directory number generated by a hash of the text string to be spoken.

The above-described embodiments of the present invention are presented for purposes of illustration and not of limitation, and the present invention is limited only by the claims that follow.

Claims

1. A method of providing voice feedback to a user of an electronic device, comprising:

displaying a plurality of elements;

identifying at least two of the plurality of elements for which voice feedback is provided, wherein a voice feedback hierarchy is associated with each of the at least two of the plurality of elements;

determining a voice feedback hierarchy associated with the display of each of the identified at least two of the plurality of elements;

responsive to the identifying and determining, generating an initial queue comprising the identified at least two of the plurality of elements;

ordering the identified elements in the initial queue based on the determined hierarchy, and

providing voice feedback for the identified at least two of the plurality of elements in the determined order of hierarchy.

2. The method of claim 1, further comprising:

retrieving an audio file associated with each of the identified at least two of the plurality of elements, and

the retrieved audio file is played back.

3. The method of claim 1, further comprising:

changing at least one of the displayed plurality of elements, and

updating at least a portion of the initial queue in response to the change.

4. The method of claim 3, further comprising:

in response to the change, re-identifying at least two of the plurality of elements for which voice feedback is provided;

re-determining a hierarchy associated with the display of each of the re-identified at least two of the plurality of elements, and

generating a revision queue comprising the re-identified at least two of the plurality of elements.

5. The method of claim 4, further comprising:

detecting the identified elements for which voice feedback is provided during the change;

comparing the initial queue and the revised queue to identify common portions of the initial queue and the revised queue;

determining that the detected element is not in a portion of the revised queue that is common to the initial queue, and

ceasing to provide voice feedback for the detected element.

6. A system for reciting text of an element displayed by an electronic device, the system comprising:

means for defining a plurality of elements with which speakable properties are associated;

means for displaying the plurality of elements in a plurality of views, wherein each view is associated with a speakable order;

means for generating a queue comprising the plurality of elements, wherein an order of the plurality of elements in the queue is set according to the speakable order;

means for pausing the first timeout;

means for identifying an audio file associated with each of the plurality of elements in the queue, wherein the audio file includes text to speak for each element;

means for sequentially playing back the identified audio files in the order of the queue, and

means for pausing the second timeout.

7. The system of claim 6, wherein the means for identifying an audio file associated with each of the plurality of elements in the queue further comprises:

means for retrieving an audio file associated with each of the plurality of elements based on the hash of the text to speak.

8. The system of claim 6, wherein the host device generates the audio file using a text-to-speech engine.

9. The system of claim 8, further comprising:

means for providing the text to be recited for each of the plurality of elements to the host device, and

means for receiving an audio file generated by utilizing the text-to-speech engine applied to the speakable text of each of the provided plurality of elements.

10. The system of claim 6, further comprising:

means for changing at least one of the displayed plurality of elements, and

means for generating a revision queue comprising the changed displayed plurality of elements ordered according to a speakable order associated with the displayed view.

11. A system for providing voice feedback to a user of an electronic device, comprising:

means for displaying a plurality of elements;

means for identifying at least two elements of the plurality of elements for which voice feedback is provided, wherein a voice feedback hierarchy is associated with each element of the at least two elements of the plurality of elements;

means for determining a voice feedback hierarchy associated with the display of each of the identified at least two of the plurality of elements;

means for generating, in response to the identifying and determining, an initial queue comprising the identified at least two of the plurality of elements;

means for ordering the identified elements in the initial queue based on the determined hierarchy, and

means for providing voice feedback for the identified at least two of the plurality of elements in the determined hierarchical order.

12. The system of claim 11, further comprising:

means for retrieving an audio file associated with each of the identified at least two of the plurality of elements, and

means for playing back the retrieved audio file.

13. The system of claim 11, further comprising:

means for changing at least one of the displayed plurality of elements, and

means for updating at least a portion of the initial queue in response to the change.

14. The system of claim 13, further comprising:

means for re-identifying, in response to the change, at least two of the plurality of elements for which voice feedback is provided;

means for re-determining a hierarchy associated with the display of each of the re-identified at least two of the plurality of elements, and

means for generating a revision queue comprising the re-identified at least two of the plurality of elements.

15. The system of claim 14, further comprising:

means for detecting the identified elements for which voice feedback is provided during the change;

means for comparing the initial queue and the revised queue to identify common portions of the initial queue and the revised queue;

means for determining that the detected element is not in a portion of the revised queue that is common to the initial queue, and

means for ceasing to provide voice feedback for the detected element.