WO1999048088A1 - Navigateur web a commande vocale - Google Patents
Navigateur web a commande vocale Download PDFInfo
- Publication number
- WO1999048088A1 WO1999048088A1 PCT/US1999/006072 US9906072W WO9948088A1 WO 1999048088 A1 WO1999048088 A1 WO 1999048088A1 US 9906072 W US9906072 W US 9906072W WO 9948088 A1 WO9948088 A1 WO 9948088A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- speech
- hyperlink
- grammar
- command
- voice
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M3/00—Automatic or semi-automatic exchanges
- H04M3/42—Systems providing special services or facilities to subscribers
- H04M3/487—Arrangements for providing information services, e.g. recorded voice services or time announcements
- H04M3/493—Interactive information services, e.g. directory enquiries ; Arrangements therefor, e.g. interactive voice response [IVR] systems or voice portals
- H04M3/4938—Interactive information services, e.g. directory enquiries ; Arrangements therefor, e.g. interactive voice response [IVR] systems or voice portals comprising a voice browser which renders and interprets, e.g. VoiceXML
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/223—Execution procedure of a spoken command
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/226—Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics
- G10L2015/228—Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics of application context
Definitions
- the present invention relates to the field of Web browsers and, in particular, to methods and systems for controlling a Web browser by the use of voice commands.
- Intranets are local area network containing one or more Web servers and client computers operating in a manner similar to the Internet as described above. Typically, all of the computers interconnected via an intranet operate within a company or organization.
- the protocols include the file transfer protocol (FTP) used for exchanging files and the hypertext transfer protocol (HTTP) used for accessing data on the World Wide Web, often referred to simply as "the Web.”
- FTP file transfer protocol
- HTTP hypertext transfer protocol
- the Web is an information service on the Internet providing documents and hyperlinks between documents.
- the Web is made up of numerous Web sites around the world that maintain and distribute electronic documents.
- a Web site may use one or more Web server computers that store and distribute documents in various formats, including the hypertext markup language (HTML).
- HTML hypertext markup language
- HTML document contains text and metadata, that is, commands providing formatting information. HTML documents also include embedded "hyperlinks" that reference other data or documents located on any Web server computer. The referenced documents may represent text, graphics, audio, or video in respective formats.
- a Web browser is a client application or operating system utility program that communicates with server computers via one or more Internet protocols, such as FTP and HTTP. Basically, Web browsers receive electronic HTML documents from server computers over the network and present them to users.
- the HotJava Web browser available from Sun Microsystems, Palo Alto, California, is an example of a popular Web browser application.
- the information includes company data residing on a network server, company Intranet information, or even information available on the Internet. These workers have been characterized as "locally mobile" workers.
- a locally mobile production worker might need access to blueprints, reference manuals, and the like, to properly perform a particular job.
- this worker would have to cease working, leave their workspace, obtain the information, and return to their workspace.
- Some information may not be transportable. Even if the worker could return with the necessary information, the demands of the job may still make it extremely difficult for the worker to view the retrieved information while performing manual tasks at the same time.
- the system and method would transmit data between a server computer and a mobile computer carried by the mobile worker.
- a system and method wherein a user employs a browser program to view and enter information, and wherein voice commands are used to control the browser program.
- a system and method would include a mechanism that allows a user to navigate between information pages and also allows a user to manipulate user interface controls by the use of voice commands.
- such a system and method would include alternate mechanisms for creating a speech-recognition grammar.
- One desirable mechanism includes the dynamic creation of a speech-recognition grammar after an information page is received by the user.
- a second desirable mechanism includes precompiling the information page to create a speech-recognition grammar that is transmitted with the information page to the user's computer.
- the present invention is directed to providing such a system and method with such associated mechanisms.
- the present invention includes a voice-activated Web browser program executing on a wearable computer.
- the browser program provides three mechanisms for allowing a user to employ voice commands to navigate pages.
- a "speech hint," or index value corresponding to each hyperlink in a Web page is determined and displayed on the Web page.
- a unique identifier, or index value is appended to the end of the hyperlink text.
- the second mechanism when a voice command is received, a determination is made of whether the voice command corresponds to the text associated with a hyperlink on the current Web page. If the voice command corresponds to the text associated with a hyperlink, the associated hyperlink is activated to retrieve additional data.
- a voice command causes a list of hyperlinks to be displayed. Each hyperlink is displayed with a corresponding index value.
- all three mechanisms are presented to a user, providing a user with a choice of using any mechanism to control the browser program.
- an external speech grammar referenced by the Web document is dynamically compiled by the Web browser after receiving a Web document.
- the speech grammar is activated by the Web browser for use in processing subsequent voice commands whenever the Web document in question is displayed. This mechanism allows Web document developers to customize the speech features of a specific Web page.
- a speech grammar corresponding to a Web document is compiled on a server computer and stored at the server computer.
- the corresponding compiled speech grammar is transmitted to the Web browser.
- the speech grammar is received at the browser and used to process voice commands pertaining to the Web page.
- the present invention provides a mechanism for controlling a browser program executing on a wearable computer by the use of voice commands.
- the invention provides flexibility to a user.
- the mechanism also provides flexibility to a Web page author, who may optimally design the Web page to be used according to one or more of the mechanisms.
- the invention also provides a mechanism for controlling a browser when a Web page author has not designed the Web page which might include voice-activated control.
- the invention can be used when a Web page author has built a Web page with a speech grammar or when a Web page author has not built a speech grammar corresponding to the Web page.
- FIGURE 1A is a block diagram of a wearable computer system for implementing the present invention
- FIGURE 1 B is a pictorial illustration of the wearable computer system of
- FIGURE 1 is a diagrammatic representation of FIG. 1 ;
- FIGURE 2 illustrates an exemplary Web page displayed on a wearable computer, in accordance with the present invention
- FIGURE 3 is a block diagram illustrating a system for implementing a voice-controlled Web browser in accordance with the present invention
- FIGURE 4 is a block diagram illustrating an alternative system for implementing a voice-controlled Web browser
- FIGURE 5 is a flow diagram illustrating a process of generating a speech- recognition grammar for use in a voice-controlled Web browser program
- FIGURE 6 is a flow diagram illustrating the process of displaying a Web document and handling a voice command, in accordance with the present invention.
- the present invention is a mechanism and method for implementing a voice- controlled Web browser program executing on a wearable computer that communicates with one or more server computers.
- the mechanism and method of the invention generate a voice-recognition grammar.
- the mechanism and method of the invention utilize the voice-recognition grammar to determine which command was received and the received command is used to control and manipulate the Web browser program.
- a Web browser program executes on a wearable computer.
- FIGURE 1A and the following discussion are intended to provide a brief, general description of a wearable computer upon which the invention may be implemented.
- an exemplary system for implementing the invention includes a wearable computer 102, including a central processing unit 104 -6-
- the system memory 106 may include both volatile and nonvolatile memory (not shown).
- a second bus such as a PCI bus 110, communicates with the system bus 108 and transfers data to and from peripheral components.
- a video controller 1 12 connected to the PCI bus 1 10 controls the display of information on a video screen 114.
- An audio controller 116 connected to the PCI bus 110 controls a speaker device 1 18.
- the speaker device 1 18 may optionally be built into a headset 134 (shown in FIGURE IB).
- the audio controller 1 16 also receives inputs from a microphone 120.
- the wearable computer 102 includes various other components, such as a power supply and a system clock, that are not illustrated in FIGURE 1.
- a wearable computer system for use with the present invention is described in commonly assigned U.S. Patent Application, Serial No. 09/045,260, pending, the disclosure of which is incorporated herein by reference in its entirety.
- FIGURE IB illustrates an embodiment of a wearable computer 102 that is used to implement the present invention.
- a CPU 104 and a memory 106 are contained within a base unit 130 that may be attached to a belt 132.
- a headset 134 includes a speaker device 118, a display screen 114, and a microphone 120.
- the wearable computer 102 communicates with a server computer (not shown).
- the server computer transmits Web documents, such as HTML documents, to the wearable computer 102, which displays the documents to a user.
- the documents are displayed on the display screen 1 14.
- the wearable computer may also play audio data via the speaker device 118.
- the video screen 114 is not present or is inactive.
- the video screen 1 14 may also be employed to selectively present Web documents, while other select Web documents are played only as audio data via the speaker device 118.
- FIGURE 2 illustrates an exemplary Web document 150 that is displayed on the video screen 114.
- the Web document 150 contains hyperlinks, which each include a representative symbol, such as text or a graphic symbol.
- the symbol may also be an audio signal that is presented to the user.
- the representative symbol is referred to as an "anchor tag" 152.
- Each hyperlink also includes an -7-
- a Uniform Resource Locator is one form of addressing that is commonly used in Web documents.
- An address can be a file system pathname or other value used to indicate the location of additional data.
- the present invention includes three mechanisms that allow a user to employ voice commands to navigate Web pages: speakable indices, an index menu, and "speakable hyperlinks.” Preferably, all three mechanisms are included, and a user has the option of using one or more of the mechanisms.
- the speakable indices mechanism includes a speech-specific parser 206
- the speech hint 154 is an index number and is inserted immediately before each corresponding anchor tag 152.
- the superscripted index number is incremented with each successive anchor tag 152, so duplicate index numbers do not occur.
- the speech hint 154 appears before each hyperlink anchor tag 152.
- the speakable indexing feature is preferably enabled and disabled via two speakable commands: "index enable” and "index disable.”
- a user speaks the words "index enable”
- the speakable index feature is enabled.
- a user speaks the words "index disable”
- the speakable index feature is disabled.
- this feature allows a user to speak a hyperlink tag's unique index number to follow the hyperlink.
- a speech-recognition engine 212 shown in FIGURE 3
- the index menu mechanism provides a second method of following hyperlinks.
- the mechanism and method of the invention displays a dialog box to the user.
- the dialog box includes a scrollable list of hyperlinks and their associated unique indices. A user may navigate this list using verbal scrolling commands, or may speak the unique index number corresponding to the hyperlink that they wish to follow.
- a user speaks the contents of a hyperlink anchor tag 152.
- the speech-recognition engine 212 (shown in FIGURE 3) generates a corresponding speech event that is translated into a user command to follow the corresponding hyperlink.
- An HTML rendering engine 208 navigates to linked Web content based on the user selection.
- a Web page author anticipating the use of speakable hyperlinks creates a Web page that does not have two hyperlink anchor tags that may sound similar.
- Controls and images also have corresponding speech hints.
- selection controls 156 have corresponding speech hints 158.
- Edit controls 160 have corresponding speech hints 162.
- the image 164 has a corresponding speech hint 166 positioned at the upper left corner.
- Activating a control sets the focus of the browser to the control, so that additional voice input is directed to the control.
- the use of speech grammars to select controls is similar to the use of speech grammars to select hyperlinks.
- the present invention provides a mechanism and method for dynamically generating speech grammars upon receipt of Web pages, and a mechanism and method for pre-compiling speech grammars prior to transmitting Web pages to the wearable computer.
- FIGURE 3 is a functional block diagram which illustrates components of a wearable computer system 200 that dynamically generates speech grammars upon receipt of Web pages.
- An HTML parser 204 receives an HTML document from the Internet or an intranet 202 and parses the document content to generate an internal representation 205 of the HTML document.
- the internal representation 205 is passed to a speech-specific parser 206.
- the speech-specific parser 206 locates hyperlinks or other interactive controls that may be the target of a voice command and generates speech grammars 209.
- the speech- specific parser 206 also generates visual speech hints 154 (shown in FIGURE 2).
- the revised internal representation 207 of the HTML document 205 is passed to an HTML rendering engine 208.
- the HTML rendering engine 208 generates a visual Web page 150 (shown in FIGURE 2) based upon the revised internal representation of the HTML document 207.
- the visual Web page 150 is displayed on the display screen 114 (shown in FIGURE 1 A).
- Speech grammars 209 are generated from the HTML text by the speech- specific parser 206 and are passed to a speech grammar compiler 210.
- the speech grammar compiler 210 translates the speech grammars 209 into a compiled speech grammar 211 that is used by a grammar-based speech-recognition engine 212.
- Many speech engine providers including IBM and Lernout & Hauspie, provide grammar compilers with their speech engine products.
- the speech-recognition engine 212 receives static, or precompiled, grammars 214 that are used for controlling the Web browser.
- grammars 214 include browser commands that are not Web page specific, such as "back" and "forward.”
- the speech-recognition engine 212 receives voice audio input from the microphone 120 and uses the compiled speech grammars 211 and static speech grammars 214 to determine the command or text spoken into the microphone 120.
- Via Voice a product licensed by IBM, is a commercially available speech-recognition engine that can be used as the speech-recognition engine 212 in the present invention.
- the speech-recognition engine 212 In response to voice audio input, the speech-recognition engine 212 generates speech events 213.
- the speech events 213 generated by the speech-recognition engine 212 are handled by corresponding software speech controls 218.
- the speech controls 218 translate the speech events 213 into user commands 215, which are passed to the HTML rendering engine 208.
- the HTML rendering engine 208 performs an action corresponding to the user commands 215. For example, if a user command 215 designates that a particular hyperlink has been selected by a voice audio input, the HTML rendering engine 208 performs the action of retrieving the Web page corresponding to the hyperlink.
- GUI controls 216 can receive input from a mechanical device, such as a mouse, or other control (not shown).
- the GUI controls 216 generate user commands 217, which are passed to the HTML rendering engine 208 for appropriate handling, as described above.
- the speech controls 218 may also generate audio prompts 218, which are presented to the user via a headset or other speaker device 118.
- FIGURE 4 is a functional block diagram which illustrates a voice-controlled Web browser system 300 that uses precompiled speech-recognition grammars 214.
- the system 300 is similar to the system 200 illustrated in FIGURE 3. The following discussion describes the important differences between the system 200 which uses dynamically generated speech-recognition grammars 209 and the system 300 which uses precompiled speech-recognition grammars 214.
- a speech grammar compiler such as the speech grammar compiler 210 illustrated in FIGURE 3, is used to generate a voice- recognition grammar at a Web server operating over the Internet or an intranet 202.
- a speech-specific parser 206 on the wearable computer receives an internal representation of the HTML document 205 and one or more previously compiled speech grammars 302. The speech-specific parser 206 passes the received speech grammars 213 to the grammar-based speech-recognition engine 212.
- the recognition engine 212 receives compiled speech grammars similar to the dynamic grammar system 200 of FIGURE 3.
- the precompiled grammar system 300 need not include the speech grammar compiler 210 of FIGURE 3.
- the speech-specific parser 206 passes the revised internal representation of the HTML document 207 to the HTML rendering engine 208.
- the HTML rendering engine 208, the GUI controls 216, the speech controls 214, and the speech-recognition engine 212 perform operations as described above with respect to the dynamic grammar system 202 of FIGURE 3.
- commands pertaining to speech grammars are embedded within HTML comment fields.
- the following HTML code segment shows, by way of example, instructions used to specify the location of a dictionary and grammar files.
- the name field has two valid values. One value is the name of the element to which the grammar is attached. Currently, this value is for form fields only. The other value is the word "document.” Use of the word "document” associates the grammar to a document level context.
- the name field is optional and, if not specified, the grammar is considered to be a document level grammar.
- FIGURE 2 illustrates a portion of an exemplary Web document 150 that is displayed on the display screen 1 14 (FIGURE 1 A) in response to receiving a corresponding HTML document.
- An HTML code segment for the corresponding HTML document is listed below.
- These grammar files are retrieved by the speech-specific parser 206 in the precompiled grammar system 302 of FIGURE 4.
- a grammar compiler can be used by a document author to prepare an HTML document for speech recognition.
- IBM's Via Voice discussed above, is a grammar compiler that accepts HTML documents and supporting grammar files as input.
- the toolkit produces speech-enabled HTML documents, grammar files, and dictionary files.
- FIGURE 5 illustrates a process 502 of dynamically generating and compiling speech-recognition grammars in accordance with the present invention.
- a new HTML document is received and loaded into the HTML parser 204 (shown in FIGURE 3).
- the HTML parser 204 parses the HTML instructions within the newly received HTML document.
- the HTML parser 204 creates an internal representation 205 of the HTML document.
- the internal representation includes one or more parse tags.
- the internal representation is then passed to the speech-specific parser 206. -13-
- the speech-specific parser 206 retrieves a parse tag from the internal representation of the HTML document.
- Speakable HTML entities include anchors, image maps, applets, inputs, and select items. If the current parse tag does not represent a speakable entity, processing proceeds to step 516, where a determination is made of whether the current parse tag is the last parse tag of the current HTML document. If the tag is not the last parse tag, processing returns to step 508 to retrieve the next parse tag.
- the speech-specific parser 206 determines that the current parse tag represents a speakable entity, at step 512, a new rule for the dynamic grammar is created, or, at step 514, an existing rule is appended.
- the rule adheres to the form:
- ⁇ rulename> "Goto link number ⁇ n>”.
- the rule is subsequently used for numerical index navigation. This form is the format used for specifying a grammar rule.
- the set of rules is then compiled using the grammar compiler, described above.
- processing proceeds to step 516 to determine whether the current parse tag is the last parse tag of the HTML document, as described above. If the tag is not the last parse tag, flow control proceeds back to step 508 to retrieve and process the next parse tag. If, at step 516, the speech-specific parser 206 determines that the current parse tag is the last parse tag, processing proceeds to step 518 where the speech grammar compiler 210 compiles the generated rules into a compiled speech grammar.
- the generated rules are in the form of ASCII text
- the compiled speech grammar is a machine representation specific to the speech-recognition engine 212.
- FIGURE 6 illustrates a process 601 that is performed on a wearable computer for displaying a Web document and handling a voice command.
- the wearable computer receives a Web document containing one or more hyperlinks.
- an ordered list of hyperlinks within the Web document is determined.
- the Web document is displayed at the wearable computer.
- a voice command is received from a user.
- the command is an index menu command
- a list of hyperlinks and their corresponding index numbers is displayed.
- the list is displayed within a dialog window.
- the list may be presented as speech over the wearable computer speakers.
- a voice command is received.
- a determination is made of the hyperlink corresponding to the voice command. If, at step 610, the voice command is not an index menu command, processing proceeds to step 616 to determine a corresponding hyperlink.
- Step 616 may include determining whether the text of a hyperlink was spoken, or whether the index number corresponding to a hyperlink was spoken.
- the hyperlink is activated. Activation of the hyperlink may include retrieving a new Web document. Alternatively, activation may comprise displaying a different portion of the same Web document.
Landscapes
- Engineering & Computer Science (AREA)
- Signal Processing (AREA)
- Information Transfer Between Computers (AREA)
Abstract
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| AU31045/99A AU3104599A (en) | 1998-03-20 | 1999-03-19 | Voice controlled web browser |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US7893798P | 1998-03-20 | 1998-03-20 | |
| US60/078,937 | 1998-03-20 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO1999048088A1 true WO1999048088A1 (fr) | 1999-09-23 |
Family
ID=22147129
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US1999/006072 Ceased WO1999048088A1 (fr) | 1998-03-20 | 1999-03-19 | Navigateur web a commande vocale |
Country Status (2)
| Country | Link |
|---|---|
| AU (1) | AU3104599A (fr) |
| WO (1) | WO1999048088A1 (fr) |
Cited By (50)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2000029936A1 (fr) * | 1998-11-12 | 2000-05-25 | Microsoft Corporation | Systeme de reconnaissance vocale avec changements de grammaires et commande d'assistance grammaticale |
| WO2001028187A1 (fr) * | 1999-10-08 | 2001-04-19 | Blue Wireless, Inc. | Dispositif de navigateur portable a reconnaissance vocale et a capacite de retroaction |
| EP1122636A2 (fr) | 2000-02-03 | 2001-08-08 | Siemens Corporate Research, Inc. | Système et méthode d'analyse, de description et d'entrée interactive activée par la voie à des formulaires HTML |
| WO2001069592A1 (fr) * | 2000-03-15 | 2001-09-20 | Bayerische Motoren Werke Aktiengesellschaft | Dispositif et procede pour l'entree vocale d'une destination selon un dialogue d'entree defini dans un systeme de guidage |
| GB2362017A (en) * | 2000-03-29 | 2001-11-07 | John Pepin | Network access |
| WO2001095087A1 (fr) * | 2000-06-08 | 2001-12-13 | Interactive Speech Technologies | Systeme de commande vocale d'une page stockee sur un serveur et telechargeable en vue de sa visualisation sur un dispositif client |
| KR20020012364A (ko) * | 2000-08-07 | 2002-02-16 | 최중인 | 음성 웹 서버를 이용한 전자상거래 방법 |
| WO2002073599A1 (fr) * | 2001-03-12 | 2002-09-19 | Mediavoice S.R.L. | Procede permettant l'interaction vocale d'une page web ou d'un site web |
| EP1246439A1 (fr) * | 2001-03-26 | 2002-10-02 | Alcatel | Système et méthode pour naviger l'internet avec la voix avec le moyen d'une connection permanente canal D |
| EP1209660A3 (fr) * | 2000-11-23 | 2002-11-20 | International Business Machines Corporation | Navigation vocale dans des applications sur internet |
| WO2002099786A1 (fr) * | 2001-06-01 | 2002-12-12 | Nokia Corporation | Procede et dispositif de navigation interactive multimodale |
| KR20030027359A (ko) * | 2001-09-28 | 2003-04-07 | 박기철 | 보이스 브라우저와 기존 웹 브라우저의 연동을 위한 방법및 시스템 |
| WO2002044887A3 (fr) * | 2000-12-01 | 2003-04-24 | Univ Columbia | Procede et systeme pour pages web a activation vocale |
| US7146323B2 (en) | 2000-11-23 | 2006-12-05 | International Business Machines Corporation | Method and system for gathering information by voice input |
| EP1729284A1 (fr) * | 2005-05-30 | 2006-12-06 | International Business Machines Corporation | Procédé et systèmes d'accès à des données en épelant des lettres discriminantes ou des noms de liens |
| US7219123B1 (en) * | 1999-10-08 | 2007-05-15 | At Road, Inc. | Portable browser device with adaptive personalization capability |
| US7228495B2 (en) | 2001-02-27 | 2007-06-05 | International Business Machines Corporation | Method and system for providing an index to linked sites on a web page for individuals with visual disabilities |
| EP1881685A1 (fr) * | 2000-12-01 | 2008-01-23 | The Trustees Of Columbia University In The City Of New York | Procédé et système pour l'activation vocale de pages Web |
| US7382770B2 (en) | 2000-08-30 | 2008-06-03 | Nokia Corporation | Multi-modal content and automatic speech recognition in wireless telecommunication systems |
| CN100424630C (zh) * | 2004-03-26 | 2008-10-08 | 宏碁股份有限公司 | 网页语音接口的操作方法 |
| CN100444097C (zh) * | 2005-06-16 | 2008-12-17 | 国际商业机器公司 | 在多模式浏览器中显示可用菜单选项的方法和系统 |
| EP1899952A4 (fr) * | 2005-07-07 | 2009-07-22 | Enable Inc V | Systeme et procede permettant de chercher un contenu sur un reseau, dans un systeme multimodal, a partir de mots-cles vocaux |
| EP2182452A1 (fr) * | 2008-10-29 | 2010-05-05 | LG Electronics Inc. | Terminal mobile et son procédé de contrôle |
| GB2467451A (en) * | 2009-06-30 | 2010-08-04 | Saad Ul Haq | Voice activated launching of hyperlinks using discrete characters or letters |
| US20110010180A1 (en) * | 2009-07-09 | 2011-01-13 | International Business Machines Corporation | Speech Enabled Media Sharing In A Multimodal Application |
| CN103136285A (zh) * | 2011-12-05 | 2013-06-05 | 英顺源(上海)科技有限公司 | 用于手持装置的翻译查询与操作系统及其方法 |
| EP2518722A3 (fr) * | 2011-04-28 | 2013-08-28 | Samsung Electronics Co., Ltd. | Procédé de fourniture de liste de liens et dispositif dýaffichage l'appliquant |
| US8600755B2 (en) | 2006-09-11 | 2013-12-03 | Nuance Communications, Inc. | Establishing a multimodal personality for a multimodal application in dependence upon attributes of user interaction |
| US8612230B2 (en) | 2007-01-03 | 2013-12-17 | Nuance Communications, Inc. | Automatic speech recognition with a selection list |
| US8706490B2 (en) | 2007-03-20 | 2014-04-22 | Nuance Communications, Inc. | Indexing digitized speech with words represented in the digitized speech |
| US8768711B2 (en) | 2004-06-17 | 2014-07-01 | Nuance Communications, Inc. | Method and apparatus for voice-enabling an application |
| US8781840B2 (en) | 2005-09-12 | 2014-07-15 | Nuance Communications, Inc. | Retrieval and presentation of network service results for mobile device using a multimodal browser |
| US8843376B2 (en) | 2007-03-13 | 2014-09-23 | Nuance Communications, Inc. | Speech-enabled web content searching using a multimodal browser |
| US8862471B2 (en) | 2006-09-12 | 2014-10-14 | Nuance Communications, Inc. | Establishing a multimodal advertising personality for a sponsor of a multimodal application |
| US8862475B2 (en) | 2007-04-12 | 2014-10-14 | Nuance Communications, Inc. | Speech-enabled content navigation and control of a distributed multimodal browser |
| US8909532B2 (en) | 2007-03-23 | 2014-12-09 | Nuance Communications, Inc. | Supporting multi-lingual user interaction with a multimodal application |
| US8938392B2 (en) | 2007-02-27 | 2015-01-20 | Nuance Communications, Inc. | Configuring a speech engine for a multimodal application based on location |
| US9076454B2 (en) | 2008-04-24 | 2015-07-07 | Nuance Communications, Inc. | Adjusting a speech engine for a mobile computing device based on background noise |
| US9083798B2 (en) | 2004-12-22 | 2015-07-14 | Nuance Communications, Inc. | Enabling voice selection of user preferences |
| US9202467B2 (en) | 2003-06-06 | 2015-12-01 | The Trustees Of Columbia University In The City Of New York | System and method for voice activating web pages |
| US9208783B2 (en) | 2007-02-27 | 2015-12-08 | Nuance Communications, Inc. | Altering behavior of a multimodal application based on location |
| US9208785B2 (en) | 2006-05-10 | 2015-12-08 | Nuance Communications, Inc. | Synchronizing distributed speech recognition |
| US9292183B2 (en) | 2006-09-11 | 2016-03-22 | Nuance Communications, Inc. | Establishing a preferred mode of interaction between a user and a multimodal application |
| US9349367B2 (en) | 2008-04-24 | 2016-05-24 | Nuance Communications, Inc. | Records disambiguation in a multimodal application operating on a multimodal device |
| US9396721B2 (en) | 2008-04-24 | 2016-07-19 | Nuance Communications, Inc. | Testing a grammar used in speech recognition for reliability in a plurality of operating environments having different background noise |
| EP3401797A1 (fr) * | 2017-05-12 | 2018-11-14 | Samsung Electronics Co., Ltd. | Commande vocale pour la navigation dans des pages web multilingues |
| US11594218B2 (en) * | 2020-09-18 | 2023-02-28 | Servicenow, Inc. | Enabling speech interactions on web-based user interfaces |
| CN116340685A (zh) * | 2023-03-28 | 2023-06-27 | 广东保伦电子股份有限公司 | 一种基于语音生成网页方法及系统 |
| US11995698B2 (en) * | 2015-11-20 | 2024-05-28 | Voicemonk, Inc. | System for virtual agents to help customers and businesses |
| US12430155B2 (en) | 2017-05-12 | 2025-09-30 | Samsung Electronics Co., Ltd. | Display apparatus and controlling method thereof |
Citations (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US5819220A (en) * | 1996-09-30 | 1998-10-06 | Hewlett-Packard Company | Web triggered word set boosting for speech interfaces to the world wide web |
-
1999
- 1999-03-19 WO PCT/US1999/006072 patent/WO1999048088A1/fr not_active Ceased
- 1999-03-19 AU AU31045/99A patent/AU3104599A/en not_active Abandoned
Patent Citations (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US5819220A (en) * | 1996-09-30 | 1998-10-06 | Hewlett-Packard Company | Web triggered word set boosting for speech interfaces to the world wide web |
Non-Patent Citations (5)
| Title |
|---|
| "HELPING THE WEB", IEEE SPECTRUM., IEEE INC. NEW YORK., US, vol. 36, no. 03, 1 March 1999 (1999-03-01), US, pages 54 - 59, XP002919110, ISSN: 0018-9235 * |
| BAYER S: "EMBEDDING SPEECH IN WEB INTERFACES", PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGEPROCESSING., XX, XX, vol. 03, 1 October 1996 (1996-10-01), XX, pages 1684 - 1687, XP002919109 * |
| HEMPHILL C T, THRIFT P R, LINN J C: "SPEECH-AWARE MULTIMEDIA", IEEE MULTIMEDIA., IEEE SERVICE CENTER, NEW YORK, NY., US, no. 01, 1 January 1996 (1996-01-01), US, pages 74 - 78, XP002919107, ISSN: 1070-986X, DOI: 10.1109/93.486706 * |
| KANEEN E, WYARD P: "A SPOKEN LANGUAGE INTERFACE TO INTERACTIVE MULTIMEDIA SERVICES", IEE COLLOQUIUM ON ADVANCES IN INTERACTIVE VOICE TECHNOLOGIES FORTELECOMMUNICATION SERVICES, IEE, LONDON, GB, 12 June 1997 (1997-06-12), GB, pages 01 - 07, XP002919111 * |
| ZUE V W: "NAVIGATING THE INFORMATION SUPERHIGHWAY USING SPOKEN LANGUAGE INTERFACES", IEEE EXPERT., IEEE SERVICE CENTER, NEW YORK, NY., US, vol. 10, no. 05, 1 October 1995 (1995-10-01), US, pages 39 - 43, XP002919108, ISSN: 0885-9000, DOI: 10.1109/64.464929 * |
Cited By (68)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US6298324B1 (en) | 1998-01-05 | 2001-10-02 | Microsoft Corporation | Speech recognition system with changing grammars and grammar help command |
| WO2000029936A1 (fr) * | 1998-11-12 | 2000-05-25 | Microsoft Corporation | Systeme de reconnaissance vocale avec changements de grammaires et commande d'assistance grammaticale |
| US7203721B1 (en) | 1999-10-08 | 2007-04-10 | At Road, Inc. | Portable browser device with voice recognition and feedback capability |
| WO2001028187A1 (fr) * | 1999-10-08 | 2001-04-19 | Blue Wireless, Inc. | Dispositif de navigateur portable a reconnaissance vocale et a capacite de retroaction |
| US7219123B1 (en) * | 1999-10-08 | 2007-05-15 | At Road, Inc. | Portable browser device with adaptive personalization capability |
| EP1122636A2 (fr) | 2000-02-03 | 2001-08-08 | Siemens Corporate Research, Inc. | Système et méthode d'analyse, de description et d'entrée interactive activée par la voie à des formulaires HTML |
| EP1122636A3 (fr) * | 2000-02-03 | 2007-11-14 | Siemens Corporate Research, Inc. | Système et méthode d'analyse, de description et d'entrée interactive activée par la voie à des formulaires HTML |
| WO2001069592A1 (fr) * | 2000-03-15 | 2001-09-20 | Bayerische Motoren Werke Aktiengesellschaft | Dispositif et procede pour l'entree vocale d'une destination selon un dialogue d'entree defini dans un systeme de guidage |
| US7209884B2 (en) | 2000-03-15 | 2007-04-24 | Bayerische Motoren Werke Aktiengesellschaft | Speech input into a destination guiding system |
| GB2362017A (en) * | 2000-03-29 | 2001-11-07 | John Pepin | Network access |
| WO2001095087A1 (fr) * | 2000-06-08 | 2001-12-13 | Interactive Speech Technologies | Systeme de commande vocale d'une page stockee sur un serveur et telechargeable en vue de sa visualisation sur un dispositif client |
| FR2810125A1 (fr) * | 2000-06-08 | 2001-12-14 | Interactive Speech Technologie | Systeme de commande vocale d'une page stockee sur un serveur et telechargeable en vue de sa visualisation sur un dispositif client |
| KR20020012364A (ko) * | 2000-08-07 | 2002-02-16 | 최중인 | 음성 웹 서버를 이용한 전자상거래 방법 |
| US7382770B2 (en) | 2000-08-30 | 2008-06-03 | Nokia Corporation | Multi-modal content and automatic speech recognition in wireless telecommunication systems |
| US7146323B2 (en) | 2000-11-23 | 2006-12-05 | International Business Machines Corporation | Method and system for gathering information by voice input |
| EP1209660A3 (fr) * | 2000-11-23 | 2002-11-20 | International Business Machines Corporation | Navigation vocale dans des applications sur internet |
| US7640163B2 (en) | 2000-12-01 | 2009-12-29 | The Trustees Of Columbia University In The City Of New York | Method and system for voice activating web pages |
| WO2002044887A3 (fr) * | 2000-12-01 | 2003-04-24 | Univ Columbia | Procede et systeme pour pages web a activation vocale |
| EP1881685A1 (fr) * | 2000-12-01 | 2008-01-23 | The Trustees Of Columbia University In The City Of New York | Procédé et système pour l'activation vocale de pages Web |
| US7228495B2 (en) | 2001-02-27 | 2007-06-05 | International Business Machines Corporation | Method and system for providing an index to linked sites on a web page for individuals with visual disabilities |
| WO2002073599A1 (fr) * | 2001-03-12 | 2002-09-19 | Mediavoice S.R.L. | Procede permettant l'interaction vocale d'une page web ou d'un site web |
| EP1246439A1 (fr) * | 2001-03-26 | 2002-10-02 | Alcatel | Système et méthode pour naviger l'internet avec la voix avec le moyen d'une connection permanente canal D |
| WO2002099786A1 (fr) * | 2001-06-01 | 2002-12-12 | Nokia Corporation | Procede et dispositif de navigation interactive multimodale |
| KR20030027359A (ko) * | 2001-09-28 | 2003-04-07 | 박기철 | 보이스 브라우저와 기존 웹 브라우저의 연동을 위한 방법및 시스템 |
| US9202467B2 (en) | 2003-06-06 | 2015-12-01 | The Trustees Of Columbia University In The City Of New York | System and method for voice activating web pages |
| CN100424630C (zh) * | 2004-03-26 | 2008-10-08 | 宏碁股份有限公司 | 网页语音接口的操作方法 |
| US8768711B2 (en) | 2004-06-17 | 2014-07-01 | Nuance Communications, Inc. | Method and apparatus for voice-enabling an application |
| US9083798B2 (en) | 2004-12-22 | 2015-07-14 | Nuance Communications, Inc. | Enabling voice selection of user preferences |
| EP1729284A1 (fr) * | 2005-05-30 | 2006-12-06 | International Business Machines Corporation | Procédé et systèmes d'accès à des données en épelant des lettres discriminantes ou des noms de liens |
| CN100444097C (zh) * | 2005-06-16 | 2008-12-17 | 国际商业机器公司 | 在多模式浏览器中显示可用菜单选项的方法和系统 |
| EP1899952A4 (fr) * | 2005-07-07 | 2009-07-22 | Enable Inc V | Systeme et procede permettant de chercher un contenu sur un reseau, dans un systeme multimodal, a partir de mots-cles vocaux |
| US8781840B2 (en) | 2005-09-12 | 2014-07-15 | Nuance Communications, Inc. | Retrieval and presentation of network service results for mobile device using a multimodal browser |
| US9208785B2 (en) | 2006-05-10 | 2015-12-08 | Nuance Communications, Inc. | Synchronizing distributed speech recognition |
| US9292183B2 (en) | 2006-09-11 | 2016-03-22 | Nuance Communications, Inc. | Establishing a preferred mode of interaction between a user and a multimodal application |
| US8600755B2 (en) | 2006-09-11 | 2013-12-03 | Nuance Communications, Inc. | Establishing a multimodal personality for a multimodal application in dependence upon attributes of user interaction |
| US9343064B2 (en) | 2006-09-11 | 2016-05-17 | Nuance Communications, Inc. | Establishing a multimodal personality for a multimodal application in dependence upon attributes of user interaction |
| US8862471B2 (en) | 2006-09-12 | 2014-10-14 | Nuance Communications, Inc. | Establishing a multimodal advertising personality for a sponsor of a multimodal application |
| US8612230B2 (en) | 2007-01-03 | 2013-12-17 | Nuance Communications, Inc. | Automatic speech recognition with a selection list |
| US8938392B2 (en) | 2007-02-27 | 2015-01-20 | Nuance Communications, Inc. | Configuring a speech engine for a multimodal application based on location |
| US9208783B2 (en) | 2007-02-27 | 2015-12-08 | Nuance Communications, Inc. | Altering behavior of a multimodal application based on location |
| US8843376B2 (en) | 2007-03-13 | 2014-09-23 | Nuance Communications, Inc. | Speech-enabled web content searching using a multimodal browser |
| US8706490B2 (en) | 2007-03-20 | 2014-04-22 | Nuance Communications, Inc. | Indexing digitized speech with words represented in the digitized speech |
| US9123337B2 (en) | 2007-03-20 | 2015-09-01 | Nuance Communications, Inc. | Indexing digitized speech with words represented in the digitized speech |
| US8909532B2 (en) | 2007-03-23 | 2014-12-09 | Nuance Communications, Inc. | Supporting multi-lingual user interaction with a multimodal application |
| US8862475B2 (en) | 2007-04-12 | 2014-10-14 | Nuance Communications, Inc. | Speech-enabled content navigation and control of a distributed multimodal browser |
| US9396721B2 (en) | 2008-04-24 | 2016-07-19 | Nuance Communications, Inc. | Testing a grammar used in speech recognition for reliability in a plurality of operating environments having different background noise |
| US9076454B2 (en) | 2008-04-24 | 2015-07-07 | Nuance Communications, Inc. | Adjusting a speech engine for a mobile computing device based on background noise |
| US9349367B2 (en) | 2008-04-24 | 2016-05-24 | Nuance Communications, Inc. | Records disambiguation in a multimodal application operating on a multimodal device |
| US9129011B2 (en) | 2008-10-29 | 2015-09-08 | Lg Electronics Inc. | Mobile terminal and control method thereof |
| EP2182452A1 (fr) * | 2008-10-29 | 2010-05-05 | LG Electronics Inc. | Terminal mobile et son procédé de contrôle |
| GB2467451B (en) * | 2009-06-30 | 2011-06-01 | Saad Ul Haq | Discrete voice command navigator |
| GB2467451A (en) * | 2009-06-30 | 2010-08-04 | Saad Ul Haq | Voice activated launching of hyperlinks using discrete characters or letters |
| US20110010180A1 (en) * | 2009-07-09 | 2011-01-13 | International Business Machines Corporation | Speech Enabled Media Sharing In A Multimodal Application |
| US8510117B2 (en) * | 2009-07-09 | 2013-08-13 | Nuance Communications, Inc. | Speech enabled media sharing in a multimodal application |
| EP2518722A3 (fr) * | 2011-04-28 | 2013-08-28 | Samsung Electronics Co., Ltd. | Procédé de fourniture de liste de liens et dispositif dýaffichage l'appliquant |
| CN103136285A (zh) * | 2011-12-05 | 2013-06-05 | 英顺源(上海)科技有限公司 | 用于手持装置的翻译查询与操作系统及其方法 |
| US11995698B2 (en) * | 2015-11-20 | 2024-05-28 | Voicemonk, Inc. | System for virtual agents to help customers and businesses |
| US20240311888A1 (en) * | 2015-11-20 | 2024-09-19 | Voicemonk, Inc. | System for virtual agents to help customers and businesses |
| US12346945B2 (en) | 2015-11-20 | 2025-07-01 | Voicemonk, Inc. | System for virtual agents to help customers and businesses |
| US20250328939A1 (en) * | 2015-11-20 | 2025-10-23 | Voicemonk, Inc. | System for virtual agents to help customers and businesses |
| EP3401797A1 (fr) * | 2017-05-12 | 2018-11-14 | Samsung Electronics Co., Ltd. | Commande vocale pour la navigation dans des pages web multilingues |
| US10802851B2 (en) | 2017-05-12 | 2020-10-13 | Samsung Electronics Co., Ltd. | Display apparatus and controlling method thereof |
| US11726806B2 (en) | 2017-05-12 | 2023-08-15 | Samsung Electronics Co., Ltd. | Display apparatus and controlling method thereof |
| US12430155B2 (en) | 2017-05-12 | 2025-09-30 | Samsung Electronics Co., Ltd. | Display apparatus and controlling method thereof |
| US11594218B2 (en) * | 2020-09-18 | 2023-02-28 | Servicenow, Inc. | Enabling speech interactions on web-based user interfaces |
| US12142275B2 (en) | 2020-09-18 | 2024-11-12 | Servicenow, Inc. | Enabling speech interactions on web-based user interfaces |
| CN116340685A (zh) * | 2023-03-28 | 2023-06-27 | 广东保伦电子股份有限公司 | 一种基于语音生成网页方法及系统 |
| CN116340685B (zh) * | 2023-03-28 | 2024-01-30 | 广东保伦电子股份有限公司 | 一种基于语音生成网页方法及系统 |
Also Published As
| Publication number | Publication date |
|---|---|
| AU3104599A (en) | 1999-10-11 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| WO1999048088A1 (fr) | Navigateur web a commande vocale | |
| JP3432076B2 (ja) | 音声対話型ビデオスクリーン表示システム | |
| US6311177B1 (en) | Accessing databases when viewing text on the web | |
| US5899975A (en) | Style sheets for speech-based presentation of web pages | |
| US6829746B1 (en) | Electronic document delivery system employing distributed document object model (DOM) based transcoding | |
| US7054952B1 (en) | Electronic document delivery system employing distributed document object model (DOM) based transcoding and providing interactive javascript support | |
| US6456974B1 (en) | System and method for adding speech recognition capabilities to java | |
| US5903727A (en) | Processing HTML to embed sound in a web page | |
| US6725424B1 (en) | Electronic document delivery system employing distributed document object model (DOM) based transcoding and providing assistive technology support | |
| US6088675A (en) | Auditorially representing pages of SGML data | |
| US7212971B2 (en) | Control apparatus for enabling a user to communicate by speech with a processor-controlled apparatus | |
| US8572209B2 (en) | Methods and systems for authoring of mixed-initiative multi-modal interactions and related browsing mechanisms | |
| CA2372544C (fr) | Methode d'acces a l'information, systeme d'acces a l'information et programme connexe | |
| US20020077823A1 (en) | Software development systems and methods | |
| US7756849B2 (en) | Method of searching for text in browser frames | |
| EP1837779A1 (fr) | Présentation de contenu multimodal | |
| US20020143821A1 (en) | Site mining stylesheet generator | |
| JPH10275162A (ja) | プロセッサに基づくホスト・システムを制御する無線音声起動制御装置 | |
| WO2001050257A2 (fr) | Incorporation de mecanisme d'interface d'utilisateur d'origine differente dans une interface d'utilisateur | |
| EP1163665A1 (fr) | Systeme et procede de communication bilaterale entre un utilisateur et un systeme | |
| JPH10111785A (ja) | クライアント側イメージ・マップを提示する方法および装置 | |
| EP1280055A1 (fr) | Méthode et système informatique pour créer et traiter une description d'une interface humaine conforme à un navigateur | |
| US20050273487A1 (en) | Automatic multimodal enabling of existing web content | |
| EP1280053B1 (fr) | Méthode et système informatique pour fournir et traiter une description d'une interface humaine | |
| EP1349083A1 (fr) | Extraction de données des pages Web basé sur des règles |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| AK | Designated states |
Kind code of ref document: A1 Designated state(s): AL AM AT AU AZ BA BB BG BR BY CA CH CN CU CZ DE DK EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT UA UG US UZ VN YU ZW |
|
| AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): GH GM KE LS MW SD SL SZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG |
|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
| DFPE | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101) | ||
| NENP | Non-entry into the national phase |
Ref country code: KR |
|
| REG | Reference to national code |
Ref country code: DE Ref legal event code: 8642 |
|
| 122 | Ep: pct application non-entry in european phase |