GB2394589A

GB2394589A - Speech recognition device

Info

Publication number: GB2394589A
Application number: GB0224797A
Authority: GB
Inventors: Douglas Ralph Ealey
Original assignee: Motorola Inc
Current assignee: Motorola Solutions Inc
Priority date: 2002-10-25
Filing date: 2002-10-25
Publication date: 2004-04-28
Anticipated expiration: 2022-10-25
Also published as: GB2394589B; HK1063372A1; WO2004038695A1; GB0224797D0; AU2003282109A1

Abstract

A speech recognition device (300), comprising one or more microphones and means to determine the direction of a signal source received by the microphones, together with means for permitting recognition processing by the device. Permission is determined by comparing the direction of the signal source with a target direction (314), and allowing recognition processing if the source direction is within an angular threshold ( r ) of the target direction.

Description

SPEECH RECOGNITION DEVICE AND METHOD

Technical Field

5 The invention relates to the field of speech recognition.

Background

There is an emergent market for voice-controlled multi 10 modal, multimedia and telematic devices. This market raises the problem that such devices must be able to discern whether or not an utterance was addressed to that particular device, or to some third party.

15 With devices that use voiced keywords or voiced dialling, one does not want the device to perform actions as a consequence of overhearing chance instances of those keywords or names whilst it rests on a desk or in a pocket.

The consequences may be for the device to inadvertently 20 call someone mentioned in an overheard conversation, or to change mode/application, making it confusing to the user to find the device interface changing apparently 'randomly' from use to use. Both behaviours would be seen as a severe ... disadvantage to voice control by the user.

Àe One cannot simply rely on the volume level to differentiate cca. À,,,.. between overheard casual speech and close-talking use as . with a normal 'phone, because multimedia and multi-modal devices and telematic control systems are generally . . '' 30 intended to be used at arms length, either to view a , À display or because the device is on the dashboard.

Thus there is a need for an alternative method.

The general approach in the art is to use a non-obvious keyword to precede interaction with the device, for example giving it a name to be addressed by. However, having to call your personal appliances by name is not likely to be a 5 practical or particularly desirable solution in many circumstances. Summary of the Invention

10 In a first aspect, the present invention provides a speech recognition device, as claimed in claim 1.

In a second aspect, the present invention provides a method for controlling speech recognition, as claimed in claim 12.

Further aspects are as claimed in the dependent claims.

Brief description of the drawings

20 FIG. 1 illustrates a possible microphone configuration in accordance with a preferred embodiment of the invention; FIG. 2 illustrates a target direction for the user's voice, together with an angular threshold of deviation theta from À - À.... 25 the target direction.

FIG. 3 illustrates an arc of possible target directions for À À the user's voice, ranging between 'normal to the plane of the user interface' and 'parallel to the plane of the user .. . 30 interface'.

. FIG.4 illustrates a system for the control of speech recognition in accordance with the preferred embodiment.

Detailed description of preferred embodiment

In a preferred embodiment, a speech recognition device is described in accordance with figure 4, comprising an input 5 signal 404 and an estimate of the direction 406 of the input signal. The target direction selected from the target options stored in store 412 is compared with the direction 406 of the input signal by a processor 430. This comparison uses a threshold of deviation from the target direction 10 stored in 410 to determine whether the input signal 404 should be passed as output 408 to a recogniser.

Figure 1 illustrates the preferred embodiment, comprising at least three microphones 160, 162, 164 that provide input 15 to a means to determine the direction of a signal source.

The speech recognition process is then controlled, by: (i) Comparing the direction of the signal source with a given target direction. Target directions are illustrated as direction 210 of figure 2, and directions generally 20 shown as 314 in figure 3; and (ii) Permitting recognition processing if the source direction is within an angular threshold 220, 320 of the target direction.

À ee.: À À. À. . 25 FIG. 2 illustrates a target direction for the user's voice that is substantially normal to the plane of the user À. À interface and microphones, together with an angular I... À À threshold of deviation theta from the target direction À. . . 30 In figure 2, the initial target direction 210 assumes that À À. the user will wish to look at the user interface 250 of device 200 when controlling it. Consequently in this preferred embodiment the microphones should be distributed to form a plane substantially in parallel with the plane of 35 the user interface 250. The target direction 210 can then

be taken to be normal to the plane of the microphones and thus implicitly the plane of user interface 250. It is anticipated that the user will rarely be aligned exactly normal to the plane of the user interface 250, and so an 5 angular threshold 220 is introduced, wherein the speech recognition process is still permitted if the user is within angle 230 of the target direction.

An alternative target direction can be selected either by 10 the user via a user interface 450 or by automatic control 440 if the device is placed in a power and/or data cradle, or if it is left alone on a substantially horizontal surface. This target direction is described in accordance with figure 3.

FIG. 3 illustrates an arc of possible target directions for the user's voice, ranging between 'normal to the plane of the user interface' and 'parallel to the plane of the user interface', in the direction downwards and vertical to the 20 plane of the user interface, together with an angular threshold of deviation theta from the possible target directions. À ".: So the alternative target direction may be anywhere on an À. 25 arc 314 that is centred on the vertical axis of the plane . of the user interface 350. Arc 314 extends between the Àe. normal to the plane of the user interface 310, and precedes down the vertical axis until parallel to the plane of the À À. user interface 312.

À À.: 30..DTD: À À. It is anticipated that the user need not be exactly aligned on this arc. Therefore an angular threshold 320 is introduced, wherein the speech recognition process is still

permitted if the user is within angle 330 of the target direction. The angular control of the speech recognition system can be 5 specified by the user in situations where the user needs to control the device from a relative position other than those permitted by the configurations above. In such an instance the user may manually indicate the wish to redefine the target direction via the user interface 450, 10 and then speak from that direction to set the device.

In the preferred embodiment, the user's direction is determined by comparing the relative signal delay between pairs of microphones 160, 162, 164, and then using these 15 delays and the positions of the corresponding microphones to calculate the signal direction.

The angular control of the speech recognition system can be overridden automatically in situations where the signal 20 from any one microphone 160, 162, 164 falls below an amplitude ratio with respect to the signals from the remaining microphones, as in when the device is held in a typical phone position, so favouring reception by any microphone near the mouth.

"'. 25

. Finally, in the preferred embodiment it is desirable to a. .. provide a facility to reversibly enable or disable the angular control of the speech recognition system via the , ,,. user interface 450, for example in a dictation scenario.

.: 30 .,

Claims

1. A speech recognition device (100), comprising: 5 one or more microphones (160, 162, 164); means to determine the direction of a signal source received by the one or more microphones (160, 162, 164); means for permitting recognition processing by the device, characterized by: 10 means for comparing the direction of the signal source with a target direction (210, 314), and permitting recognition processing if the source direction is within an angular threshold (theta) of the target direction (210,314).

2. A speech recognition device (100), according to claim 1, wherein the target direction (210) is substantially normal to the plane of the user interface.

20

3. A speech recognition device (100), according to claim 1 or claim 2, wherein the target direction (314) is anywhere on an arc . (314) centred on the vertical axis of the plane of the user À interface, the arc being from substantially normal (310) to À À. 25 the plane of the user interface and proceeding down the À. vertical axis until substantially parallel (312) to the À. plane of the user interface.

"a: À..

4. A speech recognition device (100), according to claim me. 30 1 or claim 2, .. wherein the device is adapted to change the target direction automatically to that of claim 3 when either the device is placed in a cradle provided for it, or the device determines that it has been stationary for a predetermined 3 5 period of time.

5. A speech recognition device (100), according to any previous claim, wherein user-operable means are provided, the user-operable means being adapted to enable the 5 operator to redefine the target direction manually.

6. A speech recognition device (100) according to any previous claim, wherein the microphones (160, 162, 164) are distributed in a plane substantially parallel to the plane 10 of the user interface of the device.

7. A speech recognition device (100) according to any previous claim, further comprising means for comparing the relative signal delay between any given pair of microphones 15 (160, 162, 164).

8. A speech recognition device (100) according to any previous claim, comprising means for calculating signal direction using relative signal delays between microphone 20 pairs and the positions of the microphones (160, 162, 164).

9. A speech recognition device (100) according to any previous claim, wherein the use of an angular threshold to permit recognition processing is overridden if the signal À. 25 from any one microphone (160, 162, 164) falls below an amplitude ratio with respect to the signals from the À. . remaining microphones.

À: À..

10. A speech recognition device (100) according to any ... 30 previous claim, wherein the use of an angular threshold to permit recognition processing may be reversibly enabled or disabled by the user via the user interface.

11. A speech recognition device (100) according to any previous claim, wherein the recognition comprises identifying the talker.

5

12. A method for control of speech recognition, comprising: provision of at least one microphone (160, 162, 164), the at least one microphone providing input to a means for determining the direction of a signal source; 10 provision of means for permitting recognition processing by the device, characterized by: comparing the direction of the signal source with a target direction (210, 314), and permitting recognition processing if the source direction is within an angular 15 threshold (theta) of the target direction.

. À.: À e.. *.. 1 À.. I. I. À À À À. À.: