WO2015087372A1

WO2015087372A1 - Unidirectional close-talking microphone

Info

Publication number: WO2015087372A1
Application number: PCT/JP2013/007335
Authority: WO
Inventors: 睦朗古閑; 良平上瀧; 泰彦野村; 齊藤　哲也
Original assignee: 救救ｃｏｍ株式会社; プロトラスト株式会社; 日本エムエムアイテクノロジー株式会社
Priority date: 2013-12-12
Filing date: 2013-12-12
Publication date: 2015-06-18

Abstract

[Problem] To provide a unidirectional close-talking microphone for collecting audio in a manner that reduces stress for a user by removing noise when collecting audio to such an extent that a speaker's voice can be clearly recognized, in particular, even if there is a considerable amount of unwanted noise. [Solution] Provided is a unidirectional close-talking microphone consisting of a single microphone and a noise processing means. The noise processing means consists of an impedance converter consisting of an N-channel junction field effect transistor, and an optimizer circuit for optimizing audio data. An electrical current supplied to a drain of the N-channel junction field effect transistor consisting of a grounded source is 90 to 100 μA.

Description

Unidirectional close-talking microphone

The present invention relates to a microphone for collecting speech, and in particular, collects sound by removing noise to such an extent that a speaker's speech can be clearly recognized even when noise such as noise is large. The present invention relates to a unidirectional close-talking microphone that can be used.

Conventionally, there are many microphones that collect sound and convert it into an electric signal, and microphones having various functions and characteristics have been developed and used depending on applications. In particular, many microphones have been developed that can remove only noise such as noise and collect only the clear voice of the speaker.

There are many microphones with structures and mechanisms that increase directivity so that the speaker's voice can be collected pinpointly, so that the speaker's voice can be collected reliably. . However, the scene where sound is collected with a microphone is not limited to a quiet place, and it may be used in a place with a lot of noise such as under an overpass or in an emergency vehicle. In such a case, a normal directional microphone picks up noise such as noise transmitted from the direction of the speaker, and there is a problem that it is not possible to sufficiently collect clear sound. Existing.

As a technique for solving such a problem, for example, in Japanese Patent Application Laid-Open No. 2011-13403, the number of noise detection microphones is reduced to suppress an increase in cost, without increasing the size of the speaker housing. A technique for detecting and removing noise is disclosed. Here, a speaker is driven by outputting an acoustic signal, an electromotive force generated in the speaker due to ambient noise is detected, and a signal having an opposite phase of the ambient noise is generated based on the electromotive force to generate an acoustic signal. A technique for canceling ambient noise by adding is disclosed.

According to this technology, it becomes possible to cancel a noise by generating a cancel signal corresponding to the electromotive force generated in the speaker. However, the audio signal to be canceled is not necessarily an unnecessary signal such as noise. In some cases, there is a possibility that the output will be reduced to the necessary sound, and it has not necessarily been possible to cancel the noise as intended.

Japanese Patent Application Laid-Open No. 2012-231468 discloses a technique related to an audio headset that can be transmitted to a remote listener by removing noise from an audio signal spoken by a wearer, and is transmitted by internal bone conduction. The first sound signal is output from the sound vibration of the sound, the second sound signal is output from the sound vibration of the wearer, and the sound generated by the wearer by combining the first sound signal and the second sound signal. And a technique for using the first audio signal as a means for calculating the cutoff frequency as well as the probability that no audio is present.

According to this technology, it is considered that a speaker's voice signal can be extracted, and this can eliminate noise, but there is a problem that the structure of the device becomes complicated and difficult to configure. In addition, since it is necessary to bring the device into close contact with the speaker, there has been a problem that stress applied to the speaker increases during use.

If it is possible to collect voice without picking up noise even in a place where noise and noise are large, accurate information transmission is possible in any environment. In particular, in a place with a high noise level such as in an ambulance vehicle, it is possible to reduce the risk of transmitting erroneous information. In addition, there is an advantage that the accuracy in the case of performing text processing for converting speech into character data is also improved.
Therefore, the development of a unidirectional close-talking microphone with a highly accurate noise canceling function that does not become too complicated in structure, is easy to use, and enables accurate information transmission in any environment. It was desired.
JP 2011-13403 A JP 2012-231468 A

In order to solve the above problems, the present invention is a microphone for collecting voice and converting it into an electric signal. In particular, even when noise such as noise or noise is large, the voice of a speaker can be The present invention relates to a unidirectional close-talking microphone that can reduce the stress of a user and can perform sound collection conversion by removing noise to a level that can be clearly recognized.

In order to achieve the above object, a unidirectional close-talking microphone according to the present invention comprises a single microphone that collects sound and converts it into an electrical signal, and a noise processing means that removes noise. A unidirectional close-talking microphone, wherein the noise processing means includes an impedance converter comprising an N-channel junction field effect transistor for transmitting an electrical signal, and an optimization circuit for optimizing audio data by removing noise In order to effectively eliminate ambient noise, the supply current to the drain of the N-channel junction field effect transistor composed of the grounded source is set to 90 μA to 100 μA.

The unidirectional close-talking microphone is selected by a speech extraction unit, a feature extraction unit that extracts a feature of speech from which noise has been removed by the speech extraction unit and the noise processing unit, and a vocabulary processing unit. Recognition processing means for recognizing speech by applying the acoustic model data distributed from the acoustic model data section to grammatical data determined to be applied by the lexical data and the grammar processing section, and a post-recognition processing means for generating character data And an output means for outputting the generated character data.

Since the present invention is configured as described in detail above, the following effects are obtained.
1. Since a single microphone is used, the structure does not become too complicated and can be easily configured. Further, since the optimization circuit is added to the noise processing means, maintenance is facilitated. In addition, since the current supplied to the drain of the N-channel junction field effect transistor composed of the grounded source is set to 90 μA to 100 μA, noise other than the voice of the speaker can be easily canceled within a sufficient range, and voice data can be easily obtained. Can be optimized.

2. Extracting the features of the speech after extracting the speech, and applying the grammar data and acoustic model data to the speech information to recognize the speech and generate character data, so even in places where noise and noise are large, It becomes possible to convert the voice of the speaker into usable character data.

Hereinafter, a unidirectional close-talking microphone according to the present invention will be described in detail based on an embodiment shown in the drawings. FIG. 1 is a perspective view of a unidirectional close-talking microphone according to the present invention, and FIG. 2 is a schematic circuit diagram of the unidirectional close-talking microphone. FIG. 3 is a graph showing an optimization area of voice information, and FIG. 4 is an operation flowchart of the voice recognition system.

The unidirectional close-talking microphone according to the present invention comprises a main body 10, a microphone 20, a noise processing means 30, and an earphone 40, and is simply used by fixing the main body 10 to a speaker's neck or the like. It is a macrophone that can perform noise cancellation with a simple structure.

The main body 10 is a main part of a unidirectional close-talking microphone that is fixedly installed on the speaker's neck or the like. As shown in FIG. 1, in this embodiment, the main body 10 has an elastic semicircular shape. The microphone 20 is integrally held at the tip of the main body 10. Further, a port 14 for connecting a noise processing means 30 and an earphone 40 described later is provided on the opposite side of the main body 10 where the microphone 20 is installed. Furthermore, a pair of fixing members 12 are provided inside the both ends of the semicircular shape to fix the neck and the like. The main body 10 is made of plastic in the present embodiment, but is not limited thereto, and may be a material having elasticity. For example, a wide range of resin materials can be used. Further, the overall shape is not limited to the neck shape.

The microphone 20 is a member for collecting the voice of the speaker. As shown in FIG. 1, the microphone 20 is a single unit and is equipped with a sound collecting unit 22 at the end. The microphone picks up the sound and converts it into an electric signal by transmitting the sound toward the sound collecting unit 22, and transmits the electronic information to the outside. The microphone 20 is bent in a U shape and is connected to the main body 10 so as to be rotatable. That is, by rotating the microphone 20 around the connection portion with the main body 10, it is possible to adjust the movement so that the microphone 20 is brought closer to the speaker's mouth when the main body 10 is attached to the neck or the like. .

The noise processing means 30 is a member that takes in voice data collected by the microphone 20 and converted into an electrical signal, and removes noise such as noise contained in the data. In this embodiment, as shown in FIG. 1, the noise processing means 30 is detachably attached to the port 14 on the opposite side of the place where the microphone 20 of the main body 10 is connected via the cable 16.

The noise processing means 30 is composed of an impedance converter 32 and an optimization circuit 34 as shown in FIG. The impedance converter 32 is a device for transmitting an electrical signal, and includes an N-channel junction field effect transistor (FET) 32a.

The optimization circuit 34 is a device for removing noise and optimizing audio data, and is connected to the impedance converter 32 and processes the audio data transmitted from the impedance converter 32. In this embodiment, the optimization circuit 34 is composed of a plurality of bridge-connected

resistors

34a and 34b.

The noise processing by the noise processing means 30 is performed by cooperation of the impedance converter 32 and the optimization circuit 34. In other words, the noise processing means 30 effectively removes noise such as noise and noise from the audio data consisting of the electric signal collected by the microphone 20 and converted into a signal, by electric processing. The current value and the like are adjusted, and the supply current to the drain of the N-channel junction field effect transistor 32a composed of the grounded source is adjusted to be in the range of 90 μA to 100 μA.

According to the prior art, there are two microphone units that perform cancellation processing by providing two sets of microphone units and performing reverse phase connection when performing noise removal processing on collected sound. According to this, since the speaker's voice has a large difference in distance between the utterance point and each microphone, an output of a phase difference is generated in the output of the microphone unit. On the other hand, noise and noise from the surroundings are almost in phase because the difference in distance between the sound source and each microphone is small, and it is easy to cancel each other by canceling each voice by connecting them in opposite phases.

However, according to this method, even if noise can be removed by canceling out by matching the phases in an ideal environment, the canceled sound pressure is at most about 20 dB, and sufficient noise is obtained. There is a problem that it cannot be said that it can be removed. In particular, when used in a noisy place such as under an overhead or in an emergency vehicle, it is often difficult to recognize the voice at the same time by performing noise reduction processing. A tolerable noise removal process was not possible.

In the configuration of the noise processing means 30 according to the present embodiment, the supply current to the drain of the N-channel junction field effect transistor 32a is reduced from the normal value of 500 μA to 90 μA by adjusting the potential applied to the gate. ing. According to actual measurement, this 90 μA was an optimum value that is low in sensitivity to noise with a sound source in the distance. By operating the noise processing means 30 in which the value of the current supplied to the drain of the N-channel junction field effect transistor 32a is adjusted to this value, noise and noise can be removed sufficiently effectively. This is clear as shown in the graph of FIG.

Further, as shown in FIG. 3, since the above adjustment is performed and the gain from the microphone 20 is lowered by about 30 dB from the normal time, the result that the audio signal is not distorted to about 140 dB SPL is obtained. As a result, the signal-to-noise ratio (Signal / Noise ratio signal-to-noise ratio) is significantly improved, the speaker's voice is reliably output, and noise such as noise / noise that makes the sound source far away can be reliably removed. It has become possible.

The supply current value to the drain of the N-channel junction field effect transistor 32a is 90 μA in this embodiment, but it can be given a certain width as shown in FIG. Measurements have shown that the value is preferably in the range of 90 μA to 100 μA. The sound pressure of the voice uttered by the speaker is desirably 100 dB or more, and it is desirable to use the speaker's mouth as close to the microphone 20 as possible. Since the initial sound pressure of the voice often exceeds 100 dB, it has been proved by actual measurement that the speaker can be clearly recognized if the sound source of the speaker is sufficiently close to the microphone.

The earphone 40 is a member for transmitting voice information from the outside to the speaker, and as shown in FIG. 1, a port provided at the end of the main body 10 opposite to the place where the microphone 20 is connected. 42 is detachably mounted via a cable 44. In using the unidirectional close-talking microphone according to the present invention, it is necessary for a speaker to easily obtain information such as instructions from the outside even in a place where noise and noise are large. In this embodiment, a receiving device (not shown) is provided inside the main body 10 so that a speaker can acquire voice data received by the receiving device via the earphone 40.

As shown in FIG. 4, the unidirectional close-talking microphone according to the present invention further includes a speech extraction unit 50, a feature extraction unit 60, a recognition processing unit 70, and a recognition unit in addition to the noise processing unit 30. Post-processing means 80 and output means 90 are provided.

The voice extraction means 50 is a means for further analyzing and extracting a sound recognized as a voice from the data subjected to the noise removal process by the noise processing means 30. This makes it possible to configure basic data for converting voice data into text (character data). The feature extraction unit 60 is a unit for extracting the features of the speech from which noise has been removed by the speech extraction unit 50 and the noise processing unit 30. By this means, each sound is identified and identified from the characteristics of each sound in the voice, and the preparation for text conversion is completed.

The data processed by the feature extraction means 60 is transmitted to the recognition processing means 70. As shown in FIG. 4, the recognition processing means 70 is means for recognizing speech based on acoustic model data for vocabulary data and grammatical data. The vocabulary data is selected by the vocabulary processing unit 72 that analyzes the speech data and performs vocabulary selection processing, and the grammar data is processed by the grammar processing unit 74 that performs appropriate grammar application determination processing from each selected vocabulary. The By using the acoustic model distributed from the acoustic model data unit 76 for these, it becomes possible to perform processing for accurately recognizing the content of the voice.

The data processed by the recognition processing means 70 is transmitted to the post-recognition processing means 80. The post-recognition processing unit 80 is a unit that generates character data. That is, the speech data recognized correctly by the recognition processing means 70 is analyzed to convert the speech into text. The character data generated by the post-recognition processing unit 80 is transmitted to the output unit 90. The output unit 90 performs processing for outputting the generated character data to an external PC or storage medium. This makes it possible to use the voice data that has been converted into text for any purpose.

As described above, since the voice data optimized by the noise processing means 30 is used for text conversion, more accurate voice character data conversion can be realized. That is, even in a place with a lot of noise such as the inside of an emergency vehicle, it is possible to accurately transmit the voice uttered by the speaker, and to further accurately convert the voice to text (character data). Is possible.

1 is a perspective view of a unidirectional close-talking microphone according to the present invention. Schematic circuit diagram of unidirectional close-talking microphone A graph showing the optimization area of voice information Operation flow diagram of speech recognition system

DESCRIPTION OF SYMBOLS 10 Main body 12 Fixed part 14 Port 16 Cable 20 Microphone 22 Sound collecting part 30 Noise processing means 32 Impedance converter 32a N channel junction field effect transistor 34 Optimization circuit 34a Resistance 40 Earphone 42 Port 44 Cable 50 Audio extraction means 60 Feature extraction Means 70 Recognition processing means 80 Post recognition processing means 90 Output means

Claims

In a unidirectional close-talking microphone comprising a single microphone that collects sound and converts it into an electrical signal, and a noise processing means for removing noise,
The noise processing means includes an impedance converter composed of an N-channel junction field effect transistor that transmits an electrical signal, and an optimization circuit that optimizes audio data by removing noise, and is configured to reduce ambient noise. A unidirectional close-talking microphone characterized in that, in order to eliminate it effectively, the supply current to the drain of the N-channel junction field-effect transistor composed of grounded source is 90 μA to 100 μA.
The unidirectional close-talking microphone is:
Speech extraction means, feature extraction means for extracting features of speech from which noise has been removed by the speech extraction means and the noise processing means, and lexical data selected by the vocabulary processing section and application determination made in the grammar processing section Recognition processing means for recognizing speech by applying acoustic model data distributed from the acoustic model data section to grammatical data, post-recognition processing means for generating character data, output means for outputting the generated character data, The unidirectional close-talking microphone according to claim 1, further comprising: