JP5790021B2

JP5790021B2 - Audio output system

Info

Publication number: JP5790021B2
Application number: JP2011037362A
Authority: JP
Inventors: 亮大内; 佳崇井出; 小林　詠子; 詠子小林; 紀行畑
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 2011-02-23
Filing date: 2011-02-23
Publication date: 2015-10-07
Anticipated expiration: 2031-02-23
Also published as: JP2012173630A

Description

この発明は、マスカ音を出力するマスカ音出力装置と、音声端末装置と、からなる音声出力システムに関するものである。 The present invention relates to a voice output system including a masker sound output device that outputs a masker sound and a voice terminal device.

従来、同一空間内で複数の音声コンテンツを再生する場合に、それぞれの音声コンテンツの再生音が、相互に影響を及ぼしあうことを抑制するものが提案されている（例えば特許文献１を参照）。特許文献１の装置では、他のエリアでコンテンツが再生されていないときは音量を下げ、他のエリアでコンテンツが再生されると音量を上げるものである。 Conventionally, when a plurality of audio contents are reproduced in the same space, a technique has been proposed in which the reproduced sounds of the respective audio contents are prevented from affecting each other (see, for example, Patent Document 1). In the apparatus of Patent Document 1, the volume is lowered when content is not played back in another area, and the volume is raised when content is played back in another area.

また、近年、銀行や調剤薬局等の対話カウンタにスピーカを取り付け、話者の音声と関連性の低い音声をマスカ音として出力することにより、順番を待つ他の人に話者の音声（会話内容）を聞き取り難くしたものが知られている。 In recent years, speakers have been attached to dialogue counters such as banks and dispensing pharmacies, and voices that are less relevant to the speaker's voice are output as masker sounds. ) Is known to be difficult to hear.

特開２００８−７６９８５号公報JP 2008-76985 A

マスカ音は、音量が小さいとマスキング効果を得ることが難しくなる。したがって、マスカ音はある程度の音量で出力する必要があるが、マスカ音の音量が大きすぎると聞きたい音（例えば呼び出し音声）を聞くことができなくなってしまう。 Masking sound makes it difficult to obtain a masking effect when the volume is low. Therefore, it is necessary to output the masker sound at a certain volume, but if the volume of the masker sound is too high, it becomes impossible to hear the sound to be heard (for example, the calling voice).

そこで、本発明は、十分なマスキング効果を得ることができ、かつ聞きたい音を適切な音量で聞くことができる音声出力装置を提供することを目的とする。 Accordingly, an object of the present invention is to provide an audio output device that can obtain a sufficient masking effect and can listen to a sound to be heard at an appropriate volume.

この発明の音声出力システムは、マスカ音を出力するマスカ音出力装置と、ユーザによって携帯される端末装置と、を備えている。マスカ音出力装置は、前記ユーザに対してマスカ音を出力するように設置されている。そして、端末装置は、前記ユーザ毎に必要な音声を出力する。 The audio output system of the present invention includes a masker sound output device that outputs a masker sound and a terminal device carried by the user. The masker sound output device is installed so as to output a masker sound to the user. Then, the terminal device outputs a necessary voice for each user.

以上の構成により、聞きたい音声は、ユーザが携帯する端末装置（音声端末）からユーザの直近位置で出力されるため、マスキング効果を得るためにマスカ音の音量を大きくした場合であっても、ユーザが聞きたい音を適切な音量で聞くことができる。 With the above configuration, since the voice to be heard is output from the terminal device (voice terminal) carried by the user at the closest position of the user, even when the masker sound volume is increased to obtain a masking effect, The sound that the user wants to hear can be heard at an appropriate volume.

また、上記音声出力システムにおいて、音声端末がマスカ音を補助する音声を出力することで、よりマスキング効果を高めることも可能である。マスカ音を補助する音声としては、マスカ音出力装置が出力するマスカ音と同じものを出力する態様や、小川のせせらぎや木々のざわめきのような背景音や、断続的に発生する楽音のような演出性の高い音（演出音）等を出力する態様も可能である。 Moreover, in the said audio | voice output system, it is also possible to improve a masking effect more because an audio | voice terminal outputs the audio | voice which assists a masker sound. The sound that assists the masker sound is the same as the masker sound output by the masker sound output device, the background sound such as the stream of the brook or the noise of the trees, and the musical sound generated intermittently A mode of outputting a sound with high performance (production sound) or the like is also possible.

また、上記音声出力システムにおいて、音声端末から所定のコンテンツの音声を出力することも可能である。音声端末からコンテンツの音声を出力することで、ユーザの注意をマスク対象の音声からコンテンツの音声に向けさせることができ、よりマスキング効果を高めることができる。 In the audio output system, the audio of a predetermined content can be output from the audio terminal. By outputting the audio of the content from the audio terminal, the user's attention can be directed from the audio to be masked to the audio of the content, and the masking effect can be further enhanced.

なお、ユーザからコンテンツの選択を受け付ける受付手段を備え、ユーザが所望するコンテンツの音声を出力することで、さらにマスキング効果を高めることが望ましい。 Note that it is desirable to further improve the masking effect by providing reception means for accepting selection of content from the user and outputting audio of the content desired by the user.

実際には、銀行や調剤薬局等の待合場所に設けられた複数の表示装置に表示されるコンテンツの中から選択して対応する音声を出力することが望ましい。 Actually, it is desirable to select from contents displayed on a plurality of display devices provided at a waiting place such as a bank or a dispensing pharmacy and output a corresponding sound.

また、上記マスカ音を補助する音声は、ユーザからコンテンツの選択を受け付けなかった場合に出力することが望ましい。 Further, it is desirable that the voice assisting the masker sound is output when selection of content is not accepted from the user.

この発明によれば、十分なマスキング効果を得ることができ、かつユーザが聞きたい音を適切な音量で聞くことができる。 According to the present invention, a sufficient masking effect can be obtained, and the sound that the user wants to hear can be heard at an appropriate volume.

音声出力システムの構成を示す配置図である。It is a layout diagram showing the configuration of the audio output system. 図２（Ａ）は、音声端末の構成を示すブロック図であり、図２（Ｂ）は、音声端末の外観図である。2A is a block diagram illustrating a configuration of the voice terminal, and FIG. 2B is an external view of the voice terminal. マスカ音出力装置の構成を示すブロック図である。It is a block diagram which shows the structure of a masker sound output device. サーバの構成を示すブロック図である。It is a block diagram which shows the structure of a server. 図５（Ａ）は、収音された音声信号をサーバに送信する場合のサーバ、音声端末、およびマスカ音出力装置の動作を示したフローチャートであり、図５（Ｂ）は、会話内容の記録をする場合のサーバ、および他の情報処理装置（例えば自宅ＰＣ）の動作を示したフローチャートである。FIG. 5A is a flowchart showing the operations of the server, the voice terminal, and the masker sound output device when the collected voice signal is transmitted to the server, and FIG. 6 is a flowchart showing the operation of the server and other information processing apparatus (for example, home PC) when performing the operation. サーバと音声端末の動作を示すフローチャートである。It is a flowchart which shows operation | movement of a server and an audio | voice terminal. サーバと音声端末の動作を示すフローチャートである。It is a flowchart which shows operation | movement of a server and an audio | voice terminal.

図１は、音声出力システムの概要を示す配置図である。音声出力システムは、例えば銀行や調剤薬局等の対話カウンタおよび待合場所に設置される。対話カウンタの近傍にはマスカ音を出力するマスカ音出力装置３が設置され、待合場所に向けてマスカ音が放音される。このマスカ音は、対話カウンタ内で会話を行う者の発言内容をマスクし、待合場所に居る者に発言内容を理解できないようにするものである。 FIG. 1 is a layout diagram showing an outline of an audio output system. The voice output system is installed in a dialogue counter and a waiting place such as a bank or a dispensing pharmacy. A masker sound output device 3 for outputting a masker sound is installed in the vicinity of the dialogue counter, and the masker sound is emitted toward the waiting place. This masking sound masks the content of a person who has a conversation in the dialog counter so that the person who is in the waiting place cannot understand the content of the speech.

図１においては、３つの対話カウンタにそれぞれユーザ９０およびスタッフ９１が存在し、対話カウンタから離れた待合場所に複数のユーザ９２が存在する。スタッフ９１は、例えば薬の説明を行う薬剤師であり、ユーザ９０は薬の説明を聞く患者であり、ユーザ９２は順番待ちの患者である。 In FIG. 1, there are a user 90 and a staff 91 in each of the three interaction counters, and a plurality of users 92 in a waiting place away from the interaction counter. The staff 91 is, for example, a pharmacist who explains medicines, the user 90 is a patient who listens to medicines, and the user 92 is a patient waiting for a turn.

各ユーザ９２は、受付場所のスタッフ９３から音声端末１を受け取り、携帯する。この音声端末１には、スピーカが設けられており、順番が来たときに呼び出し音声（合成音声やスタッフの実音声、あるいはビープ音等）が出力される。ユーザ９２は、呼び出し音声を聞くことで順番が来たことを知ることができる。順番が来たユーザ９２は、音声端末１を携帯して対話カウンタに向かい、対話カウンタ内でスタッフ９１に音声端末１を返却する。このようにして、音声端末１は、無線呼び出し（いわゆるページャ）の機能を有する。このように、ユーザが携帯する音声端末１から呼び出し音声を出力することで、マスカ音出力装置３が、マスキング効果を得るためにマスカ音をある程度の音量で出力したとしても、ユーザ毎に必要な音声（本実施形態では呼び出し音声）を適切な音量で聞くことができる。 Each user 92 receives the voice terminal 1 from the staff 93 at the reception location and carries it. The voice terminal 1 is provided with a speaker, and when the turn comes, a calling voice (synthetic voice, actual staff voice, beep sound, etc.) is output. The user 92 can know that the turn has come by listening to the calling voice. The user 92 whose turn has arrived carries the voice terminal 1 to the dialogue counter, and returns the voice terminal 1 to the staff 91 within the dialogue counter. In this way, the voice terminal 1 has a function of wireless calling (so-called pager). Thus, even if the masker sound output device 3 outputs the masker sound at a certain volume to obtain the masking effect by outputting the calling voice from the voice terminal 1 carried by the user, it is necessary for each user. Voice (calling voice in this embodiment) can be heard at an appropriate volume.

また、本実施形態の音声出力システムでは、待合場所に表示装置７が設けられている。表示装置７は、一般的に待合場所に設けられた汎用情報表示用ディスプレイであり、所定のコンテンツの映像が表示されている。この例では、３つの表示装置７が設置され、それぞれ豆知識チャンネル、健康チャンネル、宣伝広告チャンネル等、チャンネル毎に個別のコンテンツ映像が表示されている。音声端末１は、これらコンテンツの音声も出力する。音声出力の対象となるコンテンツは、ユーザが音声端末１を操作して、手動で選択することができるようになっている。ユーザ９２は、音声端末１から出力されるコンテンツの音声を聞くことで、自身の注意がマスク対象の音声（対話カウンタ内の会話）ではなく、コンテンツの音声に向くことになり、よりマスキング効果を高めることができる。 In the audio output system of the present embodiment, the display device 7 is provided at the waiting place. The display device 7 is a general-purpose information display generally provided at a waiting place, and displays a video of predetermined content. In this example, three display devices 7 are installed, and individual content images are displayed for each channel, such as a bean knowledge channel, a health channel, and an advertisement channel. The audio terminal 1 also outputs audio of these contents. The content that is the target of audio output can be manually selected by the user operating the audio terminal 1. The user 92 listens to the audio of the content output from the audio terminal 1, so that his / her attention is directed to the audio of the content, not the audio to be masked (conversation in the dialogue counter), and the masking effect is further improved. Can be increased.

以下、上記の音声出力システムを実現するための具体的な構成、動作について説明する。図２（Ａ）は、音声端末１の構成を示すブロック図であり、図２（Ｂ）は、音声端末１の外観図である。図３は、マスカ音出力装置３の構成を示すブロック図であり、図４は、サーバ５の構成を示すブロック図である。図５は、サーバ５とマスカ音出力装置３の動作を示すフローチャートである。図６および図７は、サーバ５と音声端末１の動作を示すフローチャートである。 Hereinafter, a specific configuration and operation for realizing the audio output system will be described. FIG. 2A is a block diagram showing a configuration of the voice terminal 1, and FIG. 2B is an external view of the voice terminal 1. FIG. 3 is a block diagram illustrating the configuration of the masker sound output device 3, and FIG. 4 is a block diagram illustrating the configuration of the server 5. FIG. 5 is a flowchart showing operations of the server 5 and the masker sound output device 3. 6 and 7 are flowcharts showing operations of the server 5 and the voice terminal 1.

音声端末１は、マイク１１、Ａ／Ｄコンバータ１２、信号処理部１３、Ｄ／Ａコンバータ１４、スピーカ１５、制御部１６、通信部１７、および操作部１８を備えている。 The audio terminal 1 includes a microphone 11, an A / D converter 12, a signal processing unit 13, a D / A converter 14, a speaker 15, a control unit 16, a communication unit 17, and an operation unit 18.

マスカ音出力装置３は、通信部３１、制御部３２、信号処理部３３、Ｄ／Ａコンバータ３４、およびスピーカ３５を備えている。 The masker sound output device 3 includes a communication unit 31, a control unit 32, a signal processing unit 33, a D / A converter 34, and a speaker 35.

サーバ５は、通信部５１、制御部５２、マスカ音生成部５３、マスカ音記憶部５４、コンテンツ記憶部５５、および出力インタフェース（Ｉ／Ｆ）５６を備えている。 The server 5 includes a communication unit 51, a control unit 52, a masker sound generation unit 53, a masker sound storage unit 54, a content storage unit 55, and an output interface (I / F) 56.

マスカ音出力装置３は、通信部３１を介してサーバ５の通信部５１と接続され、サーバ５から種々のデータを送受信する。ここでは、主にサーバ５からマスカ音に係る音データを受信する。 The masker sound output device 3 is connected to the communication unit 51 of the server 5 via the communication unit 31, and transmits and receives various data from the server 5. Here, sound data related to masker sound is received mainly from the server 5.

サーバ５の制御部５２は、マスカ音生成部５３にマスカ音の生成を指示し、マスカ音生成部５３が生成したマスカ音に係る音データを通信部５１を介してマスカ音出力装置３に出力する。 The control unit 52 of the server 5 instructs the masker sound generation unit 53 to generate a masker sound, and outputs sound data related to the masker sound generated by the masker sound generation unit 53 to the masker sound output device 3 via the communication unit 51. To do.

マスカ音生成部５３は、マスカ音記憶部５４に記憶されている各種音データを読み出してマスカ音に係る音データを生成し、制御部５２に出力する。マスカ音は、音声をマスクすることが可能な音であればどの様な音であってもよいが、例えば、マスカ音記憶部５４に記憶されている撹乱音、背景音、および演出音を組み合わせて生成する。 The masker sound generation unit 53 reads various sound data stored in the masker sound storage unit 54, generates sound data related to the masker sound, and outputs the sound data to the control unit 52. The masker sound may be any sound as long as the sound can be masked. For example, the disturbance sound, the background sound, and the effect sound stored in the masker sound storage unit 54 are combined. To generate.

撹乱音は、マスク対象の音声を撹乱する音であり、人の音声を時間軸上あるいは周波数軸上で改変し、語彙的に何ら意味をなさない（内容が理解できない）ようにしたものである。撹乱音が人の音声を時間軸上で改変したものである場合、予め特定の話者の音声（男性および女性を含む複数人の音声）を録音し、所定時間毎に一定長の区間に分割した音声信号を各区間で逆方向に読み出す等して、語彙的に意味をなさない音声に変更する。周波数軸上で改変する場合、スペクトル包絡のピーク（フォルマント）を抽出し、語彙に影響する特定のフォルマントを変更して語彙的に意味をなさない音声に変更する。 Disturbing sound is a sound that disturbs the voice to be masked, and it is a human voice that is modified on the time axis or frequency axis so that it does not make any meaning in the vocabulary (the contents cannot be understood). . If the disturbing sound is a modified human voice on the time axis, the voice of a specific speaker (multiple voices including males and females) is recorded in advance and divided into sections of a certain length every predetermined time The voice signal is read out in the opposite direction in each section, and the voice is changed to a voice that does not make sense lexically. When modifying on the frequency axis, the peak (formant) of the spectral envelope is extracted, and a specific formant that affects the vocabulary is changed to change to a vocabulary meaningless voice.

なお、撹乱音は、音声端末１のマイクを用いて、対話カウンタ内の話者の音声を取得して、取得した音声を改変することにより、都度、生成する態様としてもよい。 Note that the disturbing sound may be generated each time by acquiring the voice of the speaker in the dialogue counter using the microphone of the voice terminal 1 and modifying the acquired voice.

背景音は、例えば小川のせせらぎや木々のざわめき等、聴取者が聴覚的に注目し難く、不快感のない音である。これにより、暗騒音レベルを上げ、撹乱音の違和感を目立たなくする。 The background sound is a sound with no uncomfortable feeling that is difficult for the listener to notice audibly, such as a stream of a stream or a buzz of trees. As a result, the background noise level is raised and the uncomfortable feeling of the disturbing sound is made inconspicuous.

演出音は、断続的に発生する楽音等の演出性の高い音である。これにより、聴取者の注意を演出音にも向けさせ、聴覚心理的に撹乱音の違和感を目立たなくする。これらの撹乱音、背景音、および演出音を組み合わせたマスカ音をユーザ９２に聴取させることで、話者の音声をマスクしつつ、不快感を低減することが可能となる。 The production sound is a high performance sound such as a musical sound generated intermittently. As a result, the listener's attention is also directed to the production sound, and the sense of incongruity of the disturbing sound is made inconspicuous psychologically. By causing the user 92 to listen to the masker sound that is a combination of the disturbing sound, the background sound, and the effect sound, it is possible to reduce discomfort while masking the voice of the speaker.

なお、マスカ音記憶部５４に記憶されている撹乱音、背景音、および演出音に係る音データは、それぞれ１つに限らず、複数の音データであってもよい。この場合、マスカ音生成部５３は、複数の音データから特定の音データを選択して読み出す。複数の音データが記憶されている場合、予め規定された組み合わせテーブル（マスカ音記憶部５４に記憶されたテーブル）に従って選択する態様としてもよい。また、テーブルには、各音の音量や読み出しタイミング等を記載しておき、各音の音量や読み出しタイミングを個別に変更する態様としてもよい。また、各音データを予め合成済みのマスカ音として記憶しておき、再生するように構成することも可能である。 Note that the sound data related to the disturbing sound, the background sound, and the effect sound stored in the masker sound storage unit 54 is not limited to one, and may be a plurality of sound data. In this case, the masker sound generation unit 53 selects and reads specific sound data from the plurality of sound data. When a plurality of sound data are stored, the selection may be made in accordance with a combination table (a table stored in the masker sound storage unit 54) defined in advance. In addition, the sound volume and read timing of each sound may be described in the table, and the sound volume and read timing of each sound may be individually changed. Further, each sound data can be stored in advance as a synthesized masker sound and reproduced.

サーバ５は、このようなマスカ音に係る音データを生成し、マスカ音出力装置３に送信する。マスカ音出力装置３の制御部３２は、通信部３１を介してマスカ音に係る音データを受信し、再生処理を行う。例えば、マスカ音に係る音データがエンコードされた圧縮データであればデコードし、デジタル音声信号に変換し、信号処理部３３に出力する。信号処理部３３は、入力されたデジタル音声信号の音量や周波数特性等を調整し、Ｄ／Ａコンバータ３４に出力する。信号処理部３３から出力されたデジタル音声信号は、Ｄ／Ａコンバータ３４でアナログ音声信号に変換され、スピーカ３５から放音される。このようにして、待合場所に居るユーザ９２にマスカ音が出力される。 The server 5 generates sound data relating to such a masker sound and transmits the sound data to the masker sound output device 3. The control unit 32 of the masker sound output device 3 receives the sound data related to the masker sound via the communication unit 31 and performs a reproduction process. For example, if the sound data related to masker sound is compressed data encoded, it is decoded, converted into a digital audio signal, and output to the signal processing unit 33. The signal processing unit 33 adjusts the volume and frequency characteristics of the input digital audio signal and outputs the adjusted signal to the D / A converter 34. The digital audio signal output from the signal processing unit 33 is converted into an analog audio signal by the D / A converter 34 and emitted from the speaker 35. In this way, a masker sound is output to the user 92 at the waiting place.

次に、サーバ５と音声端末１の機能、動作について説明する。音声端末１は、通信部１７を介してサーバ５の通信部５１と接続され、サーバ５から種々のデータを送受信する。ここでは、主に呼び出し音声に係る音データや、コンテンツの音データをサーバ５から受信する。 Next, functions and operations of the server 5 and the voice terminal 1 will be described. The voice terminal 1 is connected to the communication unit 51 of the server 5 via the communication unit 17 and transmits / receives various data from the server 5. Here, the sound data mainly related to the calling voice and the sound data of the content are received from the server 5.

サーバ５の制御部５２は、コンテンツ記憶部５５からコンテンツに係る音データおよび映像データを読み出し、コンテンツに係る音データを通信部５１を介して音声端末１に送信する。また、制御部５２は、コンテンツに係る映像データを出力Ｉ／Ｆ５６を介して各表示装置７に出力する。コンテンツに係る音データおよび映像データは、複数種類記憶されており、同時に表示装置の数だけ（本実施形態では３つ）読み出しされる。 The control unit 52 of the server 5 reads the sound data and video data related to the content from the content storage unit 55 and transmits the sound data related to the content to the audio terminal 1 via the communication unit 51. Further, the control unit 52 outputs video data related to the content to each display device 7 via the output I / F 56. A plurality of types of sound data and video data related to the content are stored, and are simultaneously read by the number of display devices (three in this embodiment).

なお、音データについては、同時に複数読み出してブロードキャストで全音声端末１に送信するようにしてもよいが、音声端末１から要求がなされたコンテンツに係る音データを読み出し、ユニキャストで送信してもよい。例えば、本実施形態では、図１に示すように、３つの表示装置７にそれぞれｃｈ．１（豆知識チャンネル）、ｃｈ．２（健康チャンネル）、ｃｈ．３（宣伝広告チャンネル）のコンテンツ映像が表示されている。そのため、ユーザは、音声端末１の操作部１８のうち、図２（Ｂ）に示すように、「１」、「２」、「３」と表示されたボタンを押下する。例えばユーザが「１」と表示されたボタンを押下すると、制御部１６は、ｃｈ．１のコンテンツの音データの配信要求を行う。すると、サーバ５の制御部５２は、ｃｈ．１のコンテンツの音データを要求がなされた音声端末１に送信する。 Note that a plurality of sound data may be simultaneously read out and transmitted to all the audio terminals 1 by broadcast. However, the sound data related to the content requested from the audio terminal 1 may be read out and transmitted by unicast. Good. For example, in the present embodiment, as shown in FIG. 1 (bean knowledge channel), ch. 2 (health channel), ch. 3 (promotional advertising channel) content video is displayed. Therefore, the user presses buttons displayed as “1”, “2”, and “3” in the operation unit 18 of the voice terminal 1 as illustrated in FIG. For example, when the user presses a button labeled “1”, the control unit 16 changes the ch. A distribution request for sound data of one content is made. Then, the control unit 52 of the server 5 performs ch. The sound data of one content is transmitted to the voice terminal 1 that has been requested.

サーバ５は、このようにして、コンテンツに係る音データを音声端末１に送信する。音声端末１の制御部１６は、通信部１７を介してコンテンツに係る音データを受信し、再生処理を行う。例えば、音データがエンコードされた圧縮データであればデコードし、デジタル音声信号に変換し、信号処理部１３に出力する。また、サーバ５から複数の音データが同時に送信された場合、操作部１８のうち、押下されたボタンに対応するチャンネルのデジタル音声信号のみ信号処理部１３に出力する。 In this way, the server 5 transmits the sound data related to the content to the audio terminal 1. The control unit 16 of the audio terminal 1 receives sound data related to the content via the communication unit 17 and performs a reproduction process. For example, if the sound data is encoded compressed data, it is decoded, converted into a digital audio signal, and output to the signal processing unit 13. When a plurality of sound data are transmitted from the server 5 simultaneously, only the digital audio signal of the channel corresponding to the pressed button in the operation unit 18 is output to the signal processing unit 13.

信号処理部１３は、入力されたデジタル音声信号の音量や周波数特性等を調整し、Ｄ／Ａコンバータ１４に出力する。信号処理部１３から出力されたデジタル音声信号は、Ｄ／Ａコンバータ１４でアナログ音声信号に変換され、スピーカ１５から放音される。このようにして、待合場所に居る各ユーザ９２にコンテンツの音声が出力される。なお、コンテンツの音声は、スピーカではなく、ヘッドフォンを介して各ユーザ９２が聞くようにしてもよい。 The signal processing unit 13 adjusts the volume and frequency characteristics of the input digital audio signal and outputs the adjusted signal to the D / A converter 14. The digital audio signal output from the signal processing unit 13 is converted into an analog audio signal by the D / A converter 14 and emitted from the speaker 15. In this way, the audio of the content is output to each user 92 in the waiting place. In addition, you may make it each user 92 listen to the audio | voice of a content not through a speaker but through headphones.

なお、図２（Ｂ）に示すように、音声端末１は、「１」、「２」、「３」と表示されたボタン意外にも「ＯＦＦ」と表示されたボタンを備えている。ユーザが「ＯＦＦ」と表示されたボタンを押下すると、制御部１６は、音データの再生処理を停止する、あるいは、サーバ５に音データの配信を停止する要求を行う。これにより、コンテンツの音声を出力しないように設定することもできる。なお、このとき、コンテンツの音声に代えて、マスカ音を補助する音声を出力するようにしてもよい。マスカ音を補助する音声としては、マスカ音出力装置３が出力するマスカ音と同じものを出力する態様や、その一部（背景音だけ、演出音だけ、撹乱音＋背景音、撹乱音＋演出音、背景音＋演出音、等）を出力する態様も可能である。いずれにしても、ユーザが「ＯＦＦ」と表示されたボタンを押下すると、制御部１６は、補助音声の配信要求を行う。すると、サーバ５の制御部５２は、マスカ音を補助する音声（マスカ音出力装置３に出力しているマスカ音や、その一部）を要求がなされた音声端末１に送信する。これにより、マスカ音を補助する音声がユーザの直近で出力されるため、仮にマスカ音出力装置３から出力されるマスカ音の音量が低くとも、十分なマスキング効果を得ることができる。 As shown in FIG. 2B, the voice terminal 1 includes a button displayed as “OFF” in addition to the buttons displayed as “1”, “2”, and “3”. When the user presses a button displayed as “OFF”, the control unit 16 requests the server 5 to stop the sound data reproduction process or stop the distribution of the sound data. Thereby, it can be set not to output the audio of the content. At this time, instead of the audio of the content, an audio assisting the masker sound may be output. As the voice assisting the masker sound, the same masker sound output by the masker sound output device 3 is output, or a part thereof (only the background sound, only the effect sound, disturbing sound + background sound, disturbing sound + effect) Sound, background sound + effect sound, etc.) may be output. In any case, when the user presses the button displayed as “OFF”, the control unit 16 makes a request for delivery of auxiliary sound. Then, the control unit 52 of the server 5 transmits a voice assisting the masker sound (a masker sound output to the masker sound output device 3 or a part thereof) to the requested voice terminal 1. Thereby, since the voice assisting the masker sound is output in the immediate vicinity of the user, a sufficient masking effect can be obtained even if the volume of the masker sound output from the masker sound output device 3 is low.

また、本実施形態に示す音声端末１は、マイク１１を備えており、対話カウンタ内におけるスタッフ９１およびユーザ９０の発話音声を収音することができるようになっている。マイク１１は、収音した音声に係るアナログ音声信号をＡ／Ｄコンバータ１２に出力する。Ａ／Ｄコンバータ１２は、入力されたアナログ音声信号をデジタル音声信号に変更し、制御部１６に出力する。制御部１６は、入力されたデジタル音声信号をそのまま、あるいはＭＰ３等の圧縮データにエンコードし、通信部１７を介してサーバ５に送信する。 In addition, the voice terminal 1 shown in the present embodiment includes a microphone 11 so that the voices of the staff 91 and the user 90 in the dialogue counter can be collected. The microphone 11 outputs an analog audio signal related to the collected audio to the A / D converter 12. The A / D converter 12 changes the input analog audio signal to a digital audio signal and outputs the digital audio signal to the control unit 16. The control unit 16 encodes the input digital audio signal as it is, or encodes it into compressed data such as MP3, and transmits it to the server 5 via the communication unit 17.

サーバ５に送信された対話カウンタ内の会話音声に係る音データは、制御部５２を介してコンテンツ記憶部５５に録音データとして蓄積される。このサーバ５に送信された音データは、マスカ音生成部５３において、撹乱音の生成に用いられる。あるいは、ユーザが自宅のＰＣ等を用いてインターネットを経由して録音データを読み出し、会話内容（例えば薬の説明）を再度聞き直すこともできる。 The sound data related to the conversation voice in the conversation counter transmitted to the server 5 is accumulated as recorded data in the content storage unit 55 via the control unit 52. The sound data transmitted to the server 5 is used by the masker sound generator 53 to generate a disturbing sound. Alternatively, the user can read the recorded data via the Internet using a home PC or the like, and listen to the conversation content (for example, explanation of the medicine) again.

図５（Ａ）は、音声端末１で収音された音声信号に係る音データをサーバ５に送信し、サーバ５がマスカ音を生成する場合のサーバ５、音声端末１、およびマスカ音出力装置３の動作を示したフローチャートである。まず、音声端末１の制御部１６は、所定レベル（ノイズと区別できる程度のレベル）以上の音声信号が入力され、マイク１１で音声を収音したか否かを判断する（ｓ１）。制御部１６は、音声を収音していると判断した場合（ｓ１，Ｙｅｓ）、入力した音声信号をそのまま、あるいはＭＰ３等の圧縮データにエンコードし、サーバ５に送信する（ｓ２）。サーバ５の制御部５２は、音声端末１から送信された音声信号（音データ）を受信し（ｓ３）、録音データとしてコンテンツ記憶部５５に記憶する（ｓ４）。 FIG. 5 (A) shows the server 5, the voice terminal 1, and the masker sound output device when the sound data related to the voice signal collected by the voice terminal 1 is transmitted to the server 5 and the server 5 generates the masker sound. 3 is a flowchart showing the operation of FIG. First, the control unit 16 of the voice terminal 1 determines whether or not a voice signal of a predetermined level (a level that can be distinguished from noise) or more is input and the microphone 11 has picked up the voice (s1). When it is determined that the voice is being picked up (s1, Yes), the control unit 16 encodes the input voice signal as it is or into compressed data such as MP3 and transmits it to the server 5 (s2). The control unit 52 of the server 5 receives the audio signal (sound data) transmitted from the audio terminal 1 (s3) and stores it in the content storage unit 55 as recorded data (s4).

なお、複数の音声端末１から音声信号（音データ）を受信する場合、それぞれの録音データを区別できるようにコンテンツ記憶部５５に記憶しておくことが好ましい。例えば、各ユーザ９２が受付場所のスタッフ９３から音声端末１を受け取るとき、ユーザ毎に固有の識別情報（ＩＤ）を発行する。各ユーザ９２が呼び出しを受けて対話カウンタ内に行くと、対話カウンタ内のスタッフ９１は、このスタッフ９１の付近に設置された専用の端末（不図示）を操作して各ユーザの識別情報、およびそのユーザが使用している音声端末１の識別情報（製造番号等）をサーバ５に送信する。あるいは、ユーザ９２から受け取った音声端末１を操作して、各ユーザの識別情報の送信を行う。そして、音声端末１は、上記ｓ３の処理において、自身の識別情報（製造番号等）を音声信号とともに送信する。サーバ５は、受信した音声信号、ユーザの識別情報、および音声端末１の識別情報を対応づけてコンテンツ記憶部５５に記憶する。これにより、後述（図５（Ｂ））の録音データの再生動作において、各ユーザが再生する録音データを識別することができる。なお、後述（図５（Ｂ））の録音データの再生動作を実行しない場合、音声端末１から受信した音声信号は、マスカ音生成のために一時的に保持するだけでよく、コンテンツ記憶部５５に録音データとして蓄積する必要はない。 When receiving audio signals (sound data) from a plurality of audio terminals 1, it is preferable to store the recorded data in the content storage unit 55 so as to be distinguished from each other. For example, when each user 92 receives the voice terminal 1 from the staff 93 at the reception place, unique identification information (ID) is issued for each user. When each user 92 receives the call and goes into the dialogue counter, the staff 91 in the dialogue counter operates a dedicated terminal (not shown) installed in the vicinity of the staff 91 to identify each user's identification information, and The identification information (manufacturing number and the like) of the voice terminal 1 used by the user is transmitted to the server 5. Alternatively, the voice terminal 1 received from the user 92 is operated to transmit identification information of each user. And the audio | voice terminal 1 transmits its identification information (manufacturing number etc.) with an audio | voice signal in the process of said s3. The server 5 stores the received audio signal, the user identification information, and the identification information of the audio terminal 1 in the content storage unit 55 in association with each other. Thereby, the recording data reproduced by each user can be identified in the recording data reproduction operation described later (FIG. 5B). Note that if the recording data playback operation described later (FIG. 5B) is not executed, the audio signal received from the audio terminal 1 need only be temporarily stored for generating masker sound, and the content storage unit 55 There is no need to store it as recorded data.

次に、制御部５２は、マスカ音（撹乱音）を生成する処理を行う（ｓ５）。ここで、撹乱音は、現在会話が行われている対話カウンタ内で収音された音声信号から生成することが好ましい。すなわち、現時点で音声端末１から受信している音声信号を、時間軸上あるいは周波数軸上で改変し、撹乱音を生成する処理を行う。生成した撹乱音は、コンテンツ記憶部５３に記憶されている他の音データ（背景音や効果音）と合成してマスカ音とする。無論、コンテンツ記憶部５５に記憶されている録音データのうち、最新の録音データを読み出して撹乱音を生成するようにしてもよい。また、複数の音声端末１から音声信号（話者の音声）を受信した場合、これら複数の音声信号を合成（ミキシング）した後に時間軸上あるいは周波数軸上で改変し、撹乱音を生成することが好ましい。 Next, the control part 52 performs the process which produces | generates a masker sound (disturbance sound) (s5). Here, it is preferable that the disturbing sound is generated from an audio signal collected in the dialogue counter in which conversation is currently being performed. In other words, the audio signal currently received from the audio terminal 1 is modified on the time axis or the frequency axis to generate a disturbing sound. The generated disturbing sound is combined with other sound data (background sound or sound effect) stored in the content storage unit 53 to be a masker sound. Of course, the latest recorded data among the recorded data stored in the content storage unit 55 may be read to generate a disturbing sound. In addition, when voice signals (speaker's voice) are received from a plurality of voice terminals 1, these voice signals are synthesized (mixed) and then modified on the time axis or frequency axis to generate a disturbing sound. Is preferred.

その後、制御部５２は、生成したマスカ音をマスカ音出力装置３に出力する（ｓ６）。マスカ音出力装置３は、サーバ５が送信したマスカ音を受信し（ｓ７）、再生処理を行う（ｓ８）。以上のようにして対話カウンタ内の会話音声に応じた最適な撹乱音を生成することができる。 Thereafter, the control unit 52 outputs the generated masker sound to the masker sound output device 3 (s6). The masker sound output device 3 receives the masker sound transmitted by the server 5 (s7) and performs a reproduction process (s8). As described above, it is possible to generate an optimum disturbing sound corresponding to the conversation voice in the conversation counter.

次に、図５（Ｂ）は、会話内容の記録をする場合のサーバ５、および他の情報処理装置（例えば自宅ＰＣ）の動作を示したフローチャートである。同図に示す自宅ＰＣの動作は、当該自宅ＰＣにインストールされたアプリケーションや、ＷＥＢブラウザ上の特定のスクリプト等によって実現される。自宅ＰＣのハードウェア構成は、一般的なパーソナルコンピュータと同様であるため、図示および説明を省略する。 Next, FIG. 5B is a flowchart showing operations of the server 5 and other information processing apparatuses (for example, home PCs) when recording conversation contents. The operation of the home PC shown in the figure is realized by an application installed on the home PC, a specific script on the WEB browser, or the like. Since the hardware configuration of the home PC is the same as that of a general personal computer, illustration and description are omitted.

まず、自宅ＰＣは、各ユーザが自宅ＰＣを操作して、録音データの再生指示を行ったか否かを判断する（ｓ７１）。例えば、各ユーザが上記アプリケーションを操作して、固有の識別情報（ＩＤ）を入力したか否かを判断する。ＩＤが入力されて録音データの再生指示が行われた場合（ｓ７１、Ｙｅｓ）、自宅ＰＣは、インターネットを経由して、入力されたＩＤをサーバ５に送信する（ｓ７２）。サーバ５は、自宅ＰＣからＩＤを受信し（ｓ７３）、コンテンツ記憶部５５に記憶されている録音データのうち、受信したＩＤに対応づけられている録音データを読み出す（ｓ７４）。そして、読み出した録音データを自宅ＰＣに送信する（ｓ７５）。自宅ＰＣは、送信された録音データを受信し（ｓ７６）、再生処理を行う（ｓ７７）。なお、各ＩＤに対応づけられている録音データが複数蓄積されている場合、サーバ５は、ｓ７３の処理の後に、録音データのリスト（録音日時等の一覧情報）を自宅ＰＣに送信し、どの録音データを再生するのかを受け付ける処理を行う。ユーザが自宅ＰＣを操作して、受信したリストから再生したい録音データを選択すると、選択した録音データがサーバ５から自宅ＰＣに送信される。 First, the home PC determines whether or not each user operates the home PC to give an instruction to reproduce recorded data (s71). For example, it is determined whether each user operates the application and inputs unique identification information (ID). When the ID is input and the reproduction instruction of the recorded data is performed (s71, Yes), the home PC transmits the input ID to the server 5 via the Internet (s72). The server 5 receives the ID from the home PC (s73), and reads out the recorded data associated with the received ID from the recorded data stored in the content storage unit 55 (s74). Then, the read recording data is transmitted to the home PC (s75). The home PC receives the transmitted recording data (s76) and performs playback processing (s77). If a plurality of recording data associated with each ID is accumulated, the server 5 transmits a list of recording data (list information such as recording date and time) to the home PC after the process of s73, A process of accepting whether to play the recorded data is performed. When the user operates the home PC and selects recorded data to be reproduced from the received list, the selected recorded data is transmitted from the server 5 to the home PC.

このようにして、音声端末１を用いて録音した会話内容は、録音データとしてサーバ５に蓄積しておくことにより、ユーザやユーザの家族等が、自宅ＰＣを用いて会話内容を再生させることが可能となり、薬の説明や注意事項等を再確認することができる。 Thus, the conversation content recorded using the voice terminal 1 is stored in the server 5 as recorded data, so that the user, the user's family, and the like can reproduce the conversation content using the home PC. It becomes possible, and explanation of medicines and precautions can be reconfirmed.

次に、図６および図７のフローチャートを参照して、ユーザが銀行や薬局等で受付を行って音声端末１を受け取った場合のサーバ５と音声端末１の動作を詳細に説明する。図６および図７に示す動作は、音声端末１の電源が投入されると開始される。例えば、ユーザが受付カウンタで音声端末を受け取るときに、スタッフ９３が音声端末１の電源を投入するため、これらの動作が開始される。また、ユーザが操作部１８の各ボタンを押下したときにもこれらの動作が開始される。 Next, the operations of the server 5 and the voice terminal 1 when the user receives the voice terminal 1 after receiving it at a bank, a pharmacy, or the like will be described in detail with reference to the flowcharts of FIGS. The operations shown in FIGS. 6 and 7 are started when the power of the voice terminal 1 is turned on. For example, when the user receives the voice terminal at the reception counter, the staff 93 turns on the voice terminal 1, so these operations are started. These operations are also started when the user presses each button on the operation unit 18.

まず、図６において、音声端末１の制御部１６は、ユーザが操作部１８の各種チャンネルボタンを押下し、チャンネル指定がなされているか否かを確認する（ｓ１１）。制御部１６は、チャンネル指定がなされていれば（ｓ１１、Ｙｅｓ）、サーバ５に該当チャンネルの音データの配信要求を行う（ｓ１２）。サーバ５の制御部５２は、配信要求を受信すると（ｓ１３）、要求されているチャンネルのコンテンツの音データを、要求がなされた音声端末１に送信する（ｓ１４）。そして、音声端末１は、送信されたコンテンツの音データを受信し（ｓ１５）、再生処理を行う（ｓ１６）。 First, in FIG. 6, the control unit 16 of the voice terminal 1 confirms whether or not a channel is designated by the user pressing the various channel buttons of the operation unit 18 (s11). If the channel is designated (s11, Yes), the control unit 16 requests the server 5 to distribute the sound data of the corresponding channel (s12). Upon receiving the distribution request (s13), the control unit 52 of the server 5 transmits the sound data of the requested channel content to the requested voice terminal 1 (s14). Then, the audio terminal 1 receives the sound data of the transmitted content (s15) and performs a reproduction process (s16).

一方、制御部１６は、チャンネル指定がなされていない（ｓ１１、Ｎｏ）、すなわち「ＯＦＦ」のボタンが押下されていた場合や、電源投入直後である場合、マスカ音を補助する音声の配信要求を行う（ｓ１７）。サーバ５の制御部５２は、配信要求を受信すると（ｓ１８）、マスカ音を補助する音声（例えばマスカ音出力装置３に送信しているマスカ音の音データとおなじもの）を、要求がなされた音声端末１に送信する（ｓ１９）。そして、音声端末１は、送信されたマスカ音を補助する音声（音データ）を受信し（ｓ２０）、再生処理を行う（ｓ２１）。 On the other hand, when the channel is not designated (s11, No), that is, when the “OFF” button is pressed or immediately after the power is turned on, the control unit 16 makes a request for distributing the audio to assist the masker sound. Perform (s17). When the control unit 52 of the server 5 receives the distribution request (s18), the control unit 52 requests the voice assisting the masker sound (for example, the same sound data of the masker sound transmitted to the masker sound output device 3). It transmits to the voice terminal 1 (s19). The voice terminal 1 receives the voice (sound data) that assists the transmitted masker sound (s20), and performs the reproduction process (s21).

なお、図６の例では、チャンネル指定がなされていないとき、および電源投入直後である場合にマスカ音を補助する音声を出力する例を示したが、電源投入後から最初にチャンネル指定が行われるまでは、所定のコンテンツ（例えば宣伝広告チャンネル）の音声を出力する態様としてもよい。 In the example of FIG. 6, an example of outputting a voice assisting a masker sound when channel designation is not made and immediately after power-on is shown, but channel designation is performed first after power-on. Up to the above, it is possible to output sound of predetermined content (for example, an advertising channel).

次に、図７において、サーバ５の制御部５２は、呼び出し端末の指定がなされたか否かを判断する（ｓ３１）。この呼び出し端末の指定は、例えば対話カウンタ内のスタッフ９１の付近に設置された専用の端末（不図示）を操作することで行われる。このとき、呼び出し対象ユーザの指定も行われる。あるいは、対話カウンタ内のスタッフ９１がユーザ９０から音声端末１の返却を受け、説明が終了してユーザ９０が退席したのちに、この返却を受けた音声端末１を操作して、呼び出し対象を指定する態様としてもよい。スタッフ９１が呼び出し操作を行うと、専用の端末（あるいは音声端末１）からサーバ５に呼び出し信号が送信され、ｓ３１の判断においてＹｅｓに進むことになる。 Next, in FIG. 7, the control unit 52 of the server 5 determines whether or not a calling terminal is designated (s31). The designation of the calling terminal is performed, for example, by operating a dedicated terminal (not shown) installed near the staff 91 in the dialogue counter. At this time, the user to be called is also specified. Alternatively, after the staff 91 in the dialogue counter receives the return of the voice terminal 1 from the user 90 and the explanation is finished and the user 90 leaves the seat, the voice terminal 1 that has received the return is operated to specify the call target. It is good also as an aspect to do. When the staff 91 performs a call operation, a call signal is transmitted from the dedicated terminal (or voice terminal 1) to the server 5, and the process proceeds to Yes in the determination of s31.

制御部５２は、呼び出し対象の音声端末１に対して呼び出し音声に係る音データを送信する（ｓ３２）。そして、呼び出し対象の音声端末１は、呼び出し音声に係る音データを受信し（ｓ３３）、呼び出し音声を再生する（ｓ３４）。 The control unit 52 transmits sound data related to the calling voice to the calling voice terminal 1 (s32). Then, the calling voice terminal 1 receives the sound data related to the calling voice (s33) and reproduces the calling voice (s34).

以上の様にして、本実施形態の音声出力システムでは、マスカ音出力装置３からある程度の音量でマスカ音を出力し、マスキング効果を確保しながら、ユーザが携帯する音声端末１から呼び出し音声を出力して必要な音声を適切な音量で聞くことができる。 As described above, in the voice output system of the present embodiment, the masker sound is output from the masker sound output device 3 at a certain volume, and the calling voice is output from the voice terminal 1 carried by the user while ensuring the masking effect. And you can hear the necessary sound at an appropriate volume.

なお、本実施形態では、サーバ５からマスカ音出力装置３にマスカ音に係る音データをダウンロード（あるいはストリーミング）する例を示したが、マスカ音出力装置３内に記憶部を設け、内部の記憶部からマスカ音に係る音データを読み出す態様も可能である。また、この場合、マスカ音出力装置３には、通信機能が不要になる。 In the present embodiment, an example in which sound data related to masker sound is downloaded (or streamed) from the server 5 to the masker sound output device 3 is shown. However, a storage unit is provided in the masker sound output device 3 to store internal data. It is also possible to read out sound data related to masker sound from the unit. In this case, the masker sound output device 3 does not require a communication function.

また、音声端末１は、本実施形態に示した音声出力システムに専用の装置でなくとも、一般的な携帯電話等の携帯端末およびソフトウェアを用いて実現することが可能である。 Further, the voice terminal 1 can be realized by using a portable terminal such as a general cellular phone and software, instead of a device dedicated to the voice output system shown in the present embodiment.

また、マスカ音出力装置３やサーバ５についても、本実施形態に示した音声出力システムに専用の装置でなくとも、一般的なパーソナルコンピュータ等の情報処理装置のハードウェアおよびソフトウェアを用いて実現可能である。 Further, the masker sound output device 3 and the server 5 can be realized by using hardware and software of an information processing device such as a general personal computer, instead of a device dedicated to the sound output system shown in the present embodiment. It is.

また、本実施形態では、表示装置７に表示されるコンテンツに関する音声が音声端末
から出力される例を示したが、表示装置７がなくとも、所定のコンテンツ（例えば音楽等）を出力することも可能である。 Further, in the present embodiment, an example in which audio related to content displayed on the display device 7 is output from the audio terminal is shown, but predetermined content (for example, music or the like) may be output without the display device 7. Is possible.

１…音声端末
３…マスカ音出力装置
５…サーバ
７…表示装置
１１…マイク
１２…Ａ／Ｄコンバータ
１３…信号処理部
１４…Ｄ／Ａコンバータ
１５…スピーカ
１６…制御部
１７…通信部
１８…操作部
３１…通信部
３２…制御部
３３…信号処理部
３４…Ｄ／Ａコンバータ
３５…スピーカ DESCRIPTION OF SYMBOLS 1 ... Voice terminal 3 ... Masker sound output device 5 ... Server 7 ... Display apparatus 11 ... Microphone 12 ... A / D converter 13 ... Signal processing part 14 ... D / A converter 15 ... Speaker 16 ... Control part 17 ... Communication part 18 ... Operation unit 31 ... communication unit 32 ... control unit 33 ... signal processing unit 34 ... D / A converter 35 ... speaker

Claims

A masker sound output device that outputs a masker sound;
A terminal device carried by the user;
An audio output system comprising:
The masker sound output device is installed to output a masker sound to the user,
The terminal device includes voice output means for outputting a voice required for each user ,
The sound output unit of the terminal device outputs a sound assisting the masker sound .

The audio output system according to claim 1, wherein the audio output unit of the terminal device outputs audio of predetermined content.

  A masker sound output device that outputs a masker sound;
  A terminal device carried by the user;
  An audio output system comprising:
  The masker sound output device is installed to output a masker sound to the user,
  The terminal device includes voice output means for outputting a voice required for each user,
  An audio output system, wherein the audio output means of the terminal device outputs audio of a predetermined content.

The terminal device includes a receiving unit that receives selection of the content from a user,
The audio output system according to claim 2 or 3, wherein the audio output means outputs the audio of the content selected by the accepting means.

The terminal apparatus includes a reception means for receiving a selection of a user or Rako content,
It said voice output means, when said not received the selection of content by accepting means, audio output system according to claim 1 or claim 2 outputs audio to assist the masking sound.

It further comprises a display device for displaying content video,
It said voice output means, audio output system according to any one of claims 1 to 5, characterized in that outputs audio of the content corresponding to the image displayed on the display device.