JP2019140517A

JP2019140517A - Information processing device and program

Info

Publication number: JP2019140517A
Application number: JP2018021826A
Authority: JP
Inventors: 靖飯田; Yasushi Iida
Original assignee: Fuji Xerox Co Ltd
Current assignee: Fujifilm Business Innovation Corp
Priority date: 2018-02-09
Filing date: 2018-02-09
Publication date: 2019-08-22

Abstract

To prevent the generation of echos even if a microphone and a speaker are provided for each member of a group and there are a plurality of members in the same room.SOLUTION: Information processing devices 10A-10C used in a web conference transmit a voice signal representing collected voice to a server device 20 if a sound pressure level of voice collected by a microphone is equal to or greater than a predetermined threshold value. The server device 20 transmits the voice signal transmitted from the information processing devices 10A-10C to the information processing devices 10A-10C. The information processing devices 10A-10C do not output voice represented by the received voice signal if a device that transmitted the received voice signal is a device in the same room.SELECTED DRAWING: Figure 1

Description

本発明は、情報処理装置及びプログラムに関する。 The present invention relates to an information processing apparatus and a program.

音声会議システムにおいてエコーの発生を防ぐ発明として、例えば特許文献１に開示された音声会議システムがある。この音声会議システムは、マイクとスピーカを備えた音声会議装置を複数台接続した構成であり、音声会議装置は、自装置が担当する会議出席者の音声が入力されていない場合は、マイクからの信号を他装置へ出力せず、他装置で収音された音声をスピーカから放音する。また、音声会議装置は、自装置が担当する会議出席者の音声が入力されている場合は、マイクからの信号を他装置へ出力し、スピーカをオフにする。 As an invention for preventing the occurrence of echoes in an audio conference system, for example, there is an audio conference system disclosed in Patent Document 1. This audio conference system has a configuration in which a plurality of audio conference devices each having a microphone and a speaker are connected, and the audio conference device is connected to the audio from the microphone when the audio of the conference attendee in charge of the own device is not input. Without outputting the signal to the other device, the sound collected by the other device is emitted from the speaker. Also, the voice conference device outputs a signal from the microphone to the other device and turns off the speaker when the voice of the conference attendee in charge of the own device is input.

特開２００８−１４７８２２号公報JP 2008-147822 A

集団が通信回線を介して会話を行う際においては、遠隔地のメンバーと会話を行う際に、同じ部屋に複数人のメンバーが集まり、メンバーの各々がマイク及びスピーカを備えた端末装置で会話を行う方法もある。この場合、複数人のメンバーがいる部屋においては、発話したメンバーの音声は、同じ部屋にいる他のメンバーの端末装置のマイクで収音され、発話したメンバーの端末装置へ通信回線を介して送信され、発話したメンバーの端末装置のスピーカから出力されて発話したメンバーへ届いてしまう。 When a group has a conversation via a communication line, when a conversation is made with a member at a remote location, multiple members gather in the same room, and each member has a conversation with a terminal device equipped with a microphone and a speaker. There is also a way to do it. In this case, in a room where there are multiple members, the voice of the speaking member is picked up by the microphone of the terminal device of the other member in the same room and transmitted to the terminal device of the speaking member via a communication line. Then, it is output from the speaker of the terminal device of the speaking member and reaches the speaking member.

本発明は、集団のメンバー毎にマイクロフォンとスピーカが設けられ、同じ部屋に複数のメンバーがいてもエコーの発生を抑えることを目的とする。 An object of the present invention is to suppress the occurrence of echo even if a microphone and a speaker are provided for each member of a group and there are a plurality of members in the same room.

本発明の請求項１に係る情報処理装置は、マイクロフォンから供給される音声信号を取得する第１取得手段と、通信回線を介して供給される音声信号を取得する第２取得手段と、前記第１取得手段が取得した音声信号が表す音声の音圧レベルが予め定められた閾値以上である場合、前記第１取得手段が取得した音声信号を前記通信回線へ出力し、前記音圧レベルが予め定められた閾値未満である場合、前記第１取得手段が取得した音声信号を前記通信回線へ出力しない出力手段と、前記第２取得手段が取得した音声信号をスピーカへ供給する手段であって、前記音圧レベルが予め定められた閾値以上である場合、前記第２取得手段が取得した音声信号をスピーカへ供給しない供給手段とを備える。 An information processing apparatus according to claim 1 of the present invention includes a first acquisition unit that acquires an audio signal supplied from a microphone, a second acquisition unit that acquires an audio signal supplied via a communication line, and the first acquisition unit. When the sound pressure level of the sound represented by the sound signal acquired by the one acquisition means is greater than or equal to a predetermined threshold, the sound signal acquired by the first acquisition means is output to the communication line, and the sound pressure level is set in advance. An output unit that does not output the audio signal acquired by the first acquisition unit to the communication line, and a unit that supplies the audio signal acquired by the second acquisition unit to a speaker when the threshold value is less than a predetermined threshold; When the sound pressure level is equal to or higher than a predetermined threshold value, a supply unit that does not supply the audio signal acquired by the second acquisition unit to the speaker is provided.

本発明の請求項２に係る情報処理装置は、前記第２取得手段が取得した音声信号が自装置と同部屋にある装置から出力された音声信号であるか判定する判定手段を有し、前記供給手段は、前記音圧レベルが予め定められた閾値未満であり、前記第２取得手段が取得した音声信号が自装置と同部屋にある装置から出力された音声信号であると前記判定手段が判定した場合、前記第２取得手段が取得した音声信号を前記スピーカへ供給しない構成である。 The information processing apparatus according to claim 2 of the present invention includes a determination unit that determines whether the audio signal acquired by the second acquisition unit is an audio signal output from a device in the same room as the own device, The supply means has the sound pressure level lower than a predetermined threshold value, and the determination means determines that the sound signal acquired by the second acquisition means is an audio signal output from a device in the same room as the own device. If it is determined, the audio signal acquired by the second acquisition unit is not supplied to the speaker.

本発明の請求項３に係る情報処理装置においては、前記音声信号は、当該音声信号を出力した装置の識別子を含み、前記判定手段は、前記第２取得手段が取得した音声信号に含まれる前記識別子が予め登録された識別子である場合、当該音声信号が自装置と同部屋にある装置から出力された音声信号であると判定する。 In the information processing device according to claim 3 of the present invention, the audio signal includes an identifier of the device that output the audio signal, and the determination unit is included in the audio signal acquired by the second acquisition unit. When the identifier is an identifier registered in advance, it is determined that the audio signal is an audio signal output from a device in the same room as the own device.

本発明の請求項４に係る情報処理装置においては、前記判定手段は、前記第１取得手段が取得した音声信号と前記第２取得手段が取得した音声信号を照合して一致した場合、前記第２取得手段が取得した音声信号が自装置と同部屋にある装置から出力された音声信号であると判定する。 In the information processing apparatus according to claim 4 of the present invention, when the determination unit matches the audio signal acquired by the first acquisition unit with the audio signal acquired by the second acquisition unit, 2 Determine that the audio signal acquired by the acquisition means is an audio signal output from a device in the same room as the own device.

本発明の請求項５に係る情報処理装置においては、前記供給手段は、前記第２取得手段が取得した音声信号において自装置と異なる部屋にある装置から出力された音声信号がある場合、自装置と異なる部屋にある装置から出力された音声信号を前記スピーカへ供給する。 In the information processing apparatus according to claim 5 of the present invention, when the supply means has an audio signal output from a device in a room different from the own apparatus in the audio signal acquired by the second acquisition means, An audio signal output from a device in a different room is supplied to the speaker.

本発明の請求項６係る情報処理装置においては、前記供給手段は、自装置と異なる部屋にある装置から出力された音声信号と自装置と同部屋にある装置から出力された音声信号を前記第２取得手段が取得した場合、自装置と同部屋にある装置から出力された音声信号も前記スピーカへ供給する。 In the information processing apparatus according to claim 6 of the present invention, the supply means receives the audio signal output from a device in a room different from the own device and the audio signal output from a device in the same room as the own device. 2 When the acquisition means acquires, an audio signal output from a device in the same room as the own device is also supplied to the speaker.

本発明の請求項７に係る情報処理装置は、自装置のユーザを撮影して当該ユーザの画像を生成する撮像手段と、前記撮像手段が生成した画像において前記ユーザの状態を認識する認識手段を有し、前記出力手段は、前記第１取得手段が取得した音声信号の前記通信回線への出力を前記認識手段が認識した状態に応じて制御する。 An information processing apparatus according to a seventh aspect of the present invention includes an imaging unit that captures an image of a user of the device and generates an image of the user, and a recognition unit that recognizes the state of the user in the image generated by the imaging unit. And the output means controls the output of the audio signal acquired by the first acquisition means to the communication line according to the state recognized by the recognition means.

本発明の請求項８に係る情報処理装置においては、前記認識手段は、前記状態として前記ユーザの視線の方向を認識し、前記出力手段は、前記認識手段が認識した視線の方向が予め定められた方向である場合、前記第１取得手段が取得した音声信号を前記通信回線へ出力しない。 In the information processing apparatus according to claim 8 of the present invention, the recognition unit recognizes the direction of the line of sight of the user as the state, and the output unit determines in advance the direction of the line of sight recognized by the recognition unit. If the direction is the direction, the voice signal acquired by the first acquisition unit is not output to the communication line.

本発明の請求項９に係る情報処理装置においては、前記認識手段は、前記状態として前記ユーザの顔が向いている方向を認識し、前記認識手段が認識した前記方向が予め定められた方向である場合、前記第１取得手段が取得した音声信号を前記通信回線へ出力しない。 In the information processing apparatus according to claim 9 of the present invention, the recognition unit recognizes a direction in which the user's face is facing as the state, and the direction recognized by the recognition unit is a predetermined direction. In some cases, the audio signal acquired by the first acquisition means is not output to the communication line.

本発明の請求項１０に係るプログラムは、コンピュータを、マイクロフォンから供給される音声信号を取得する第１取得手段と、通信回線を介して供給される音声信号を取得する第２取得手段と、前記第１取得手段が取得した音声信号が表す音声の音圧レベルが予め定められた閾値以上である場合、前記第１取得手段が取得した音声信号を前記通信回線へ出力し、前記音圧レベルが予め定められた閾値未満である場合、前記第１取得手段が取得した音声信号を前記通信回線へ出力しない出力手段と、前記第２取得手段が取得した音声信号をスピーカへ供給する手段であって、前記音圧レベルが予め定められた閾値以上である場合、前記第２取得手段が取得した音声信号をスピーカへ供給しない供給手段として機能させるためのプログラムである。 According to a tenth aspect of the present invention, there is provided a computer program comprising: a first acquisition unit that acquires a voice signal supplied from a microphone; a second acquisition unit that acquires a voice signal supplied via a communication line; When the sound pressure level of the sound represented by the sound signal acquired by the first acquisition means is greater than or equal to a predetermined threshold value, the sound signal acquired by the first acquisition means is output to the communication line, and the sound pressure level is An output unit that does not output the audio signal acquired by the first acquisition unit to the communication line, and a unit that supplies the audio signal acquired by the second acquisition unit to a speaker if the threshold value is less than a predetermined threshold; When the sound pressure level is equal to or higher than a predetermined threshold, a program for causing the audio signal acquired by the second acquisition unit to function as a supply unit that does not supply the speaker. .

本発明の請求項１に係る情報処理装置によれば、集団のメンバー毎にマイクロフォンとスピーカが設けられ、同じ部屋に複数のメンバーがいてもエコーの発生を抑えることができる。
本発明の請求項２に係る情報処理装置によれば、マイクロフォンに入力された音声が同じ部屋にある装置のスピーカからは出力されず、エコーを抑えることができる。
本発明の請求項３に係る情報処理装置によれば、通信回線から供給される音声信号が同じ部屋にある装置から供給されたものであるか精度良く判定できる。
本発明の請求項４に係る情報処理装置によれば、通信回線から供給される音声信号が同じ部屋にある装置から供給されたものであるか精度良く判定できる。
本発明の請求項５に係る情報処理装置によれば、他の部屋の音声を聞くことができる。
本発明の請求項６に係る情報処理装置によれば、他の部屋の音声のみを聞くことができる。
本発明の請求項７に係る情報処理装置によれば、他の部屋へ伝えたくない音声を他の部屋へ出力しないようにすることができる。
本発明の請求項８に係る情報処理装置によれば、他の部屋へ伝えたくない音声を精度良く特定することができる。
本発明の請求項９に係る情報処理装置によれば、他の部屋へ伝えたくない音声を精度良く特定することができる。
本発明の請求項１０に係るプログラムによれば、集団のメンバー毎にマイクロフォンとスピーカが設けられ、同じ部屋に複数のメンバーがいてもエコーの発生を抑えることができる。 According to the information processing apparatus of the first aspect of the present invention, a microphone and a speaker are provided for each member of the group, and the occurrence of echo can be suppressed even if there are a plurality of members in the same room.
According to the information processing apparatus of the second aspect of the present invention, the sound input to the microphone is not output from the speaker of the apparatus in the same room, and echo can be suppressed.
According to the information processing apparatus of the third aspect of the present invention, it can be accurately determined whether the audio signal supplied from the communication line is supplied from an apparatus in the same room.
According to the information processing apparatus of the fourth aspect of the present invention, it can be accurately determined whether the audio signal supplied from the communication line is supplied from an apparatus in the same room.
According to the information processing apparatus of the fifth aspect of the present invention, it is possible to hear the sound of another room.
According to the information processing apparatus of the sixth aspect of the present invention, only the sound of another room can be heard.
According to the information processing apparatus of the seventh aspect of the present invention, it is possible to prevent a voice that is not desired to be transmitted to another room from being output to the other room.
According to the information processing apparatus of the eighth aspect of the present invention, it is possible to accurately specify a voice that is not desired to be transmitted to another room.
According to the information processing apparatus of the ninth aspect of the present invention, it is possible to accurately specify a voice that is not desired to be transmitted to another room.
According to the program according to claim 10 of the present invention, a microphone and a speaker are provided for each member of the group, and the occurrence of echo can be suppressed even if there are a plurality of members in the same room.

本発明の一実施形態に係る情報処理装置１０Ａ〜１０Ｃの利用シーンの一例を示した図。The figure which showed an example of the utilization scene of information processing apparatus 10A-10C which concerns on one Embodiment of this invention. 情報処理装置１０のハードウェア構成を示した図。The figure which showed the hardware constitutions of the information processing apparatus 10. プログラムを制御部１０１が実行することにより実現する機能の機能ブロック図。The functional block diagram of the function implement | achieved when the control part 101 performs a program. 情報処理装置１０が表示する画面の一例を示した図。The figure which showed an example of the screen which the information processing apparatus 10 displays. 音声処理部１０７が生成した音声信号の出力を制御する処理のフローチャート。The flowchart of the process which controls the output of the audio | voice signal which the audio | voice process part 107 produced | generated. スピーカへの音声信号の供給を制御する処理のフローチャート。The flowchart of the process which controls supply of the audio | voice signal to a speaker.

［実施形態］
図１は、本発明に係る情報処理装置１０Ａ〜１０Ｃの利用シーンの一例を示した図である。情報処理装置１０Ａ〜１０Ｃは、カメラ及びマイクロフォンを備えた所謂ラップトップ型のコンピュータ装置である。情報処理装置１０は、ラップトップ型のものに限定されず、デスクトップ型であってもよく、また、スマートフォン、タブレット端末などの携帯型の装置であってもよい。情報処理装置１０Ａ〜１０Ｃの構成は同じであるため、以下、各々を区別する必要がない場合は情報処理装置１０と称する。サーバ装置２０は、Ｗｅｂ会議のサービスを提供するサーバ装置である。情報処理装置１０が通信回線３及びサーバ装置２０を介して映像及び音声の送受信を行うことにより、参加者２Ａ〜２ＣがＷｅｂ会議を行う。会議の参加者２Ａ〜２Ｃは、本発明に係る集団の一例であり、参加者２Ａ〜２Ｃの各々は、本発明に係る集団のメンバーの一例である。なお、図１においては、参加者２Ａ、参加者２Ａが使用する情報処理装置１０Ａ、参加者２Ｂ及び参加者２Ｂが使用する情報処理装置１０Ｂが部屋４Ａに存在し、参加者２Ｃ及び参加者２Ｃが使用する情報処理装置１０Ｃが部屋４Ａとは異なる部屋４Ｂに存在している状態を示している。 [Embodiment]
FIG. 1 is a diagram showing an example of usage scenes of information processing apparatuses 10A to 10C according to the present invention. The information processing apparatuses 10A to 10C are so-called laptop computer apparatuses each including a camera and a microphone. The information processing apparatus 10 is not limited to a laptop type, and may be a desktop type or a portable type device such as a smartphone or a tablet terminal. Since the configurations of the information processing apparatuses 10A to 10C are the same, hereinafter, the information processing apparatuses 10A to 10C will be referred to as the information processing apparatus 10 when it is not necessary to distinguish them. The server device 20 is a server device that provides a web conference service. When the information processing apparatus 10 transmits and receives video and audio via the communication line 3 and the server apparatus 20, the participants 2A to 2C conduct a Web conference. The conference participants 2A to 2C are an example of a group according to the present invention, and each of the participants 2A to 2C is an example of a member of the group according to the present invention. In FIG. 1, the participant 2A, the information processing device 10A used by the participant 2A, the participant 2B, and the information processing device 10B used by the participant 2B exist in the room 4A, and the participant 2C and the participant 2C. 10C shows a state in which the information processing apparatus 10C used by is present in a room 4B different from the room 4A.

図２は、情報処理装置１０のハードウェア構成のうち、本発明に係る部分の一例を示した図である。操作部１０４は、操作者からの入力を受け付けるキーボードやタッチパッドなどの入力デバイスを有する。表示部１０３は、ディスプレイ装置を有し、文字やＧＵＩ（Graphical User Interface）、画像などを表示する。通信部１０５は、通信回線３を介した通信を行う通信インターフェースとして機能する。 FIG. 2 is a diagram illustrating an example of a portion according to the present invention in the hardware configuration of the information processing apparatus 10. The operation unit 104 includes an input device such as a keyboard and a touch pad that receives input from the operator. The display unit 103 includes a display device and displays characters, a GUI (Graphical User Interface), an image, and the like. The communication unit 105 functions as a communication interface that performs communication via the communication line 3.

音声処理部１０７は、マイクロフォンとスピーカを有している。音声処理部１０７は、通信部１０５がサーバ装置２０から受信したデジタルの音声信号をアナログの音声信号に変換し、スピーカへ供給する。また、音声処理部１０７は、マイクロフォンが収音した音声を表すアナログの音声信号をデジタルの音声信号に変換し、通信部１０５へ供給する。このデジタルの音声信号は、通信部１０５から通信回線３及びサーバ装置２０を介して他の情報処理装置１０へ送信される。 The audio processing unit 107 has a microphone and a speaker. The audio processing unit 107 converts the digital audio signal received by the communication unit 105 from the server device 20 into an analog audio signal and supplies the analog audio signal to the speaker. The audio processing unit 107 converts an analog audio signal representing the audio collected by the microphone into a digital audio signal and supplies the digital audio signal to the communication unit 105. This digital audio signal is transmitted from the communication unit 105 to the other information processing apparatus 10 via the communication line 3 and the server apparatus 20.

カメラ１０６は、撮像素子、撮像素子に像を結像する光学系、撮像素子へ入射する光を制限する絞りなどを備えている。カメラ１０６は、情報処理装置１０のユーザを撮影し、撮影した像を表す映像信号を生成する。情報処理装置１０を用いてＷｅｂ会議を行う場合、カメラ１０６が生成した映像信号は、通信部１０５から通信回線３及びサーバ装置２０を介して他の情報処理装置１０へ送信される。 The camera 106 includes an image sensor, an optical system that forms an image on the image sensor, a diaphragm that restricts light incident on the image sensor, and the like. The camera 106 captures the user of the information processing apparatus 10 and generates a video signal representing the captured image. When a web conference is performed using the information processing apparatus 10, the video signal generated by the camera 106 is transmitted from the communication unit 105 to the other information processing apparatus 10 via the communication line 3 and the server apparatus 20.

記憶部１０２は、コンピュータ読み取り可能な記録媒体であり、例えば、ハードディスクで構成されている。記憶部１０２は、制御部１０１が実行するプログラムやプログラムを実行した制御部１０１が使用する情報を記憶する。なお、プログラムは、電気通信回線を介して取得してもよい。 The storage unit 102 is a computer-readable recording medium, and is composed of, for example, a hard disk. The storage unit 102 stores a program executed by the control unit 101 and information used by the control unit 101 that has executed the program. The program may be acquired via a telecommunication line.

制御部１０１は、ＣＰＵ（Central Processing Unit）およびメモリを有している。メモリは、コンピュータ読み取り可能な記録媒体であり、例えば、ＲＡＭ（Random Access Memory）で構成されている。情報処理装置１０において実現する機能は、記憶部１０２に記憶されているプログラムをＣＰＵ、メモリなどのハードウェア上に読み込ませることでＣＰＵが演算を行い、記憶部１０２の制御、通信部１０５の制御、音声処理部１０７の制御、カメラ１０６の制御、メモリおよび記憶部１０２における情報の読み出しおよび／または書き込みの制御を行うことで実現される。 The control unit 101 has a CPU (Central Processing Unit) and a memory. The memory is a computer-readable recording medium, and is composed of, for example, a RAM (Random Access Memory). The functions realized in the information processing apparatus 10 are such that the CPU performs an operation by reading a program stored in the storage unit 102 on hardware such as a CPU and a memory, and controls the storage unit 102 and the communication unit 105. This is realized by controlling the audio processing unit 107, controlling the camera 106, and controlling reading and / or writing of information in the memory and storage unit 102.

図３は、記憶部１０２に記憶されているプログラムを制御部１０１が実行することにより実現する機能のうち、本発明に係る機能の構成を示した機能ブロック図である。第１取得部１００１は、音声処理部１０７が有するマイクロフォンから供給される音声信号を取得する。第１取得部１００１は、本発明に係る第１取得手段の一例である。第２取得部１００２は、通信回線３を介して通信部１０５が受信した音声信号を取得する。第２取得部１００２は、本発明に係る第２取得手段の一例である。出力部１００３は、第１取得部１００１が取得した音声信号が表す音声の音圧レベルが予め定められた閾値以上である場合、取得した音声信号を、通信部１０５を制御して通信回線３へ出力し、音圧レベルが予め定められた閾値未満である場合、第１取得部１００１が取得した音声信号を通信回線３へ出力しない。出力部１００３は、本発明に係る出力手段の一例である。判定部１００５は、第２取得部１００２が取得した音声信号が自装置と同部屋にある装置から出力された音声信号であるか判定する。判定部１００５は、本発明に係る判定手段の一例である。供給部１００４は、第２取得部１００２が取得した音声信号をスピーカへ供給する手段であり、第１取得手段が取得した音声信号が表す音声の音圧レベルが予め定められた閾値以上である場合、第２取得部１００２が取得した音声信号をスピーカへ供給しない。また、供給部１００４は、第１取得部１００１が取得した音声信号が表す音声の音圧レベルが予め定められた閾値未満であり、第２取得部１００２が取得した音声信号が自装置と同部屋にある装置から出力された音声信号であると判定部１００５が判定した場合、第２取得部１００２が取得した音声信号をスピーカへ供給しない。供給部１００４は、本発明に係る供給手段の一例である。 FIG. 3 is a functional block diagram showing a configuration of functions according to the present invention among functions realized by the control unit 101 executing a program stored in the storage unit 102. The first acquisition unit 1001 acquires an audio signal supplied from a microphone included in the audio processing unit 107. The first acquisition unit 1001 is an example of a first acquisition unit according to the present invention. The second acquisition unit 1002 acquires the audio signal received by the communication unit 105 via the communication line 3. The second acquisition unit 1002 is an example of a second acquisition unit according to the present invention. When the sound pressure level of the voice represented by the voice signal acquired by the first acquisition unit 1001 is equal to or higher than a predetermined threshold, the output unit 1003 controls the communication unit 105 to transmit the acquired voice signal to the communication line 3. When the sound pressure level is less than a predetermined threshold value, the audio signal acquired by the first acquisition unit 1001 is not output to the communication line 3. The output unit 1003 is an example of an output unit according to the present invention. The determination unit 1005 determines whether the audio signal acquired by the second acquisition unit 1002 is an audio signal output from a device in the same room as the own device. The determination unit 1005 is an example of a determination unit according to the present invention. The supply unit 1004 is a unit that supplies the audio signal acquired by the second acquisition unit 1002 to the speaker, and the sound pressure level of the audio represented by the audio signal acquired by the first acquisition unit is greater than or equal to a predetermined threshold. The audio signal acquired by the second acquisition unit 1002 is not supplied to the speaker. In addition, the supply unit 1004 is configured such that the sound pressure level of the sound represented by the sound signal acquired by the first acquisition unit 1001 is less than a predetermined threshold, and the sound signal acquired by the second acquisition unit 1002 is in the same room as the own device. When the determination unit 1005 determines that the sound signal is output from the device in the device, the sound signal acquired by the second acquisition unit 1002 is not supplied to the speaker. The supply unit 1004 is an example of a supply unit according to the present invention.

次に、情報処理装置１０の動作例について説明する。参加者２Ａ〜２Ｃの各々は、Ｗｅｂ会議を行う場合、自身が使用する情報処理装置１０の操作部１０４を操作してサーバ装置２０へアクセスし、情報処理装置１０をサーバ装置２０に設けられた仮想会議室へ接続させる。情報処理装置１０Ａ〜１０Ｃが仮想会議室へ接続すると、情報処理装置１０Ａ〜１０Ｃは、カメラ１０６が生成した映像信号をサーバ装置２０へ送信する。なお、この映像信号は、情報処理装置１０を使用する参加者の各々を識別するための参加者識別子を含む。例えば、情報処理装置１０Ａが送信する映像信号には、参加者２Ａを識別する参加者識別子が含まれる。サーバ装置２０は、情報処理装置１０から送信された映像信号を、同じ仮想会議室に接続した他の情報処理装置１０へ送信する。 Next, an operation example of the information processing apparatus 10 will be described. Each of the participants 2 A to 2 C operates the operation unit 104 of the information processing apparatus 10 used by the participant 2 A to access the server apparatus 20, and the information processing apparatus 10 is provided in the server apparatus 20. Connect to a virtual meeting room. When the information processing apparatuses 10A to 10C connect to the virtual conference room, the information processing apparatuses 10A to 10C transmit the video signal generated by the camera 106 to the server apparatus 20. The video signal includes a participant identifier for identifying each participant who uses the information processing apparatus 10. For example, the video signal transmitted by the information processing apparatus 10A includes a participant identifier that identifies the participant 2A. The server device 20 transmits the video signal transmitted from the information processing device 10 to another information processing device 10 connected to the same virtual conference room.

例えば、情報処理装置１０Ａから送信された映像信号は、サーバ装置２０を介して情報処理装置１０Ｂと情報処理装置１０Ｃへ送信される。また、情報処理装置１０Ｂから送信された映像信号は、サーバ装置２０を介して情報処理装置１０Ａと情報処理装置１０Ｃへ送信され、情報処理装置１０Ｃから送信された映像信号は、サーバ装置２０を介して情報処理装置１０Ａと情報処理装置１０Ｂへ送信される。 For example, a video signal transmitted from the information processing apparatus 10A is transmitted to the information processing apparatus 10B and the information processing apparatus 10C via the server apparatus 20. Further, the video signal transmitted from the information processing apparatus 10B is transmitted to the information processing apparatus 10A and the information processing apparatus 10C via the server apparatus 20, and the video signal transmitted from the information processing apparatus 10C is transmitted via the server apparatus 20. To the information processing apparatus 10A and the information processing apparatus 10B.

情報処理装置１０は、サーバ装置２０から送信された映像信号を受信し、受信した映像信号が表す映像と、映像信号に含まれている参加者識別子を表示部１０３で表示する。図４は、ここで情報処理装置１０Ａが表示する画面の一例を示した図である。情報処理装置１０Ａにおいては、参加者２Ｂの映像と参加者２Ｃの映像が表示される。また、情報処理装置１０Ａにおいては、参加者２Ｂの映像に下に参加者２Ｂの参加者識別子が表示され、参加者２Ｃの映像に下に参加者２Ｃの参加者識別子が表示される。これにより参加者２Ａは、映像で参加者２Ｂと参加者２Ｃの顔を見ることができる。 The information processing apparatus 10 receives the video signal transmitted from the server apparatus 20, and displays the video represented by the received video signal and the participant identifier included in the video signal on the display unit 103. FIG. 4 is a diagram illustrating an example of a screen displayed by the information processing apparatus 10A. In the information processing apparatus 10A, the video of the participant 2B and the video of the participant 2C are displayed. In the information processing apparatus 10A, the participant identifier of the participant 2B is displayed below the video of the participant 2B, and the participant identifier of the participant 2C is displayed below the video of the participant 2C. Thus, the participant 2A can see the faces of the participant 2B and the participant 2C on the video.

情報処理装置１０Ｂは、参加者２Ａの映像、参加者２Ａの参加者識別子、参加者２Ｃの映像及び参加者２Ｃの参加者識別子を表示する。これにより参加者２Ｂは、映像で参加者２Ａと参加者２Ｃの顔を見ることができる。また、情報処理装置１０Ｃは、参加者２Ａの映像、参加者２Ａの参加者識別子、参加者２Ｂの映像及び参加者２Ｂの参加者識別子を表示する。これにより参加者２Ｃは、映像で参加者２Ａと参加者２Ｂの顔を見ることができる。 The information processing apparatus 10B displays the video of the participant 2A, the participant identifier of the participant 2A, the video of the participant 2C, and the participant identifier of the participant 2C. Thereby, the participant 2B can see the faces of the participant 2A and the participant 2C on the video. Further, the information processing apparatus 10C displays the video of the participant 2A, the participant identifier of the participant 2A, the video of the participant 2B, and the participant identifier of the participant 2B. Thus, the participant 2C can see the faces of the participant 2A and the participant 2B on the video.

Ｗｅｂ会議の参加者２Ａ〜２Ｃは、参加者の映像が表示されると、会議を始めるにあたり、同じ部屋にいる参加者を指定する操作を行う。図４に示したように、参加者の映像の下には、表示された参加者を同じ部屋にいる参加者として指定するためのＧＵＩであるラジオボタンＢ１とラジオボタンＢ２が表示される。図１に示したように参加者２Ａと参加者２Ｂが同じ部屋４Ａにいる場合、参加者２Ａは、情報処理装置１０Ａを操作し、参加者２Ｂの映像の下に表示されたラジオボタンＢ１をクリックする。この操作が行われると、情報処理装置１０Ａは、ラジオボタンＢ１の上方に表示されている参加者２Ｂの参加者識別子を、同じ部屋にいる参加者の参加者識別子として記憶する。また、図１に示したように参加者２Ａと参加者２Ｃが同じ部屋にいない場合、参加者２Ａは、情報処理装置１０Ａを操作し、参加者２Ｃの映像の下に表示されたラジオボタンＢ２をクリックする。なお、情報処理装置１０Ａは、参加者２Ａが参加者２Ｂの映像の下にあるラジオボタンＢ２をクリックする操作を行うと、記憶した参加者２Ｂの参加者識別子を消去する。 When the participants' video is displayed, the Web conference participants 2A to 2C perform an operation of designating the participants in the same room when starting the conference. As shown in FIG. 4, a radio button B1 and a radio button B2, which are GUIs for designating the displayed participant as a participant in the same room, are displayed below the video of the participant. As shown in FIG. 1, when the participant 2A and the participant 2B are in the same room 4A, the participant 2A operates the information processing apparatus 10A and clicks the radio button B1 displayed below the video of the participant 2B. click. When this operation is performed, the information processing apparatus 10A stores the participant identifier of the participant 2B displayed above the radio button B1 as the participant identifier of the participant in the same room. Further, as shown in FIG. 1, when the participant 2A and the participant 2C are not in the same room, the participant 2A operates the information processing apparatus 10A, and the radio button B2 displayed below the video of the participant 2C. Click. Note that the information processing apparatus 10A deletes the stored participant identifier of the participant 2B when the participant 2A performs an operation of clicking the radio button B2 below the video of the participant 2B.

参加者２Ｂは、情報処理装置１０Ｂを操作し、参加者２Ａの映像の下に表示されたラジオボタンＢ１をクリックする。この操作が行われると、情報処理装置１０Ｂは、ラジオボタンＢ１の上方に表示されている参加者２Ａの参加者識別子を、同じ部屋にいる参加者の参加者識別子として記憶する。参加者２Ｃは、図１に示した利用シーンの場合、部屋４Ｂに他の参加者がいないため、参加者２Ａの映像の下に表示されたラジオボタンＢ２をクリックし、参加者２Ｂの映像の下に表示されたラジオボタンＢ２をクリックする。情報処理装置１０Ｃにおいては、ラジオボタンＢ１がクリックされないため、参加者２Ａの参加者識別子と参加者２Ｂの参加者識別子が記憶部１０２に記憶されない。 Participant 2B operates information processing apparatus 10B and clicks radio button B1 displayed below the video of participant 2A. When this operation is performed, the information processing apparatus 10B stores the participant identifier of the participant 2A displayed above the radio button B1 as the participant identifier of the participant in the same room. In the case of the usage scene shown in FIG. 1, the participant 2C clicks the radio button B2 displayed below the video of the participant 2A because there is no other participant in the room 4B, and the video of the participant 2B is displayed. Click the radio button B2 displayed below. In the information processing apparatus 10C, since the radio button B1 is not clicked, the participant identifier of the participant 2A and the participant identifier of the participant 2B are not stored in the storage unit 102.

次に音声処理部１０７が生成した音声信号の処理例について説明する。図５は、音声処理部１０７が生成した音声信号の出力を制御する処理のフローチャートである。情報処理装置１０は、マイクロフォンが収音した音声を表す音声信号を解析し、マイクロフォンが収音した音声の音圧レベルを特定する（ステップＳＡ１）。情報処理装置１０は、マイクロフォンが収音した音声を表す音声信号の出力を、ステップＳＡ１で特定した音圧レベルに応じて制御する。 Next, a processing example of the audio signal generated by the audio processing unit 107 will be described. FIG. 5 is a flowchart of a process for controlling the output of the audio signal generated by the audio processing unit 107. The information processing apparatus 10 analyzes the audio signal representing the sound collected by the microphone and specifies the sound pressure level of the sound collected by the microphone (step SA1). The information processing apparatus 10 controls the output of the audio signal representing the sound collected by the microphone according to the sound pressure level specified in step SA1.

具体的には、情報処理装置１０は、マイクロフォンが収音した音声の音圧レベルが予め定められた閾値以上である場合（ステップＳＡ２でＹＥＳ）、マイクロフォンが収音した音声を表す音声信号をサーバ装置２０へ送信する（ステップＳＡ３）。なお、サーバ装置２０へ送信される音声信号は、情報処理装置１０を使用する参加者の参加者識別子を含む。また、情報処理装置１０は、マイクロフォンが収音した音声の音圧レベルが予め定められた閾値未満である場合（ステップＳＡ２でＮＯ）、音声信号をサーバ装置２０へ送信しない（ステップＳＡ４）。 Specifically, when the sound pressure level of the sound collected by the microphone is equal to or higher than a predetermined threshold (YES in step SA2), the information processing apparatus 10 stores a sound signal representing the sound collected by the microphone. It transmits to the apparatus 20 (step SA3). Note that the audio signal transmitted to the server device 20 includes the participant identifier of the participant who uses the information processing device 10. Further, when the sound pressure level of the sound collected by the microphone is less than a predetermined threshold (NO in step SA2), the information processing apparatus 10 does not transmit the sound signal to the server apparatus 20 (step SA4).

サーバ装置２０は、情報処理装置１０から送信された音声信号を受信し、受信した音声信号を仮想会議室に接続している全ての情報処理装置１０へ送信する。即ち、本実施形態においては、情報処理装置１０は、自身が送信した音声信号もサーバ装置２０から受信することとなる。 The server device 20 receives the audio signal transmitted from the information processing device 10 and transmits the received audio signal to all the information processing devices 10 connected to the virtual conference room. In other words, in the present embodiment, the information processing apparatus 10 also receives the audio signal transmitted by itself from the server apparatus 20.

次に、サーバ装置２０から受信した音声信号の処理例について説明する。図６は、スピーカへの音声信号の供給を制御する処理のフローチャートである。情報処理装置１０は、サーバ装置２０から送信された音声信号を受信すると、同じ部屋にいる参加者の参加者識別子が記憶部１０２に記憶されているか判断する（ステップＳＢ１）。 Next, a processing example of the audio signal received from the server device 20 will be described. FIG. 6 is a flowchart of a process for controlling the supply of an audio signal to the speaker. When the information processing apparatus 10 receives the audio signal transmitted from the server apparatus 20, the information processing apparatus 10 determines whether the participant identifiers of the participants in the same room are stored in the storage unit 102 (step SB1).

情報処理装置１０は、同じ部屋にいる参加者の参加者識別子が記憶部１０２に記憶されていない場合（ステップＳＢ１でＮＯ）、受信した音声信号に含まれている参加者識別子において、自装置が送信する参加者識別子以外の参加者識別子があるか判断する（ステップＳＢ２）。情報処理装置１０は、受信した音声信号に含まれている参加者識別子において、自装置が送信する参加者識別子以外の参加者識別子がある場合（ステップＳＢ２でＹＥＳ）、サーバ装置２０から受信した音声信号をアナログの音声信号に変換し、アナログの音声信号をスピーカへ供給する（ステップＳＢ３）。情報処理装置１０は、受信した音声信号に含まれている参加者識別子において、自装置が送信する参加者識別子以外の参加者識別子がない場合（ステップＳＢ２でＮＯ）、スピーカへの音声信号の供給を停止する（ステップＳＢ４）。 When the participant identifier of the participant in the same room is not stored in the storage unit 102 (NO in step SB1), the information processing apparatus 10 uses the participant identifier included in the received audio signal. It is determined whether there is a participant identifier other than the participant identifier to be transmitted (step SB2). When the participant identifier included in the received audio signal includes a participant identifier other than the participant identifier transmitted by the information processing device 10 (YES in step SB2), the information processing device 10 receives the audio received from the server device 20. The signal is converted into an analog audio signal, and the analog audio signal is supplied to the speaker (step SB3). When there is no participant identifier other than the participant identifier transmitted by the information processing apparatus 10 included in the received audio signal (NO in step SB2), the information processing apparatus 10 supplies the audio signal to the speaker. Is stopped (step SB4).

例えば、同じ部屋にいる参加者の参加者識別子を記憶していない情報処理装置１０Ｃは、情報処理装置１０Ａが送信した参加者２Ａの参加者識別子を含む音声信号のみを受信した場合、受信した音声信号をアナログの音声信号に変換してスピーカへ供給する（ステップＳＢ３）。この場合、参加者Ａの音声が情報処理装置１０Ｃのスピーカから出力される。情報処理装置１０Ｃは、情報処理装置１０Ａが送信した音声信号に加え、参加者２Ｃの参加者識別子を含む音声信号を受信した場合、受信した音声信号をアナログの音声信号に変換してスピーカへ供給する（ステップＳＢ３）。この場合、参加者Ａの音声と参加者Ｃの音声が情報処理装置１０Ｃのスピーカから出力される。情報処理装置１０Ｃは、参加者２Ｃの参加者識別子を含む音声信号のみを受信した場合、スピーカへの音声信号の供給を停止する（ステップＳＢ４）。 For example, when the information processing apparatus 10C that does not store the participant identifiers of the participants in the same room receives only the audio signal including the participant identifier of the participant 2A transmitted by the information processing apparatus 10A, the received audio The signal is converted into an analog audio signal and supplied to the speaker (step SB3). In this case, the voice of participant A is output from the speaker of information processing apparatus 10C. When the information processing apparatus 10C receives the audio signal including the participant identifier of the participant 2C in addition to the audio signal transmitted by the information processing apparatus 10A, the information processing apparatus 10C converts the received audio signal into an analog audio signal and supplies the analog audio signal to the speaker. (Step SB3). In this case, the voice of participant A and the voice of participant C are output from the speaker of information processing apparatus 10C. When only the audio signal including the participant identifier of the participant 2C is received, the information processing apparatus 10C stops supplying the audio signal to the speaker (step SB4).

情報処理装置１０は、同じ部屋にいる参加者の参加者識別子が記憶部１０２に記憶されている場合（ステップＳＢ１でＹＥＳ）、受信した音声信号に含まれている参加者識別子において、自装置が送信する参加者識別子及び記憶部１０２に記憶されている参加者識別子以外の参加者識別子があるか判断する（ステップＳＢ５）。情報処理装置１０は、受信した音声信号に含まれている参加者識別子において、自装置が送信する参加者識別子及び記憶部１０２に記憶されている参加者識別子以外の参加者識別子がある場合（ステップＳＢ５でＹＥＳ）、サーバ装置２０から送信された音声信号をアナログの音声信号に変換し、アナログの音声信号をスピーカへ供給する（ステップＳＢ６）。 When the participant identifier of the participant who is in the same room is stored in the storage unit 102 (YES in step SB1), the information processing device 10 uses the participant identifier included in the received audio signal. It is determined whether there is a participant identifier other than the participant identifier to be transmitted and the participant identifier stored in the storage unit 102 (step SB5). When the participant identifier included in the received audio signal includes a participant identifier other than the participant identifier transmitted by the information processing device 10 and the participant identifier stored in the storage unit 102 (step S110). (YES in SB5), the audio signal transmitted from the server device 20 is converted into an analog audio signal, and the analog audio signal is supplied to the speaker (step SB6).

例えば、情報処理装置１０Ａは、情報処理装置１０Ｃが送信した音声信号のみを受信した場合、受信した音声信号をアナログの音声信号に変換してスピーカへ供給する（ステップＳＢ６）。この場合、参加者Ｃの音声が情報処理装置１０Ａのスピーカから出力される。情報処理装置１０Ａは、情報処理装置１０Ｃが送信した音声信号に加え、自身が送信した音声信号を受信した場合、受信した音声信号をアナログの音声信号に変換してスピーカへ供給する（ステップＳＢ６）。この場合、参加者Ａの音声と参加者Ｃの音声が情報処理装置１０Ａのスピーカから出力される。 For example, when only the audio signal transmitted by the information processing apparatus 10C is received, the information processing apparatus 10A converts the received audio signal into an analog audio signal and supplies the analog audio signal to the speaker (step SB6). In this case, the voice of the participant C is output from the speaker of the information processing apparatus 10A. When the information processing apparatus 10A receives the audio signal transmitted by itself, in addition to the audio signal transmitted by the information processing apparatus 10C, the information processing apparatus 10A converts the received audio signal into an analog audio signal and supplies the analog audio signal to the speaker (step SB6). . In this case, the voice of participant A and the voice of participant C are output from the speaker of information processing apparatus 10A.

情報処理装置１０Ｂは、情報処理装置１０Ｃが送信した音声信号のみを受信した場合、受信した音声信号をアナログの音声信号に変換してスピーカへ供給する（ステップＳＢ６）。この場合、参加者Ｃの音声が情報処理装置１０Ｂのスピーカから出力される。情報処理装置１０Ｂは、情報処理装置１０Ｃが送信した音声信号に加え、情報処理装置１０Ａが送信した音声信号を受信した場合、受信した音声信号をアナログの音声信号に変換してスピーカへ供給する（ステップＳＢ６）。この場合、参加者Ａの音声と参加者Ｃの音声が情報処理装置１０Ｂのスピーカから出力される。 When only the audio signal transmitted by the information processing apparatus 10C is received, the information processing apparatus 10B converts the received audio signal into an analog audio signal and supplies the analog audio signal to the speaker (step SB6). In this case, the voice of the participant C is output from the speaker of the information processing apparatus 10B. When the information processing apparatus 10B receives the audio signal transmitted by the information processing apparatus 10A in addition to the audio signal transmitted by the information processing apparatus 10C, the information processing apparatus 10B converts the received audio signal into an analog audio signal and supplies it to the speaker ( Step SB6). In this case, the voice of participant A and the voice of participant C are output from the speaker of information processing apparatus 10B.

情報処理装置１０は、同じ部屋にいる参加者の参加者識別子が記憶部１０２に記憶されており（ステップＳＢ１でＹＥＳ）、受信した音声信号に含まれている参加者識別子において、自装置が送信する参加者識別子及び記憶部１０２に記憶されている参加者識別子以外の参加者識別子がない場合（ステップＳＢ５でＮＯ）、即ち、受信した音声信号の中に他の部屋にある情報処理装置１０から送信された音声信号がない場合、スピーカへの音声信号の供給を停止する（ステップＳＢ７）。 In the information processing apparatus 10, the participant identifier of the participant in the same room is stored in the storage unit 102 (YES in step SB1), and the own apparatus transmits the participant identifier included in the received audio signal. If there is no participant identifier other than the participant identifier and the participant identifier stored in the storage unit 102 (NO in step SB5), that is, from the information processing apparatus 10 in another room in the received audio signal If there is no transmitted audio signal, supply of the audio signal to the speaker is stopped (step SB7).

例えば、情報処理装置１０Ａは、情報処理装置１０Ｂが送信した音声信号のみを受信した場合、又は情報処理装置１０Ｂが送信した音声信号に加え、情報処理装置１０Ａが送信した音声信号を受信した場合、スピーカへの音声信号の供給を停止する（ステップＳＢ７）。また、情報処理装置１０Ｂは、情報処理装置１０Ａが送信した音声信号のみを受信した場合、又は情報処理装置１０Ｂが送信した音声信号に加え、情報処理装置１０Ａが送信した音声信号を受信した場合、スピーカへの音声信号の供給を停止する（ステップＳＢ７）。 For example, when the information processing apparatus 10A receives only the audio signal transmitted by the information processing apparatus 10B, or when the audio signal transmitted by the information processing apparatus 10A is received in addition to the audio signal transmitted by the information processing apparatus 10B, The supply of the audio signal to the speaker is stopped (step SB7). Further, when the information processing apparatus 10B receives only the audio signal transmitted by the information processing apparatus 10A, or when the audio signal transmitted by the information processing apparatus 10A is received in addition to the audio signal transmitted by the information processing apparatus 10B, The supply of the audio signal to the speaker is stopped (step SB7).

［変形例］
以上、本発明の実施形態について説明したが、本発明は上述した実施形態に限定されることなく、他の様々な形態で実施可能である。例えば、上述の実施形態を以下のように変形して本発明を実施してもよい。なお、上述した実施形態及び以下の変形例は、各々を組み合わせてもよい。 [Modification]
As mentioned above, although embodiment of this invention was described, this invention is not limited to embodiment mentioned above, It can implement with another various form. For example, the present invention may be implemented by modifying the above-described embodiment as follows. In addition, you may combine each of embodiment mentioned above and the following modifications.

上述した実施形態においては、各情報処理装置１０は、サーバ装置２０を介して参加者識別子を含む音声信号及び映像信号の授受を行っているが、サーバ装置２０を介さず情報処理装置１０Ａ〜１０Ｃが互いに接続して参加者識別子を含む映像信号及び音声信号の授受を行うことによりＷｅｂ会議を行う構成であってもよい。 In the above-described embodiment, each information processing apparatus 10 transmits and receives an audio signal and a video signal including a participant identifier via the server apparatus 20, but the information processing apparatuses 10 A to 10 C without the server apparatus 20. May be configured to perform a web conference by connecting to each other and exchanging a video signal and an audio signal including a participant identifier.

本発明においては、音声処理部１０７が備えるマイクロフォンの指向性は、単一指向性であってもよい。また、本発明においては、音声処理部１０７が備えるスピーカは、狭指向性のスピーカであってもよい。 In the present invention, the directivity of the microphone included in the sound processing unit 107 may be unidirectional. In the present invention, the speaker included in the audio processing unit 107 may be a narrow directivity speaker.

本発明においては、情報処理装置１０は、音声信号を受信している場合、受信した音声信号を送信した情報処理装置１０から送信された映像信号の映像を表示している領域内において、発話中であることを表す画像を表示してもよい。 In the present invention, when the information processing apparatus 10 receives an audio signal, the information processing apparatus 10 is uttering in the area displaying the video of the video signal transmitted from the information processing apparatus 10 that transmitted the received audio signal. An image representing that may be displayed.

本発明においては、情報処理装置１０は、カメラ１０６が生成した映像信号を解析し、解析結果に応じて音声信号のサーバ装置２０への送信を制御してもよい。例えば、情報処理装置１０は、カメラ１０６が生成した映像信号を解析し、カメラ１０６が撮影した参加者の視線の方向を特定する。情報処理装置１０は、特定した視線の方向がカメラ１０６の方向又は表示部１０３の方向ではない場合、音声信号をサーバ装置２０へ送信しないようにしてもよい。この変形例によれば、情報処理装置１０は、参加者の視線の方向がカメラ１０６の方向又は表示部１０３の方向ではなく、例えば、同じ部屋にいる参加者の方向である場合、音声信号をサーバ装置２０へ送信しない。参加者が同じ部屋にいる他の参加者へ視線を向けて話をする場合には、会話の音声がサーバ装置２０へ送信されないため、同じ部屋にいる参加者のみで会話を行いたいときに他の部屋にいる参加者に会話の内容を聞かれることがない。また、情報処理装置１０は、本発明に係る認識手段を有し撮影した参加者の顔を認識し、参加者の顔が発話を始める前の状態となった場合、又は参加者の顔の状態が発話をしている状態である場合、図５の処理を行うようにしてもよい。 In the present invention, the information processing apparatus 10 may analyze the video signal generated by the camera 106 and control transmission of the audio signal to the server apparatus 20 according to the analysis result. For example, the information processing apparatus 10 analyzes the video signal generated by the camera 106 and specifies the direction of the line of sight of the participant captured by the camera 106. The information processing apparatus 10 may not transmit the audio signal to the server apparatus 20 when the identified line-of-sight direction is not the camera 106 direction or the display unit 103 direction. According to this modification, the information processing apparatus 10 outputs an audio signal when the direction of the line of sight of the participant is not the direction of the camera 106 or the direction of the display unit 103, for example, the direction of the participant in the same room. It is not transmitted to the server device 20. When the participant talks with other participants in the same room looking at the line of sight, the voice of the conversation is not transmitted to the server device 20, so when the participant wants to talk only with the participant in the same room. Participants in the room will not be asked about the conversation. Further, the information processing apparatus 10 recognizes the face of the participant who has taken the recognition means according to the present invention and the face of the participant is in a state before starting to speak, or the state of the face of the participant 5 is in a state of speaking, the processing of FIG. 5 may be performed.

本発明においては、サーバ装置２０は、音声信号を情報処理装置１０へ送信する際に、受信した音声信号を、受信した音声信号を送信した情報処理装置１０へ送信しないようにしてもよい。 In the present invention, when transmitting the audio signal to the information processing apparatus 10, the server apparatus 20 may not transmit the received audio signal to the information processing apparatus 10 that has transmitted the received audio signal.

本発明においては、情報処理装置１０は、受信した音声信号に含まれている参加者識別子が、自身が送信する参加者識別子と同じ識別子である場合、自身が送信する参加者識別子を含む音声信号について、アナログの音声信号に変換せず、アナログの音声信号のスピーカへの供給を行わないようにしてもよい。また、情報処理装置１０は、受信した音声信号に含まれている参加者識別子が、同じ部屋にいる参加者の参加者識別子として記憶した参加者識別子と同じ識別子である場合、当該参加者識別子を含む音声信号について、アナログの音声信号に変換せず、アナログの音声信号のスピーカへの供給を行わないようにしてもよい。 In the present invention, when the participant identifier included in the received audio signal is the same identifier as the participant identifier transmitted by the information processing apparatus 10, the information processing device 10 transmits the audio signal including the participant identifier transmitted by itself. The analog audio signal may not be converted to an analog audio signal, and the analog audio signal may not be supplied to the speaker. Further, when the participant identifier included in the received audio signal is the same identifier as the participant identifier stored as the participant identifier of the participant in the same room, the information processing apparatus 10 displays the participant identifier. The included audio signal may not be converted into an analog audio signal, and the analog audio signal may not be supplied to the speaker.

情報処理装置１０は、マイクロフォンで収音された音声と、通信部１０５が受信した音声信号が表す音声とのマッチングを行い、受信した音声信号が表す音声がマイクロフォンで収音された音声と一致したと判定した場合、受信した音声信号について、アナログの音声信号に変換せず、アナログの音声信号のスピーカへの供給を行わないようにしてもよい。 The information processing apparatus 10 matches the sound collected by the microphone with the sound represented by the sound signal received by the communication unit 105, and the sound represented by the received sound signal matches the sound collected by the microphone. If it is determined that the received audio signal is not converted to an analog audio signal, the analog audio signal may not be supplied to the speaker.

上述した実施形態においては、情報処理装置１０のユーザは、Ｗｅｂ会議の参加者であるが、Ｗｅｂ会議の参加者に限定されるものではない。例えば、情報処理装置１０のユーザは、通信回線３を利用して行うオンラインゲームのプレーヤであってもよい。 In the embodiment described above, the user of the information processing apparatus 10 is a web conference participant, but is not limited to a web conference participant. For example, the user of the information processing apparatus 10 may be an online game player that uses the communication line 3.

２Ａ〜２Ｃ…参加者、３…通信回線、４Ａ、４Ｂ…部屋、１０、１０Ａ〜１０Ｃ…情報処理装置、１０１…制御部、１０２…記憶部、１０３…表示部、１０４…操作部、１０５…通信部、１０６…カメラ、１０７…音声処理部、１００１…第１取得部、１００２…第２取得部、１００３…出力部、１００４…供給部、１００５…判定部。 2A to 2C ... participant, 3 ... communication line, 4A, 4B ... room, 10, 10A to 10C ... information processing device, 101 ... control unit, 102 ... storage unit, 103 ... display unit, 104 ... operation unit, 105 ... Communication unit 106... Camera 107 107 Audio processing unit 1001 First acquisition unit 1002 Second acquisition unit 1003 Output unit 1004 Supply unit 1005 Determination unit

Claims

First acquisition means for acquiring an audio signal supplied from a microphone;
Second acquisition means for acquiring an audio signal supplied via a communication line;
When the sound pressure level of the sound represented by the sound signal acquired by the first acquisition means is greater than or equal to a predetermined threshold, the sound signal acquired by the first acquisition means is output to the communication line, and the sound pressure level Is less than a predetermined threshold, output means for not outputting the audio signal acquired by the first acquisition means to the communication line;
The means for supplying the sound signal acquired by the second acquisition means to the speaker, and the sound signal acquired by the second acquisition means is not supplied to the speaker when the sound pressure level is equal to or higher than a predetermined threshold. An information processing apparatus comprising: supply means.

Determining means for determining whether the audio signal acquired by the second acquisition means is an audio signal output from a device in the same room as the own device;
The determination means is that the supply means has a sound pressure level lower than a predetermined threshold value, and the sound signal acquired by the second acquisition means is an audio signal output from a device in the same room as the own device. The information processing apparatus according to claim 1, wherein when the determination is made, the audio signal acquired by the second acquisition unit is not supplied to the speaker.

The audio signal includes an identifier of a device that has output the audio signal,
When the identifier included in the audio signal acquired by the second acquisition unit is a pre-registered identifier, the determination unit is an audio signal output from a device in the same room as the own device The information processing apparatus according to claim 2.

In the case where the determination means matches the audio signal acquired by the first acquisition means and the audio signal acquired by the second acquisition means, the audio signal acquired by the second acquisition means is in the same room as the own device. The information processing device according to claim 2, wherein the information processing device is determined to be an audio signal output from a device located in the computer.

The supply means, when there is an audio signal output from a device in a room different from the own apparatus in the audio signal acquired by the second acquisition means, the audio signal output from an apparatus in a room different from the own apparatus The information processing apparatus according to any one of claims 1 to 4, wherein the information processing apparatus is supplied to a speaker.

When the second acquisition unit acquires an audio signal output from a device in a room different from the own device and an audio signal output from a device in the same room as the own device, the supplying unit acquires the same room as the own device. The information processing apparatus according to claim 5, wherein an audio signal output from the apparatus is also supplied to the speaker.

Imaging means for photographing the user of the device and generating an image of the user;
Recognizing means for recognizing the state of the user in the image generated by the imaging means;
The information processing apparatus according to claim 1, wherein the output unit controls output of the audio signal acquired by the first acquisition unit to the communication line according to a state recognized by the recognition unit.

The recognizing means recognizes the direction of the user's line of sight as the state;
The information processing apparatus according to claim 7, wherein the output unit does not output the audio signal acquired by the first acquisition unit to the communication line when the direction of the line of sight recognized by the recognition unit is a predetermined direction. .

The recognizing means recognizes the direction of the user's face as the state;
The information processing apparatus according to claim 7, wherein when the direction recognized by the recognition unit is a predetermined direction, the audio signal acquired by the first acquisition unit is not output to the communication line.

Computer
First acquisition means for acquiring an audio signal supplied from a microphone;
Second acquisition means for acquiring an audio signal supplied via a communication line;
When the sound pressure level of the sound represented by the sound signal acquired by the first acquisition means is greater than or equal to a predetermined threshold, the sound signal acquired by the first acquisition means is output to the communication line, and the sound pressure level Is less than a predetermined threshold, output means for not outputting the audio signal acquired by the first acquisition means to the communication line;
The means for supplying the sound signal acquired by the second acquisition means to the speaker, and the sound signal acquired by the second acquisition means is not supplied to the speaker when the sound pressure level is equal to or higher than a predetermined threshold. Program to function as a supply means.