US20190222804A1 - Controlling focus of audio signals on speaker during videoconference - Google Patents
Controlling focus of audio signals on speaker during videoconference Download PDFInfo
- Publication number
- US20190222804A1 US20190222804A1 US15/872,450 US201815872450A US2019222804A1 US 20190222804 A1 US20190222804 A1 US 20190222804A1 US 201815872450 A US201815872450 A US 201815872450A US 2019222804 A1 US2019222804 A1 US 2019222804A1
- Authority
- US
- United States
- Prior art keywords
- signal
- single speaker
- microphones
- audio signals
- speaker
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000005236 sound signal Effects 0.000 title claims abstract description 255
- 238000000034 method Methods 0.000 claims description 61
- 230000004048 modification Effects 0.000 claims description 3
- 238000012986 modification Methods 0.000 claims description 3
- 230000015654 memory Effects 0.000 description 21
- 238000010586 diagram Methods 0.000 description 16
- 230000006870 function Effects 0.000 description 8
- 238000004891 communication Methods 0.000 description 7
- 238000004590 computer program Methods 0.000 description 6
- 230000003287 optical effect Effects 0.000 description 5
- 230000008569 process Effects 0.000 description 5
- 230000008901 benefit Effects 0.000 description 3
- 230000001066 destructive effect Effects 0.000 description 3
- 238000005070 sampling Methods 0.000 description 3
- 230000001413 cellular effect Effects 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 230000003247 decreasing effect Effects 0.000 description 2
- 230000001934 delay Effects 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 239000004973 liquid crystal related substance Substances 0.000 description 2
- 230000006855 networking Effects 0.000 description 2
- 239000004065 semiconductor Substances 0.000 description 2
- 230000000694 effects Effects 0.000 description 1
- 230000008921 facial expression Effects 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 230000001953 sensory effect Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N7/00—Television systems
- H04N7/14—Systems for two-way working
- H04N7/15—Conference systems
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N7/00—Television systems
- H04N7/14—Systems for two-way working
- H04N7/141—Systems for two-way working between two video terminals, e.g. videophone
- H04N7/147—Communication arrangements, e.g. identifying the communication as a video-communication, intermediate storage of the signals
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R1/00—Details of transducers, loudspeakers or microphones
- H04R1/20—Arrangements for obtaining desired frequency or directional characteristics
- H04R1/32—Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only
- H04R1/323—Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only for loudspeakers
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R1/00—Details of transducers, loudspeakers or microphones
- H04R1/20—Arrangements for obtaining desired frequency or directional characteristics
- H04R1/32—Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only
- H04R1/40—Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only by combining a number of identical transducers
- H04R1/406—Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only by combining a number of identical transducers microphones
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R3/00—Circuits for transducers, loudspeakers or microphones
- H04R3/005—Circuits for transducers, loudspeakers or microphones for combining the signals of two or more microphones
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R5/00—Stereophonic arrangements
- H04R5/04—Circuit arrangements, e.g. for selective connection of amplifier inputs/outputs to loudspeakers, for loudspeaker detection, or for adaptation of settings to personal preferences or hearing impairments
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S3/00—Systems employing more than two channels, e.g. quadraphonic
- H04S3/008—Systems employing more than two channels, e.g. quadraphonic in which the audio signals are in digital form, i.e. employing more than two discrete digital channels
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L2021/02161—Number of inputs available containing the signal or the noise to be suppressed
- G10L2021/02166—Microphone arrays; Beamforming
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R2201/00—Details of transducers, loudspeakers or microphones covered by H04R1/00 but not provided for in any of its subgroups
- H04R2201/40—Details of arrangements for obtaining desired directional characteristic by combining a number of identical transducers covered by H04R1/40 but not provided for in any of its subgroups
- H04R2201/401—2D or 3D arrays of transducers
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R2203/00—Details of circuits for transducers, loudspeakers or microphones covered by H04R3/00 but not provided for in any of its subgroups
- H04R2203/12—Beamforming aspects for stereophonic sound reproduction with loudspeaker arrays
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R2430/00—Signal processing covered by H04R, not provided for in its groups
- H04R2430/20—Processing of the output signals of the acoustic transducers of an array for obtaining a desired directivity characteristic
- H04R2430/23—Direction finding using a sum-delay beam-former
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R2499/00—Aspects covered by H04R or H04S not otherwise provided for in their subgroups
- H04R2499/10—General applications
- H04R2499/15—Transducers incorporated in visual displaying devices, e.g. televisions, computer displays, laptops
Definitions
- a single person can be speaking at a time.
- a video camera can aim and/or focus on the single person who is speaking.
- Persons at a receiving end of the videoconference can perceive noise originating from sources other than the speaker as originating from the same direction as the speaker, which can be perceived as unnatural.
- a non-transitory computer-readable storage medium may include instructions stored thereon. When executed by at least one processor, the instructions may be configured to cause a computing system to determine that a video system is aiming at a single speaker of a plurality of people, receive audio signals from a plurality of microphones, the received audio signals including audio signals generated by the single speaker, based on determining that the video system is aiming at the single speaker, transmit a monophonic signal, the monophonic signal being based on the received audio signals, determine that the video system is not aiming at the single speaker, and based on the determining that the video system is not aiming at the single speaker, transmit a stereophonic signal, the stereophonic signal being based on the received audio signals.
- a non-transitory computer-readable storage medium may include instructions stored thereon. When executed by at least one processor, the instructions may be configured to cause a computing system to determine a first direction of a speaker that a video system is aiming at, receive audio signals from a plurality of microphones, generate a first audio signal based on the received audio signals and focusing on the first direction, determine a second direction of a noise source other than the speaker, generate a second audio signal based on the received audio signals and focusing on the second direction, and generate a stereophonic signal based the first audio signal and the second audio signal.
- a method may be performed by a computing system.
- the method may comprise determining that a video system is aiming at a single speaker, determining a first direction of the single speaker from an array of microphones, based on determining that the video system is aiming at the single speaker and the first direction of the single speaker, generating a first beamformed signal based on beamforming, in the first direction, multiple first direction audio signals received by the array of microphones, determining a second direction of a noise source other than the single speaker, generating a second beamformed signal based on beamforming, in the second direction, multiple second direction audio signals received by the array of microphones in the second direction, generating a monophonic signal based on the first beamformed signal and the second beamformed signal, the first beamformed signal having greater weight relative to the second beamformed signal, determining that the video system is not aiming at the single speaker, and based on determining that the video system is not aiming at the single speaker, generating a stereophonic signal, the stereophonic signal including the first beam
- FIG. 1 is a diagram of a videoconferencing system according to an example.
- FIG. 2 is a block diagram of a computing system that can implement features of the videoconferencing system according to an example.
- FIG. 3 is a diagram showing directions of beamforming within a location from which the videoconferencing system receives input according to an example.
- FIG. 4A is a diagram showing weights of beamformed signals when the video camera is focusing on a single person according to an example.
- FIG. 4B is a diagram showing weights of beamformed signals when the video camera has zoomed out and is aiming and/or focusing on multiple persons according to an example.
- FIG. 4C is a diagram showing weights of beamformed signals when the video camera is aiming and/or focusing on a single person and the video conferencing system is performing beamforming on the single person and multiple noise sources according to another example.
- FIG. 5 is a diagram showing microphones and directions of beamforming toward different sources of audio signals according to an example.
- FIG. 6 is a diagram showing microphones and a number of wavelengths between the microphones along a direction of beamforming according to an example.
- FIG. 7 is a flowchart showing a method according to an example.
- FIG. 8 is a flowchart showing a method according to another example.
- FIG. 9 is a flowchart showing a method according to another example.
- FIG. 10 shows an example of a computer device and a mobile computer device that can be used to implement the techniques described here.
- a computing system can generate and/or transmit monophonic audio signals when a video system, such as a video camera, is aiming at and/or focusing on a single speaker.
- the monophonic audio signals can be focused on the single speaker, and can be generated by beamforming and/or preferentially weighting audio signals emitted along a path, toward the object such as the single, human speaker, when the video system generating the video signals is focusing on, and/or aiming at, the single speaker.
- two audio signals can be generated by beamforming in two different directions.
- a technical problem with simply beamforming in two different directions, which are independent of a speech source, such as to the left and to the right, to form a left audio channel and a right audio channel, is that the speech source, the human speaker, is not targeted, resulting in less than optimal capturing of the speech from the human speaker.
- a technical problem with beamforming in only one direction, toward the human speaker, is that when the audio signals are reproduced at a receiving end, noise from other sources will seem to originate from the same direction as the speech source.
- a technical solution to these technical problems of beamforming in two different directions and beamforming in a single direction is to generate one or more beamformed signals in the direction of the speech source and/or human speaker, and a second beamformed signal in a direction of a noise source other than the speech source and/or human speaker, and attenuating and/or reducing the weight of the beamformed signal(s) in the direction of the speech source and/or human speaker.
- Technical advantages of beamforming in the direction of the speech source and/or human speaker and in the direction of the noise source include the speech being clearly reproduced and the noise from the noise source(s) being reproduced with a quality of being received from a direction other than the direction of the speech source and/or human speaker.
- a further technical advantage is that the audio signals focusing on the single speaker when the video camera is focusing on and/or aiming at the single speaker can overcome the otherwise unnatural experience of hearing sounds from different sources during a videoconference, compared to a face-to-face meeting in which participants would turn their heads toward the person who is currently speaking.
- the computing system can generate a single monophonic signal focusing on the speech source and/or single speaker, such as by beamforming in a direction of the speech source and/or single speaker.
- a technical problem of generating a single monophonic signal focusing on the speech source and/or single speaker is that when the video system is no longer aiming at and/or focusing on the speech source and/or single speaker, the audio signal, which focuses on the single speaker, will not correspond to the video signal, which is capturing more objects and/or persons than only the single speaker.
- the computing system can generate a stereophonic signal with audio signals received from different directions.
- a technical problem of generating the stereophonic signal is that when a single human speaker is speaking and the video system is generating an image of only the single speaker, the audio signals capturing noises from different directions will not correspond to the video image.
- a technical solution for these technical problems is for the computing system to transition from the monophonic signal to a stereophonic signal when the video system is no longer aiming at and/or focusing on the single speaker, such as when the video system zooms out and shows persons other than the single speaker.
- the stereophonic signal can include the monophonic signal generated and/or transmitted when the video system was aiming at and/or focusing on the single speaker, as well as an additional audio signal, which can include audio signals from a different set of microphones and/or focused on a different direction. Controlling the focus of the audio signals on the speaker based on whether the video system is focusing on the single speaker can create a more natural experience for the viewer(s)/listener(s).
- the computing system can also create a natural sounding combined audio signal and/or stereophonic signal by generating a second and/or additional audio signal, such as by beamforming or preferentially weighting received audio signals, toward a noise source other than the single speaker, and combining the audio signals from the single speaker and the noise source, with the audio signals from the single speaker having greater weight than the audio signals from the noise source.
- the stereophonic signal including the audio signals from the single speaker and the noise source avoid an unnatural experience on the part of listeners and viewers that noise from sources other than the speaker seems to originate from the same direction as the speaker.
- FIG. 1 is a diagram of a videoconferencing system according to an example.
- the videoconferencing system can send video and audio signals from a first location (e.g., a first conference room) 102 to a second location (e.g., a second conference room) 106 via a network 104 , enabling one or more persons 124 in the second location 106 to see and hear one or more persons 114 A, 114 B, 114 C in the first location 102 .
- the videoconferencing system can include any combination of components shown in FIG.
- components in both locations 102 , 106 components in both locations 102 , 106 and a server 122 , components in only the location 102 , or components in the location 102 and the server 122 , as non-limiting examples.
- the location 102 can include one or more persons 114 A, 114 B, 114 C, any number of whom may be speaking and/or may be sources of noise and/or audio signals. While three persons 114 A, 114 B, 114 C are shown in the example location 102 of FIG. 1 , any number of persons 114 A, 114 B, 114 C can be in the location 102 . In some examples, the persons 114 A, 114 B, 114 C can each sit on chairs 116 A, 116 B, 116 C behind a desk 118 .
- the location 102 can include a doorway 120 , which can be a source of noise and/or audio signals, such as from noise generated by a door of the doorway 120 opening and closing, or from noise originating outside the location 102 and entering the location 102 through the doorway 120 .
- a doorway 120 can be a source of noise and/or audio signals, such as from noise generated by a door of the doorway 120 opening and closing, or from noise originating outside the location 102 and entering the location 102 through the doorway 120 .
- the videoconferencing system can include a video camera 108 in the first location 102 .
- the video camera 108 can be part of a video system, and can capture optical signals and/or video signals within the location 102 .
- the video camera 108 can zoom in to a small part of the location 102 , such as to aim at, focus on, and/or capture images of a single human speaker such as the person 114 B, and/or can zoom out to receive and/or process video signals from a larger part of the location 102 , such as to capture images of, aim at, and/or focus on all or multiple of the persons 114 A, 114 B, 114 C sitting at the desk 118 .
- the video camera 108 can also pan left and right, and/or up and down, to change the person 114 A, 114 B, 114 C and/or portion of the location 102 that the video camera 108 is focusing on.
- the video camera 108 can be controlled manually, or by software that causes the video camera 108 to focus on an active speaker in the location 102 , such as by heuristics or machine learning techniques.
- the video camera 108 can send a signal to a computing device 112 and/or microphones 110 indicating a direction in which the video camera 108 is focusing.
- the videoconferencing system can include multiple and/or a plurality of microphones 110 , and/or an array of microphones 110 , in the first location 102 .
- the microphones 110 can capture audio signals in the location 102 .
- the microphones 110 , the computing device 112 receiving audio signals from the microphones 110 , and/or other components of the videoconferencing system, can generate audio signals such as one or more beamformed signals based on the received audio signals that each focus on audio signals received from a particular direction and/or are received along a particular path.
- the microphones 110 and/or computing device 112 can generate the beamformed signals by, for example, beamforming audio signals received by the microphones 110 in a same direction that the video camera 108 is aiming and/or focusing, such as a direction of a single speaker that the video camera 108 is aiming at and/or focusing on.
- the generation of beamformed signals by beamforming can include shifting phases of received audio signals so that signals received by the microphones 110 from the direction in which the video camera 108 is aiming and/or focusing constructively interfere with each other, and/or increasing or decreasing amplitudes of signals received by different microphones based on the locations of the microphones and the direction of the focus.
- the shifting can be based on the direction, a known speed of sound, and a known distance between the microphones, so that the constructive interference is caused by audio signals received by two (or more) microphones 110 coming from the direction in which the video camera 108 is aiming and/or focusing being processed as if the audio signals were received by the two (or more) microphones 110 at the same time, whereas audio signals received by microphones 110 coming from directions other than the direction in which the video camera 108 is aiming and/or focusing are processed as if the audio signals were received at different times, resulting in destructive interference.
- the videoconferencing system can include a computing device 112 in the location 102 .
- the computing device 112 can receive video signals from the video camera 108 and can receive audio signals from the microphones 110 .
- the computing device 112 can control the direction, aim, and/or focus of the video camera 108 based on determinations by the computing device 112 of which person(s) 114 A, 114 B, 114 C is actively speaking.
- the computing device 112 can control the direction of focus and/or generation of focused and/or beamformed audio signals such as by beamforming by the microphones 110 , and/or perform beamforming of audio signals received by the computing device 112 from the microphones 110 .
- the computing device 112 in the location 102 in which the video and audio signals of the speaker are recorded can be considered a local computing device.
- the videoconferencing system can generate monophonic signals based on audio signals received by the microphones 110 when the video camera 108 is aiming at and/or focusing on the single speaker, and transmit a stereophonic signal based on audio signals received by the microphones when the video camera 108 has stopped and/or is no longer aiming at and/or focusing on the single speaker.
- a receiving system can transmit the same monophonic signal out of all speakers, and can transmit a first signal from the stereophonic signal out of a first (or more) speaker, and a second signal from the stereophonic signal out of a second (or more) speaker.
- the monophonic signal can be based on signals received from a set of, which can include some or all of, microphones from the microphones 110 .
- the stereophonic signal can include a first audio signal received from a first microphone and/or first set of microphones from the microphones 110 , and a second audio signal received from a second microphone and/or second set of microphones from the microphones 110 , the first set being different from the second set.
- the videoconferencing system can generate monophonic signals by focusing audio signals in a specific direction, such as based on first beamformed audio signals that are beamformed in a direction that the video camera 108 is focusing and/or aiming, such as in a direction of a single speaker that the video camera 108 is focusing on and/or aiming at. If and/or when the video camera 108 stops focusing on and/or aiming at an object, and/or is no longer focusing on and/or aiming at an object, the videoconferencing system can generate stereophonic signals such as by generating a second (or more) beamformed signal and combining the second beamformed signal with the first beamformed signal that focuses in the direction that the video camera 108 is focusing and/or aiming.
- the generation of stereophonic signals based on multiple beamformed signals can cause noise from more parts of the location 102 to be transmitted to remote participants of the videoconference along with audio signals transmitted from the speaker.
- the videoconferencing system can generate multiple focused and/or beamformed audio signals by beamforming audio signals in multiple directions.
- the videoconferencing system can, for example, generate a first beamformed signal focusing on a first direction based on beamforming, in a first direction of a human speaker, audio signals received from the first direction.
- the videoconferencing system can also generate a second beamformed signal focusing on a second direction based on beamforming, in a second direction of a noise source, different from the first direction, audio signals received from the second direction.
- the videoconferencing system can generate a combined signal and/or stereophonic signal based on combining the first beamformed signal and the second beamformed signal.
- the first beamformed signal can have greater weight within the combined and/or stereophonic signal, making the voice of the human speaker in the first direction easily audible, but still providing some of the background noise from the noise source to create a sound that is more similar to that experienced by a person actually in the location 102 and near the video camera 108 and microphones 110 .
- the video camera 108 , microphones 110 , and/or computing device 112 can be combined into one apparatus, or can be set up in the location 102 as standalone components and communicate with each other via wired or wireless interfaces.
- the computing device 112 can be in the same location 102 as the video camera 108 and microphones 110 , or can be outside the location 102 and communicate with the video camera 108 and microphones 110 via wired or wireless interfaces.
- the videoconference system can also include a display and/or speakers in the location 102 , so that the persons 114 A, 114 B, 114 C from whom the video camera 108 and microphones 110 are capturing video and audio input can view and listen to persons in remote locations, such as a second location 106 .
- the computing device 112 can communicate with a computing device 132 in a remote, second location 106 , and/or a remote server 122 , via a network 104 .
- the network 104 can include multiple interfaces and/or devices facilitating communication between computing devices, such as the Internet or, in the example of a videoconference system maintained within a corporate or college campus, a local area network (LAN).
- LAN local area network
- the server 122 can perform any combination of the functions, methods, and/or techniques described herein, such as controlling the focus, aim, and/or direction of the video camera 108 , beamforming audio signals received by the microphones 110 , and/or combining the beamformed signals and/or signals from different microphones to generate stereophonic signals, or may simply transmit the video and audio data between computing devices 112 , 132 . While two locations 102 , 106 are shown in the videoconference system of FIG. 1 , any number of locations may be included in the videoconference system, with persons in each location viewing and listening to one or more human speakers in a remote location(s) on a display and from electrical speakers.
- the second location 106 can be remote from the first location 102 .
- the second location 106 can include the computing device 132 .
- the computing device 132 in the second location 106 can receive video and audio signals from the computing device 112 in the first location 102 and/or the server 122 .
- the computing device 132 in the second location 106 can transmit the video and audio signals to a display 128 and electronic speakers 130 A, 130 B, respectively, to present the video and audio to a person 124 in the second location 106 .
- a first electronic speaker 130 A can, based on a combined and/or stereophonic signal received from the computing device 112 via the computing device 132 , output a first monophonic and/or audio signal such as words spoken by a human speaker
- a second electronic speaker 130 B can, based on the combined and/or stereophonic signal received from the computing device 112 via the computing device 132 , output a second monophonic and/or audio signal such as noise generated by a noise source other than the speaker.
- both speakers 130 A, 130 B can output the same monophonic signal.
- the computing device 132 in the second location 106 in which the video and audio of the speaker are presented, rather than recorded and/or captured, can be considered a remote computing device.
- the second location 106 can include a display 128 and one or more speakers 130 A, 130 B.
- the display 128 can present images based on the video data received by the display 128 from the computing device 132 in the second location 106 , which may be the video captured by the video camera 108 .
- the display 128 can include a traditional screen that generates images by projecting light toward the viewers, such as a cathode ray tube (CRT) display, plasma display, a light-emitting diode (LED) display, a liquid crystal display (LCD), a projector that projects images onto a screen, or a holographic system that creates a holographic image of the speaker and/or other persons in the first location 102 , as non-limiting examples.
- CTR cathode ray tube
- LED light-emitting diode
- LCD liquid crystal display
- the speaker(s) 130 A, 130 B can output sound based on audio signals received from the computing device 132 in the second location 106 , which may be based on the combined signal(s) generated by the computing device 112 in the first location 102 and/or microphones 110 .
- the speaker(s) 130 A, 130 B can output the same sound, or in examples of receiving stereophonic signals, the speakers 130 A, 130 B can each output different sounds, such as sounds based on different audio signals generated based on beamforming in different directions or audio signals received by different sets of microphones.
- a person 124 can be in the second location 106 , watching and listening to the person(s) 114 A, 114 B, 114 C who are in the first location 102 , on the display 128 and from the speaker(s) 130 A, 130 B.
- the person 124 can sit on a chair 126 .
- the second location 106 can also include a video camera and microphones for capturing video and audio signals from the person 124 to present and/or output to persons in other locations, such as the first location 102 .
- FIG. 2 is a block diagram of a computing system 200 that can implement features of the videoconferencing system according to an example.
- the features of the computing system 200 described herein can be included in, and/or performed by, the computing device 112 in the first location 102 , the server 122 , the computing device 132 in the second location 106 , or any combination of the computing device 112 , server 122 , and/or computing device 132 .
- the computing system 200 can include an aim determiner 202 .
- the aim determiner 202 can determine a direction of aim and/or focus of the video camera 108 .
- the aim determiner 202 can determine that the video camera 108 is aiming at and/or focusing on a single, human speaker, and determine the direction of the single speaker from the video camera 108 and/or microphones 110 .
- the video camera 108 can aim at and/or focus on the single, human speaker by pointing in the direction of the speaker so that the speaker is in or near the middle of an image captured by the video camera, and/or can focus on the single, human speaker by adjusting a lens of the video camera 108 so that light reflected from the speaker converges on a sensor of the video camera 108 .
- the aim determiner 202 can determine the direction of aim and/or focus by the video camera 108 based on receiving and/or processing a single speaker signal from the video camera 108 .
- the single speaker signal can indicate that the video camera 108 is aiming at and/or focusing on the single speaker and/or is capturing an image of only a single person 114 B in the location 102 , and can indicate a direction of the single speaker.
- the video camera 108 may have determined that a single speaker is speaking based on video data, such as facial expressions of the single speaker including lip movement, body language of other persons captured by the video camera 108 such as the other persons facing or angling their bodies toward the single speaker, or the video camera 108 capturing the image of only the single person 114 B and not capturing images of other persons 114 A, 114 C in the location 102 .
- the direction of the single speaker can be determined based on the direction that the camera 108 is pointing, and/or based on a location of the speaker within a captured image.
- the computing system 200 can focus, and/or perform a beamforming operation, in the direction of the single speaker, and send a single speaker audio signal to the remote computing device 132 .
- the single speaker audio signal can include the combined signal (discussed below) and an indication that only a single speaker is speaking, which can prompt the remote computing device 132 to output the audio as either stereophonic audio output or monophonic audio output.
- the aim determiner 202 can determine that the video camera 108 is no longer aiming at and/or focusing on, and/or has stopped aiming at and/or focusing on, the single speaker, based on receiving and/or processing a multiple speaker signal from the video camera 108 .
- the multiple speaker signal can indicate that the video camera 108 is aiming at and/or focusing on multiple speakers, and/or capturing a wide view that includes multiple persons 114 A, 114 B, 114 C.
- the aim determiner 202 can determine that the video camera 108 is no longer aiming at and/or focusing on the single speaker such as the person 114 B and/or that the video camera 108 has stopped aiming at and/or focusing on the single speaker.
- the aim determiner 202 can determine that the video camera 108 is no longer and/or has stopped aiming at and/or focusing on the single speaker based on receiving a multiple speaker signal from the video camera 108 , or based on multiple persons being in the image captured by the video camera 108 , according to example embodiments.
- the computing system 200 can send a multiple speaker audio signal to the remote computing device 132 .
- the multiple speaker audio signal can include the combined and/or stereophonic signal (discussed below) and an indication that multiple human speakers are speaking, which can prompt the remote computing device 132 to output the audio stereophonically, such as outputting focused and/or beamformed audio signals from a first human speaker through a first electronic speaker and outputting focused and/or beamformed audio signals from a second human speaker through a second electronic speaker.
- the computing system 200 in response to the video camera 108 resuming aim and/or focus on the single speaker, and/or aiming at and/or focusing on a new single speaker, the computing system 200 can generate a monophonic signal focusing on the single speaker and transmit the generated monophonic signal to the remote computing device.
- the computing system 200 can include a direction determiner 204 .
- the direction determiner 204 can determine one or more directions in which to focus, beamform, and/or preferentially weight audio signals.
- the direction determiner 204 can determine that the computing system 200 should focus and/or beamform audio signals in a first direction that the aim determiner 202 has determined that the video camera 108 is aiming and/or focusing, such as a direction of a single speaker. In some examples, the direction determiner 204 can also determine the first direction of the single speaker based on first direction audio signals received by the microphones 110 in a first direction, such as audio signals indicating human speech, and comparing times of receiving and/or processing the audio signals by the different microphones 110 .
- the direction determiner 204 can, for example, determine delays between audio signals received and/or processed by the different microphones, and determine the direction based on the determined delays, known speed of sound, and known distances between microphones (for example, if the delay between two microphones is equal to the time it takes sound to travel between the microphones, then the direction would be on or near a line extending through the two microphones in the direction of the microphone that first received and/or processed the audio signal).
- the direction determiner 204 can determine the first direction of the single speaker based on determining that multiple first direction audio signals in the first direction are changing as a function of time, such as by performing beamforming operations in multiple directions and determining the direction that has the greatest changes in audio amplitude over a sampling period.
- the direction determiner 204 can, for example, perform beamforming operations in multiple directions over the sampling period to generate multiple beamformed signals, and determine that the direction of the beamformed signal with greatest change over the sampling period is in the direction of the speaker based on an assumption that human speech has a high degree of variation (for example, based on pauses between words and/or sentences).
- the direction determiner 204 can determine a second direction in which the computing system 200 should focus and/or beamform audio signals.
- the second direction can be a noise source other than the single speaker.
- the noise source can be a second human speaker, or other types of noise such as people speaking in the background, a door opening and/or closing, or papers or chairs being moved, as non-limiting examples.
- the direction determiner 204 can determine the second direction of a noise source based on comparing times of receiving and/or processing second direction audio signals received by the different microphones 110 in a second direction.
- the direction determiner 204 can determine a third direction in which the computing system 200 should focus and/or beamform audio signals.
- the third direction can be a noise source other than the single speaker.
- the noise source can be a second or third human speaker, or other types of noise such as people speaking in the background, a door opening and/or closing, or papers or chairs being moved, as non-limiting examples.
- the direction determiner 204 can determine the third direction of a noise source based on comparing times of receiving and/or processing the audio signals by the different microphones 110 .
- the computing system 200 can include a beamformer 206 .
- the beamformer 206 can focus on audio signals received along a path, which may be a straight line or may bend in examples of reflected audio signals, to generate focused audio signals and/or beamformed signals.
- the beamformer 206 can generate focused audio signals and/or beamformed signals by combining and/or modifying signals received by and/or from the microphones 110 so that audio signals and/or noises received by multiple microphones in the direction of focus and/or performing beamforming operations experience constructive interference and/or are amplified, while audio signals and/or noises received by multiple microphones in directions other than the direction of focus and/or other than performing beamforming operations experience destructive interference and/or are reduced in magnitude.
- the beamformer 206 can beamform multiple audio signals received from a direction of the single speaker 114 B, and/or can beamform multiple audio signals received from a direction other than the single speaker 114 B.
- the beamformer 206 can include a microphone selector 208 .
- the microphone selector 208 can select multiple microphones 110 , such as two microphones 110 , for which a line intersecting the two microphones 110 is most closely parallel to the direction in which the beamforming is performed.
- the beamformer 206 can include a phase shifter 210 .
- the phase shifter 210 can shift the phase of the audio signal received by one of the selected microphones 110 so that the audio signals received by the selected microphones 110 constructively interfere with each other, amplifying the audio signals received in the direction of beamforming.
- the phase shifter 210 can modify and/or shift the phase(s) of the audio signals based on a distance between the selected microphones 110 and a speed of sound, delaying the phase of the microphone 110 closer to the noise source so that with respect to audio signals received from the noise source in the direction of focus and/or beamforming, the phase-shifted signal of the selected microphone 110 closer to the noise source matches the signal of the selected microphone 110 farther from the noise source.
- Noise sources in directions other than the direction of focus and/or beamforming will experience varying degrees of destructive interference between the selected microphones 110 , reducing the amplitude of audio signals received from noise sources in directions other than the direction of focus and/or beamforming.
- the beamformer 206 can process signals only from the selected microphones 110 in an example in which the beamformer 206 narrowly focuses on the direction of beamforming, so that all audio signals processed by the beamformer 206 experience constructive interference in the direction of beamforming.
- the beamformer 206 can also process signals from microphones 110 other than the selected microphones, to process audio signals from noise sources in directions other than the direction of beamforming and/or the direction of the selected noise source.
- the beamformer 206 can reduce the weight of signals received from the microphones 110 other than the selected microphones 110 to narrow the beamforming (and/or increase the focus in the direction of focus) when the video camera 108 zooms in on the speaker, and/or can increase the weight of signals received from the microphones 110 other than the selected microphones 110 to broaden the beamforming (and/or decrease the focus in the direction of focus) when the video camera 108 zooms out away from the speaker, according to example implementations.
- the beamformer 206 can reduce the focus and/or beamforming by broadening beamforming, such as by increasing the weight of signals received from the microphones 110 other than the selected microphones 110 , and/or by increasing the weight of a beamformed signal(s) other than the beamformed signal focusing in the direction of the single speaker. In some examples, the beamformer 206 can reduce beamforming by ceasing beamforming, such as ceasing and/or stopping the shifting of phases of signals received from microphones 110 .
- the computing system 200 can include a signal combiner 212 .
- the signal combiner 212 can combine audio signals processed by the beamformer 206 , which may be focused and/or beamformed in different directions, and/or may combine audio signals received by different sets of microphones.
- the signal combiner 212 can, for example, combine a first focused and/or beamformed signal for which beamforming was performed in a direction of an active human speaker and/or a single human speaker with a second, additional, and/or third beamformed signal(s) for which beamforming was performed in a direction(s) of a noise source(s) other than the direction of the active human speaker and/or a single human speaker.
- the signal combiner 212 can add the first focused and/or beamformed signal to the second focused and/or beamformed signal to generate a monophonic signal, or may include both the first focused and/or beamformed signal and the second focused and/or beamformed signal as distinct audio signals to generate a stereophonic signal that includes multiple focused and/or beamformed signals.
- the signal combiner 212 can include a signal weighter 214 .
- the signal weighter 214 can weight the signals of the audio signals combined by the signal combiner 212 .
- the signal weighter 214 can, for example, reduce the weight and/or amplitude of certain signals, such as the signals processed or generated by the beamformer 206 in directions other than the direction of the active human speaker and/or a single human speaker and/or outside the path along which the beamformer 206 is focusing and/or performing beamforming.
- the signal weighter can preferentially weight beamformed audio signals, such as audio signals emitted along a path passing through at least one of the plurality of microphones and the speaker, as compared with sounds emitted from outside the path.
- the signal weighter 214 can reduce the relative weights and/or amplitudes of signals processed or generated by the beamformer 206 in directions other than the direction of the active human speaker and/or a single human speaker compared to the weight and/or amplitude of the signals processed or generated by the beamformer 206 in the directions of the active human speaker and/or a single human speaker.
- the signal weighter 214 can increase the relative weights and/or amplitudes of signals processed or generated by the beamformer 206 in directions other than the direction of the active human speaker and/or a single human speaker compared to the weight and/or amplitude of the signals processed or generated by the beamformer 206 in the directions of the active human speaker and/or a single human speaker.
- the combined signal generated by the signal combiner 212 can include multiple focused and/or beamformed signals, with one focused and/or beamformed signal for each direction in which beamforming was performed, forming a stereophonic signal.
- Each focused and/or beamformed signal can include a single beamformed signal and an indication of a direction in which the beamforming was performed.
- the combined and/or stereophonic signal can include a first focused and/or beamformed signal including the first beamformed signal and an indicator of the first direction, and a second focused and/or beamformed signal including the second beamformed signal and an indicator of the second direction.
- the computing device 112 can send the combined and/or stereophonic signal to the computing device 132 , and the computing device 132 can transmit one focused and/or beamformed signal to each speaker 130 A, 130 B, based on the indicated direction, creating a stereo effect in the second location 106 .
- the computing system 200 can include at least one processor 216 .
- the at least one processor 216 can include one or more processors, and can be included in one or more computing devices.
- the at least one processor 216 can execute instructions, such as instructions stored in memory, to cause the computing system 200 to perform any combination of methods, functions, and/or techniques described herein.
- the computing system 200 can include at least one memory device 218 .
- the at least one memory device 218 can be included in one or more computing devices.
- the at least one memory device 218 can include a non-transitory computer-readable storage medium.
- the at least one memory device 218 can store instructions that, when executed by the at least one processor 216 , cause the computing system 200 to perform any combination of methods, functions, and/or techniques described herein.
- the at least one memory device 218 can store data accessed to perform, and/or generated by, any combination of methods, functions, and/or techniques described herein.
- the computing system 200 can include input/output nodes 220 .
- the input/output nodes 220 can receive and/or send signals from and/or to other computing devices.
- the input/output nodes 220 can include one or more video cameras 108 , microphones 110 , displays 128 , and/or speakers 130 A, 130 B.
- the input/output nodes 220 can include devices for receiving input from a user, such as via a keyboard, mouse, and/or touchscreen.
- the input/output nodes 220 can also include devices for providing output to a user, such as a screen or monitor, printer, or speaker.
- the input/output nodes 220 can also include devices for communicating with other computing devices, such as networking and/or communication interfaces including wired interfaces (such as Ethernet (Institute for Electrical and Electronics Engineers (IEEE) 802.3), Universal Serial Bus (USB), coaxial cable, and/or High Definition Multiple Input (HDMI)), and/or wireless interfaces (such as Wireless Fidelity (IEEE 802.11), Bluetooth (IEEE 802.15), and/or a cellular network protocol such as Long-Term Evolution (LTE) and/or LTE-Advanced), as non-limiting examples.
- wired interfaces such as Ethernet (Institute for Electrical and Electronics Engineers (IEEE) 802.3), Universal Serial Bus (USB), coaxial cable, and/or High Definition Multiple Input (HDMI)
- wireless interfaces such as Wireless Fidelity (IEEE 802.11), Bluetooth (IEEE 802.15), and/or a cellular network protocol such as Long-Term Evolution (LTE) and/or LTE-Advanced), as non-limiting examples.
- FIG. 3 is a diagram showing directions 302 , 304 , 306 of beamforming within the location 102 from which the videoconferencing system receives input according to an example.
- the directions of beamforming can represent directions of focus by the computing system 200 and/or microphones 110 .
- the microphones 110 , computing system 200 , and/or videoconferencing system can focus and/or perform beamforming in a first direction 302 toward a single person 114 B who is an active speaker to generate a first focused and/or beamformed signal.
- the microphones 110 , computing system 200 , and/or videoconferencing system can focus and/or perform beamforming in a second direction 304 toward another noise source such as a person 114 A who may be speaking at a same time as the person 114 B to generate a second focused and/or beamformed signal.
- the microphones 110 , computing system 200 , and/or videoconferencing system can focus and/or perform beamforming in a third direction 306 toward a noise source such as the doorway 120 to generate a third focused and/or beamformed signal, which may allow noise to travel into the location 102 from outside the location 102 and/or may generate noise from a door in the doorway 120 opening and/or closing.
- the focused and/or beamformed audio signal generated based on beamforming in the first direction 302 can be combined with the second audio signal and/or third audio signal to generate a combined signal and/or stereophonic signal.
- FIG. 4A is a diagram showing weights 410 , 412 of beamformed signals when the video camera 108 is focusing on a single person 114 B according to an example.
- the video camera 108 is focused on the single person 114 B, and an image 402 A generated by the video camera 108 , computing device 112 , computing system 200 , and/or videoconference system shows, presents, and/or displays a person image 414 B of the single person 114 B who is the active speaker.
- the signal combiner 212 can generate a combined signal 404 A, which can be monophonic, based on a first signal 406 , which can be a beamformed signal in the first direction 302 toward the person 114 B who is the active speaker, and a second signal 408 and/or additional signal, which can be a beamformed signal in the second direction toward a noise source such as a person 114 A other than the person 114 B who is the active speaker.
- the signal weighter 214 can, based on the determination that the video camera 108 and/or video system is focusing on the active and/or single speaker in the first direction 302 , give the first signal 206 a greater weight 410 in the combined signal 404 A than the weight 412 of the second signal 408 .
- the combined signal 404 A includes only the first signal 406 . In examples in which the weight 412 of the second signal 408 is greater than zero, the combined signal 404 A can include both the first signal 406 and the second signal 408 .
- FIG. 4B is a diagram showing weights 410 , 412 of beamformed signals when the video camera 108 has zoomed out and is aiming at and/or focusing on multiple persons 114 A, 114 B, 114 C according to an example.
- the video camera 108 is no longer aiming at and/or focusing on, and/or has stopped aiming at and/or focusing on, the person 114 B who is the single speaker and/or the active speaker.
- the video camera 108 has zoomed out to present a broader image 402 B, which includes three person images 414 A, 414 B, 414 C (which are representations of the persons 114 A, 114 B, 114 C) sitting at the desk image 418 (which is a representation of the desk 118 ).
- the computing system 200 can reduce the beamforming, such as by increasing the weight 412 of the second signal 408 relative to the weight 410 of the first signal 406 , and/or decreasing the weight 410 of the first signal 406 relative to the weight 412 of the second signal 408 , within the combined signal 404 B.
- the first signal 406 can have less weight in the combined signal 404 B after the video camera 108 has zoomed out than in the combined signal 404 A when the video camera 108 was aiming at and/or focusing on the single person.
- the combined signal 404 B can be a monophonic signal that includes approximately equal contributions from the audio signals 406 , 408 , and the same combined monophonic signal can be outputted by both of the speakers 130 A, 130 B.
- the combined signal 404 B can be a stereophonic signal that includes distinct audio signals from each of the first signal 406 and second signal 408 , and each of the first signal 406 and second signal 408 can be outputted by a different speaker 130 A, 130 B.
- FIG. 4C is a diagram showing weights of beamformed signals when the video camera 108 is aiming at and/or focusing on a single person 114 B and the video conferencing system is performing beamforming on the single person 114 B and multiple noise sources according to another example.
- the video camera 108 is aiming at and/or focusing on the person 114 B who is the single speaker and/or active speaker, but has zoomed out to present a broader image 402 C, which includes the three person images 414 A, 414 B, 414 C sitting at the desk image 418 and the doorway image 420 (which is a representation of the doorway 120 ).
- the computing system 200 can perform beamforming in the first direction 302 on the person 114 B (represented by the person image 414 B) to generate a first beamformed signal 406 , in the second direction 304 on a first noise source such as the person 114 A (represented by the person image 414 A) to generate a second beamformed signal 408 and/or second additional signal, and in a third direction 306 on a second noise source such as the doorway 120 (represented by the doorway image 420 ) to generate a third beamformed signal 422 .
- the second direction 304 can be away from and/or different from the first direction 302
- the third direction 306 can be away from and/or different from both the first direction 302 and the second direction 304 .
- the weighted sum of the first signal 406 , second signal 408 , and third signal 422 , used to generate a combined signal 404 C can have a greater weight 410 for the first signal 406 than the weight 412 of the second signal and the weight 424 of the third signal 422 .
- the combined signal 404 C can be a combined monophonic signal that will focus on the single speaker due to the emphasis on the first signal 406 but also include background noise due to the contributions from the second and third signals 408 , 422 .
- FIG. 5 is a diagram showing microphones 110 and directions 302 , 304 , 306 of beamforming toward different sources of audio signals according to an example.
- the directions 302 , 304 , 306 can be paths along which audio signals travel from the noise sources (such as the persons 114 A, 114 B and doorway 120 ) to the microphones 110 , and/or paths along which optical beams travel from the objects 114 A, 114 B (and/or persons), 120 (and/or doorway), based on which the images 414 A, 414 B, 420 are created, toward the video camera 108 .
- the noise sources such as the persons 114 A, 114 B and doorway 120
- the noise sources can include the person 114 B a first direction 302 from the microphones 110 , the person 114 A a second direction 304 from the microphones 110 , and the doorway 120 a third direction 306 from the microphones 110 .
- the multiple microphones 110 form an array of microphones 110 .
- the array of microphones 110 includes eight microphones 110 A, 110 B, 110 C, 110 D, 110 E, 110 F, 110 G, 110 H arranged in a circular pattern.
- Each of the microphones 110 A, 110 B, 110 C, 110 D, 110 E, 110 F, 110 G, 110 H can be in a different location than each of the other microphones 110 A, 110 B, 110 C, 110 D, 110 E, 110 F, 110 G, 110 H.
- the computing system 200 can determine a pair of microphones 110 A, 110 B, 110 C, 110 D, 110 E, 110 F, 110 G, 110 H that, when a line or ray is drawn through the microphones, is more closely parallel than any other pair of microphones 110 A, 110 B, 110 C, 110 D, 110 E, 110 F, 110 G, 110 H to the direction of the noise source in which to focus and/or beamform.
- the microphones 110 A, 110 E form a line most closely parallel to the first direction 302 .
- the microphone selector 208 can select microphones 110 A, 110 E for focusing and/or performing beamforming in the first direction 302 , and the phase shifter 210 can delay the signals from the microphone 110 A (which is closer than the microphone 110 E to the person 114 B who is the noise source) by an amount of time sound takes to travel the distance from the microphone 110 A to the microphone 110 E, thereby causing audio signals received by both microphones 110 A, 110 E from any noise source along the line of the first direction 302 to constructively interfere with each other.
- the microphones 110 H, 110 E form a line most closely parallel to the second direction 304 .
- the microphone selector 208 can select microphones 110 H, 110 E for focusing and/or performing beamforming in the second direction 304 , and the phase shifter 210 can delay the signals from the microphone 110 H (which is closer than the microphone 110 E to the person 114 A who is the noise source) by an amount of time sound takes to travel the distance from the microphone 110 H to the microphone 110 E, thereby causing audio signals received by both microphones 110 H, 110 E from any noise source along the line of the second direction 304 to constructively interfere with each other.
- the microphones 110 C, 110 D form a line most closely parallel to the third direction 306 .
- the microphone selector 208 can select microphones 110 C, 110 D for performing beamforming in the third direction 306
- the phase shifter 210 can delay the signals from the microphone 110 C (which is closer than the microphone 110 D to the doorway 120 which is the noise source) by an amount of time sound takes to travel the distance from the microphone 110 C to the microphone 110 D, thereby causing audio signals received by both microphones 110 C, 110 D from any noise source along the line of the third direction 306 to constructively interfere with each other.
- FIG. 6 is a diagram showing microphones 110 A, 110 E and a number of wavelengths A between the microphones 110 A, 110 E along a direction 302 of beamforming according to an example.
- the microphones 110 A, 110 E are four-and-a-half wavelengths apart.
- the distance between the microphones 110 A, 110 B may have been predetermined and stored in the memory 218 of the computing system 200 .
- Distances between other pairs of microphones 110 A, 110 B, 110 C, 110 D, 110 E, 110 F, 110 G, 110 H may also have been predetermined and stored in the memory 218 of the computing system 200 .
- the phase shifter 210 can delay the phase of the audio signals received by the microphone 110 A by an amount of time for sound to travel the distance between the microphones, in this example four-and-a-half wavelengths from the microphone 110 A to the microphone 110 E (or some other distance and/or number of wavelengths for other pairs of microphones 110 A, 110 B, 110 C, 110 D, 110 E, 110 F, 110 G, 110 H), and/or differences in distances between the one microphone 110 A and the speaker 114 B, and the distance between the microphone 110 E and the single speaker 114 B, such as by dividing the distance between the microphones 110 A, 110 E, and/or difference in distances, by the known speed of sound.
- FIG. 7 is a flowchart showing a method 700 according to an example.
- the method 700 includes the aim determiner 202 determining that a video system is aiming at a single speaker of a plurality of people ( 702 ).
- the method 700 can also include the computing system 200 receiving audio signals from a plurality of microphones 110 , the received audio signals including audio signals generated by the single speaker ( 704 ).
- the method 700 can also include the computing system 200 , based on determining that the video system is aiming at the single speaker, transmitting a monophonic signal, the monophonic signal being based on the received audio signals ( 706 ).
- the method 700 can also include the aim determiner 202 determining that the video system is not aiming at the single speaker ( 708 ).
- the method 700 can also include the computing system 200 , based on the determining that the video system is not aiming at the single speaker, transmitting a stereophonic signal, the stereophonic signal being based on the received audio signals.
- the monophonic signal can be based on the received audio signals and can focus on the single speaker
- the stereophonic signal can include the monophonic signal and an additional signal.
- the additional signal can be based on the received audio signals and can focus on a noise source other than the single speaker.
- the method 700 can further include the computing system 200 generating the monophonic signal by performing a beamforming operation on the received audio signals in a direction of the single speaker.
- the method 700 can further include the computing system 200 generating the monophonic signal by preferentially weighting audio signals emitted along a path passing through at least one of the plurality of microphones and the speaker as compared with sounds emitted from outside the path.
- the determining that the video system is aiming at the single speaker can include processing a single speaker signal from the video system, the single speaker signal indicating that the video system is aiming at the single speaker, and the determining that the video system is not aiming at the single speaker can include processing a multiple speaker signal from the video system, the multiple speaker signal indicating that the video system is aiming at multiple speakers.
- the stereophonic signal can include a first audio signal based on a first microphone of the plurality of microphones and a second audio signal based on a second microphone of the plurality of microphones.
- the method 700 can further include the computing system 200 generating the monophonic signal by shifting a phase of an audio signal received from at least one microphone of the plurality of microphones relative to at least one other microphone of the plurality of microphones, the shifting being based on differences in distances between the at least one microphone and the single speaker, and the at least one other microphone and the single speaker.
- the method 700 can further include the computing system 200 generating the monophonic signal by shifting a phase of at least a first audio signal received by a first microphone of the plurality of microphones from the single speaker so that at least a portion of the first audio signal received from the single speaker constructively interferes with at least a portion of a second audio signal received by a second microphone of the plurality of microphones, the second microphone being in a different location than the first microphone.
- the method 700 can further include the computing system 200 , based on determining that the video system is aiming at the single speaker, generating a first audio signal by beamforming multiple audio signals received by the plurality of microphones from a direction of the single speaker, generating a second audio signal by beamforming multiple audio signals received by the plurality of microphones from a direction away from the single speaker, and generating the monophonic signal based on a weighted sum of the first audio signal and the second audio signal, the first audio signal receiving a greater weight relative to the second audio signal.
- the transmitting the stereophonic signal can include transmitting the first audio signal and the second audio signal as distinct audio signals.
- the computing system 200 can include a video camera configured to aim at the single speaker and capture images of the single speaker, the plurality of microphones configured to capture the received audio signals in a direction of the single speaker, and a local computing device configured to receive the captured images from the video camera, send the captured images to a remote computing device, receive the audio signals from the plurality of microphones, determine that the video camera is aiming at the single speaker, based on the determining that the video camera is aiming at the single speaker, beamform the received audio signals in the direction of the single speaker to generate a first beamformed signal, based on the beamforming, transmit the monophonic signal to the remote computing device, determine that the video camera is not aiming at the single speaker, based on the determining that the video camera is not aiming at the single speaker, beamform the received audio signals in a direction other than the direction of the single speaker to generate a second beamformed signal, and transmit the stereophonic signal to the remote computing device, the stereophonic signal including the first beamformed signal and the second beamformed signal.
- FIG. 8 is a flowchart showing a method 800 according to another example.
- the method 800 can include the direction determiner 204 determine a first direction of a speaker that a video system is aiming at ( 802 ).
- the method 800 can also include the computing system 200 receiving audio signals from a plurality of microphones 110 ( 804 ).
- the method 800 can also include the beamformer 206 generating a first audio signal based on the received audio signals and focusing on the first direction ( 806 ).
- the method 800 can also include the direction determiner 204 determining a second direction of a noise source other than the speaker ( 808 ).
- the method 800 can also include the beamformer 206 generating a second audio signal based on the received audio signals and focusing on the second direction ( 810 ).
- the method 800 can also include the signal combiner 212 generating a combined and/or stereophonic signal based the first audio signal and the second audio signal ( 812 ).
- the determining the first direction ( 802 ) can include determining that the first audio signal is changing as a function of time.
- the generating the first audio signal ( 806 ) can include beamforming the received audio signals in the first direction
- the generating the second audio signal ( 810 ) can include beamforming the received audio signals in the second direction.
- the generating the stereophonic signal ( 812 ) can include generating the stereophonic signal based on a weighted sum of the first audio signal and the second audio signal, the first audio signal receiving a greater weight relative to the second audio signal.
- the stereophonic signal can include the first audio signal and an indicator of the first direction, and the second audio signal and an indicator of the second direction.
- the noise source can be a first noise source.
- the method 800 can further include determining a third direction of a second noise source, the third direction being different than the first direction and the second direction, the second direction being different than the first direction, and generating a third audio signal based on the received audio signals and the third direction.
- the generating the stereophonic signal ( 812 ) can include generating the stereophonic signal based on a weighted sum of the first audio signal, the second audio signal, and the third audio signal, the first audio signal receiving a greater weight relative to the second audio signal and the third audio signal.
- the computing system 200 can include the video system configured to aim at the speaker in the first direction, the plurality microphones configured to receive the audio signals, and a local computing device configured to send video signals received by the video system to a remote computing device, determine the first direction, generate the first audio signal, determine the second direction, generate the second audio signal, generate the stereophonic signal, and send the stereophonic signal to the remote computing device.
- the method 800 can further include at least two electronic speakers 130 A, 130 B that are remote from the computing system 200 to output an outputted audio signal based on the stereophonic signal.
- FIG. 9 is a flowchart showing a method 900 according to another example.
- the method 900 can be performed by the computing system 200 .
- the method 900 can include the aim determiner 202 determining that a video system is aiming at a single speaker ( 902 ).
- the method 900 can also include the direction determiner 204 determining a first direction of the single speaker from an array of microphones 110 ( 904 ).
- the method 900 can also include, based on determining that the video system is aiming at the single speaker and the first direction of the single speaker, the beamformer 206 generating a first beamformed signal based on beamforming, in the first direction, multiple first direction audio signals received by the array of microphones 110 ( 906 ).
- the method 900 can also include the direction determiner 204 determining a second direction of a noise source other than the single speaker ( 908 ).
- the method 900 can also include the beamformer 206 generating a second beamformed signal based on beamforming, in the second direction, multiple second direction audio signals received by the array of microphones in the second direction ( 910 ).
- the method 900 can also include the signal combiner 212 generating a monophonic signal based on the first beamformed signal and the second beamformed signal, the first beamformed signal having greater weight relative to the second beamformed signal ( 912 ).
- the method 900 can also include the aim determiner 202 determining that the video system is not aiming at the single speaker ( 914 ).
- the method 900 can also include the signal combiner 212 , based on determining that the video system is not aiming at the single speaker, generating a stereophonic signal, the stereophonic signal including the first beamformed signal and the second beamformed signal as distinct signals ( 916 ).
- the method 900 can also include sending the monophonic signal to a videoconference system that is remote from the computing system, and sending the stereophonic signal to the videoconference system.
- the generating the first beamformed signal ( 906 ) can include modifying phases of audio signals received by the array of microphones, the modifications being based on differences in distances between microphones in the array of microphones and the single speaker.
- FIG. 10 shows an example of a generic computer device 1000 and a generic mobile computer device 1050 , which may be used with the techniques described here.
- Computing device 1000 is intended to represent various forms of digital computers, such as laptops, desktops, tablets, workstations, personal digital assistants, televisions, servers, blade servers, mainframes, and other appropriate computing devices.
- Computing device 1050 is intended to represent various forms of mobile devices, such as personal digital assistants, cellular telephones, smartphones, and other similar computing devices.
- the components shown here, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed in this document.
- Computing device 1000 includes a processor 1002 , memory 1004 , a storage device 1006 , a high-speed interface 1008 connecting to memory 1004 and high-speed expansion ports 1010 , and a low speed interface 1012 connecting to low speed bus 1014 and storage device 1006 .
- the processor 1002 can be a semiconductor-based processor.
- the memory 1004 can be a semiconductor-based memory.
- Each of the components 1002 , 1004 , 1006 , 1008 , 1010 , and 1012 are interconnected using various busses, and may be mounted on a common motherboard or in other manners as appropriate.
- the processor 1002 can process instructions for execution within the computing device 1000 , including instructions stored in the memory 1004 or on the storage device 1006 to display graphical information for a GUI on an external input/output device, such as display 1016 coupled to high speed interface 1008 .
- an external input/output device such as display 1016 coupled to high speed interface 1008 .
- multiple processors and/or multiple buses may be used, as appropriate, along with multiple memories and types of memory.
- multiple computing devices 1000 may be connected, with each device providing portions of the necessary operations (e.g., as a server bank, a group of blade servers, or a multi-processor system).
- the memory 1004 stores information within the computing device 1000 .
- the memory 1004 is a volatile memory unit or units.
- the memory 1004 is a non-volatile memory unit or units.
- the memory 1004 may also be another form of computer-readable medium, such as a magnetic or optical disk.
- the storage device 1006 is capable of providing mass storage for the computing device 1000 .
- the storage device 1006 may be or contain a computer-readable medium, such as a floppy disk device, a hard disk device, an optical disk device, or a tape device, a flash memory or other similar solid state memory device, or an array of devices, including devices in a storage area network or other configurations.
- a computer program product can be tangibly embodied in an information carrier.
- the computer program product may also contain instructions that, when executed, perform one or more methods, such as those described above.
- the information carrier is a computer- or machine-readable medium, such as the memory 1004 , the storage device 1006 , or memory on processor 1002 .
- the high speed controller 1008 manages bandwidth-intensive operations for the computing device 1000 , while the low speed controller 1012 manages lower bandwidth-intensive operations. Such allocation of functions is exemplary only.
- the high-speed controller 1008 is coupled to memory 1004 , display 1016 (e.g., through a graphics processor or accelerator), and to high-speed expansion ports 1010 , which may accept various expansion cards (not shown).
- low-speed controller 1012 is coupled to storage device 1006 and low-speed expansion port 1014 .
- the low-speed expansion port which may include various communication ports (e.g., USB, Bluetooth, Ethernet, wireless Ethernet) may be coupled to one or more input/output devices, such as a keyboard, a pointing device, a scanner, or a networking device such as a switch or router, e.g., through a network adapter.
- input/output devices such as a keyboard, a pointing device, a scanner, or a networking device such as a switch or router, e.g., through a network adapter.
- the computing device 1000 may be implemented in a number of different forms, as shown in the figure. For example, it may be implemented as a standard server 1020 , or multiple times in a group of such servers. It may also be implemented as part of a rack server system 1024 . In addition, it may be implemented in a personal computer such as a laptop computer 1022 . Alternatively, components from computing device 1000 may be combined with other components in a mobile device (not shown), such as device 1050 . Each of such devices may contain one or more of computing device 1000 , 1050 , and an entire system may be made up of multiple computing devices 1000 , 1050 communicating with each other.
- implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, specially designed ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof.
- ASICs application specific integrated circuits
- These various implementations can include implementation in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, coupled to receive data and instructions from, and to transmit data and instructions to, a storage system, at least one input device, and at least one output device.
- the systems and techniques described here can be implemented on a computer having a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user and a keyboard and a pointing device (e.g., a mouse or a trackball) by which the user can provide input to the computer.
- a display device e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor
- a keyboard and a pointing device e.g., a mouse or a trackball
- Other kinds of devices can be used to provide for interaction with a user as well; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user can be received in any form, including acoustic, speech, or tactile input.
- the systems and techniques described here can be implemented in a computing system that includes a back end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front end component (e.g., a client computer having a graphical user interface or a Web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back end, middleware, or front end components.
- the components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include a local area network (“LAN”), a wide area network (“WAN”), and the Internet.
- LAN local area network
- WAN wide area network
- the Internet the global information network
- the computing system can include clients and servers.
- a client and server are generally remote from each other and typically interact through a communication network.
- the relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
Landscapes
- Engineering & Computer Science (AREA)
- Signal Processing (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Health & Medical Sciences (AREA)
- Otolaryngology (AREA)
- Multimedia (AREA)
- General Health & Medical Sciences (AREA)
- Circuit For Audible Band Transducer (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
Abstract
Description
- During videoconferences, a single person can be speaking at a time. A video camera can aim and/or focus on the single person who is speaking. Persons at a receiving end of the videoconference can perceive noise originating from sources other than the speaker as originating from the same direction as the speaker, which can be perceived as unnatural.
- According to an example, a non-transitory computer-readable storage medium may include instructions stored thereon. When executed by at least one processor, the instructions may be configured to cause a computing system to determine that a video system is aiming at a single speaker of a plurality of people, receive audio signals from a plurality of microphones, the received audio signals including audio signals generated by the single speaker, based on determining that the video system is aiming at the single speaker, transmit a monophonic signal, the monophonic signal being based on the received audio signals, determine that the video system is not aiming at the single speaker, and based on the determining that the video system is not aiming at the single speaker, transmit a stereophonic signal, the stereophonic signal being based on the received audio signals.
- According to an example, a non-transitory computer-readable storage medium may include instructions stored thereon. When executed by at least one processor, the instructions may be configured to cause a computing system to determine a first direction of a speaker that a video system is aiming at, receive audio signals from a plurality of microphones, generate a first audio signal based on the received audio signals and focusing on the first direction, determine a second direction of a noise source other than the speaker, generate a second audio signal based on the received audio signals and focusing on the second direction, and generate a stereophonic signal based the first audio signal and the second audio signal.
- According to an example, a method may be performed by a computing system. The method may comprise determining that a video system is aiming at a single speaker, determining a first direction of the single speaker from an array of microphones, based on determining that the video system is aiming at the single speaker and the first direction of the single speaker, generating a first beamformed signal based on beamforming, in the first direction, multiple first direction audio signals received by the array of microphones, determining a second direction of a noise source other than the single speaker, generating a second beamformed signal based on beamforming, in the second direction, multiple second direction audio signals received by the array of microphones in the second direction, generating a monophonic signal based on the first beamformed signal and the second beamformed signal, the first beamformed signal having greater weight relative to the second beamformed signal, determining that the video system is not aiming at the single speaker, and based on determining that the video system is not aiming at the single speaker, generating a stereophonic signal, the stereophonic signal including the first beamformed signal and the second beamformed signal as distinct signals.
- The details of one or more implementations are set forth in the accompanying drawings and the description below. Other features will be apparent from the description and drawings, and from the claims.
-
FIG. 1 is a diagram of a videoconferencing system according to an example. -
FIG. 2 is a block diagram of a computing system that can implement features of the videoconferencing system according to an example. -
FIG. 3 is a diagram showing directions of beamforming within a location from which the videoconferencing system receives input according to an example. -
FIG. 4A is a diagram showing weights of beamformed signals when the video camera is focusing on a single person according to an example. -
FIG. 4B is a diagram showing weights of beamformed signals when the video camera has zoomed out and is aiming and/or focusing on multiple persons according to an example. -
FIG. 4C is a diagram showing weights of beamformed signals when the video camera is aiming and/or focusing on a single person and the video conferencing system is performing beamforming on the single person and multiple noise sources according to another example. -
FIG. 5 is a diagram showing microphones and directions of beamforming toward different sources of audio signals according to an example. -
FIG. 6 is a diagram showing microphones and a number of wavelengths between the microphones along a direction of beamforming according to an example. -
FIG. 7 is a flowchart showing a method according to an example. -
FIG. 8 is a flowchart showing a method according to another example. -
FIG. 9 is a flowchart showing a method according to another example. -
FIG. 10 shows an example of a computer device and a mobile computer device that can be used to implement the techniques described here. - A computing system can generate and/or transmit monophonic audio signals when a video system, such as a video camera, is aiming at and/or focusing on a single speaker. The monophonic audio signals can be focused on the single speaker, and can be generated by beamforming and/or preferentially weighting audio signals emitted along a path, toward the object such as the single, human speaker, when the video system generating the video signals is focusing on, and/or aiming at, the single speaker.
- In stereo audio conferencing, two audio signals can be generated by beamforming in two different directions. A technical problem with simply beamforming in two different directions, which are independent of a speech source, such as to the left and to the right, to form a left audio channel and a right audio channel, is that the speech source, the human speaker, is not targeted, resulting in less than optimal capturing of the speech from the human speaker. A technical problem with beamforming in only one direction, toward the human speaker, is that when the audio signals are reproduced at a receiving end, noise from other sources will seem to originate from the same direction as the speech source. A technical solution to these technical problems of beamforming in two different directions and beamforming in a single direction is to generate one or more beamformed signals in the direction of the speech source and/or human speaker, and a second beamformed signal in a direction of a noise source other than the speech source and/or human speaker, and attenuating and/or reducing the weight of the beamformed signal(s) in the direction of the speech source and/or human speaker. Technical advantages of beamforming in the direction of the speech source and/or human speaker and in the direction of the noise source include the speech being clearly reproduced and the noise from the noise source(s) being reproduced with a quality of being received from a direction other than the direction of the speech source and/or human speaker. A further technical advantage is that the audio signals focusing on the single speaker when the video camera is focusing on and/or aiming at the single speaker can overcome the otherwise unnatural experience of hearing sounds from different sources during a videoconference, compared to a face-to-face meeting in which participants would turn their heads toward the person who is currently speaking.
- At times, the computing system can generate a single monophonic signal focusing on the speech source and/or single speaker, such as by beamforming in a direction of the speech source and/or single speaker. A technical problem of generating a single monophonic signal focusing on the speech source and/or single speaker is that when the video system is no longer aiming at and/or focusing on the speech source and/or single speaker, the audio signal, which focuses on the single speaker, will not correspond to the video signal, which is capturing more objects and/or persons than only the single speaker. At times, the computing system can generate a stereophonic signal with audio signals received from different directions. A technical problem of generating the stereophonic signal is that when a single human speaker is speaking and the video system is generating an image of only the single speaker, the audio signals capturing noises from different directions will not correspond to the video image. A technical solution for these technical problems is for the computing system to transition from the monophonic signal to a stereophonic signal when the video system is no longer aiming at and/or focusing on the single speaker, such as when the video system zooms out and shows persons other than the single speaker. Technical advantages of transitioning to the stereophonic signal when the video system is no longer aiming at and/or focusing on the single speaker include matching the audio output to the video output and reducing an unnatural experience of seeing a group of people but hearing sounds from only one of them even though others may also be making noise, such as by whispering or shuffling papers. The stereophonic signal can include the monophonic signal generated and/or transmitted when the video system was aiming at and/or focusing on the single speaker, as well as an additional audio signal, which can include audio signals from a different set of microphones and/or focused on a different direction. Controlling the focus of the audio signals on the speaker based on whether the video system is focusing on the single speaker can create a more natural experience for the viewer(s)/listener(s). The computing system can also create a natural sounding combined audio signal and/or stereophonic signal by generating a second and/or additional audio signal, such as by beamforming or preferentially weighting received audio signals, toward a noise source other than the single speaker, and combining the audio signals from the single speaker and the noise source, with the audio signals from the single speaker having greater weight than the audio signals from the noise source. The stereophonic signal including the audio signals from the single speaker and the noise source avoid an unnatural experience on the part of listeners and viewers that noise from sources other than the speaker seems to originate from the same direction as the speaker.
-
FIG. 1 is a diagram of a videoconferencing system according to an example. The videoconferencing system can send video and audio signals from a first location (e.g., a first conference room) 102 to a second location (e.g., a second conference room) 106 via anetwork 104, enabling one ormore persons 124 in thesecond location 106 to see and hear one or 114A, 114B, 114C in themore persons first location 102. The videoconferencing system can include any combination of components shown inFIG. 1 , such as components in both 102, 106, components in bothlocations 102, 106 and alocations server 122, components in only thelocation 102, or components in thelocation 102 and theserver 122, as non-limiting examples. - The
location 102 can include one or 114A, 114B, 114C, any number of whom may be speaking and/or may be sources of noise and/or audio signals. While threemore persons 114A, 114B, 114C are shown in thepersons example location 102 ofFIG. 1 , any number of 114A, 114B, 114C can be in thepersons location 102. In some examples, the 114A, 114B, 114C can each sit onpersons 116A, 116B, 116C behind achairs desk 118. Thelocation 102 can include adoorway 120, which can be a source of noise and/or audio signals, such as from noise generated by a door of thedoorway 120 opening and closing, or from noise originating outside thelocation 102 and entering thelocation 102 through thedoorway 120. - The videoconferencing system can include a
video camera 108 in thefirst location 102. Thevideo camera 108 can be part of a video system, and can capture optical signals and/or video signals within thelocation 102. Thevideo camera 108 can zoom in to a small part of thelocation 102, such as to aim at, focus on, and/or capture images of a single human speaker such as theperson 114B, and/or can zoom out to receive and/or process video signals from a larger part of thelocation 102, such as to capture images of, aim at, and/or focus on all or multiple of the 114A, 114B, 114C sitting at thepersons desk 118. Thevideo camera 108 can also pan left and right, and/or up and down, to change the 114A, 114B, 114C and/or portion of theperson location 102 that thevideo camera 108 is focusing on. Thevideo camera 108 can be controlled manually, or by software that causes thevideo camera 108 to focus on an active speaker in thelocation 102, such as by heuristics or machine learning techniques. In some examples, thevideo camera 108 can send a signal to acomputing device 112 and/ormicrophones 110 indicating a direction in which thevideo camera 108 is focusing. - The videoconferencing system can include multiple and/or a plurality of
microphones 110, and/or an array ofmicrophones 110, in thefirst location 102. Themicrophones 110 can capture audio signals in thelocation 102. Themicrophones 110, thecomputing device 112 receiving audio signals from themicrophones 110, and/or other components of the videoconferencing system, can generate audio signals such as one or more beamformed signals based on the received audio signals that each focus on audio signals received from a particular direction and/or are received along a particular path. Themicrophones 110 and/orcomputing device 112 can generate the beamformed signals by, for example, beamforming audio signals received by themicrophones 110 in a same direction that thevideo camera 108 is aiming and/or focusing, such as a direction of a single speaker that thevideo camera 108 is aiming at and/or focusing on. The generation of beamformed signals by beamforming can include shifting phases of received audio signals so that signals received by themicrophones 110 from the direction in which thevideo camera 108 is aiming and/or focusing constructively interfere with each other, and/or increasing or decreasing amplitudes of signals received by different microphones based on the locations of the microphones and the direction of the focus. The shifting can be based on the direction, a known speed of sound, and a known distance between the microphones, so that the constructive interference is caused by audio signals received by two (or more)microphones 110 coming from the direction in which thevideo camera 108 is aiming and/or focusing being processed as if the audio signals were received by the two (or more)microphones 110 at the same time, whereas audio signals received bymicrophones 110 coming from directions other than the direction in which thevideo camera 108 is aiming and/or focusing are processed as if the audio signals were received at different times, resulting in destructive interference. - The videoconferencing system can include a
computing device 112 in thelocation 102. Thecomputing device 112 can receive video signals from thevideo camera 108 and can receive audio signals from themicrophones 110. In some examples, thecomputing device 112 can control the direction, aim, and/or focus of thevideo camera 108 based on determinations by thecomputing device 112 of which person(s) 114A, 114B, 114C is actively speaking. In some examples, thecomputing device 112 can control the direction of focus and/or generation of focused and/or beamformed audio signals such as by beamforming by themicrophones 110, and/or perform beamforming of audio signals received by thecomputing device 112 from themicrophones 110. Thecomputing device 112 in thelocation 102 in which the video and audio signals of the speaker are recorded can be considered a local computing device. - In some examples, the videoconferencing system can generate monophonic signals based on audio signals received by the
microphones 110 when thevideo camera 108 is aiming at and/or focusing on the single speaker, and transmit a stereophonic signal based on audio signals received by the microphones when thevideo camera 108 has stopped and/or is no longer aiming at and/or focusing on the single speaker. A receiving system can transmit the same monophonic signal out of all speakers, and can transmit a first signal from the stereophonic signal out of a first (or more) speaker, and a second signal from the stereophonic signal out of a second (or more) speaker. - In some examples, the monophonic signal can be based on signals received from a set of, which can include some or all of, microphones from the
microphones 110. In some examples, the stereophonic signal can include a first audio signal received from a first microphone and/or first set of microphones from themicrophones 110, and a second audio signal received from a second microphone and/or second set of microphones from themicrophones 110, the first set being different from the second set. - In some examples, the videoconferencing system can generate monophonic signals by focusing audio signals in a specific direction, such as based on first beamformed audio signals that are beamformed in a direction that the
video camera 108 is focusing and/or aiming, such as in a direction of a single speaker that thevideo camera 108 is focusing on and/or aiming at. If and/or when thevideo camera 108 stops focusing on and/or aiming at an object, and/or is no longer focusing on and/or aiming at an object, the videoconferencing system can generate stereophonic signals such as by generating a second (or more) beamformed signal and combining the second beamformed signal with the first beamformed signal that focuses in the direction that thevideo camera 108 is focusing and/or aiming. The generation of stereophonic signals based on multiple beamformed signals can cause noise from more parts of thelocation 102 to be transmitted to remote participants of the videoconference along with audio signals transmitted from the speaker. - In some examples, the videoconferencing system can generate multiple focused and/or beamformed audio signals by beamforming audio signals in multiple directions. The videoconferencing system can, for example, generate a first beamformed signal focusing on a first direction based on beamforming, in a first direction of a human speaker, audio signals received from the first direction. The videoconferencing system can also generate a second beamformed signal focusing on a second direction based on beamforming, in a second direction of a noise source, different from the first direction, audio signals received from the second direction. The videoconferencing system can generate a combined signal and/or stereophonic signal based on combining the first beamformed signal and the second beamformed signal. The first beamformed signal can have greater weight within the combined and/or stereophonic signal, making the voice of the human speaker in the first direction easily audible, but still providing some of the background noise from the noise source to create a sound that is more similar to that experienced by a person actually in the
location 102 and near thevideo camera 108 andmicrophones 110. - The
video camera 108,microphones 110, and/orcomputing device 112 can be combined into one apparatus, or can be set up in thelocation 102 as standalone components and communicate with each other via wired or wireless interfaces. Thecomputing device 112 can be in thesame location 102 as thevideo camera 108 andmicrophones 110, or can be outside thelocation 102 and communicate with thevideo camera 108 andmicrophones 110 via wired or wireless interfaces. The videoconference system can also include a display and/or speakers in thelocation 102, so that the 114A, 114B, 114C from whom thepersons video camera 108 andmicrophones 110 are capturing video and audio input can view and listen to persons in remote locations, such as asecond location 106. - The
computing device 112 can communicate with acomputing device 132 in a remote,second location 106, and/or aremote server 122, via anetwork 104. Thenetwork 104 can include multiple interfaces and/or devices facilitating communication between computing devices, such as the Internet or, in the example of a videoconference system maintained within a corporate or college campus, a local area network (LAN). Theserver 122 can perform any combination of the functions, methods, and/or techniques described herein, such as controlling the focus, aim, and/or direction of thevideo camera 108, beamforming audio signals received by themicrophones 110, and/or combining the beamformed signals and/or signals from different microphones to generate stereophonic signals, or may simply transmit the video and audio data between 112, 132. While twocomputing devices 102, 106 are shown in the videoconference system oflocations FIG. 1 , any number of locations may be included in the videoconference system, with persons in each location viewing and listening to one or more human speakers in a remote location(s) on a display and from electrical speakers. - The
second location 106 can be remote from thefirst location 102. Thesecond location 106 can include thecomputing device 132. Thecomputing device 132 in thesecond location 106 can receive video and audio signals from thecomputing device 112 in thefirst location 102 and/or theserver 122. Thecomputing device 132 in thesecond location 106 can transmit the video and audio signals to adisplay 128 and 130A, 130B, respectively, to present the video and audio to aelectronic speakers person 124 in thesecond location 106. In some examples, such as when thevideo camera 108 has stopped and/or is not focusing on and/or aiming at a single speaker, a firstelectronic speaker 130A can, based on a combined and/or stereophonic signal received from thecomputing device 112 via thecomputing device 132, output a first monophonic and/or audio signal such as words spoken by a human speaker, and a secondelectronic speaker 130B can, based on the combined and/or stereophonic signal received from thecomputing device 112 via thecomputing device 132, output a second monophonic and/or audio signal such as noise generated by a noise source other than the speaker. In some examples, such as when thevideo camera 108 is focusing on and/or aiming at the single speaker and thecomputing device 112 transmits a monophonic signal, both 130A, 130B can output the same monophonic signal. Thespeakers computing device 132 in thesecond location 106 in which the video and audio of the speaker are presented, rather than recorded and/or captured, can be considered a remote computing device. - The
second location 106 can include adisplay 128 and one or 130A, 130B. Themore speakers display 128 can present images based on the video data received by thedisplay 128 from thecomputing device 132 in thesecond location 106, which may be the video captured by thevideo camera 108. Thedisplay 128 can include a traditional screen that generates images by projecting light toward the viewers, such as a cathode ray tube (CRT) display, plasma display, a light-emitting diode (LED) display, a liquid crystal display (LCD), a projector that projects images onto a screen, or a holographic system that creates a holographic image of the speaker and/or other persons in thefirst location 102, as non-limiting examples. - The speaker(s) 130A, 130B can output sound based on audio signals received from the
computing device 132 in thesecond location 106, which may be based on the combined signal(s) generated by thecomputing device 112 in thefirst location 102 and/ormicrophones 110. The speaker(s) 130A, 130B can output the same sound, or in examples of receiving stereophonic signals, the 130A, 130B can each output different sounds, such as sounds based on different audio signals generated based on beamforming in different directions or audio signals received by different sets of microphones.speakers - A
person 124 can be in thesecond location 106, watching and listening to the person(s) 114A, 114B, 114C who are in thefirst location 102, on thedisplay 128 and from the speaker(s) 130A, 130B. Theperson 124 can sit on achair 126. In some examples, thesecond location 106 can also include a video camera and microphones for capturing video and audio signals from theperson 124 to present and/or output to persons in other locations, such as thefirst location 102. -
FIG. 2 is a block diagram of acomputing system 200 that can implement features of the videoconferencing system according to an example. The features of thecomputing system 200 described herein can be included in, and/or performed by, thecomputing device 112 in thefirst location 102, theserver 122, thecomputing device 132 in thesecond location 106, or any combination of thecomputing device 112,server 122, and/orcomputing device 132. - The
computing system 200 can include anaim determiner 202. Theaim determiner 202 can determine a direction of aim and/or focus of thevideo camera 108. Theaim determiner 202 can determine that thevideo camera 108 is aiming at and/or focusing on a single, human speaker, and determine the direction of the single speaker from thevideo camera 108 and/ormicrophones 110. Thevideo camera 108 can aim at and/or focus on the single, human speaker by pointing in the direction of the speaker so that the speaker is in or near the middle of an image captured by the video camera, and/or can focus on the single, human speaker by adjusting a lens of thevideo camera 108 so that light reflected from the speaker converges on a sensor of thevideo camera 108. - In some examples, the
aim determiner 202 can determine the direction of aim and/or focus by thevideo camera 108 based on receiving and/or processing a single speaker signal from thevideo camera 108. The single speaker signal can indicate that thevideo camera 108 is aiming at and/or focusing on the single speaker and/or is capturing an image of only asingle person 114B in thelocation 102, and can indicate a direction of the single speaker. Thevideo camera 108 may have determined that a single speaker is speaking based on video data, such as facial expressions of the single speaker including lip movement, body language of other persons captured by thevideo camera 108 such as the other persons facing or angling their bodies toward the single speaker, or thevideo camera 108 capturing the image of only thesingle person 114B and not capturing images of 114A, 114C in theother persons location 102. The direction of the single speaker can be determined based on the direction that thecamera 108 is pointing, and/or based on a location of the speaker within a captured image. - When the
video camera 108 is aiming at and/or focusing on the single speaker, thecomputing system 200 can focus, and/or perform a beamforming operation, in the direction of the single speaker, and send a single speaker audio signal to theremote computing device 132. The single speaker audio signal can include the combined signal (discussed below) and an indication that only a single speaker is speaking, which can prompt theremote computing device 132 to output the audio as either stereophonic audio output or monophonic audio output. In some examples, theaim determiner 202 can determine that thevideo camera 108 is no longer aiming at and/or focusing on, and/or has stopped aiming at and/or focusing on, the single speaker, based on receiving and/or processing a multiple speaker signal from thevideo camera 108. The multiple speaker signal can indicate that thevideo camera 108 is aiming at and/or focusing on multiple speakers, and/or capturing a wide view that includes 114A, 114B, 114C.multiple persons - In some examples, the
aim determiner 202 can determine that thevideo camera 108 is no longer aiming at and/or focusing on the single speaker such as theperson 114B and/or that thevideo camera 108 has stopped aiming at and/or focusing on the single speaker. Theaim determiner 202 can determine that thevideo camera 108 is no longer and/or has stopped aiming at and/or focusing on the single speaker based on receiving a multiple speaker signal from thevideo camera 108, or based on multiple persons being in the image captured by thevideo camera 108, according to example embodiments. When thevideo camera 108 is no longer aiming at and/or focused on, and/or has stopped aiming at and/or focusing on, the single speaker, thecomputing system 200 can send a multiple speaker audio signal to theremote computing device 132. The multiple speaker audio signal can include the combined and/or stereophonic signal (discussed below) and an indication that multiple human speakers are speaking, which can prompt theremote computing device 132 to output the audio stereophonically, such as outputting focused and/or beamformed audio signals from a first human speaker through a first electronic speaker and outputting focused and/or beamformed audio signals from a second human speaker through a second electronic speaker. In some examples, in response to thevideo camera 108 resuming aim and/or focus on the single speaker, and/or aiming at and/or focusing on a new single speaker, thecomputing system 200 can generate a monophonic signal focusing on the single speaker and transmit the generated monophonic signal to the remote computing device. - The
computing system 200 can include adirection determiner 204. Thedirection determiner 204 can determine one or more directions in which to focus, beamform, and/or preferentially weight audio signals. - In some examples, the
direction determiner 204 can determine that thecomputing system 200 should focus and/or beamform audio signals in a first direction that theaim determiner 202 has determined that thevideo camera 108 is aiming and/or focusing, such as a direction of a single speaker. In some examples, thedirection determiner 204 can also determine the first direction of the single speaker based on first direction audio signals received by themicrophones 110 in a first direction, such as audio signals indicating human speech, and comparing times of receiving and/or processing the audio signals by thedifferent microphones 110. Thedirection determiner 204 can, for example, determine delays between audio signals received and/or processed by the different microphones, and determine the direction based on the determined delays, known speed of sound, and known distances between microphones (for example, if the delay between two microphones is equal to the time it takes sound to travel between the microphones, then the direction would be on or near a line extending through the two microphones in the direction of the microphone that first received and/or processed the audio signal). In some examples, thedirection determiner 204 can determine the first direction of the single speaker based on determining that multiple first direction audio signals in the first direction are changing as a function of time, such as by performing beamforming operations in multiple directions and determining the direction that has the greatest changes in audio amplitude over a sampling period. Thedirection determiner 204 can, for example, perform beamforming operations in multiple directions over the sampling period to generate multiple beamformed signals, and determine that the direction of the beamformed signal with greatest change over the sampling period is in the direction of the speaker based on an assumption that human speech has a high degree of variation (for example, based on pauses between words and/or sentences). - In some examples, the
direction determiner 204 can determine a second direction in which thecomputing system 200 should focus and/or beamform audio signals. The second direction can be a noise source other than the single speaker. The noise source can be a second human speaker, or other types of noise such as people speaking in the background, a door opening and/or closing, or papers or chairs being moved, as non-limiting examples. Thedirection determiner 204 can determine the second direction of a noise source based on comparing times of receiving and/or processing second direction audio signals received by thedifferent microphones 110 in a second direction. - In some examples, the
direction determiner 204 can determine a third direction in which thecomputing system 200 should focus and/or beamform audio signals. The third direction can be a noise source other than the single speaker. The noise source can be a second or third human speaker, or other types of noise such as people speaking in the background, a door opening and/or closing, or papers or chairs being moved, as non-limiting examples. Thedirection determiner 204 can determine the third direction of a noise source based on comparing times of receiving and/or processing the audio signals by thedifferent microphones 110. - The
computing system 200 can include abeamformer 206. Thebeamformer 206 can focus on audio signals received along a path, which may be a straight line or may bend in examples of reflected audio signals, to generate focused audio signals and/or beamformed signals. Thebeamformer 206 can generate focused audio signals and/or beamformed signals by combining and/or modifying signals received by and/or from themicrophones 110 so that audio signals and/or noises received by multiple microphones in the direction of focus and/or performing beamforming operations experience constructive interference and/or are amplified, while audio signals and/or noises received by multiple microphones in directions other than the direction of focus and/or other than performing beamforming operations experience destructive interference and/or are reduced in magnitude. Thebeamformer 206 can beamform multiple audio signals received from a direction of thesingle speaker 114B, and/or can beamform multiple audio signals received from a direction other than thesingle speaker 114B. - The
beamformer 206 can include amicrophone selector 208. Themicrophone selector 208 can selectmultiple microphones 110, such as twomicrophones 110, for which a line intersecting the twomicrophones 110 is most closely parallel to the direction in which the beamforming is performed. - The
beamformer 206 can include aphase shifter 210. Thephase shifter 210 can shift the phase of the audio signal received by one of the selectedmicrophones 110 so that the audio signals received by the selectedmicrophones 110 constructively interfere with each other, amplifying the audio signals received in the direction of beamforming. Thephase shifter 210 can modify and/or shift the phase(s) of the audio signals based on a distance between the selectedmicrophones 110 and a speed of sound, delaying the phase of themicrophone 110 closer to the noise source so that with respect to audio signals received from the noise source in the direction of focus and/or beamforming, the phase-shifted signal of the selectedmicrophone 110 closer to the noise source matches the signal of the selectedmicrophone 110 farther from the noise source. Noise sources in directions other than the direction of focus and/or beamforming will experience varying degrees of destructive interference between the selectedmicrophones 110, reducing the amplitude of audio signals received from noise sources in directions other than the direction of focus and/or beamforming. - The
beamformer 206 can process signals only from the selectedmicrophones 110 in an example in which thebeamformer 206 narrowly focuses on the direction of beamforming, so that all audio signals processed by thebeamformer 206 experience constructive interference in the direction of beamforming. In examples of broader beamforming, thebeamformer 206 can also process signals frommicrophones 110 other than the selected microphones, to process audio signals from noise sources in directions other than the direction of beamforming and/or the direction of the selected noise source. Thebeamformer 206 can reduce the weight of signals received from themicrophones 110 other than the selectedmicrophones 110 to narrow the beamforming (and/or increase the focus in the direction of focus) when thevideo camera 108 zooms in on the speaker, and/or can increase the weight of signals received from themicrophones 110 other than the selectedmicrophones 110 to broaden the beamforming (and/or decrease the focus in the direction of focus) when thevideo camera 108 zooms out away from the speaker, according to example implementations. - In some examples, the
beamformer 206 can reduce the focus and/or beamforming by broadening beamforming, such as by increasing the weight of signals received from themicrophones 110 other than the selectedmicrophones 110, and/or by increasing the weight of a beamformed signal(s) other than the beamformed signal focusing in the direction of the single speaker. In some examples, thebeamformer 206 can reduce beamforming by ceasing beamforming, such as ceasing and/or stopping the shifting of phases of signals received frommicrophones 110. - The
computing system 200 can include asignal combiner 212. Thesignal combiner 212 can combine audio signals processed by thebeamformer 206, which may be focused and/or beamformed in different directions, and/or may combine audio signals received by different sets of microphones. Thesignal combiner 212 can, for example, combine a first focused and/or beamformed signal for which beamforming was performed in a direction of an active human speaker and/or a single human speaker with a second, additional, and/or third beamformed signal(s) for which beamforming was performed in a direction(s) of a noise source(s) other than the direction of the active human speaker and/or a single human speaker. Thesignal combiner 212 can add the first focused and/or beamformed signal to the second focused and/or beamformed signal to generate a monophonic signal, or may include both the first focused and/or beamformed signal and the second focused and/or beamformed signal as distinct audio signals to generate a stereophonic signal that includes multiple focused and/or beamformed signals. - The
signal combiner 212 can include asignal weighter 214. Thesignal weighter 214 can weight the signals of the audio signals combined by thesignal combiner 212. Thesignal weighter 214 can, for example, reduce the weight and/or amplitude of certain signals, such as the signals processed or generated by thebeamformer 206 in directions other than the direction of the active human speaker and/or a single human speaker and/or outside the path along which thebeamformer 206 is focusing and/or performing beamforming. The signal weighter can preferentially weight beamformed audio signals, such as audio signals emitted along a path passing through at least one of the plurality of microphones and the speaker, as compared with sounds emitted from outside the path. If theaim determiner 202 determines thatvideo camera 108 is aiming at and/or focusing on the active human speaker and/or a single human speaker, then thesignal weighter 214 can reduce the relative weights and/or amplitudes of signals processed or generated by thebeamformer 206 in directions other than the direction of the active human speaker and/or a single human speaker compared to the weight and/or amplitude of the signals processed or generated by thebeamformer 206 in the directions of the active human speaker and/or a single human speaker. If theaim determiner 202 determines that thevideo camera 108 is no longer aiming at and/or focusing on the active human speaker and/or a single human speaker, and/or has stopped aiming at and/or focusing on the active human speaker and/or a single human speaker, such as by zooming out to capture images of 114A, 114B, then themore persons signal weighter 214 can increase the relative weights and/or amplitudes of signals processed or generated by thebeamformer 206 in directions other than the direction of the active human speaker and/or a single human speaker compared to the weight and/or amplitude of the signals processed or generated by thebeamformer 206 in the directions of the active human speaker and/or a single human speaker. - In some examples, the combined signal generated by the
signal combiner 212 can include multiple focused and/or beamformed signals, with one focused and/or beamformed signal for each direction in which beamforming was performed, forming a stereophonic signal. Each focused and/or beamformed signal can include a single beamformed signal and an indication of a direction in which the beamforming was performed. For example, the combined and/or stereophonic signal can include a first focused and/or beamformed signal including the first beamformed signal and an indicator of the first direction, and a second focused and/or beamformed signal including the second beamformed signal and an indicator of the second direction. Thecomputing device 112 can send the combined and/or stereophonic signal to thecomputing device 132, and thecomputing device 132 can transmit one focused and/or beamformed signal to each 130A, 130B, based on the indicated direction, creating a stereo effect in thespeaker second location 106. - The
computing system 200 can include at least oneprocessor 216. The at least oneprocessor 216 can include one or more processors, and can be included in one or more computing devices. The at least oneprocessor 216 can execute instructions, such as instructions stored in memory, to cause thecomputing system 200 to perform any combination of methods, functions, and/or techniques described herein. - The
computing system 200 can include at least onememory device 218. The at least onememory device 218 can be included in one or more computing devices. The at least onememory device 218 can include a non-transitory computer-readable storage medium. The at least onememory device 218 can store instructions that, when executed by the at least oneprocessor 216, cause thecomputing system 200 to perform any combination of methods, functions, and/or techniques described herein. The at least onememory device 218 can store data accessed to perform, and/or generated by, any combination of methods, functions, and/or techniques described herein. - The
computing system 200 can include input/output nodes 220. The input/output nodes 220 can receive and/or send signals from and/or to other computing devices. The input/output nodes 220 can include one ormore video cameras 108,microphones 110,displays 128, and/or 130A, 130B. The input/speakers output nodes 220 can include devices for receiving input from a user, such as via a keyboard, mouse, and/or touchscreen. The input/output nodes 220 can also include devices for providing output to a user, such as a screen or monitor, printer, or speaker. The input/output nodes 220 can also include devices for communicating with other computing devices, such as networking and/or communication interfaces including wired interfaces (such as Ethernet (Institute for Electrical and Electronics Engineers (IEEE) 802.3), Universal Serial Bus (USB), coaxial cable, and/or High Definition Multiple Input (HDMI)), and/or wireless interfaces (such as Wireless Fidelity (IEEE 802.11), Bluetooth (IEEE 802.15), and/or a cellular network protocol such as Long-Term Evolution (LTE) and/or LTE-Advanced), as non-limiting examples. -
FIG. 3 is a 302, 304, 306 of beamforming within thediagram showing directions location 102 from which the videoconferencing system receives input according to an example. The directions of beamforming can represent directions of focus by thecomputing system 200 and/ormicrophones 110. In some examples, themicrophones 110,computing system 200, and/or videoconferencing system can focus and/or perform beamforming in afirst direction 302 toward asingle person 114B who is an active speaker to generate a first focused and/or beamformed signal. In some examples, themicrophones 110,computing system 200, and/or videoconferencing system can focus and/or perform beamforming in asecond direction 304 toward another noise source such as aperson 114A who may be speaking at a same time as theperson 114B to generate a second focused and/or beamformed signal. In some examples, themicrophones 110,computing system 200, and/or videoconferencing system can focus and/or perform beamforming in a third direction 306 toward a noise source such as thedoorway 120 to generate a third focused and/or beamformed signal, which may allow noise to travel into thelocation 102 from outside thelocation 102 and/or may generate noise from a door in thedoorway 120 opening and/or closing. The focused and/or beamformed audio signal generated based on beamforming in thefirst direction 302 can be combined with the second audio signal and/or third audio signal to generate a combined signal and/or stereophonic signal. -
FIG. 4A is a 410, 412 of beamformed signals when thediagram showing weights video camera 108 is focusing on asingle person 114B according to an example. In this example, thevideo camera 108 is focused on thesingle person 114B, and animage 402A generated by thevideo camera 108,computing device 112,computing system 200, and/or videoconference system shows, presents, and/or displays aperson image 414B of thesingle person 114B who is the active speaker. Thesignal combiner 212 can generate a combinedsignal 404A, which can be monophonic, based on afirst signal 406, which can be a beamformed signal in thefirst direction 302 toward theperson 114B who is the active speaker, and asecond signal 408 and/or additional signal, which can be a beamformed signal in the second direction toward a noise source such as aperson 114A other than theperson 114B who is the active speaker. Thesignal weighter 214 can, based on the determination that thevideo camera 108 and/or video system is focusing on the active and/or single speaker in thefirst direction 302, give the first signal 206 agreater weight 410 in the combinedsignal 404A than theweight 412 of thesecond signal 408. In examples in which theweight 412 of thesecond signal 408 is zero, the combinedsignal 404A includes only thefirst signal 406. In examples in which theweight 412 of thesecond signal 408 is greater than zero, the combinedsignal 404A can include both thefirst signal 406 and thesecond signal 408. -
FIG. 4B is a 410, 412 of beamformed signals when thediagram showing weights video camera 108 has zoomed out and is aiming at and/or focusing on 114A, 114B, 114C according to an example. In this example, themultiple persons video camera 108 is no longer aiming at and/or focusing on, and/or has stopped aiming at and/or focusing on, theperson 114B who is the single speaker and/or the active speaker. Thevideo camera 108 has zoomed out to present abroader image 402B, which includes three 414A, 414B, 414C (which are representations of theperson images 114A, 114B, 114C) sitting at the desk image 418 (which is a representation of the desk 118). In some examples, based on determining that thepersons video camera 108 is no longer aiming at and/or focusing on, and/or has stopped aiming at and/or focusing on, the single speaker, thecomputing system 200 can reduce the beamforming, such as by increasing theweight 412 of thesecond signal 408 relative to theweight 410 of thefirst signal 406, and/or decreasing theweight 410 of thefirst signal 406 relative to theweight 412 of thesecond signal 408, within the combinedsignal 404B. Thefirst signal 406 can have less weight in the combinedsignal 404B after thevideo camera 108 has zoomed out than in the combinedsignal 404A when thevideo camera 108 was aiming at and/or focusing on the single person. In some examples, when thevideo camera 108 is no longer aiming at and/or focusing on, and/or has stopped aiming at and/or focusing on, theperson 114B who is the single speaker and/or the active speaker, the combinedsignal 404B can be a monophonic signal that includes approximately equal contributions from the 406, 408, and the same combined monophonic signal can be outputted by both of theaudio signals 130A, 130B. In some examples, when thespeakers video camera 108 is no longer aiming at and/or focusing on, and/or has stopped aiming at and/or focusing on, theperson 114B who is the single speaker and/or the active speaker, the combinedsignal 404B can be a stereophonic signal that includes distinct audio signals from each of thefirst signal 406 andsecond signal 408, and each of thefirst signal 406 andsecond signal 408 can be outputted by a 130A, 130B.different speaker -
FIG. 4C is a diagram showing weights of beamformed signals when thevideo camera 108 is aiming at and/or focusing on asingle person 114B and the video conferencing system is performing beamforming on thesingle person 114B and multiple noise sources according to another example. In this example, thevideo camera 108 is aiming at and/or focusing on theperson 114B who is the single speaker and/or active speaker, but has zoomed out to present a broader image 402C, which includes the three 414A, 414B, 414C sitting at theperson images desk image 418 and the doorway image 420 (which is a representation of the doorway 120). In this example, based on determining that thevideo camera 108 is aiming at and/or focusing on the single speaker, thecomputing system 200 can perform beamforming in thefirst direction 302 on theperson 114B (represented by theperson image 414B) to generate a firstbeamformed signal 406, in thesecond direction 304 on a first noise source such as theperson 114A (represented by theperson image 414A) to generate a secondbeamformed signal 408 and/or second additional signal, and in a third direction 306 on a second noise source such as the doorway 120 (represented by the doorway image 420) to generate a third beamformed signal 422. Thesecond direction 304 can be away from and/or different from thefirst direction 302, and the third direction 306 can be away from and/or different from both thefirst direction 302 and thesecond direction 304. Based on thevideo camera 108 aiming at and/or focusing on the single speaker and/orperson 114B, the weighted sum of thefirst signal 406,second signal 408, and third signal 422, used to generate a combinedsignal 404C, can have agreater weight 410 for thefirst signal 406 than theweight 412 of the second signal and theweight 424 of the third signal 422. Based on all of the 406, 408, 422 having weights greater than zero, the combinedsignals signal 404C can be a combined monophonic signal that will focus on the single speaker due to the emphasis on thefirst signal 406 but also include background noise due to the contributions from the second andthird signals 408, 422. -
FIG. 5 is adiagram showing microphones 110 and 302, 304, 306 of beamforming toward different sources of audio signals according to an example. Thedirections 302, 304, 306 can be paths along which audio signals travel from the noise sources (such as thedirections 114A, 114B and doorway 120) to thepersons microphones 110, and/or paths along which optical beams travel from the 114A, 114B (and/or persons), 120 (and/or doorway), based on which theobjects 414A, 414B, 420 are created, toward theimages video camera 108. The noise sources can include theperson 114B afirst direction 302 from themicrophones 110, theperson 114A asecond direction 304 from themicrophones 110, and the doorway 120 a third direction 306 from themicrophones 110. In this example, themultiple microphones 110 form an array ofmicrophones 110. In this example, the array ofmicrophones 110 includes eight 110A, 110B, 110C, 110D, 110E, 110F, 110G, 110H arranged in a circular pattern. Each of themicrophones 110A, 110B, 110C, 110D, 110E, 110F, 110G, 110H can be in a different location than each of themicrophones 110A, 110B, 110C, 110D, 110E, 110F, 110G, 110H. After determining a direction of a noise source in which to focus and/or beamform, theother microphones computing system 200 can determine a pair of 110A, 110B, 110C, 110D, 110E, 110F, 110G, 110H that, when a line or ray is drawn through the microphones, is more closely parallel than any other pair ofmicrophones 110A, 110B, 110C, 110D, 110E, 110F, 110G, 110H to the direction of the noise source in which to focus and/or beamform.microphones - In the example shown in
FIG. 5 , the 110A, 110E form a line most closely parallel to themicrophones first direction 302. Themicrophone selector 208 can select 110A, 110E for focusing and/or performing beamforming in themicrophones first direction 302, and thephase shifter 210 can delay the signals from themicrophone 110A (which is closer than themicrophone 110E to theperson 114B who is the noise source) by an amount of time sound takes to travel the distance from themicrophone 110A to themicrophone 110E, thereby causing audio signals received by both 110A, 110E from any noise source along the line of themicrophones first direction 302 to constructively interfere with each other. - In the example shown in
FIG. 5 , the 110H, 110E form a line most closely parallel to themicrophones second direction 304. Themicrophone selector 208 can select 110H, 110E for focusing and/or performing beamforming in themicrophones second direction 304, and thephase shifter 210 can delay the signals from themicrophone 110H (which is closer than themicrophone 110E to theperson 114A who is the noise source) by an amount of time sound takes to travel the distance from themicrophone 110H to themicrophone 110E, thereby causing audio signals received by both 110H, 110E from any noise source along the line of themicrophones second direction 304 to constructively interfere with each other. - In the example shown in
FIG. 5 , the 110C, 110D form a line most closely parallel to the third direction 306. Themicrophones microphone selector 208 can select 110C, 110D for performing beamforming in the third direction 306, and themicrophones phase shifter 210 can delay the signals from themicrophone 110C (which is closer than themicrophone 110D to thedoorway 120 which is the noise source) by an amount of time sound takes to travel the distance from themicrophone 110C to themicrophone 110D, thereby causing audio signals received by both 110C, 110D from any noise source along the line of the third direction 306 to constructively interfere with each other.microphones -
FIG. 6 is a 110A, 110E and a number of wavelengths A between thediagram showing microphones 110A, 110E along amicrophones direction 302 of beamforming according to an example. In this example, the 110A, 110E are four-and-a-half wavelengths apart. The distance between themicrophones 110A, 110B may have been predetermined and stored in themicrophones memory 218 of thecomputing system 200. Distances between other pairs of 110A, 110B, 110C, 110D, 110E, 110F, 110G, 110H may also have been predetermined and stored in themicrophones memory 218 of thecomputing system 200. When beamforming along thefirst direction 302, thephase shifter 210 can delay the phase of the audio signals received by themicrophone 110A by an amount of time for sound to travel the distance between the microphones, in this example four-and-a-half wavelengths from themicrophone 110A to themicrophone 110E (or some other distance and/or number of wavelengths for other pairs of 110A, 110B, 110C, 110D, 110E, 110F, 110G, 110H), and/or differences in distances between the onemicrophones microphone 110A and thespeaker 114B, and the distance between themicrophone 110E and thesingle speaker 114B, such as by dividing the distance between the 110A, 110E, and/or difference in distances, by the known speed of sound.microphones -
FIG. 7 is a flowchart showing amethod 700 according to an example. According to this example, themethod 700 includes theaim determiner 202 determining that a video system is aiming at a single speaker of a plurality of people (702). Themethod 700 can also include thecomputing system 200 receiving audio signals from a plurality ofmicrophones 110, the received audio signals including audio signals generated by the single speaker (704). Themethod 700 can also include thecomputing system 200, based on determining that the video system is aiming at the single speaker, transmitting a monophonic signal, the monophonic signal being based on the received audio signals (706). Themethod 700 can also include theaim determiner 202 determining that the video system is not aiming at the single speaker (708). Themethod 700 can also include thecomputing system 200, based on the determining that the video system is not aiming at the single speaker, transmitting a stereophonic signal, the stereophonic signal being based on the received audio signals. - According to an example, the monophonic signal can be based on the received audio signals and can focus on the single speaker, and the stereophonic signal can include the monophonic signal and an additional signal. The additional signal can be based on the received audio signals and can focus on a noise source other than the single speaker.
- According to an example, the
method 700 can further include thecomputing system 200 generating the monophonic signal by performing a beamforming operation on the received audio signals in a direction of the single speaker. - According to an example, the
method 700 can further include thecomputing system 200 generating the monophonic signal by preferentially weighting audio signals emitted along a path passing through at least one of the plurality of microphones and the speaker as compared with sounds emitted from outside the path. - According to an example, the determining that the video system is aiming at the single speaker can include processing a single speaker signal from the video system, the single speaker signal indicating that the video system is aiming at the single speaker, and the determining that the video system is not aiming at the single speaker can include processing a multiple speaker signal from the video system, the multiple speaker signal indicating that the video system is aiming at multiple speakers.
- According to an example, the stereophonic signal can include a first audio signal based on a first microphone of the plurality of microphones and a second audio signal based on a second microphone of the plurality of microphones.
- According to an example, the
method 700 can further include thecomputing system 200 generating the monophonic signal by shifting a phase of an audio signal received from at least one microphone of the plurality of microphones relative to at least one other microphone of the plurality of microphones, the shifting being based on differences in distances between the at least one microphone and the single speaker, and the at least one other microphone and the single speaker. - According to an example, the
method 700 can further include thecomputing system 200 generating the monophonic signal by shifting a phase of at least a first audio signal received by a first microphone of the plurality of microphones from the single speaker so that at least a portion of the first audio signal received from the single speaker constructively interferes with at least a portion of a second audio signal received by a second microphone of the plurality of microphones, the second microphone being in a different location than the first microphone. - According to an example, the
method 700 can further include thecomputing system 200, based on determining that the video system is aiming at the single speaker, generating a first audio signal by beamforming multiple audio signals received by the plurality of microphones from a direction of the single speaker, generating a second audio signal by beamforming multiple audio signals received by the plurality of microphones from a direction away from the single speaker, and generating the monophonic signal based on a weighted sum of the first audio signal and the second audio signal, the first audio signal receiving a greater weight relative to the second audio signal. In this example, the transmitting the stereophonic signal can include transmitting the first audio signal and the second audio signal as distinct audio signals. - According to an example, the
computing system 200 can include a video camera configured to aim at the single speaker and capture images of the single speaker, the plurality of microphones configured to capture the received audio signals in a direction of the single speaker, and a local computing device configured to receive the captured images from the video camera, send the captured images to a remote computing device, receive the audio signals from the plurality of microphones, determine that the video camera is aiming at the single speaker, based on the determining that the video camera is aiming at the single speaker, beamform the received audio signals in the direction of the single speaker to generate a first beamformed signal, based on the beamforming, transmit the monophonic signal to the remote computing device, determine that the video camera is not aiming at the single speaker, based on the determining that the video camera is not aiming at the single speaker, beamform the received audio signals in a direction other than the direction of the single speaker to generate a second beamformed signal, and transmit the stereophonic signal to the remote computing device, the stereophonic signal including the first beamformed signal and the second beamformed signal. -
FIG. 8 is a flowchart showing amethod 800 according to another example. According to this example, themethod 800 can include thedirection determiner 204 determine a first direction of a speaker that a video system is aiming at (802). Themethod 800 can also include thecomputing system 200 receiving audio signals from a plurality of microphones 110 (804). Themethod 800 can also include thebeamformer 206 generating a first audio signal based on the received audio signals and focusing on the first direction (806). Themethod 800 can also include thedirection determiner 204 determining a second direction of a noise source other than the speaker (808). Themethod 800 can also include thebeamformer 206 generating a second audio signal based on the received audio signals and focusing on the second direction (810). Themethod 800 can also include thesignal combiner 212 generating a combined and/or stereophonic signal based the first audio signal and the second audio signal (812). - According to an example, the determining the first direction (802) can include determining that the first audio signal is changing as a function of time.
- According to an example, the generating the first audio signal (806) can include beamforming the received audio signals in the first direction, and the generating the second audio signal (810) can include beamforming the received audio signals in the second direction.
- According to an example, the generating the stereophonic signal (812) can include generating the stereophonic signal based on a weighted sum of the first audio signal and the second audio signal, the first audio signal receiving a greater weight relative to the second audio signal.
- According to an example, the stereophonic signal can include the first audio signal and an indicator of the first direction, and the second audio signal and an indicator of the second direction.
- According to an example, the noise source can be a first noise source. In this example, the
method 800 can further include determining a third direction of a second noise source, the third direction being different than the first direction and the second direction, the second direction being different than the first direction, and generating a third audio signal based on the received audio signals and the third direction. In this example, the generating the stereophonic signal (812) can include generating the stereophonic signal based on a weighted sum of the first audio signal, the second audio signal, and the third audio signal, the first audio signal receiving a greater weight relative to the second audio signal and the third audio signal. - According to an example, the
computing system 200 can include the video system configured to aim at the speaker in the first direction, the plurality microphones configured to receive the audio signals, and a local computing device configured to send video signals received by the video system to a remote computing device, determine the first direction, generate the first audio signal, determine the second direction, generate the second audio signal, generate the stereophonic signal, and send the stereophonic signal to the remote computing device. - According to an example, the
method 800 can further include at least two 130A, 130B that are remote from theelectronic speakers computing system 200 to output an outputted audio signal based on the stereophonic signal. -
FIG. 9 is a flowchart showing amethod 900 according to another example. Themethod 900 can be performed by thecomputing system 200. Themethod 900 can include theaim determiner 202 determining that a video system is aiming at a single speaker (902). Themethod 900 can also include thedirection determiner 204 determining a first direction of the single speaker from an array of microphones 110 (904). Themethod 900 can also include, based on determining that the video system is aiming at the single speaker and the first direction of the single speaker, thebeamformer 206 generating a first beamformed signal based on beamforming, in the first direction, multiple first direction audio signals received by the array of microphones 110 (906). Themethod 900 can also include thedirection determiner 204 determining a second direction of a noise source other than the single speaker (908). Themethod 900 can also include thebeamformer 206 generating a second beamformed signal based on beamforming, in the second direction, multiple second direction audio signals received by the array of microphones in the second direction (910). Themethod 900 can also include thesignal combiner 212 generating a monophonic signal based on the first beamformed signal and the second beamformed signal, the first beamformed signal having greater weight relative to the second beamformed signal (912). Themethod 900 can also include theaim determiner 202 determining that the video system is not aiming at the single speaker (914). Themethod 900 can also include thesignal combiner 212, based on determining that the video system is not aiming at the single speaker, generating a stereophonic signal, the stereophonic signal including the first beamformed signal and the second beamformed signal as distinct signals (916). - According to an example, the
method 900 can also include sending the monophonic signal to a videoconference system that is remote from the computing system, and sending the stereophonic signal to the videoconference system. - According to an example, the generating the first beamformed signal (906) can include modifying phases of audio signals received by the array of microphones, the modifications being based on differences in distances between microphones in the array of microphones and the single speaker.
-
FIG. 10 shows an example of ageneric computer device 1000 and a generic mobile computer device 1050, which may be used with the techniques described here.Computing device 1000 is intended to represent various forms of digital computers, such as laptops, desktops, tablets, workstations, personal digital assistants, televisions, servers, blade servers, mainframes, and other appropriate computing devices. Computing device 1050 is intended to represent various forms of mobile devices, such as personal digital assistants, cellular telephones, smartphones, and other similar computing devices. The components shown here, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed in this document. -
Computing device 1000 includes aprocessor 1002,memory 1004, astorage device 1006, a high-speed interface 1008 connecting tomemory 1004 and high-speed expansion ports 1010, and alow speed interface 1012 connecting tolow speed bus 1014 andstorage device 1006. Theprocessor 1002 can be a semiconductor-based processor. Thememory 1004 can be a semiconductor-based memory. Each of the 1002, 1004, 1006, 1008, 1010, and 1012, are interconnected using various busses, and may be mounted on a common motherboard or in other manners as appropriate. Thecomponents processor 1002 can process instructions for execution within thecomputing device 1000, including instructions stored in thememory 1004 or on thestorage device 1006 to display graphical information for a GUI on an external input/output device, such asdisplay 1016 coupled tohigh speed interface 1008. In other implementations, multiple processors and/or multiple buses may be used, as appropriate, along with multiple memories and types of memory. Also,multiple computing devices 1000 may be connected, with each device providing portions of the necessary operations (e.g., as a server bank, a group of blade servers, or a multi-processor system). - The
memory 1004 stores information within thecomputing device 1000. In one implementation, thememory 1004 is a volatile memory unit or units. In another implementation, thememory 1004 is a non-volatile memory unit or units. Thememory 1004 may also be another form of computer-readable medium, such as a magnetic or optical disk. - The
storage device 1006 is capable of providing mass storage for thecomputing device 1000. In one implementation, thestorage device 1006 may be or contain a computer-readable medium, such as a floppy disk device, a hard disk device, an optical disk device, or a tape device, a flash memory or other similar solid state memory device, or an array of devices, including devices in a storage area network or other configurations. A computer program product can be tangibly embodied in an information carrier. The computer program product may also contain instructions that, when executed, perform one or more methods, such as those described above. The information carrier is a computer- or machine-readable medium, such as thememory 1004, thestorage device 1006, or memory onprocessor 1002. - The
high speed controller 1008 manages bandwidth-intensive operations for thecomputing device 1000, while thelow speed controller 1012 manages lower bandwidth-intensive operations. Such allocation of functions is exemplary only. In one implementation, the high-speed controller 1008 is coupled tomemory 1004, display 1016 (e.g., through a graphics processor or accelerator), and to high-speed expansion ports 1010, which may accept various expansion cards (not shown). In the implementation, low-speed controller 1012 is coupled tostorage device 1006 and low-speed expansion port 1014. The low-speed expansion port, which may include various communication ports (e.g., USB, Bluetooth, Ethernet, wireless Ethernet) may be coupled to one or more input/output devices, such as a keyboard, a pointing device, a scanner, or a networking device such as a switch or router, e.g., through a network adapter. - The
computing device 1000 may be implemented in a number of different forms, as shown in the figure. For example, it may be implemented as astandard server 1020, or multiple times in a group of such servers. It may also be implemented as part of arack server system 1024. In addition, it may be implemented in a personal computer such as alaptop computer 1022. Alternatively, components fromcomputing device 1000 may be combined with other components in a mobile device (not shown), such as device 1050. Each of such devices may contain one or more ofcomputing device 1000, 1050, and an entire system may be made up ofmultiple computing devices 1000, 1050 communicating with each other. - Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, specially designed ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various implementations can include implementation in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, coupled to receive data and instructions from, and to transmit data and instructions to, a storage system, at least one input device, and at least one output device.
- These computer programs (also known as programs, software, software applications or code) include machine instructions for a programmable processor, and can be implemented in a high-level procedural and/or object-oriented programming language, and/or in assembly/machine language. As used herein, the terms “machine-readable medium” “computer-readable medium” refers to any computer program product, apparatus and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term “machine-readable signal” refers to any signal used to provide machine instructions and/or data to a programmable processor.
- To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user and a keyboard and a pointing device (e.g., a mouse or a trackball) by which the user can provide input to the computer. Other kinds of devices can be used to provide for interaction with a user as well; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user can be received in any form, including acoustic, speech, or tactile input.
- The systems and techniques described here can be implemented in a computing system that includes a back end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front end component (e.g., a client computer having a graphical user interface or a Web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back end, middleware, or front end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include a local area network (“LAN”), a wide area network (“WAN”), and the Internet.
- The computing system can include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
- A number of embodiments have been described. Nevertheless, it will be understood that various modifications may be made without departing from the spirit and scope of the invention.
- In addition, the logic flows depicted in the figures do not require the particular order shown, or sequential order, to achieve desirable results. In addition, other steps may be provided, or steps may be eliminated, from the described flows, and other components may be added to, or removed from, the described systems. Accordingly, other embodiments are within the scope of the following claims.
Claims (21)
Priority Applications (5)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US15/872,450 US10356362B1 (en) | 2018-01-16 | 2018-01-16 | Controlling focus of audio signals on speaker during videoconference |
| EP19703476.2A EP3741135A1 (en) | 2018-01-16 | 2019-01-14 | Controlling focus of audio signals on speaker during videoconference |
| PCT/US2019/013505 WO2019143565A1 (en) | 2018-01-16 | 2019-01-14 | Controlling focus of audio signals on speaker during videoconference |
| CN201980008718.6A CN111602414B (en) | 2018-01-16 | 2019-01-14 | Controlling audio signal focused speakers during video conferencing |
| US16/430,946 US10805575B2 (en) | 2018-01-16 | 2019-06-04 | Controlling focus of audio signals on speaker during videoconference |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US15/872,450 US10356362B1 (en) | 2018-01-16 | 2018-01-16 | Controlling focus of audio signals on speaker during videoconference |
Related Child Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US16/430,946 Continuation US10805575B2 (en) | 2018-01-16 | 2019-06-04 | Controlling focus of audio signals on speaker during videoconference |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| US10356362B1 US10356362B1 (en) | 2019-07-16 |
| US20190222804A1 true US20190222804A1 (en) | 2019-07-18 |
Family
ID=65279688
Family Applications (2)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US15/872,450 Active US10356362B1 (en) | 2018-01-16 | 2018-01-16 | Controlling focus of audio signals on speaker during videoconference |
| US16/430,946 Active US10805575B2 (en) | 2018-01-16 | 2019-06-04 | Controlling focus of audio signals on speaker during videoconference |
Family Applications After (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US16/430,946 Active US10805575B2 (en) | 2018-01-16 | 2019-06-04 | Controlling focus of audio signals on speaker during videoconference |
Country Status (4)
| Country | Link |
|---|---|
| US (2) | US10356362B1 (en) |
| EP (1) | EP3741135A1 (en) |
| CN (1) | CN111602414B (en) |
| WO (1) | WO2019143565A1 (en) |
Cited By (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US10984789B2 (en) * | 2017-10-05 | 2021-04-20 | Harman Becker Automotive Systems Gmbh | Apparatus and method using multiple voice command devices |
| EP4084003A1 (en) * | 2021-04-28 | 2022-11-02 | Mitel Networks Corporation | Adaptive noise cancelling for conferencing communication systems |
| US20240098441A1 (en) * | 2021-01-15 | 2024-03-21 | Harman International Industries, Incorporated | Low frequency automatically calibrating sound system |
| WO2024232229A1 (en) * | 2023-05-10 | 2024-11-14 | ソニーグループ株式会社 | Information processing device and information processing method |
Families Citing this family (20)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US11227588B2 (en) * | 2018-12-07 | 2022-01-18 | Nuance Communications, Inc. | System and method for feature based beam steering |
| CN110234043B (en) * | 2019-05-31 | 2020-08-25 | 歌尔科技有限公司 | Sound signal processing method, device and equipment based on microphone array |
| KR20210058152A (en) * | 2019-11-13 | 2021-05-24 | 엘지전자 주식회사 | Control Method of Intelligent security devices |
| US11923997B2 (en) * | 2020-06-18 | 2024-03-05 | Latesco Lp | Methods and systems for session management in digital telepresence systems using machine learning |
| US11289089B1 (en) * | 2020-06-23 | 2022-03-29 | Amazon Technologies, Inc. | Audio based projector control |
| CN111856402B (en) * | 2020-07-23 | 2023-08-18 | 海尔优家智能科技(北京)有限公司 | Signal processing method and device, storage medium, electronic device |
| US11714595B1 (en) * | 2020-08-07 | 2023-08-01 | mmhmm inc. | Adaptive audio for immersive individual conference spaces |
| CN112466327B (en) * | 2020-10-23 | 2022-02-22 | 北京百度网讯科技有限公司 | Voice processing method and device and electronic equipment |
| US11581004B2 (en) * | 2020-12-02 | 2023-02-14 | HearUnow, Inc. | Dynamic voice accentuation and reinforcement |
| CN112887557B (en) * | 2021-01-22 | 2022-11-11 | 维沃移动通信有限公司 | Focus tracking method and device and electronic equipment |
| CN113707165B (en) * | 2021-09-07 | 2024-09-17 | 联想(北京)有限公司 | Audio processing method and device, electronic equipment and storage medium |
| US11889188B1 (en) * | 2022-08-25 | 2024-01-30 | Benjamin Slotznick | Computer program product and method for auto-focusing one or more cameras on selected persons in a venue who are performers of a performance occurring at the venue |
| US11902659B1 (en) | 2022-08-25 | 2024-02-13 | Benjamin Slotznick | Computer program product and method for auto-focusing a lighting fixture on a person in a venue who is wearing, or carrying, or holding, or speaking into a microphone at the venue |
| US11877058B1 (en) * | 2022-08-25 | 2024-01-16 | Benjamin Slotznick | Computer program product and automated method for auto-focusing a camera on a person in a venue who is wearing, or carrying, or holding, or speaking into a microphone at the venue |
| US11889187B1 (en) * | 2022-08-25 | 2024-01-30 | Benjamin Slotznick | Computer program product and method for auto-focusing one or more lighting fixtures on selected persons in a venue who are performers of a performance occurring at the venue |
| US11601731B1 (en) | 2022-08-25 | 2023-03-07 | Benjamin Slotznick | Computer program product and method for auto-focusing a camera on an in-person attendee who is speaking into a microphone at a hybrid meeting that is being streamed via a videoconferencing system to remote attendees |
| CN115662385A (en) * | 2022-09-09 | 2023-01-31 | 北京中联合超高清协同技术中心有限公司 | Audio processing method and device for curling game and electronic equipment |
| CN116055985B (en) * | 2023-02-06 | 2025-11-25 | 苏州科达科技股份有限公司 | Methods and related equipment for converting mono to multichannel audio based on audio processing. |
| EP4443901A1 (en) * | 2023-04-06 | 2024-10-09 | Koninklijke Philips N.V. | Generation of an audio stereo signal |
| EP4462769A1 (en) * | 2023-05-08 | 2024-11-13 | Koninklijke Philips N.V. | Generation of an audiovisual signal |
Family Cites Families (24)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US7852369B2 (en) * | 2002-06-27 | 2010-12-14 | Microsoft Corp. | Integrated design for omni-directional camera and microphone array |
| US7190775B2 (en) * | 2003-10-29 | 2007-03-13 | Broadcom Corporation | High quality audio conferencing with adaptive beamforming |
| US7667728B2 (en) * | 2004-10-15 | 2010-02-23 | Lifesize Communications, Inc. | Video and audio conferencing system with spatial audio |
| JP2007078545A (en) * | 2005-09-15 | 2007-03-29 | Yamaha Corp | Object detection system and voice conference system |
| EP1983799B1 (en) * | 2007-04-17 | 2010-07-07 | Harman Becker Automotive Systems GmbH | Acoustic localization of a speaker |
| JP2009156888A (en) | 2007-12-25 | 2009-07-16 | Sanyo Electric Co Ltd | Speech corrector and imaging apparatus equipped with the same, and sound correcting method |
| EP2058797B1 (en) * | 2007-11-12 | 2011-05-04 | Harman Becker Automotive Systems GmbH | Discrimination between foreground speech and background noise |
| US8503653B2 (en) * | 2008-03-03 | 2013-08-06 | Alcatel Lucent | Method and apparatus for active speaker selection using microphone arrays and speaker recognition |
| US8189807B2 (en) * | 2008-06-27 | 2012-05-29 | Microsoft Corporation | Satellite microphone array for video conferencing |
| US8358328B2 (en) * | 2008-11-20 | 2013-01-22 | Cisco Technology, Inc. | Multiple video camera processing for teleconferencing |
| EP2211564B1 (en) * | 2009-01-23 | 2014-09-10 | Harman Becker Automotive Systems GmbH | Passenger compartment communication system |
| US9888335B2 (en) * | 2009-06-23 | 2018-02-06 | Nokia Technologies Oy | Method and apparatus for processing audio signals |
| CN102763432B (en) * | 2010-02-17 | 2015-06-24 | 诺基亚公司 | Processing of multi-device audio capture |
| US20120016606A1 (en) * | 2010-02-25 | 2012-01-19 | Emmanuel Petit | Power Profiling for Embedded System Design |
| US8587631B2 (en) * | 2010-06-29 | 2013-11-19 | Alcatel Lucent | Facilitating communications using a portable communication device and directed sound output |
| GB2496660B (en) * | 2011-11-18 | 2014-06-04 | Skype | Processing audio signals |
| US20150146078A1 (en) * | 2013-11-27 | 2015-05-28 | Cisco Technology, Inc. | Shift camera focus based on speaker position |
| WO2016034454A1 (en) | 2014-09-05 | 2016-03-10 | Thomson Licensing | Method and apparatus for enhancing sound sources |
| EP3411873B1 (en) * | 2016-02-04 | 2022-07-13 | Magic Leap, Inc. | Technique for directing audio in augmented reality system |
| CN105812717A (en) * | 2016-04-21 | 2016-07-27 | 邦彦技术股份有限公司 | Multimedia conference control method and server |
| CN105915798A (en) * | 2016-06-02 | 2016-08-31 | 北京小米移动软件有限公司 | Camera control method in video conference and control device thereof |
| US10587978B2 (en) * | 2016-06-03 | 2020-03-10 | Nureva, Inc. | Method, apparatus and computer-readable media for virtual positioning of a remote participant in a sound space |
| US10219095B2 (en) * | 2017-05-24 | 2019-02-26 | Glen A. Norris | User experience localizing binaural sound during a telephone call |
| EP3477964B1 (en) * | 2017-10-27 | 2021-03-24 | Oticon A/s | A hearing system configured to localize a target sound source |
-
2018
- 2018-01-16 US US15/872,450 patent/US10356362B1/en active Active
-
2019
- 2019-01-14 WO PCT/US2019/013505 patent/WO2019143565A1/en not_active Ceased
- 2019-01-14 CN CN201980008718.6A patent/CN111602414B/en active Active
- 2019-01-14 EP EP19703476.2A patent/EP3741135A1/en not_active Withdrawn
- 2019-06-04 US US16/430,946 patent/US10805575B2/en active Active
Cited By (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US10984789B2 (en) * | 2017-10-05 | 2021-04-20 | Harman Becker Automotive Systems Gmbh | Apparatus and method using multiple voice command devices |
| US20240098441A1 (en) * | 2021-01-15 | 2024-03-21 | Harman International Industries, Incorporated | Low frequency automatically calibrating sound system |
| EP4084003A1 (en) * | 2021-04-28 | 2022-11-02 | Mitel Networks Corporation | Adaptive noise cancelling for conferencing communication systems |
| US11657829B2 (en) | 2021-04-28 | 2023-05-23 | Mitel Networks Corporation | Adaptive noise cancelling for conferencing communication systems |
| WO2024232229A1 (en) * | 2023-05-10 | 2024-11-14 | ソニーグループ株式会社 | Information processing device and information processing method |
Also Published As
| Publication number | Publication date |
|---|---|
| US10805575B2 (en) | 2020-10-13 |
| US20190289259A1 (en) | 2019-09-19 |
| CN111602414A (en) | 2020-08-28 |
| CN111602414B (en) | 2023-03-10 |
| WO2019143565A1 (en) | 2019-07-25 |
| US10356362B1 (en) | 2019-07-16 |
| EP3741135A1 (en) | 2020-11-25 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US10805575B2 (en) | Controlling focus of audio signals on speaker during videoconference | |
| US11991315B2 (en) | Audio conferencing using a distributed array of smartphones | |
| US10917612B2 (en) | Multiple simultaneous framing alternatives using speaker tracking | |
| US8848028B2 (en) | Audio cues for multi-party videoconferencing on an information handling system | |
| US11082662B2 (en) | Enhanced audiovisual multiuser communication | |
| US8441515B2 (en) | Method and apparatus for minimizing acoustic echo in video conferencing | |
| EP2352290B1 (en) | Method and apparatus for matching audio and video signals during a videoconference | |
| US20230021918A1 (en) | Systems, devices, and methods of manipulating audio data based on microphone orientation | |
| US8411126B2 (en) | Methods and systems for close proximity spatial audio rendering | |
| US20210382672A1 (en) | Systems, devices, and methods of manipulating audio data based on display orientation | |
| US20240340605A1 (en) | Information processing device and method, and program | |
| US12481479B2 (en) | Identifying co-located devices within a teleconferencing session | |
| US20250039008A1 (en) | Conferencing session facilitation systems and methods using virtual assistant systems and artificial intelligence algorithms | |
| US12133064B2 (en) | Video and audio splitting that simulates in-person conversations during remote conferencing | |
| WO2017211447A1 (en) | Method for reproducing sound signals at a first location for a first participant within a conference with at least two further participants at at least one further location | |
| JP2023043497A (en) | remote conference system | |
| US20250016452A1 (en) | Video fencing system and method | |
| CN121548988A (en) | Video Fence Systems and Methods |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
| AS | Assignment |
Owner name: GOOGLE LLC, CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:RUDBERG, TORE;SCHULDT, CHRISTIAN;SIGNING DATES FROM 20180112 TO 20180114;REEL/FRAME:044661/0175 |
|
| STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
| MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 4 |