DE102013211283B4

DE102013211283B4 - Playback of audio data using distributed electroacoustic transducers in networked mobile devices

Info

Publication number: DE102013211283B4
Application number: DE102013211283.1A
Authority: DE
Inventors: Karim Helwani; Herbert Buchner
Original assignee: Technische Universitaet Berlin; Deutsche Telekom AG
Current assignee: Deutsche Telekom AG
Priority date: 2013-06-17
Filing date: 2013-06-17
Publication date: 2018-01-11
Anticipated expiration: 2033-06-18
Also published as: DE102013211283A1

Abstract

Verfahren zur Wiedergabe eines von einer virtuellen Quelle ausgehenden Schallfelds unter Verwendung mehrerer mobiler Endgeräte mit jeweils mindestens einem elektroakustischen Wandler in einem lokalen Raum, mit den Schritten Verbinden der Endgeräte durch ein Datennetz, Ermitteln der jeweiligen Positionen der Endgeräte in dem lokalen Raum unter Verwendung eines videobasierten Ortungsverfahrens, Bestimmen von Ansteuerungsfunktionen für die elektroakustischen Wandler zur Wiedergabe des Schallfelds auf der Basis eines über das Datennetz gesendeten Signals in Abhängigkeit der Position der virtuellen Quelle und der ermittelten Positionen der Endgeräte in dem lokalen Raum und Wiedergeben des Schallfelds durch die elektroakustischen Wandler in den Endgeräten.A method of reproducing a sound field emanating from a virtual source using a plurality of mobile terminals each having at least one electroacoustic transducer in a local space, comprising the steps of connecting the terminals through a data network, determining the respective positions of the terminals in the local space using a video-based one A locating method, determining driving functions for the electroacoustic transducers to reproduce the sound field based on a signal transmitted over the data network in dependence on the position of the virtual source and the detected positions of the terminals in the local space and reproducing the sound field by the electroacoustic transducers in the terminals ,

Description

Die vorliegende Erfindung betrifft die Synthese eines Schallfeldes für die räumliche Audiowiedergabe mittels mehrerer vernetzter Endgeräte, die mit Lautsprechern und vorzugsweise auch Mikrofonen ausgestattet sind, und insbesondere deren Einsatz in der räumlichen Vollduplex-Freisprechkommunikation.The present invention relates to the synthesis of a sound field for spatial audio reproduction by means of several networked terminals, which are equipped with speakers and preferably microphones, and in particular their use in the full-duplex, hands-free communication.

Allgemein werden bei mehrkanaliger Audiowiedergabe die Lautsprecher derart angesteuert, dass ein räumlicher Höreindruck in einem vordefinierten Bereich erzeugt wird. Hierbei sind zahlreiche Verfahren zur Audiowiedergabe oder zur physikalischen Synthese eines Schallfeldes bekannt. Beispielhaft sollen hier die Stereophonie, die Wellenfeldsynthese (siehe z. B. A. J. Berkhout, D. de Vries, und P. Vogel. Acoustic control by wave field synthesis. Journal of the Acoustical Society of America, Band 93(5): 2764–2778, Mai 1993) oder Higher-order-Ambisonics (J. Daniel, Representation de champs acoustiques, application à la transmission et à la reproduction de scènes sonores complexes dans un contexte multimedia, PhD thesis, Université Paris 6, 2001) genannt werden. Diese Verfahren der mehrkanaligen Wiedergabeverfahren gehen von festen vordefinierten Lautsprecherpositionen aus.In general, in multichannel audio playback, the loudspeakers are controlled such that a spatial hearing impression is generated in a predefined range. Here, numerous methods for audio playback or for the physical synthesis of a sound field are known. By way of example, stereophony, wavefield synthesis (see, for example, BAJ Berkhout, D. de Vries, and P. Vogel, Acoustic Control by wave field synthesis, Journal of the Acoustical Society of America, vol. 93 (5): 2764-2778), May 1993) or Higher-order-Ambisonics (J. Daniel, Representation of champs acoustics, application à la transmission de la reproduction de scènes sonores complexes dans un contexte multimedia, PhD thesis, Université Paris 6, 2001). These methods of multi-channel playback methods assume fixed predefined speaker positions.

Verfahren zur Echounterdrückung bzw. Echokompensation und zur Signalverbesserung insbesondere in der Vollduplexkommunikation unter Verwendung von Freisprecheinrichtungen sind beispielsweise in E. Hänsler, G. Schmidt, Topics in acoustic echo and noise control: selected methods for the cancellation of acoustical echoes, the reduction of background noise, and speech processing, Springer-Verlag, Berlin 2006 beschrieben.Methods for echo cancellation and for signal enhancement, in particular in full-duplex communication using hands-free equipment, are described, for example, in E. Hänsler, G. Schmidt, Topics in acoustic echo and noise control, the reduction of background noise , and speech processing, Springer-Verlag, Berlin 2006 described.

Die US 2009/0264114 A1 betrifft die Verwendung von Ortsinformationen zur Audiosignalverstärkung in einem mehrfach verteilten Netzwerk.The US 2009/0264114 A1 relates to the use of location information for audio signal amplification in a multiply-distributed network.

Die US 2009/0116652 A1 betrifft das räumliche Manipulieren von Ton, der an einen Zuhörer über einen Satz von Ausgangswandlern, z. B. Kopfhörern, wiedergegeben wird.The US 2009/0116652 A1 relates to the spatial manipulation of sound transmitted to a listener via a set of output transducers, e.g. As headphones, is played.

Die US 2008/0160976 A1 beschreibt die Konfiguration von Telekonferenzen auf Basis von Proximitätsinformationen.The US 2008/0160976 A1 describes configuring teleconferencing based on proximity information.

Die US 2012/0129543 A1 beschreibt die selektive Formatierung von Medien in einer Gruppenkommunikationssitzung.The US 2012/0129543 A1 describes how to selectively format media in a group communication session.

US 6 850 496 B1 beschreibt einen virtuellen Konferenzraum für Sprachkonferenz. US Pat. No. 6,850,496 B1 describes a virtual conference room for voice conference.

US 6 408 327 B1 betrifft synthetische Stereo-Audio-Konferenzen mit mehreren Nutzern über LAN oder WAN. US Pat. No. 6,408,327 B1 concerns synthetic stereo audio conferencing with multiple users over LAN or WAN.

Mit der vorliegenden Erfindung werden ein verbessertes Verfahren und eine Vorrichtung zur Wiedergabe eines Schallfelds bereitgestellt, die vorteilhaft insbesondere in der Vollduplexkommunikation eingesetzt werden können. Gemäß der Erfindung werden zur Schallfeldsynthese mehrere mobile, miteinander vernetzte Endgeräte, insbesondere Smartphones, verwendet.The present invention provides an improved method and a device for reproducing a sound field, which can advantageously be used in particular in full-duplex communication. According to the invention, a plurality of mobile, networked terminals, in particular smartphones, are used for sound field synthesis.

Die vorliegende Erfindung wird durch die unabhängigen Ansprüche definiert. Die abhängigen Ansprüche definieren Ausführungsformen der Erfindung.The present invention is defined by the independent claims. The dependent claims define embodiments of the invention.

Mit der vorliegenden Erfindung wird ein Verfahren zur Wiedergabe eines von einer virtuellen Quelle ausgehenden Schallfelds unter Verwendung mehrerer mobiler Endgeräte mit jeweils einem elektroakustischen Wandler in einem lokalen Raum bereitgestellt, wobei die Endgeräte durch ein Datennetz verbunden werden. Zunächst werden die jeweiligen Positionen der Endgeräte in dem lokalen Raum bestimmt. Auf der Basis eines über das Datennetz gesendeten Signals werden Ansteuerungsfunktionen für die elektroakustischen Wandler zur Wiedergabe des Schallfelds in Abhängigkeit der Position der virtuellen Quelle und der ermittelten Positionen der Endgeräte in dem lokalen Raum bestimmt. Unter Verwendung der so ermittelten Ansteuerungsfunktionen wird das Schallfelds durch die elektroakustischen Wandler in den Endgeräten wiedergegeben.The present invention provides a method of reproducing a sound field emanating from a virtual source using a plurality of mobile terminals each having an electroacoustic transducer in a local space, the terminals being connected by a data network. First, the respective positions of the terminals in the local space are determined. On the basis of a signal transmitted over the data network, driving functions for the electroacoustic transducers for reproducing the sound field are determined as a function of the position of the virtual source and the determined positions of the terminals in the local space. Using the thus determined driving functions, the sound field is reproduced by the electro-acoustic transducers in the terminals.

Zur Bestimmung der Position der Endgeräte wird ein videobasiertes Ortungsverfahren verwendet. Zusätzlich oder alternativ kann die Position der Endgeräte auch durch manuelle Eingabe durch den Benutzer bestimmt werden.To determine the position of the terminals, a video-based location method is used. Additionally or alternatively, the position of the terminals may also be determined by manual input by the user.

Die Endgeräte können durch ein sternförmiges Netzwerk mit einem externen oder einem lokalen Server, oder durch ein vollvermaschtes Netzwerk untereinander verbunden sein. Vorzugsweise ist das Netzwerk ein Funknetzwerk, insbesondere ein Mobilfunknetz wie GSM, GPRS, UMTS oder LTE, WLAN oder Bluetooth.The terminals may be interconnected by a star-shaped network with an external or a local server, or by a fully meshed network. The network is preferably a radio network, in particular a mobile radio network such as GSM, GPRS, UMTS or LTE, WLAN or Bluetooth.

Zur Kompensierung der durch die Übertragung des Signals verursachte Verzögerung wird vorzugsweise die Wiedergabe durch die einzelnen Endgeräte synchronisiert.To compensate for the delay caused by the transmission of the signal, playback is preferably synchronized by the individual terminals.

Der Inhalt des wiederzugebenden Schallfeld kann durch eines der Endgeräte ausgewählt werden. Dieser Inhalt kann insbesondere durch das über das Datennetz gesendete Signal dargestellt sein. Dies ist insbesondere der Fall, wenn der Inhalt des wiederzugebenden Schallfelds der Inhalt eines Audiosignals aus einem fernen Raum ist, beispielsweise des Audiosignals bei einem Telefonat (z. B. mittels GSM oder VoIP). Andererseits kann auch ein Inhalt wiedergegeben werden, der in allen Endgeräten vorhanden ist, beispielsweise in Form einer gespeicherten Audiodatei. Dann kann ein Zeitzeiger auf die Audiodatei über das Datennetz gesendet werden.The content of the sound field to be reproduced can be selected by one of the terminals. This content can be represented in particular by the signal transmitted via the data network. This is particularly the case when the content of the sound field to be reproduced is the content of an audio signal from a distant room, for example the audio signal during a telephone call (eg by means of GSM or VoIP). On the other hand, a content that is present in all terminals, for example in the form of a stored audio file, can also be reproduced. Then a time hand can be sent to the audio file over the data network.

Für das erfindungsgemäße Verfahren zur Vollduplex-Kommunikation zwischen einem fernen Raum und einem lokalen Raum wird zur Wiedergabe des Audiosignals in Form eines von einer virtuellen Quelle ausgehenden Schallfelds das oben beschriebene Verfahren verwendet. Weiterhin wird durch die mehreren mobiler Endgeräte das aus dem lokalen Raum in den fernen Raum zu übertragende akustischen Signals aufgenommen. Vorzugsweise weisen der lokale Server, der externe Server und/oder die einzelnen mobilen Endgeräte eine Vorrichtung zur Echokompensation auf.For the inventive method for full-duplex communication between a remote room and a local room, the method described above is used to reproduce the audio signal in the form of a sound field emanating from a virtual source. Furthermore, the plurality of mobile terminals record the acoustic signal to be transmitted from the local room to the distant room. The local server, the external server and / or the individual mobile terminals preferably have a device for echo compensation.

Durch die Erfindung wird weiterhin ein System zum Durchführen des erfindungsgemäßen Verfahrens bereit gestellt, wobei das System mehrere mobile Endgeräte, beispielsweise Smartphones, mit jeweils einem elektroakustischen Wandler aufweist, die über ein Datennetz verbunden sind.The invention further provides a system for carrying out the method according to the invention, wherein the system has a plurality of mobile terminals, for example smartphones, each having an electro-acoustic converter, which are connected via a data network.

Die Erfindung wird im Folgenden anhand von Ausführungsbeispielen unter Verweis auf die beigefügten Figuren näher beschrieben.The invention will be described in more detail below with reference to embodiments with reference to the accompanying figures.

1 zeigt verschiedene, gemäß Ausführungsformen der vorliegenden Erfindung verwendete Netzwerktopologien, mit denen die mobilen Endgeräte verbunden sein können, nämlich (a) ein sternförmiges Netzwerk mit einem externen Server, (b) ein sternförmiges Netzwerk mit einem lokalen Server und (c) ein vollvermaschtes Netzwerk. 1 shows various network topologies used in accordance with embodiments of the present invention to which the mobile terminals may be connected, namely (a) a star-shaped network with an external server, (b) a star-shaped network with a local server, and (c) a fully meshed network.

2 zeigt ein Beispiel einer Benutzerschnittstelle auf einem mobilen Endgerät bei einer aktiven Sitzung unter Verwendung eines Verfahrens gemäß einer Ausführungsform der Erfindung. 2 shows an example of a user interface on a mobile terminal in an active session using a method according to an embodiment of the invention.

Die Audiowiedergabe gemäß einer Ausführungsform der vorliegenden Erfindung wird im Folgenden ausführlich anhand des Ablaufs einer typischen Sitzung beschrieben, wobei eine Sitzung beispielsweise eine Telefonkonferenz oder die Wiedergabe einer vorbestimmten, über das Netzwerk übertragenen oder auf allen verwendeten Endgeräten (Smartphones) gespeicherten Audiodatei sein kann.The audio playback according to an embodiment of the present invention will be described below in detail with reference to the flow of a typical session, where a session may be, for example, a telephone conference or the playback of a predetermined audio file transmitted over the network or stored on all the terminals (smartphones) used.

Beim Starten einer solchen Sitzung geht das Verfahren prinzipiell von zwei Netzwerktopologien aus, wobei sich eine Topologie je nach Serverort in zwei Arten unterteilen lässt:

1A. Ein sternförmiges Netzwerk, worin alle lokalen Teilnehmer über Ihre Smartphones über einen zentralen Hauptknoten verbunden sind. Der Verbindungsaufbau zum Server erfolgt über Einwählen in einen zentralen Dienst (siehe 1a).
1B. Ein sternförmiges Netzwerk, worin alle lokalen Teilnehmer über Ihre Smartphones über einen lokalen Hauptknoten verbunden sind. Hier dient beispielsweise eines der Smartphone als Hotspot (siehe 1b).
2. Ein vollvermaschtes Netzwerk, in dem alle lokalen Teilnehmer miteinander über ein lokales bidirektionales Netzwerk, beispielsweise über ein Bluetooth-Netzwerk verbunden sind (siehe 1c).

When you start such a session, the process basically assumes two network topologies, with one topology divided into two types depending on the server location:

1A. A star-shaped network where all local subscribers are connected via their smartphones via a central hub. The connection to the server is made by dialing into a central service (see 1a ).
1B. A star-shaped network where all local subscribers are connected via their smartphones via a local master node. For example, one of the smartphones serves as a hotspot here (see 1b ).
2. A fully meshed network in which all local participants are connected to each other via a local bi-directional network, for example via a Bluetooth network (see 1c ).

Eine Sitzung kann von jedem Endgerät aus gestartet werden. Zur Anmeldung an einer Sitzung wird jedem Teilnehmer eine erkennbare Identifikationsnummer (ID) zugeordnet. Nach der Anmeldung übermitteln die Teilnehmer ihre aktuellen Positionen, die sie über übliche Lokalisierungsperipherie (z. B. GPS) berechnen können. Die Positionierung kann auch manuell erfolgen bzw. korrigiert werden, indem die Nutzer sequentiell ihre relativen Positionen annähernd eingeben. Zur manuellen Positionierung kann beispielsweise den Nutzern auf dem Display eine leere Karte von benachbarten Zellen gezeigt werden, worin die Nutzer ihre Zelle wählen, in der sie sich befinden.A session can be started from any terminal. To register at a meeting, each participant is assigned a recognizable identification number (ID). After logging in, the participants submit their current positions, which they can calculate using common localization peripherals (eg GPS). The positioning can also be done manually or corrected by the users sequentially entering their relative positions. For manual positioning, for example, the users on the display may be shown a blank map of neighboring cells in which the users select their cell in which they are located.

Eine andere Art der Lokalisierung kann erfolgen, indem alle Nutzer ihre Smartphones mit dem Display nach oben auf einer Oberfläche (beispielweise einem Besprechungsraumtisch) legen bis auf einen Nutzer, der dann die Kalibrierungsfunktion aufruft. In der Kalibrierungsfunktion werden auf den Displays der Smartphones der Teilnehmer verschiedene Muster und/oder Farben ausgegeben, die von der Kamera des Nutzers, der die Kalibrierungsfunktion gestartet hat, erfasst und ausgewertet werden. Mittels Standardverfahren der Mustererkennung werden somit die Positionen der teilnehmenden Smartphones ermittelt und zentral an alle Teilnehmer gesendet.Another type of localization can be done by all users placing their smartphones with the display facing up on a surface (for example, a meeting room table), except for a user who then calls the calibration function. In the calibration function, the subscribers' displays display various patterns and / or colors that are captured and evaluated by the user's camera that started the calibration function. By means of standard pattern recognition methods, the positions of the participating smartphones are thus determined and sent centrally to all subscribers.

Neben dieser Möglichkeit der videobasierten Lokalisierung kann auch eine auf Audiodaten basierte Lokalisierung durchgeführt werden. Dazu werden die Endgeräte der Teilnehmer aufgefordert, vordefinierte unterschiedliche Tonsequenzen abzuspielen. Diese werden mit den Mikrofonen des Endgeräts, worauf die Kalibrierfunktion durchgeführt wird, aufgenommen und mittels Standardverfahren der Audiolokalisierung lokal oder auf einem externen Server verarbeitet. Ein hierfür geeignetes Verfahren ist beispielsweise in H. Buchner, R. Aichner, und W. Kellermann, „TRINICON-based Blind System Identification with Application to Multiple-Source Localization and Separation”, in Blind Speech Separation, S. Makino, H. Sawada, und T.-W. Lee, Hrsg. Springer Netherlands, 2007, S. 101–147 beschrieben.In addition to this possibility of video-based localization, localization based on audio data can also be performed. For this purpose, the participants' devices are requested to play predefined different sound sequences. These are recorded with the microphones of the terminal on which the calibration function is performed and processed by means of standard methods of audio localization locally or on an external server. A suitable method for this purpose is described, for example, in H. Buchner, R. Aichner, and W. Kellermann, "TRINICON-based Blind System Identification with Application to Multiple-Source Localization and Separation", in Blind Speech Separation, S. Makino, H. Sawada, and T.-W. Lee, ed. Springer Netherlands, 2007, pp. 101-147.

Zur Verbesserung der Positionsbestimmung kann auch eine Kombination der vorgestellten Lokalisierungsverfahren angewendet werden.To improve the position determination, a combination of the presented localization methods can also be used.

Während einer Sitzung wird eine aktive Karte mit den Teilnehmern angezeigt, wie es schematisch in 2 gezeigt ist. Die Benutzerschnittstelle soll bevorzugt über eine Funktion verfügen, mit deren Hilfe virtuelle Quellen hinzugefügt werden können. Eine solche virtuelle Quelle kann entweder eine Rufnummer sein, die man wählt, um eine Telefonkonferenz zu beginnen. Es kann sich aber auch um ein aufgenommenes Audiosignal (z. B. eine .wav,.mp3 Datei, etc.) handeln, welches entweder auf allen teilnehmenden Endgeräten gespeichert ist oder über das Funknetzwerk übertragen wird.During a session, an active map is displayed with the participants, as shown schematically in 2 is shown. The user interface should preferably have a feature that allows virtual sources to be added. Such a virtual source can either be a telephone number dialed to start a telephone conference. However, it can also be a recorded audio signal (eg a .wav, .mp3 file, etc.), which is either stored on all participating terminals or transmitted via the radio network.

Zur Wiedergabe durch Synthese des gewünschten Schallfeldes werden folgende Schritte durchgeführt:

– Bestimmung der Ansteuerungsfunktion der Lautsprecher in Abhängigkeit von der gewünschten Position der virtuellen Quelle und der ermittelten geometrischen Anordnung. Es wird vorzugsweise zu jeder Quelle ein Satz von Lautsprechen (Smartphones) selektiert, die bei der Wiedergabe dieser virtuellen Quelle aktiv werden sollen.
– Für die oben genannten zwei Fälle der Wiedergabe ist Folgendes zu beachten:
– Inhalt nur an einem Gerät gespeichert: In diesem Fall wird jedem Teilnehmer, der entsprechend der Selektion bei der Wiedergabe einer bestimmten Quelle aktiv sein soll, der Inhalt zugesandt.
– Alle Teilnehmer verfügen über den gesamten (musikalischen) wiederzugebenden Inhalt: Hier wird lediglich der Zeitzeiger auf die aktive Datei ermittelt.
– Synchronisierung der Endgeräte. Es muss sichergestellt werden, dass die durch die Funkübertragung verursachte Verzögerung der Daten kompensiert wird.
– Schnelle Faltung des wiederzugebenden Inhalts mit den eigenen Ansteuerungsfunktionen.

To reproduce by synthesizing the desired sound field, the following steps are performed:

- Determining the driving function of the speakers depending on the desired position of the virtual source and the determined geometric arrangement. Preferably, a set of voices (smartphones) are selected for each source to be active when playing this virtual source.
- Please note the following for the above two cases of playback:
- Contents stored on one device only: In this case, each participant who is to be active according to the selection when playing a particular source, the content is sent.
- All participants have the entire (musical) content to be played back: here only the time hand is determined on the active file.
- Synchronization of the terminals. It must be ensured that the data delay caused by the radio transmission is compensated.
- Fast convolution of the content to be played with its own control functions.

Für eine Konferenzschaltung zur Vollduplexkommunikation wird im Folgenden zwischen der Wiedergabe- und Aufnahmetechnik unterschieden:
Für die Wiedergabe wird – wieder in Abhängigkeit von der Netzwerktopologie – zwischen drei Fällen unterschieden:

1A. Die lokalen Teilnehmer können einen Teilnehmer in einem fernen Ende über den externen Hauptknoten anwählen. In diesem Fall wird vorzugsweise auf dem Server ein Echokompensator (engl. Acoustic Echo Canceler, AEC – siehe beispielsweise E. Hänsler, G. Schmidt, Topics in acoustic echo and noise control: selected methods for the cancellation of acoustical echoes, the reduction of background noise, and speech processing, Springer-Verlag, Berlin 2006 oder H. Buchner, J. Benesty, und W. Kellermann, Generalized multichannel frequency-domain adaptive filtering: efficient realization and application to hands-free speech communication, Signal Processing, Bd. 85, Nr. 3, S. 549–570, 2005) oder Echounterdrücker (engl. Acoustic Echo Suppressor, AES – siehe beispielsweise C. Faller and C. Tournery, Stereo acoustic echo control using a simplified echo path model, in Proc. IWAENC, 2006 oder EP 1 715 669 A1 ) implementiert, so dass der Server nur echofreies Signal vom nahen ins fernen Ende übermittelt.
1B. Nur der Teilnehmer, dessen Smartphone oder Endgerät als Server dient, kann einen Teilnehmer im fernen Ende anrufen. Dann wird der Echounterdrücker (bzw. -kompensator) vorzugsweise auf diesem Gerät implementiert sein.
2. Jeder Teilnehmer kann selbst über den eigenen Mobilfunkanbieter einen Teilnehmer in einem fernen Ende anrufen und das Sprachsignal dieses im nahen Ende je nach gewünschter Position der virtuellen Quelle über dem lokalen Netzwerk weiterreichen (ausstrahlen). In diesem Szenario sollte der Echounterdrücker (bzw. -kompensator) auf jedem der beteiligten Endgeräte implementiert sein.

For a conference circuit for full-duplex communication, a distinction is made below between the playback and recording techniques:
Again, depending on the network topology, a distinction is made between three cases:

1A. The local subscribers can dial a subscriber in a far end via the external main node. In this case, preferably an echo canceller (Acoustic Echo Canceler, AEC - see, for example, E. Hänsler, G. Schmidt, Topics in acoustic echo and noise control: selected methods for the cancellation of acoustical echoes, the reduction of background Noise and speech processing, Springer-Verlag, Berlin 2006 or H. Buchner, J. Benesty, and W. Kellermann, Generalized multichannel frequency-domain adaptive filtering: efficient realization and application to hands-free speech communication, Signal Processing, Vol. 85, No. 3, pp. 549-570, 2005) or acoustic echo suppressors, AES - see, for example, C. Faller and C. Tournery, Stereo acoustic echo control using a simplified echo path model, in Proc. IWAENC , 2006 or EP 1 715 669 A1 ) so that the server transmits only echo-free signal from the near to the far end.
1B. Only the subscriber whose smartphone or terminal serves as a server can call a subscriber in the far end. Then, the echo canceller (or compensator) will preferably be implemented on this device.
2. Each subscriber can even call a subscriber in a far end via their own mobile service provider and pass on the voice signal in the near end depending on the desired position of the virtual source on the local network (broadcast). In this scenario, the echo suppressor (or compensator) should be implemented on each of the participating terminals.

Genau wie auf der Wiedergabeseite stehen auch auf der Aufnahmeseite mehrere elektroakustische Wandler aufgrund der im lokalen Raum verteilten Endgeräte zur Verfügung. Grundsätzlich kann die mehrkanalige Aufnahme ebenfalls für eine räumliche Verarbeitung genutzt werden, insbesondere für die Unterdrückung von Störsignalen. Bei der Aufnahme sieht die Erfindung deshalb die Anwendung von bereits bekannten mehrkanaligen adaptiven Verfahren zur blinden Quellentrennung (siehe z. B. H. Buchner, R. Aichner, und W. Kellermann, „TRINICON-based Blind System Identification with Application to Multiple-Source Localization and Separation”, in Blind Speech Separation, S. Makino, H. Sawada, und T.-W. Lee, Hrsg. Springer Netherlands, 2007, S. 101–147) und Interferenzunterdrückung (Beamforming, beschrieben beispielsweise in Brandstein und D. Ward, Microphone arrays: signal processing techniques and applications, Birkhäuser 2001), ein- und mehrkanalige Rauschunterdrückung, wie ebenfalls in den oben genannten Veröffentlichungen von Brandstein und D. Ward bzw. von E. Hänsler und G. Schmidt beschrieben) vor, wobei die bereits ermittelten Positionen der Endgeräte (siehe oben) als Vorinformation über die Position der Mikrofone dienen kann. Des Weiteren ist eine einfache Selektion der zu verwendenden Mikrofone bei der Aufnahme möglich, z. B. basierend auf Schätzungen von Signal-zu-Rauschleistungsverhältnissen.Just as on the playback side, several electroacoustic transducers are also available on the recording side due to the terminals distributed in the local area. In principle, the multi-channel recording can also be used for spatial processing, in particular for the suppression of interference signals. When recording, the invention therefore sees the application of already known multi-channel adaptive methods for blind source separation (see, for example, BH Buchner, R. Aichner, and W. Kellermann, "TRINICON-based Blind System Identification with Application to Multiple-Source Localization and Separation In Blind Speech Separation, S. Makino, H. Sawada, and T.-W. Lee, Ed. Springer Netherlands, 2007, pp. 101-147) and interference suppression (beamforming, described, for example, in Brandstein and D. Ward. Microphone arrays: signal processing techniques and applications, Birkhäuser 2001), single- and multi-channel noise suppression, as also described in the above-mentioned publications by Brandstein and D. Ward or by E. Hänsler and G. Schmidt), wherein the already determined Positions of the terminals (see above) can serve as a pre-information about the position of the microphones. Furthermore, a simple selection of the microphones to be used during recording is possible, for. Based on estimates of signal-to-noise power ratios.

Die Erfindung sieht auch eine eventuelle Kombination der genannten Verfahren zur Signalverbesserung vor. So kann beispielsweise ein hierarchischer Ansatz verfolgt werden, in dem zunächst ein Cluster von Teilnehmern durch einfache Selektion gebildet werden kann und dann eine blinde Quellentrennung in diesem Cluster durchgeführt wird.The invention also provides for a possible combination of said signal enhancement methods. Thus, for example, a hierarchical approach can be followed, in which initially a cluster of participants can be formed by simple selection and then a blind source separation is performed in this cluster.

Bei der Verarbeitung der Audiodaten wird wegen der möglichen Vielzahl von Kanälen aus Komplexitätsgründen vorzugsweise eine Parallelisierungsstrategie auf der als Server dienendem Rechner oder Endgerät verfolgt. So kann beispielsweist eine Verarbeitung im Transformationsbereich (siehe H. Buchner und S. Spors, A General Derivation of Wave-Domain Adaptive Filtering and Application to Acoustic Echo Cancellation, Proc. Asilomar Conference on Signals, Systems, and Computers, Pacific Grove, CA, USA, Okt. 2008 oder K. Helwani, H. Buchner, und S. Spors, Source-domain adaptive filtering for MIMO systems with application to acoustic echo cancellation, in IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2010), MapReduce (siehe J. Dean und S. Ghemawat, „MapReduce: simplified data processing on large clusters”, 6th Symposium on Operating Systems Design and Implementation, S. 107–113, 2004), oder eine Kombination dieser zum Einsatz kommen.In the processing of the audio data, a parallelization strategy is preferably pursued on the server or terminal serving as server because of the possible multiplicity of channels for complexity reasons. For example, processing in the Transform domain (see H. Buchner and S. Spors, A General Derivation of Wave Domain Adaptive Filtering and Application to Acoustic Echo Cancellation, Proc. Asilomar Conference on Signals, Systems, and Computers, Pacific Grove, CA, USA, Oct. 2008 or K. Helwani, H. Buchner, and S. Spors, Source-Domain Adaptive Filtering for MIMO Systems with Application to Acoustic Echo Cancellation, IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2010), MapReduce (see J. Dean and S. Ghemawat, "MapReduce: simplified data processing on large clusters", 6th Symposium on Operating Systems Design and Implementation, pp. 107-113, 2004), or a combination thereof ,

Obwohl die Erfindung mittels der Figuren und der zugehörigen Beschreibung dargestellt und detailliert beschrieben ist, sind diese Darstellung und diese detaillierte Beschreibung illustrativ und beispielhaft zu verstehen und nicht als die Erfindung einschränkend. Es versteht sich, dass Fachleute Änderungen und Abwandlungen machen können, ohne den Umfang der folgenden Ansprüche zu verlassen. Insbesondere umfasst die Erfindung ebenfalls Ausführungsformen mit jeglicher Kombination von Merkmalen, die vorstehend zu verschiedenen Aspekten und/oder Ausführungsformen genannt oder gezeigt sind.While the invention has been illustrated and described in detail by the figures and the accompanying description, this description and detailed description are to be considered illustrative and exemplary and not limiting as to the invention. It is understood that those skilled in the art can make changes and modifications without departing from the scope of the following claims. In particular, the invention also includes embodiments with any combination of features that are mentioned or shown above in various aspects and / or embodiments.

Die Erfindung umfasst ebenfalls einzelne Merkmale in den Figuren auch wenn sie dort im Zusammenhang mit anderen Merkmalen gezeigt sind und/oder vorstehend nicht genannt sind.The invention also includes individual features in the figures, even if they are shown there in connection with other features and / or not mentioned above.

Weiterhin schließt der Ausdruck „umfassen” und Ableitungen davon andere Elemente oder Schritte nicht aus. Ebenfalls schließt der unbestimmte Artikel „ein” bzw. „eine” und Ableitungen davon eine Vielzahl nicht aus. Die Funktionen mehrerer in den Ansprüchen aufgeführter Merkmale können durch eine Einheit erfüllt sein. Die Begriffe „im Wesentlichen”, „etwa”, „ungefähr” und dergleichen in Verbindung mit einer Eigenschaft beziehungsweise einem Wert definieren insbesondere auch genau die Eigenschaft beziehungsweise genau den Wert.Furthermore, the term "comprising" and derivatives thereof does not exclude other elements or steps. Also, the indefinite article "a" and "derivatives" and derivatives thereof do not exclude a variety. The functions of several features listed in the claims may be fulfilled by one unit. The terms "substantially", "approximately", "approximately" and the like in connection with a property or a value in particular also define precisely the property or exactly the value.

Claims

A method of reproducing a sound field emanating from a virtual source using a plurality of mobile terminals each having at least one electroacoustic transducer in a local space, comprising the steps Connecting the terminals through a data network, Determining the respective positions of the terminals in the local area using a video-based location method, Determining drive functions for the electroacoustic transducers for reproducing the sound field on the basis of a signal transmitted over the data network as a function of the position of the virtual source and the determined positions of the terminals in the local space and Playing the sound field through the electroacoustic transducers in the terminals.

The method of claim 1, wherein the terminals are interconnected by a star-shaped network with an external or local server or by a fully meshed network.

The method of claim 1 or 2, wherein the terminals are connected by a radio network.

The method of claim 3, wherein the wireless network is a cellular network, WLAN or Bluetooth.

Method according to one of the preceding claims, wherein the position of the terminals is determined by manual input by the user.

The method of any one of the preceding claims, further comprising the step of: Synchronizing the playback by the individual terminals to compensate for the delay caused by the transmission of the signal.

The method of any preceding claim, wherein the method further comprises the step of: Selecting the content of the sound field to be played by one of the terminals.

The method of claim 7, wherein the signal transmitted over the data network represents the content of the sound field to be reproduced.

A method according to claim 7 or 8, wherein the content of the sound field to be reproduced is the content of an audio signal from a distant room.

A method according to claim 9, wherein the content of the sound field to be reproduced is the content of the audio signal during a telephone call.

A method according to claim 7, wherein the content of the sound field to be reproduced is an audio file stored in all the terminals.

The method of claim 11, wherein the signal sent over the data network is a time pointer to the audio file.

A method for full-duplex communication between a remote room and a local room using a plurality of mobile terminals each having an electro-acoustic transducer in the local space, with the steps Reproducing a sound field emanating from a virtual source with the method according to claim 9 and Picking up the acoustic signal to be transmitted from the local space to the distant space by the electroacoustic transducers of the terminals.

The method of claim 13, wherein the local, the external server and / or the individual mobile terminals have a device for echo cancellation.

System for performing a method according to one of the preceding claims, wherein the system comprises a plurality of mobile terminals, each having an electro-acoustic transducer, which are connected via a data network.

The system of claim 15, wherein the mobile terminals are smartphones.