[go: up one dir, main page]

WO2014071076A1 - Conférence pour des participants à différents emplacements - Google Patents

Conférence pour des participants à différents emplacements Download PDF

Info

Publication number
WO2014071076A1
WO2014071076A1 PCT/US2013/067877 US2013067877W WO2014071076A1 WO 2014071076 A1 WO2014071076 A1 WO 2014071076A1 US 2013067877 W US2013067877 W US 2013067877W WO 2014071076 A1 WO2014071076 A1 WO 2014071076A1
Authority
WO
WIPO (PCT)
Prior art keywords
audio data
segments
location
audio
locations
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/US2013/067877
Other languages
English (en)
Inventor
Ronald David GUTMAN
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Publication of WO2014071076A1 publication Critical patent/WO2014071076A1/fr
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L12/00Data switching networks
    • H04L12/02Details
    • H04L12/16Arrangements for providing special services to substations
    • H04L12/18Arrangements for providing special services to substations for broadcast or conference, e.g. multicast
    • H04L12/1813Arrangements for providing special services to substations for broadcast or conference, e.g. multicast for computer conferences, e.g. chat rooms
    • H04L12/1827Network arrangements for conference optimisation or adaptation

Definitions

  • the present invention relates to telecommunications networks, and more particularly to teleconferencing.
  • Teleconferencing seeks to extend the usual telephone capability to more than two people, so as to allow any number of people to communicate remotely as if they were talking face to face, e.g. in the same room.
  • Teleconferencing equipment receives the audio signals from each person (each conference participant), mixes the audio, and sends the mixed audio to each participant.
  • video conferencing is also available, then a participant may receive images (e.g. photographs or computer screen images) from one or more other participants. The images can be displayed on a computer monitor.
  • Some teleconferencing embodiments of the present invention step away from imitating face-to-face interaction between participants, and such embodiments enhance a teleconference with features not available in face-to-face interaction.
  • the participants' audio is not mixed and hence not obscured by other participants.
  • some embodiments allow people at different locations to have a discussion using a voice network or data network in such a way that the following benefits and conveniences are provided:
  • Participation schedules need not be precisely coordinated: a person can join the conference late yet still participate and hear all of the discussion. This can be achieved, for example, by recording each participant's contribution for later reproduction by any participant including those who join late.
  • Interruptions such as a call on another phone or other distraction, also do not cause any of the discussion to be missed by any person.
  • the person being distracted can hear the other participants' contributions later during the conference if the contributions are recorded. The other participants thus do not have to wait for the distracted person; they can continue the discussion, or they can listen to earlier recorded contributions if desired.
  • a speaker can pause briefly without being interrupted by another speaker who starts speaking at the pause - both speakers can speak at the same time.
  • a moderator is not needed.
  • a moderator can help establish priorities but is not needed to achieve the previously mentioned benefits.
  • Some embodiments do not need a moderator to prioritize speakers.
  • Muting is automatically applied to reduce noise from locations where no one is speaking to make a contribution.
  • [0017] A meeting among several people of an organization when those people are in different locations. Some of them might be traveling and some working in home offices.
  • An example of a kind of meeting that might benefit significantly is a brainstorming session because each person can contribute at the time his idea occurs. Long pauses in the discussion are less burdensome because if multiple participants speak at the same time, then each participant can listen to other participants' audio during a pause.
  • Fig. 1 illustrates audio data flow in some embodiments of the present invention.
  • FIG. 2 is a block diagram of a teleconferencing system according to some embodiments of the present invention.
  • FIG. 3 is a block diagram of a central computing system used in teleconferencing according to some embodiments of the present invention.
  • FIG. 4 is a block diagram of a teleconferencing participant's system according to some embodiments of the present invention.
  • Fig. 5 is a block diagram of an audio segment according to some embodiments of the present invention.
  • Some embodiments of this invention include methods that do at least the following:
  • FIG. 1 illustrates a conference of four participants at respective four locations 110A, 110B, 1 IOC, 110D.
  • each location has a microphone 120 and a speaker device 130.
  • the microphone converts the audio signals into electrical signals, and the speaker device performs the opposite transformation, as known in the art.
  • a segment of the discussion is defined as audio data that records one person's continuously spoken contribution.
  • the microphone 120 at each location 110X (i.e. 110A, HOB, HOC, HOD) generates a respective segment 140X as shown in the Figure.
  • a location 110X may generate multiple segments or no segments, as any participant might contribute more than one segment at any given time in the discussion, but a segment 140 (i.e. 140X) contains an uninterrupted contribution from one person. However, in some cases, where several participants share a single conference room at each location 110 (possibly with a single microphone 120), one segment 140 might contain contributions from more than one of the participants in the conference room. The end of a segment is automatically determined by a minimum length pause.
  • [0033] Place the segments in a sequence (e.g. sequence 150 in Figure 1). This process can be called serialization; the segments are serialized.
  • the sequence could be created on a single storage device or system (e.g. 264 in Figure 2) by a central computing device such as a computer server (e.g. 260 in Figure 2).
  • reference number 150 is used to refer both to the sequence of serialized segments and to an individual segment 140 in the sequence.
  • a user interface at each location 110 can provide users (e.g. participants) with information about the discussion such as the following information:
  • a user interface at a location 110 can also allow a participant to:
  • the limits applied can depend on the participant.
  • the limits for a lecturer for example, can be much larger.
  • the lecturer naturally contributes a much larger proportion of the discussion.
  • Another embodiment for lectures can place the lecturer's segments 140 alternately in the discussion sequence 150 since it is natural for the lecturer to answer each question from students and, when he asks a question, respond to each answer from a student.
  • the output of the serialized discussion sequence can include pauses of a desired length between the segments. This provides a participant who wishes to make a contribution with an obvious and convenient moment to do so. This is especially useful in embodiments that do not allow a user to start a new segment while listening to another segment. Such embodiments are useful where the computing devices used are not powerful enough for the speech processing needed to separate the segment being output from the new speech.
  • Video can accompany the audio and be serialized with it. This assumes that each speaker has both a microphone available and a means to produce video like a web-cam or a laptop computer.
  • a presentation includes visual media, such as Microsoft PowerPoint slides
  • the presenter can provide all of the video and the video can be stored with the serialized discussion sequence.
  • a new segment is created from a question from the audience, it can be associated with a point in time of the video provided by the presenter. Then, when that audio segment is output, it can be accompanied by the video at that point in time providing the context for the question. If the video originally comes from the presenter's computer, the presenter has the option to take control back and switch the video in real-time, but that only affects what subsequent users will see.
  • Figure 1 shows the main concept in which segments of audio ("new audio segments" 140 below) from different locations 110 are placed into a sequence 150 of segments, the "serialized audio segments" in the figure.
  • the sequence 150 of segments can be streamed back to audio speakers 130 at each location 110 to he heard by the participants at those locations.
  • the streaming to each location 110 can occur independently from the other locations.
  • FIG. 2 shows four different locations 110A, 110B, 1 IOC, 110D each of which has participants in the same discussion.
  • Each of the four locations has a discussion client 210 which is a system dedicated to capturing contributions from the user or users (i.e. participants) at that location and delivering serialized discussion segments 150 back to the users at that location.
  • Each location 110 in the figure is set up differently:
  • Location 110A has only audio devices, and namely an audio speaker 130 and a microphone 120. Other locations 110 might generate and view video as part of the discussion but location 110A does not.
  • Location 110B has a speaker device 130 and a microphone 120, and in addition has video devices including a web cam 220 and a display screen 240 for video.
  • Location HOC is a mobile phone.
  • the phone can be used for audio only as the phone includes a microphone and a speaker device (not shown), but the phone could be used for video also since the phone includes a screen and may or may not include a camera.
  • the corresponding discussion client 210 is the phone's computer (not shown).
  • Location HOD is a laptop computer which may be able to participate fully in the discussion, i.e. to provide both audio and video capture and display.
  • the discussion client is the laptop computer's processor and memory and other associated circuitry (e.g. network interface card, etc.).
  • Each of these discussion clients 210 at locations 1 lOA-110D communicates with central computing system 260 that stores the serialized discussion segments 150 and assigns to each segment an index, e.g. a whole number greater than zero, which is a number used to identify the segment.
  • the central computing system 260 can be a computer on the Internet or can be a network of computers. New segments 140 are generated by each discussion client 210 and sent to the central computing system 260, which puts them in sequence 150. All of the segments 140 are sent back to each client 210 in the sequence 150 established by the central computing system 260.
  • the central computing system 260 also provides configuration data 268 to clients 210 and the status 272 of the discussion in progress.
  • FIG. 3 shows how the central computing system 260 works in detail. At this point, it is useful to know what each discussion segment 140 contains in addition to the audio data 510 (see Fig. 5).
  • Each segment 140 as generated by a discussion client 210 and transmitted to the central computing system 260, contains the following:
  • the time stamp 530 identifying the absolute time when the contribution was made, that is, when the participant began speaking.
  • the time stamp can be encoded according to a standard such as Unix time which encodes time as the number of seconds after January 1 , 1970.
  • the index 550 of a related segment which can be the segment played by the participant's discussion client 210 at the instant the participant began speaking or otherwise began contributing audio. If the participant began during a pause between two segments, the index can be the index of the last segment played by the participant's discussion client. If no segment of the discussion had yet been played by the participant's discussion client, then the related segment's index can be 0.
  • the time stamp can be encoded by the number seconds from the start of the other segment.
  • Newly created segments 140 are directed to Speech processing unit 310 ( Figure 3) within central computing system 260.
  • Speech processing unit 310 cleans up the sound (510) in the segment 140.
  • the speech processing unit might remove noise, but an important part of the cleanup is to remove sound from a related segment (indicated by 550) which was being played by discussion client 210 when the segment 140 was created.
  • the related segment's sound might be picked up by the microphone 120 being used to capture the new segment 140. It is desirable to remove that part of the captured sound, if any, to make the contributor's presentation clearer. If the contributor uses a head phone for microphone 120 or starts in a pause, the unwanted sound might be minor, but otherwise, it can be significant.
  • This processing is similar to echo cancellation and prior art might be used to implement it.
  • Spectral subtraction is a technique, also prior art, that might be used by the unit 310.
  • the speech processing unit 310 can obtain this audio data from the serialized segment server 320 using the related segment index 550, and may use the relative time stamp 560 to locate the sound within the related segment (550) originally produced by the discussion client 210 at the time when the new segment 140 was captured. Only a short initial portion of the audio 510 of the new segment 140 is processed for this removal because the discussion client 210 pauses play of the related segment (550) shortly after the creation of the new segment 140 begins.
  • segments 140 may arrive simultaneously from different discussion clients 210, and can be processed simultaneously as they arrive.
  • the Segment Serializer 330 takes new segment information as described above and places each segment 140 into the sequence 150 by assigning a segment index to the segment.
  • the segment index is a number that is greater for segments later in the sequence 150.
  • a very simple embodiment of the serializer 330 can simply assign increasing indexes in the order that new segments initially arrive at the serializer. Other embodiments can take other factors into account as described above.
  • a new segment 140 arrives over time as it is being created and the serializer 330 can wait until it has received all of the new segment before assigning an index to the new segment, especially since some possible rules for assigning the index use the length of the segment or the time of the segment's end.
  • the segment serializer 330 accesses serialization rules 342 serialization from the Discussion configuration unit 340.
  • the serializer 330 can work as follows:
  • the serializer 330 checks the configuration 342 for privileges of the contributing participant. For example, the serialization rules make indicate for that the new segment's contributor (identified by 520) might have the privilege that his segments 140 appear alternately in the sequence (for example, if the contributor is a lecturer). If so, and if the last assigned segment 140 (i.e. the last segment assigned an index) was contributed by another participant, then the segment 140 is immediately assigned the next index. [0077] 2. Otherwise, the new segment 140 is placed in a group of unassigned segments to which other rules 342 are applied.
  • serializer 330 can pick an incomplete new segment 140 and assign to it the next index so the segment can become available to the waiting clients 210.
  • Serializer 330 can apply a rule such as picking a segment 140 based on its absolute time stamp 530. The next index can be assigned to the segment 140 with the earliest time stamp 530.
  • Serializer 330 can also consider the priority of the contributor. The serializer can obtain current information about waiting clients from the Serialized Segment Server 320.
  • Serializer 330 can also consider the priority of the contributor.
  • the Serialized Segment Server 320 does the following:
  • Status 272 can include:
  • the Segment Serializer 330 streams new segments 140 into the Serialized Segment Server 320 to begin storing each new segment 140 before the segment is assigned an index.
  • the Segment Serializer 330 can later send the serialization sequence index to the Serialized Segment Server 320.
  • a new segment 140 Before being assigned a sequence index, a new segment 140 can be assigned a temporary index that the serializer 330 and Serialized Segment Server 320 use to reference the segment, or the segment can be referenced by the identity 520 of the participant from which the segment comes.
  • Discussion configuration unit 340 keeps the following information that other parts of the system, such as the Segment Serializer 330 and the Discussion Clients 210, can access:
  • the privilege information may include:
  • Figure 4 shows a Discussion Client 210, which serves two purposes:
  • Video buffer 410 and audio buffer 420 in the discussion client simply capture and store, in main memory (not shown) or elsewhere, the raw data from any video capture device such as a web cam 220 and audio capture device 120 such as a microphone so that the data is not lost before it can be processed.
  • User interface 430 displays discussion status 272 described above and accepts commands from the participant in one form or another such as a voice command or touch of a button. (User interface 430 may be combined with screen 240 to display both status 272 and video segments, and/or user interface 430 can be combined with user interfaces of other devices, such as 120 or 220.)
  • VAD (Voice Activity Detection) unit 440 performs voice activity detection, which means that it detects when new speech begins to occur. The detection is performed based on the signal from audio capture device 120. When VAD 440 detects new speech, VAD 440 alerts New Segment Control unit 450 to manage creation of a new segment 140.
  • VAD unit 440 can use algorithms from prior art, such as counting zero crossings, to detect the start of new speech. The detection can err toward inferring start of speech when there is none because another unit, the speech processing unit 460, can compensate. In this approach, VAD 440 can make a quick judgment and the more complex analysis is only performed when VAD 440 detects start of speech. This design is useful when hardware is not sufficiently powerful or the playing of segments by Segment player 470 feeds sound back to microphone 120 confusing the VAD algorithm.
  • Speech processing unit 460 is very similar to the speech processing unit 310 in the Central computing system 260. As stated before, an embodiment might fully implement only one of the two speech processing units 460, 310 while the other unit simply passes the new segment data 510 through. However, the unit 460 in the Discussion Client 210 may also perform the following tasks:
  • unit 460 It transmits the video stream as well the audio. If unit 460 is directed by the New Segment Control Unit 450 not to transmit the new segment as described below (due to limit violations for example), then unit 460 may also block transmission of the associated video, if any, captured device 220.
  • speech processing 460 can have these special features:
  • Speech processing 460 can augment the VAD algorithm because speech processing 460 uses information about the sound from a simultaneously played segment and about how the input audio is affected by the simultaneously played segment. After removal of noise and sound of the simultaneously played segment from the audio input, speech processing 460 can test more accurately for the start of new speech. If unit 460 determines that new speech has not occurred (VAD was triggered by noise or playback), unit 460 signals the New Segment Control 450 that a new segment will not be created, and speech processing 460 does not transmit a new segment.
  • New Segment Control unit 450 directs the creation of new segments 140 as follows:
  • User interface 430 which might provide a means for the user to indicate creation of a new segment (via voice command, button touch, or other human interface). To the user, this is a "record" command.
  • VAD unit 440 [00114] - b.
  • New Segment Control 450 applies rules based on participant status received from the Segment player 470 and on discussion configuration 268 from the Central computing system 260 to determine whether the new segment should be allowed. Possible rules are described above.
  • New Segment Control 450 can also use the rules to compute how much additional time this participant can contribute to new segments based on current discussion status 272. New Segment Control 450 might use this information to enforce the rules. New
  • Segment Control 450 can additionally transmit the information to user interface 430 for display.
  • New Segment Control 450 determines that a new segment should be created, New Segment Control 450 signals the Segment player 470 to pause any currently playing segments and signals the Speech processing unit 460 to transmit the new segment. At the same time New Segment Control 450 sends information to the Speech processing 460 on how to format the new segment. Such information may include:
  • New Segment Control 450 accepts any signal from the Speech processing unit 460 to abort the new segment and responds by signaling the Segment player 470 to resume any playback.
  • Audio segment buffer 482 and video segment buffer 484 monitor the Central computing system 260 for serialized segments and buffer the segments as the segments become available.
  • a video segment contains the video data for the corresponding audio segment 140.
  • Audio buffer 482 includes, for each segment 140 it stores, all segment information as described above including the serialized segment index.
  • the speech processing unit 460 can access audio segment buffer 482 for a related segment 150 played during creation of a new segment 140 and can remove from the new segment 140 the sound played during creation of the new segment.
  • Segment player 470 performs the following functions:
  • segment player 470 obtains discussion status 272 from the Central computing system 260, tracks the activity of the participant using this client 210, keeps playback status, and provides related information to the user interface 430 and provides participant status, mentioned above, to the New Segment Control unit 450.
  • segment player 470 Obtains discussion status 272 from the Central computing system 260, tracks the activity of the participant using this client 210, keeps playback status, and provides related information to the user interface 430 and provides participant status, mentioned above, to the New Segment Control unit 450.
  • segment player 470 is Obtains discussion status 272 from the Central computing system 260, tracks the activity of the participant using this client 210, keeps playback status, and provides related information to the user interface 430 and provides participant status, mentioned above, to the New Segment Control unit 450.
  • segment player 470 is Obtains discussion status 272 from the Central computing system 260, tracks the activity of the participant using this client 210, keeps playback status, and provides related information to the user interface 430 and provides participant status, mentioned above, to
  • [00125] - a Provides the index of the currently playing segment 150 to the New Segment Control Unit 450.
  • the index can be used in formatting new segments (note field 550 in Fig. 5).

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Telephonic Communication Services (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

Selon l'invention, dans une téléconférence, des données audio représentent des signaux audio (oscillations d'air) provenant de différents participants sans mélanger les signaux audio provenant de différents emplacements même si des participants parlent simultanément ; l'audio de chaque participant n'étant pas embrouillé par d'autres participants. Les données audio de tous les participants sont mises en file d'attente dans une file d'attente commune (150) sur la base de l'instant auquel l'audio a été généré, et/ou des priorités des participants, et/ou d'autres informations. L'audio est lu à chaque emplacement dans l'ordre de la file d'attente. L'invention concerne également d'autres caractéristiques.
PCT/US2013/067877 2012-11-01 2013-10-31 Conférence pour des participants à différents emplacements Ceased WO2014071076A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201261721032P 2012-11-01 2012-11-01
US61/721,032 2012-11-01

Publications (1)

Publication Number Publication Date
WO2014071076A1 true WO2014071076A1 (fr) 2014-05-08

Family

ID=50628064

Family Applications (2)

Application Number Title Priority Date Filing Date
PCT/US2013/067877 Ceased WO2014071076A1 (fr) 2012-11-01 2013-10-31 Conférence pour des participants à différents emplacements
PCT/US2013/068000 Ceased WO2014071152A1 (fr) 2012-11-01 2013-11-01 Téléconférence pour des participants à différents emplacements

Family Applications After (1)

Application Number Title Priority Date Filing Date
PCT/US2013/068000 Ceased WO2014071152A1 (fr) 2012-11-01 2013-11-01 Téléconférence pour des participants à différents emplacements

Country Status (1)

Country Link
WO (2) WO2014071076A1 (fr)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10645035B2 (en) * 2017-11-02 2020-05-05 Google Llc Automated assistants with conference capabilities

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080212747A1 (en) * 2001-01-24 2008-09-04 Microsoft Corporation Method and apparatus for serializing an asynchronous communication
US20090003247A1 (en) * 2007-06-28 2009-01-01 Rebelvox, Llc Telecommunication and multimedia management method and apparatus

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5765164A (en) * 1995-12-21 1998-06-09 Intel Corporation Apparatus and method for management of discontinuous segments of multiple audio, video, and data streams
US9258337B2 (en) * 2008-03-18 2016-02-09 Avaya Inc. Inclusion of web content in a virtual environment
US8665309B2 (en) * 2009-11-03 2014-03-04 Northrop Grumman Systems Corporation Video teleconference systems and methods for providing virtual round table meetings

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080212747A1 (en) * 2001-01-24 2008-09-04 Microsoft Corporation Method and apparatus for serializing an asynchronous communication
US20090003247A1 (en) * 2007-06-28 2009-01-01 Rebelvox, Llc Telecommunication and multimedia management method and apparatus

Also Published As

Publication number Publication date
WO2014071152A1 (fr) 2014-05-08

Similar Documents

Publication Publication Date Title
CN110113316B (zh) 会议接入方法、装置、设备及计算机可读存储介质
EP2362576A1 (fr) System de teleconférence web équipé de navigation vidéo en direct
US20100220172A1 (en) Automatic Video Switching for Multimedia Conferencing
US7808521B2 (en) Multimedia conference recording and manipulation interface
JP2010507353A (ja) オーバラップするメディアメッセージを調整するシステム及び方法
US20140122588A1 (en) Automatic Notification of Audience Boredom during Meetings and Conferences
US12229471B2 (en) Centrally controlling communication at a venue
Chen Conveying conversational cues through video
US20040249967A1 (en) Primary data stream communication
US20250168209A1 (en) Systems and methods for selecting a local device in a collaborative environment
JP5217877B2 (ja) 会議支援装置
JP7292343B2 (ja) 情報処理装置、情報処理方法及び情報処理プログラム
WO2014071076A1 (fr) Conférence pour des participants à différents emplacements
JP2006254064A (ja) 遠隔会議システム、音像位置割当方法および音質設定方法
US11949727B2 (en) Organic conversations in a virtual group setting
JP6610076B2 (ja) 情報処理装置、情報処理システム、プログラム、及び記録媒体
JPH11136369A (ja) 多地点間接続音声制御装置
JP3211786U (ja) ライブ映像利用の対話装置
JP4768578B2 (ja) テレビ会議システム、及び、テレビ会議システムにおける制御方法
US12526387B2 (en) Systems and methods for managing audio input data and audio output data of virtual meetings
JP4531013B2 (ja) 映像音声会議システムおよび端末装置
JP2013207465A (ja) 会議システム、端末装置および会議方法
HK40071405B (en) Participation queue system and method for online video conferencing
JP2024122041A (ja) 情報処理装置、情報処理方法、プログラム
JP2022113375A (ja) 情報処理方法及び監視システム

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 13850060

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 13850060

Country of ref document: EP

Kind code of ref document: A1