WO2008147272A1 - A conference bridge and a method for managing packets arriving therein - Google Patents
A conference bridge and a method for managing packets arriving therein Download PDFInfo
- Publication number
- WO2008147272A1 WO2008147272A1 PCT/SE2007/050395 SE2007050395W WO2008147272A1 WO 2008147272 A1 WO2008147272 A1 WO 2008147272A1 SE 2007050395 W SE2007050395 W SE 2007050395W WO 2008147272 A1 WO2008147272 A1 WO 2008147272A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- packets
- arrived
- streams
- conference bridge
- packet
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims description 22
- 230000001360 synchronised effect Effects 0.000 claims abstract description 12
- 230000015654 memory Effects 0.000 claims abstract description 7
- 230000000694 effects Effects 0.000 claims description 11
- 238000001514 detection method Methods 0.000 claims description 5
- 238000012544 monitoring process Methods 0.000 claims 1
- 239000000872 buffer Substances 0.000 description 30
- 238000010586 diagram Methods 0.000 description 17
- 230000005540 biological transmission Effects 0.000 description 7
- 238000013459 approach Methods 0.000 description 4
- 230000003139 buffering effect Effects 0.000 description 4
- 238000012545 processing Methods 0.000 description 3
- 230000008901 benefit Effects 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 230000001934 delay Effects 0.000 description 2
- 230000003111 delayed effect Effects 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 238000007476 Maximum Likelihood Methods 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 230000002457 bidirectional effect Effects 0.000 description 1
- 230000007717 exclusion Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000013508 migration Methods 0.000 description 1
- 230000005012 migration Effects 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000000737 periodic effect Effects 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L12/00—Data switching networks
- H04L12/02—Details
- H04L12/16—Arrangements for providing special services to substations
- H04L12/18—Arrangements for providing special services to substations for broadcast or conference, e.g. multicast
- H04L12/1881—Arrangements for providing special services to substations for broadcast or conference, e.g. multicast with schedule organisation, e.g. priority, sequence management
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L49/00—Packet switching elements
- H04L49/90—Buffering arrangements
- H04L49/901—Buffering arrangements using storage descriptor, e.g. read or write pointers
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L49/00—Packet switching elements
- H04L49/90—Buffering arrangements
- H04L49/9023—Buffering arrangements for implementing a jitter-buffer
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L49/00—Packet switching elements
- H04L49/90—Buffering arrangements
- H04L49/9084—Reactions to storage capacity overflow
- H04L49/9089—Reactions to storage capacity overflow replacing packets in a storage arrangement, e.g. pushout
- H04L49/9094—Arrangements for simultaneous transmit and receive, e.g. simultaneous reading/writing from/to the storage element
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L65/00—Network arrangements, protocols or services for supporting real-time applications in data packet communication
- H04L65/40—Support for services or applications
- H04L65/403—Arrangements for multi-party communication, e.g. for conferences
- H04L65/4038—Arrangements for multi-party communication, e.g. for conferences with floor control
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L65/00—Network arrangements, protocols or services for supporting real-time applications in data packet communication
- H04L65/60—Network streaming of media packets
- H04L65/75—Media network packet handling
- H04L65/765—Media network packet handling intermediate
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L65/00—Network arrangements, protocols or services for supporting real-time applications in data packet communication
- H04L65/80—Responding to QoS
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M3/00—Automatic or semi-automatic exchanges
- H04M3/42—Systems providing special services or facilities to subscribers
- H04M3/56—Arrangements for connecting several subscribers to a common circuit, i.e. affording conference facilities
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L12/00—Data switching networks
- H04L12/02—Details
- H04L12/16—Arrangements for providing special services to substations
- H04L12/18—Arrangements for providing special services to substations for broadcast or conference, e.g. multicast
- H04L12/1813—Arrangements for providing special services to substations for broadcast or conference, e.g. multicast for computer conferences, e.g. chat rooms
- H04L12/1822—Conducting the conference, e.g. admission, detection, selection or grouping of participants, correlating users to one or more conference sessions, prioritising transmission
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M7/00—Arrangements for interconnection between switching centres
- H04M7/006—Networks other than PSTN/ISDN providing telephone service, e.g. Voice over Internet Protocol (VoIP), including next generation networks with a packet-switched transport layer
Definitions
- the present invention relates to managing packets of multiple packetized media streams arriving in a conference bridge of a non- synchronous packet network
- Conferencing capability allows for group communication and collaboration among geographically dispersed participants (also called users below).
- PSTN Public Switched Telephone Network
- the mixing of real-time media streams from several users can usually be performed without causing any substantial additional delay.
- the individual audio samples from the participants are synchronized and arrive at regular time intervals. This means that the samples can be scheduled to be processed at regular time intervals and no additional delay is added except the time needed for the processing.
- the processing for a voice teleconference usually consist of determining which talkers that are active and summing the speech contribution from the active talkers.
- IP Internet Protocol
- jitter buffers are typically implemented in the conference bridge on the incoming speech to cater for the varying delay of the packets.
- a drawback of the prior art jitter buffer approach is that the jitter buffers introduce an undesirable extra delay in the conference bridge.
- An object of the present invention is to reduce the delay in a conference bridge.
- the present invention queues arrived packets for each stream.
- the queued packets are monitored to detect arrival of temporally related packets of the streams. Selected temporally related packets are mixed once it has been detected that they have arrived.
- An advantage of the present invention is that it reduces the overall delay in the network. Instead of introducing a fixed delay in the conference bridge, the jitter in the incoming packets is forwarded to be handled by the receiving terminal.
- Fig. 1 is a simple block diagram of a network with a conference bridge
- FIG. 2 is a more detailed block diagram of a typical prior art non- synchronous packet network with a conference bridge
- Fig. 3 is a time diagram illustrating jitter buffering in a typical prior art conference bridge
- Fig. 4 is a time diagram illustrating time dependence of the jitter from a first user to the conference bridge
- Fig. 5 is a time diagram illustrating time dependence of the jitter from the conference bridge to a second user
- Fig. 6 is a time diagram illustrating time dependence of the combined jitter from the first user to the second user
- Fig. 7 is a time diagram illustrating an embodiment of the method in accordance with the present invention.
- Fig. 8 is a block diagram of an embodiment of a conference bridge in accordance with the present invention.
- Fig. 9 is a time diagram illustrating another embodiment of the method in accordance with the present invention.
- Fig. 10 is a time diagram illustrating still another embodiment of the method in accordance with the present invention.
- Fig. 11 is a block diagram of another embodiment of a conference bridge in accordance with the present invention.
- Fig. 12 is a flow chart illustrating the principles of the method in accordance with the present invention.
- Fig. 13 is a flow chart illustrating an embodiment of the method in accordance with the present invention.
- Fig. 14 is a flow chart illustrating a further embodiment of the method in accordance with the present invention. DETAILED DESCRIPTION
- Fig. 1 is a simple block diagram of a network with a conference bridge 10.
- Users A-E each have bidirectional connections to conference bridge 10. In one direction each user sends audio or voice packets to conference bridge 10. In the opposite direction each user receives combined or mixed packets from the other users or participants in the conference. For example, user A sends packets A to conference bridge 10 and receives a mix BCDE of packets from the other users.
- the purpose of the conference bridge is to manage received packets and perform the mixing that is relevant for each user.
- Fig. 2 is a more detailed block diagram of a typical prior art non- synchronous packet network with a conference bridge 10. In order to simplify the description, Fig. 2 only illustrates how packets from users A-D are mixed and forwarded to user E. The other users are managed in a similar way.
- Packets from users A-D reach respective jitter buffers 12 in conference bridge 10, where they are delayed. When the packets are released from jitter buffers 10, they are forwarded to respective decoders 14. Decoders 14 decode the packets into samples that are forwarded to a selecting and mixing unit 16. After mixing, the resulting samples are encoded into packets in an encoder 18 and forwarded to user E.
- a clock unit 20 releases packets from jitter buffers 12 at regular time instants separated by a time interval T, which corresponds to the length of a speech frame, typically 20-40 ms. The added delay in jitter buffers is typically 1-3 time intervals T.
- Fig. 3 is a time diagram illustrating jitter buffering in the conference bridge of Fig. 2.
- This example assumes a jitter buffer delay of one time interval T
- the packets are temporally re- lated means that they are based on samples generated (at the users) at approximately the same absolute or global time, i.e. they represent approximately simultaneous events. Due to the delay in jitter buffers 12, mixing is not performed until time instant k+ 1 , as illustrated by the arrow in the lower left corner of Fig. 3. At time instant k+1 all temporally related packets have also arrived, but this time they were not synchronous.
- the jitter buffer approach can be described mathematically as follows:
- the transmission time from, for example, user A to user E without a jitter buffer may be expressed as:
- T A ⁇ bndge is the (constant) time delay from user A to the bridge
- T bndge ⁇ E is the (constant) time delay from the bridge to user E, ⁇ ⁇ ⁇ bndge ik) ls the jitter in transmission time from user A to the bridge,
- ⁇ bndge ⁇ E (k) is the jitter in transmission time from the bridge to user E.
- the jitter can be assumed to obey some statistical distribution.
- the transmission time for packets from user A to user E will be:
- the transmission time for packets from A to E will, in average, be larger with the inclusion of a jitter buffer. Similar reasoning for the other paths leads to jitters ⁇ B ⁇ bndge (k), ⁇ c ⁇ bndge (k) and ⁇ D ⁇ bndge (k) . Thus, the jitter buffers have to satisfy
- TjXterbuffer > maX (° " A ⁇ b ⁇ dge W > ⁇ B ⁇ b ⁇ dge W > ⁇ C ⁇ bndge ( k ) ⁇ D ⁇ bnd ge ⁇ k )) ( 4 )
- the media stream with the largest jitter will determine the jitter buffer delay for all streams. It is also known to have adaptive jitter buffer delays.
- the basic concept of the invention is to exclude the jitter buffering in the conference bridge and asynchronously perform the mixing of the signals once all the packets that should be mixed have arrived.
- the purpose of the jitter buffers in a conference bridge is to enable the mixing of the correct samples from the included participants by compensating for jitter in the transmission time.
- the conference bridge does not have to produce synchronous output (for e.g. playback in a loudspeaker), but can produce asyn- chronous output. Instead all the necessary synchronization for playback may be performed at the jitter buffer in the receiving terminal.
- the present invention is based on two observations, namely:
- the jitter buffer at user E is of the same magnitude as the jitter buffers in the conference bridge.
- the jitters ⁇ A ⁇ b ⁇ dge (k) and ⁇ bndge ⁇ E (k) are typically uncorrelated. This means that it is unlikely that the two jitters are large at the same time k. This is illustrated in Fig. 4-6. Thus, in average the combined jitter ⁇ A ⁇ bndge (k) + ⁇ bndge ⁇ E (k) will not result in any significantly increased delay.
- Fig. 7 is a time diagram illustrating an embodiment of the method in accordance with the present invention.
- the packets received at the conference bridge are exactly the same as in Fig. 3.
- the temporally related packets are mixed as soon as they have all been received.
- the packets that arrive early in the time interval between k+ 1 and k+2 can immediately be decoded and mixed as soon as all packets have been received, instead of waiting until time instant k+3 as in the prior art of Fig. 3.
- Fig. 8 is a block diagram of an embodiment of a conference bridge in accordance with the present invention.
- packets that arrive in the con- ference bridge to jitter buffers are forwarded to queue memories or FIFOs 22.
- Queue memories 22 are controlled by a control unit 24.
- Control unit 24 monitors queue memories 22 over monitor lines 26 to determine whether temporally related packets from all streams have arrived in the queue memories. As soon as all temporally related packets representing a given time interval have arrived, control unit 24 releases these packets to decoders 14 for decoding and subsequent mixing in unit 16.
- VADs voice activity detectors
- Fig. 9 In the embodiment illustrated in Fig. 9 all temporally related packets are collected and thereafter analyzed with regard to speech activity. Only packets including speech activity are then mixed. In Fig. 9 packets without speech activity have been illustrated by empty boxes. Thus, in this embodiment mixing is not performed until all temporally related packets have arrived. Sometimes this means that packets that are later determined to include no speech activity have to be awaited before previously arrived packets with speech activity can be mixed. This situation has been illustrated, for example, between time instants k+1 and k+2, where a non-speech packet from user D arrives after the speech packets from users A-C. Fig. 9 also illustrates that different users may be silent at different times.
- Fig. 9 can be implemented by a conference bridge in accordance with Fig. 8, provided with voice activity detection for each stream in unit 16.
- mixing is normally performed as soon as all active packets have arrived. This is accomplished by storing and maintaining a list of active streams, typically in unit 16. For example, the three active packets from users A-C can be mixed as soon as they have all arrived between time instants k+1 and k+2, since user D is not in the list of active streams, and thus the later arriving packet from user D can be ignored in the mixing. The reason is that the previous packet from user D did not include any speech. Thus, by storing and maintaining a list of previously active streams or users, this embodiment needs to wait only for packets from users in the list of active streams before mixing can be started.
- a stream is in the list does, however, not necessarily mean that the next arriving packet from this stream will be mixed, since the next packet may be inactive. This is illustrated between instants k+2 and k+3, where the arriving packet from stream A is inactive, thereby enabling updating of the list.
- the active packet from stream D between instants k+2 and k+3 is not in the list when it arrives. However, this packet may actually be included in mixing if the list is updated with the status of all packets that have been received when the packets are released from the queue memories, in this case when the inactive packet from user A has arrived. If the active packet from user D had arrived after the inactive packet from user A, it would thus not have been included in the mixing.
- Fig. 11 is a block diagram of an embodiment of a conference bridge in accordance with the present invention that is suitable for implementing the method illustrated in Fig. 10.
- This embodiment differs from the embodiment of Fig. 8 in that selecting and mixing unit 16 has been modified into a selecting and mixing unit 30, which includes a unit 28 for maintaining a list of active streams.
- Unit 28 forwards a current list of active streams to control unit 24, which uses this list to release arrived temporally related packets to decoders 14 as soon as the last packet from a stream in the list has arrived.
- voice activity detection is assumed to be performed on decoded signals (samples). However, it is also possible to perform voice activity detection directly on the coded speech parameters (before decoders 14). This can for example be performed using the techniques described in [1] combined with the relevant parts of a standard VAD (e.g. 3GPP 26.094), or as exemplified by [2].
- a standard VAD e.g. 3GPP 26.094
- the master clock for the mixing may either be a reference clock in the mixer itself or may be derived from the included participants (e.g. the median time).
- a concealment unit is typically provided in selecting and mixing unit 16, 30.
- a mechanism for estimating and correcting for the clock drift may be included.
- the clock drift preferably is handled at the mixing point, since otherwise an increasing time difference between users with clock drift would be introduced in the mixed signal.
- Methods for deter- mining clock drift can be found in e.g. [3, 4], which are hereby incorporated by reference.
- the functionality of the conference bridge of the present invention is typically implemented by a micro processor or micro/ signal processor combination and corresponding software.
- Step Sl queues packets that have arrived in the conference bridge for each of the streams.
- Step S2 monitors the queued packets to detect arrival of temporally related packets of the streams.
- Step S3 mixes selected temporally related packets once it has been detected that they have arrived. The same steps are performed during the next time interval T.
- Fig. 13 is a flow chart illustrating an embodiment of the method in accordance with the present invention. This embodiment is suitable for the approach described in Fig. 9.
- Step Sl queues packets that have arrived in the conference bridge for each of the streams.
- Step S2 monitors the queued packets to detect arrival of temporally related packets of the streams.
- Step S4 tests whether all temporally related packets of the streams have arrived. If so, active temporally related packets are selected in step S5 and mixed in step S6. Otherwise the procedure returns to step S4. The same steps are performed during the next time interval T.
- Fig. 14 is a flow chart illustrating a further embodiment of the method in accordance with the present invention. This embodiment is suitable for the approach described in Fig. 10.
- Step Sl queues packets that have arrived in the conference bridge for each of the streams.
- Step S7 selects streams eligible for mixing from a list of currently active streams.
- Step S8 monitors the queued packets to detect arrival of temporally related packets of the streams in the list.
- Step S9 tests whether all temporally related packets of the streams in the list have arrived. If so, the list of active streams is updated in step SlO and then the received temporally related packets from streams in the list are selected and mixed in step SI l. Otherwise the procedure returns to step S8. The same steps are performed during the next time interval T.
Landscapes
- Engineering & Computer Science (AREA)
- Signal Processing (AREA)
- Computer Networks & Wireless Communication (AREA)
- Multimedia (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
- Telephonic Communication Services (AREA)
Abstract
A conference bridge for managing arriving packets of multiple packetized media streams of a non-synchronous packet network includes queue memories (22) arranged to queue arrived packets for each stream. A control unit (24) monitors the queued packets to detect arrival of temporally related packets of the streams. A mixer (16) mixes selected temporally related packets once it has been detected that they have arrived.
Description
A CONFERENCE BRIDGE AND A METHOD FOR MANAGING PACKETS ARRIVING THEREIN
TECHNICAL FIELD
The present invention relates to managing packets of multiple packetized media streams arriving in a conference bridge of a non- synchronous packet network
BACKGROUND
Conferencing capability allows for group communication and collaboration among geographically dispersed participants (also called users below). Historically, conferencing has been achieved in the Public Switched Telephone Network (PSTN) by means of a centralized conference bridge. In such a circuit switched network, the mixing of real-time media streams from several users can usually be performed without causing any substantial additional delay. In e.g. a voice teleconference, the individual audio samples from the participants are synchronized and arrive at regular time intervals. This means that the samples can be scheduled to be processed at regular time intervals and no additional delay is added except the time needed for the processing. The processing for a voice teleconference usually consist of determining which talkers that are active and summing the speech contribution from the active talkers.
Currently trends point towards the migration of voice communication services from the circuit- switched PSTN to non- synchronous packet-based Internet Protocol (IP) networks. This shift is motivated by a desire to provide data and voice services on a single, packet-based network infrastructure. In a non- synchronous packet network, the audio samples (or coded parameters representing the audio samples) from the participants in e.g. a voice telecon-
ference do usually not arrive at regular time intervals due to the jitter in the transport network.
In order to synchronize the speech contributions from the participants and thus making it possible to mix samples corresponding to the same time from all participants, jitter buffers are typically implemented in the conference bridge on the incoming speech to cater for the varying delay of the packets.
SUMMARY
A drawback of the prior art jitter buffer approach is that the jitter buffers introduce an undesirable extra delay in the conference bridge.
An object of the present invention is to reduce the delay in a conference bridge.
This object is achieved in accordance with the attached claims.
Briefly, the present invention queues arrived packets for each stream. The queued packets are monitored to detect arrival of temporally related packets of the streams. Selected temporally related packets are mixed once it has been detected that they have arrived.
An advantage of the present invention is that it reduces the overall delay in the network. Instead of introducing a fixed delay in the conference bridge, the jitter in the incoming packets is forwarded to be handled by the receiving terminal.
BRIEF DESCRIPTION OF THE DRAWINGS
The invention, together with further objects and advantages thereof, may best be understood by making reference to the following description taken together with the accompanying drawings, in which:
Fig. 1 is a simple block diagram of a network with a conference bridge;
Fig. 2 is a more detailed block diagram of a typical prior art non- synchronous packet network with a conference bridge;
Fig. 3 is a time diagram illustrating jitter buffering in a typical prior art conference bridge;
Fig. 4 is a time diagram illustrating time dependence of the jitter from a first user to the conference bridge;
Fig. 5 is a time diagram illustrating time dependence of the jitter from the conference bridge to a second user;
Fig. 6 is a time diagram illustrating time dependence of the combined jitter from the first user to the second user;
Fig. 7 is a time diagram illustrating an embodiment of the method in accordance with the present invention;
Fig. 8 is a block diagram of an embodiment of a conference bridge in accordance with the present invention;
Fig. 9 is a time diagram illustrating another embodiment of the method in accordance with the present invention;
Fig. 10 is a time diagram illustrating still another embodiment of the method in accordance with the present invention;
Fig. 11 is a block diagram of another embodiment of a conference bridge in accordance with the present invention;
Fig. 12 is a flow chart illustrating the principles of the method in accordance with the present invention;
Fig. 13 is a flow chart illustrating an embodiment of the method in accordance with the present invention; and
Fig. 14 is a flow chart illustrating a further embodiment of the method in accordance with the present invention.
DETAILED DESCRIPTION
In the following description elements having the same or similar functions will be provided with the same reference designations in the drawings.
Fig. 1 is a simple block diagram of a network with a conference bridge 10. Users A-E each have bidirectional connections to conference bridge 10. In one direction each user sends audio or voice packets to conference bridge 10. In the opposite direction each user receives combined or mixed packets from the other users or participants in the conference. For example, user A sends packets A to conference bridge 10 and receives a mix BCDE of packets from the other users. The purpose of the conference bridge is to manage received packets and perform the mixing that is relevant for each user.
Fig. 2 is a more detailed block diagram of a typical prior art non- synchronous packet network with a conference bridge 10. In order to simplify the description, Fig. 2 only illustrates how packets from users A-D are mixed and forwarded to user E. The other users are managed in a similar way.
Packets from users A-D reach respective jitter buffers 12 in conference bridge 10, where they are delayed. When the packets are released from jitter buffers 10, they are forwarded to respective decoders 14. Decoders 14 decode the packets into samples that are forwarded to a selecting and mixing unit 16. After mixing, the resulting samples are encoded into packets in an encoder 18 and forwarded to user E. A clock unit 20 releases packets from jitter buffers 12 at regular time instants separated by a time interval T, which corresponds to the length of a speech frame, typically 20-40 ms. The added delay in jitter buffers is typically 1-3 time intervals T.
Fig. 3 is a time diagram illustrating jitter buffering in the conference bridge of Fig. 2. This example assumes a jitter buffer delay of one time interval T At time instant k all temporally related packets from users A-D have arrived simultaneously to conference bridge 10. The feature that the packets are temporally re-
lated means that they are based on samples generated (at the users) at approximately the same absolute or global time, i.e. they represent approximately simultaneous events. Due to the delay in jitter buffers 12, mixing is not performed until time instant k+ 1 , as illustrated by the arrow in the lower left corner of Fig. 3. At time instant k+1 all temporally related packets have also arrived, but this time they were not synchronous. However, the buffering until time instant k+2 makes them synchronous. Similar comments apply to the temporally related packets arriving between instants k+2 and k+3. It is noted that so far the extra delay provided by the jitter buffers was not actually needed, since all packets arrived in time for mixing at the next periodic time instant. This, however, is changed at time instant k+4, since the packet from user A would have arrived to late for mixing without the extra delay. Thus, for such situations the extra delay provided for late packets by jitter buffers 12 is actually useful.
The jitter buffer approach can be described mathematically as follows: The transmission time from, for example, user A to user E without a jitter buffer may be expressed as:
T^α|(/c) = TA→hndge + §A→hndge{k) + Tbndge→E + §hndge→E{k) (1) where
TA→bndge is the (constant) time delay from user A to the bridge,
Tbndge→E is the (constant) time delay from the bridge to user E, δ Α→bndgeik) ls the jitter in transmission time from user A to the bridge,
§bndge→E(k) is the jitter in transmission time from the bridge to user E.
The jitter can be assumed to obey some statistical distribution.
With the inclusion of a jitter buffer 12 in the conference bridge, the transmission time for packets from user A to user E will be:
1 A→E V^/ — 1 A→bndge τ λ jitterbuffer τ 1 bndge→E τ u bridge → E \^ I \^>
Since the jitter buffer in the conference bridge is designed to compensate for the possible jitter §A→hndge{k) one can assume that:
most of the time. Thus, the transmission time for packets from A to E will, in average, be larger with the inclusion of a jitter buffer. Similar reasoning for the other paths leads to jitters δB→bndge(k), δc→bndge(k) and δD→bndge(k) . Thus, the jitter buffers have to satisfy
TjXterbuffer > maX (°" A→bπdge W > δ B→bπdge W > ^C→bndge (k) ^ D→bndge {k)) (4)
most of the time, i.e. the media stream with the largest jitter will determine the jitter buffer delay for all streams. It is also known to have adaptive jitter buffer delays.
From equation (1) it is noted that there is a jitter δbndge→E(k) in transmission time from the bridge to user E. This jitter is compensated by a playback jitter buffer at user E. The playback jitter buffer synchronizes the packets of the single mixed stream arriving at user E before decoding to generate a continuous stream of decoded samples at user E.
The basic concept of the invention is to exclude the jitter buffering in the conference bridge and asynchronously perform the mixing of the signals once all the packets that should be mixed have arrived. As discussed above, the purpose of the jitter buffers in a conference bridge is to enable the mixing of the correct samples from the included participants by compensating for jitter in the transmission time. However, unlike the situation for the playback in the terminal, the conference bridge does not have to produce synchronous output (for e.g. playback in a loudspeaker), but can produce asyn-
chronous output. Instead all the necessary synchronization for playback may be performed at the jitter buffer in the receiving terminal. The present invention is based on two observations, namely:
1. The jitter buffer at user E is of the same magnitude as the jitter buffers in the conference bridge.
2. The jitters δA→bπdge(k) and δbndge→E(k) are typically uncorrelated. This means that it is unlikely that the two jitters are large at the same time k. This is illustrated in Fig. 4-6. Thus, in average the combined jitter ^A→bndge(k) + δbndge→E(k) will not result in any significantly increased delay.
These observations imply that by omitting the jitter buffers in the conference bridge, the jitters resulting at user E will in average be of the same magnitude as before, but that larger jitters will occur more frequently. However, since the jitter buffer at user E is already designed to cope with such jitter, no changes are necessary at user E.
Fig. 7 is a time diagram illustrating an embodiment of the method in accordance with the present invention. The packets received at the conference bridge are exactly the same as in Fig. 3. However, since there are no jitter buffers in the conference bridge, the temporally related packets are mixed as soon as they have all been received. As can be seen in Fig. 7 this results in an eliminated jitter buffer delay and mixed stream with asynchronously transmitted packets. For example, the packets that arrive early in the time interval between k+ 1 and k+2 can immediately be decoded and mixed as soon as all packets have been received, instead of waiting until time instant k+3 as in the prior art of Fig. 3.
Fig. 8 is a block diagram of an embodiment of a conference bridge in accordance with the present invention. Instead of forwarding packets that arrive in the con-
ference bridge to jitter buffers, as in the prior art, they are forwarded to queue memories or FIFOs 22. Queue memories 22 are controlled by a control unit 24. Control unit 24 monitors queue memories 22 over monitor lines 26 to determine whether temporally related packets from all streams have arrived in the queue memories. As soon as all temporally related packets representing a given time interval have arrived, control unit 24 releases these packets to decoders 14 for decoding and subsequent mixing in unit 16.
In the above description it has been assumed that packets from all users A-D should be mixed. However, if one or several users are silent, packets from these users may be discarded instead of mixed with packets from active talkers. The detection may be performed by one or more voice activity detectors (VADs), typically included in unit 16. This situation can be handled in different ways, as illustrated by Fig. 9 and 10.
In the embodiment illustrated in Fig. 9 all temporally related packets are collected and thereafter analyzed with regard to speech activity. Only packets including speech activity are then mixed. In Fig. 9 packets without speech activity have been illustrated by empty boxes. Thus, in this embodiment mixing is not performed until all temporally related packets have arrived. Sometimes this means that packets that are later determined to include no speech activity have to be awaited before previously arrived packets with speech activity can be mixed. This situation has been illustrated, for example, between time instants k+1 and k+2, where a non-speech packet from user D arrives after the speech packets from users A-C. Fig. 9 also illustrates that different users may be silent at different times. Thus, user D is silent between time instants k and k+2, while user A is silent between time instants k+2 and k+5. The embodiment illustrated in Fig. 9 can be implemented by a conference bridge in accordance with Fig. 8, provided with voice activity detection for each stream in unit 16.
In the embodiment illustrated in Fig. 10, mixing is normally performed as soon as all active packets have arrived. This is accomplished by storing and
maintaining a list of active streams, typically in unit 16. For example, the three active packets from users A-C can be mixed as soon as they have all arrived between time instants k+1 and k+2, since user D is not in the list of active streams, and thus the later arriving packet from user D can be ignored in the mixing. The reason is that the previous packet from user D did not include any speech. Thus, by storing and maintaining a list of previously active streams or users, this embodiment needs to wait only for packets from users in the list of active streams before mixing can be started. The fact that a stream is in the list does, however, not necessarily mean that the next arriving packet from this stream will be mixed, since the next packet may be inactive. This is illustrated between instants k+2 and k+3, where the arriving packet from stream A is inactive, thereby enabling updating of the list. The active packet from stream D between instants k+2 and k+3 is not in the list when it arrives. However, this packet may actually be included in mixing if the list is updated with the status of all packets that have been received when the packets are released from the queue memories, in this case when the inactive packet from user A has arrived. If the active packet from user D had arrived after the inactive packet from user A, it would thus not have been included in the mixing. Although late packets from streams that are not in the list are ignored for mixing purposes, they are still examined when they arrive to determine whether their inactive /active status has changed to update the list. Similarly, although arriving inactive packets from streams in the list will not be mixed, they will be used to update the list.
Comparing the embodiments of Fig. 9 and 10, it is appreciated that mixing can often be started earlier in the embodiment of Fig. 10. The trade-off is that the active /inactive status of streams may occasionally be delayed one time interval T (due to late arriving active packets from streams not yet in the list), which may lead to exclusion of an actually active packet from mixing.
Fig. 11 is a block diagram of an embodiment of a conference bridge in accordance with the present invention that is suitable for implementing the method
illustrated in Fig. 10. This embodiment differs from the embodiment of Fig. 8 in that selecting and mixing unit 16 has been modified into a selecting and mixing unit 30, which includes a unit 28 for maintaining a list of active streams. Unit 28 forwards a current list of active streams to control unit 24, which uses this list to release arrived temporally related packets to decoders 14 as soon as the last packet from a stream in the list has arrived.
In the embodiments illustrated in Fig. 8 and 1 1 , voice activity detection is assumed to be performed on decoded signals (samples). However, it is also possible to perform voice activity detection directly on the coded speech parameters (before decoders 14). This can for example be performed using the techniques described in [1] combined with the relevant parts of a standard VAD (e.g. 3GPP 26.094), or as exemplified by [2].
The master clock for the mixing may either be a reference clock in the mixer itself or may be derived from the included participants (e.g. the median time).
In the description above it has been assumed that packets arrive with reasonable delays to be included in the mixing. However, if this is not the case concealment strategies may be applied. For example, if an expected packet from use A has not arrived within a predetermined time out period, for example 3-5 time intervals T, a concealment packet (for example the last received packet from user A) may be included in the mixing instead. In this case the late packet is typically discarded if it eventually arrives. The time out period for late arriving packets should be set according to the statistics of the jitter and the desired lost frame rate. A concealment unit is typically provided in selecting and mixing unit 16, 30.
In order to handle possible drift between the clocks of the sample circuits of the respective user terminals, a mechanism for estimating and correcting for the clock drift may be included. The clock drift preferably is handled at the mixing point, since otherwise an increasing time difference between users with clock drift would be introduced in the mixed signal. Methods for deter-
mining clock drift can be found in e.g. [3, 4], which are hereby incorporated by reference.
The functionality of the conference bridge of the present invention is typically implemented by a micro processor or micro/ signal processor combination and corresponding software.
Fig. 12 is a flow chart illustrating the principles of the method in accordance with the present invention. Step Sl queues packets that have arrived in the conference bridge for each of the streams. Step S2 monitors the queued packets to detect arrival of temporally related packets of the streams. Step S3 mixes selected temporally related packets once it has been detected that they have arrived. The same steps are performed during the next time interval T.
Fig. 13 is a flow chart illustrating an embodiment of the method in accordance with the present invention. This embodiment is suitable for the approach described in Fig. 9. Step Sl queues packets that have arrived in the conference bridge for each of the streams. Step S2 monitors the queued packets to detect arrival of temporally related packets of the streams. Step S4 tests whether all temporally related packets of the streams have arrived. If so, active temporally related packets are selected in step S5 and mixed in step S6. Otherwise the procedure returns to step S4. The same steps are performed during the next time interval T.
Fig. 14 is a flow chart illustrating a further embodiment of the method in accordance with the present invention. This embodiment is suitable for the approach described in Fig. 10. Step Sl queues packets that have arrived in the conference bridge for each of the streams. Step S7 selects streams eligible for mixing from a list of currently active streams. Step S8 monitors the queued packets to detect arrival of temporally related packets of the streams in the list. Step S9 tests whether all temporally related packets of the streams in the list have arrived. If so, the list of active streams is updated in step SlO and then the received temporally related packets from streams in the list are selected and mixed in step
SI l. Otherwise the procedure returns to step S8. The same steps are performed during the next time interval T.
It will be understood by those skilled in the art that various modifications and changes may be made to the present invention without departure from the scope thereof, which is defined by the appended claims.
REFERENCES
[1] US 2002/0184010 Al
[2] US 2003/0135370 Al
[3] Tόnu Trump, "Maximum Likelihood Trend Estimation in Exponential Noise", IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 49, NO. 9, SEPTEMBER 2001 , pp 2087-2095,
[4] Tόnu Trump, "Compensation for clock skew in voice over packet networks by speech interpolation", Proceedings of ISCAS 2004, pp 608-61 1.
Claims
1. A method of managing packets of multiple packetized media streams arriving in a conference bridge of a non- synchronous packet network, including the steps of queuing (Sl) arrived packets for each stream; monitoring (S2) the queued packets to detect arrival of temporally related packets of the streams; mixing (S3) selected temporally related packets once it has been detected that they have arrived.
2. The method of claim 1, including the step of selecting (S5) packets to be included in the mixing (S6) when all temporally related packets of the streams have arrived (S4).
3. The method of claim 1, including the steps of selecting (SI l) packets to be included in the mixing from a list of currently active (SlO) streams; starting mixing (SI l) as soon as the temporally related packets of the streams in the list have arrived (S8, S9).
4. The method of any of the preceding claims, including the step of replacing an expected packet to be included in the mix by an error concealment packet if the expected packet has not arrived within a predetermined time period after a previous mix.
5. The method of claim 2 or 3, including the step of determining the active/inactive status of packets by voice activity detection.
6. A conference bridge for managing arriving packets of multiple packetized media streams of a non- synchronous packet network, including queue memories (22) arranged to queue arrived packets for each stream; a control unit (24) arranged to monitor the queued packets to detect arrival of temporally related packets of the streams; a mixer (16, 30) arranged to mix selected temporally related packets once it has been detected that they have arrived.
7. The conference bridge of claim 6, including a packet selector (16) arranged to select packets to be included in the mix when all temporally related packets of the streams have arrived.
8. The conference bridge of claim 6, including a packet selector (28) arranged to determine packets to be included in the mix from a list of currently active streams; a mixer (30) arranged to start mixing as soon as the temporally related packets of the currently active streams in the list have arrived.
9. The conference bridge of any of the preceding claims 6-8, including an error concealer (16, 30) arranged to replace an expected packet to be included in the mix by an error concealment packet if the expected packet has not arrived within a predetermined time period after a previous mix.
10. The conference bridge of claim 7 or 8, including at least one voice activity detector for determining the active /inactive status of packets.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/SE2007/050395 WO2008147272A1 (en) | 2007-06-01 | 2007-06-01 | A conference bridge and a method for managing packets arriving therein |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/SE2007/050395 WO2008147272A1 (en) | 2007-06-01 | 2007-06-01 | A conference bridge and a method for managing packets arriving therein |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2008147272A1 true WO2008147272A1 (en) | 2008-12-04 |
Family
ID=39301582
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/SE2007/050395 WO2008147272A1 (en) | 2007-06-01 | 2007-06-01 | A conference bridge and a method for managing packets arriving therein |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2008147272A1 (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2011008789A1 (en) * | 2009-07-13 | 2011-01-20 | Qualcomm Incorporated | Selectively mixing media during a group communication session within a wireless communications system |
CN103325385A (en) * | 2012-03-23 | 2013-09-25 | 杜比实验室特许公司 | Speech communication method and device, method and device for operating jitter buffer |
WO2014004259A1 (en) * | 2012-06-28 | 2014-01-03 | Dolby Laboratories Licensing Corporation | Reduced system latency for dominant speaker |
US9025497B2 (en) | 2009-07-10 | 2015-05-05 | Qualcomm Incorporated | Media forwarding for a group communication session in a wireless communications system |
EP2913946A1 (en) * | 2014-02-26 | 2015-09-02 | Frequentis AG | Voice transmission in redundant systems |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1113657A2 (en) * | 1999-12-30 | 2001-07-04 | Nortel Networks Limited | Apparatus and method for packet-based media communications |
US6735213B2 (en) * | 2001-11-28 | 2004-05-11 | Thinkengine Networks Inc. | Processing of telephony samples |
-
2007
- 2007-06-01 WO PCT/SE2007/050395 patent/WO2008147272A1/en active Application Filing
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1113657A2 (en) * | 1999-12-30 | 2001-07-04 | Nortel Networks Limited | Apparatus and method for packet-based media communications |
US6735213B2 (en) * | 2001-11-28 | 2004-05-11 | Thinkengine Networks Inc. | Processing of telephony samples |
Non-Patent Citations (3)
Title |
---|
DICK M ET AL: "Network-centric music performance: practice and experiments", IEEE COMMUNICATIONS MAGAZINE, IEEE SERVICE CENTER, PISCATAWAY, US, vol. 43, no. 6, June 2005 (2005-06-01), pages 86 - 93, XP011134820, ISSN: 0163-6804 * |
OHSHIMA K ET AL: "A teleconferencing system with high-speed stream mixing for voice over IP", APPLICATIONS AND THE INTERNET, 2004. PROCEEDINGS. 2004 INTERNATIONAL SYMPOSIUM ON TOKYO, JAPAN 26-30 JAN. 2004, LOS ALAMITOS, CA, USA,IEEE COMPUT. SOC, US, 26 January 2004 (2004-01-26), pages 295 - 298, XP010682166, ISBN: 0-7695-2068-5 * |
YANG S ET AL: "Multipoint communications with speech mixing over IP network", COMPUTER COMMUNICATIONS, ELSEVIER SCIENCE PUBLISHERS BV, AMSTERDAM, NL, vol. 25, no. 1, 1 January 2002 (2002-01-01), pages 46 - 55, XP004321155, ISSN: 0140-3664 * |
Cited By (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9025497B2 (en) | 2009-07-10 | 2015-05-05 | Qualcomm Incorporated | Media forwarding for a group communication session in a wireless communications system |
CN102474513B (en) * | 2009-07-13 | 2015-03-18 | 高通股份有限公司 | Method for selectively mixing media during a group communication session of arbitration communication group and application server |
CN102474513A (en) * | 2009-07-13 | 2012-05-23 | 高通股份有限公司 | Selectively mixing media during a group communication session within a wireless communications system |
WO2011008789A1 (en) * | 2009-07-13 | 2011-01-20 | Qualcomm Incorporated | Selectively mixing media during a group communication session within a wireless communications system |
US9088630B2 (en) | 2009-07-13 | 2015-07-21 | Qualcomm Incorporated | Selectively mixing media during a group communication session within a wireless communications system |
KR101397266B1 (en) * | 2009-07-13 | 2014-05-20 | 퀄컴 인코포레이티드 | Selectively mixing media during a group communication session within a wireless communications system |
CN103325385B (en) * | 2012-03-23 | 2018-01-26 | 杜比实验室特许公司 | Speech communication method and device, method and device for operating jitter buffer |
WO2013142705A1 (en) * | 2012-03-23 | 2013-09-26 | Dolby Laboratories Licensing Corporation | Voice communication method and apparatus and method and apparatus for operating jitter buffer |
US9571425B2 (en) | 2012-03-23 | 2017-02-14 | Dolby Laboratories Licensing Corporation | Method and apparatus for voice communication based on voice activity detection |
CN103325385A (en) * | 2012-03-23 | 2013-09-25 | 杜比实验室特许公司 | Speech communication method and device, method and device for operating jitter buffer |
US9912617B2 (en) | 2012-03-23 | 2018-03-06 | Dolby Laboratories Licensing Corporation | Method and apparatus for voice communication based on voice activity detection |
CN107978325A (en) * | 2012-03-23 | 2018-05-01 | 杜比实验室特许公司 | Voice communication method and equipment, the method and apparatus of operation wobble buffer |
CN107978325B (en) * | 2012-03-23 | 2022-01-11 | 杜比实验室特许公司 | Voice communication method and apparatus, method and apparatus for operating jitter buffer |
WO2014004259A1 (en) * | 2012-06-28 | 2014-01-03 | Dolby Laboratories Licensing Corporation | Reduced system latency for dominant speaker |
US9426087B2 (en) | 2012-06-28 | 2016-08-23 | Dolby Laboratories Licensing Corporation | Reduced system latency for dominant speaker |
EP2913946A1 (en) * | 2014-02-26 | 2015-09-02 | Frequentis AG | Voice transmission in redundant systems |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9660887B1 (en) | Adaptive audio stream with latency compensation cross reference to other applications | |
US9641576B2 (en) | Dynamic locale based aggregation of full duplex media streams | |
US7084898B1 (en) | System and method for providing video conferencing synchronization | |
AU2008330261B2 (en) | Play-out delay estimation | |
US9654537B2 (en) | Synchronization and mixing of audio and video streams in network-based video conferencing call systems | |
US7693190B2 (en) | Lip synchronization for audio/video transmissions over a network | |
EP3616398B1 (en) | Method and apparatus for synchronizing applications' consumption of remote data | |
US6947417B2 (en) | Method and system for providing media services | |
US7161939B2 (en) | Method and system for switching among independent packetized audio streams | |
US7664057B1 (en) | Audio-to-video synchronization system and method for packet-based network video conferencing | |
US20030035444A1 (en) | Method for synchronizing a communication system via a packet-oriented data network | |
EP2868055B1 (en) | Reduced system latency for dominant speaker | |
US20090041020A1 (en) | Clock management between two endpoints | |
US20110167174A1 (en) | Method and System for In-Band Signaling of Multiple Media Streams | |
WO1997010674A1 (en) | Telecommunications multimedia conferencing system and method | |
US20080231687A1 (en) | Minimizing fast video update requests in a video conferencing system | |
WO2008147272A1 (en) | A conference bridge and a method for managing packets arriving therein | |
US20170070547A1 (en) | Low latency media mixing in packet networks | |
US6775301B1 (en) | System and method for compensating for channel jitter | |
US20180063011A1 (en) | Media Buffering | |
US7535995B1 (en) | System and method for volume indication during a communication session | |
Heyi | Voice over IP End-to-End delay measurements | |
Elliott et al. | Synchronization of Speaker Selection for Centralized Tandem Free VoIP Conferencing | |
Narbutt et al. | Adaptive Anti-jitter Mechanism for Multi-Party Conferencing in a H. 323 Multi-Point Control Unit |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 07748555 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 07748555 Country of ref document: EP Kind code of ref document: A1 |