[go: up one dir, main page]

WO2008147272A1 - A conference bridge and a method for managing packets arriving therein - Google Patents

A conference bridge and a method for managing packets arriving therein Download PDF

Info

Publication number
WO2008147272A1
WO2008147272A1 PCT/SE2007/050395 SE2007050395W WO2008147272A1 WO 2008147272 A1 WO2008147272 A1 WO 2008147272A1 SE 2007050395 W SE2007050395 W SE 2007050395W WO 2008147272 A1 WO2008147272 A1 WO 2008147272A1
Authority
WO
WIPO (PCT)
Prior art keywords
packets
arrived
streams
conference bridge
packet
Prior art date
Application number
PCT/SE2007/050395
Other languages
French (fr)
Inventor
Anders Eriksson
Tommy Falk
Original Assignee
Telefonaktiebolaget Lm Ericsson (Publ)
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Telefonaktiebolaget Lm Ericsson (Publ) filed Critical Telefonaktiebolaget Lm Ericsson (Publ)
Priority to PCT/SE2007/050395 priority Critical patent/WO2008147272A1/en
Publication of WO2008147272A1 publication Critical patent/WO2008147272A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L12/00Data switching networks
    • H04L12/02Details
    • H04L12/16Arrangements for providing special services to substations
    • H04L12/18Arrangements for providing special services to substations for broadcast or conference, e.g. multicast
    • H04L12/1881Arrangements for providing special services to substations for broadcast or conference, e.g. multicast with schedule organisation, e.g. priority, sequence management
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L49/00Packet switching elements
    • H04L49/90Buffering arrangements
    • H04L49/901Buffering arrangements using storage descriptor, e.g. read or write pointers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L49/00Packet switching elements
    • H04L49/90Buffering arrangements
    • H04L49/9023Buffering arrangements for implementing a jitter-buffer
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L49/00Packet switching elements
    • H04L49/90Buffering arrangements
    • H04L49/9084Reactions to storage capacity overflow
    • H04L49/9089Reactions to storage capacity overflow replacing packets in a storage arrangement, e.g. pushout
    • H04L49/9094Arrangements for simultaneous transmit and receive, e.g. simultaneous reading/writing from/to the storage element
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/40Support for services or applications
    • H04L65/403Arrangements for multi-party communication, e.g. for conferences
    • H04L65/4038Arrangements for multi-party communication, e.g. for conferences with floor control
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/60Network streaming of media packets
    • H04L65/75Media network packet handling
    • H04L65/765Media network packet handling intermediate
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/80Responding to QoS
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/42Systems providing special services or facilities to subscribers
    • H04M3/56Arrangements for connecting several subscribers to a common circuit, i.e. affording conference facilities
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L12/00Data switching networks
    • H04L12/02Details
    • H04L12/16Arrangements for providing special services to substations
    • H04L12/18Arrangements for providing special services to substations for broadcast or conference, e.g. multicast
    • H04L12/1813Arrangements for providing special services to substations for broadcast or conference, e.g. multicast for computer conferences, e.g. chat rooms
    • H04L12/1822Conducting the conference, e.g. admission, detection, selection or grouping of participants, correlating users to one or more conference sessions, prioritising transmission
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M7/00Arrangements for interconnection between switching centres
    • H04M7/006Networks other than PSTN/ISDN providing telephone service, e.g. Voice over Internet Protocol (VoIP), including next generation networks with a packet-switched transport layer

Definitions

  • the present invention relates to managing packets of multiple packetized media streams arriving in a conference bridge of a non- synchronous packet network
  • Conferencing capability allows for group communication and collaboration among geographically dispersed participants (also called users below).
  • PSTN Public Switched Telephone Network
  • the mixing of real-time media streams from several users can usually be performed without causing any substantial additional delay.
  • the individual audio samples from the participants are synchronized and arrive at regular time intervals. This means that the samples can be scheduled to be processed at regular time intervals and no additional delay is added except the time needed for the processing.
  • the processing for a voice teleconference usually consist of determining which talkers that are active and summing the speech contribution from the active talkers.
  • IP Internet Protocol
  • jitter buffers are typically implemented in the conference bridge on the incoming speech to cater for the varying delay of the packets.
  • a drawback of the prior art jitter buffer approach is that the jitter buffers introduce an undesirable extra delay in the conference bridge.
  • An object of the present invention is to reduce the delay in a conference bridge.
  • the present invention queues arrived packets for each stream.
  • the queued packets are monitored to detect arrival of temporally related packets of the streams. Selected temporally related packets are mixed once it has been detected that they have arrived.
  • An advantage of the present invention is that it reduces the overall delay in the network. Instead of introducing a fixed delay in the conference bridge, the jitter in the incoming packets is forwarded to be handled by the receiving terminal.
  • Fig. 1 is a simple block diagram of a network with a conference bridge
  • FIG. 2 is a more detailed block diagram of a typical prior art non- synchronous packet network with a conference bridge
  • Fig. 3 is a time diagram illustrating jitter buffering in a typical prior art conference bridge
  • Fig. 4 is a time diagram illustrating time dependence of the jitter from a first user to the conference bridge
  • Fig. 5 is a time diagram illustrating time dependence of the jitter from the conference bridge to a second user
  • Fig. 6 is a time diagram illustrating time dependence of the combined jitter from the first user to the second user
  • Fig. 7 is a time diagram illustrating an embodiment of the method in accordance with the present invention.
  • Fig. 8 is a block diagram of an embodiment of a conference bridge in accordance with the present invention.
  • Fig. 9 is a time diagram illustrating another embodiment of the method in accordance with the present invention.
  • Fig. 10 is a time diagram illustrating still another embodiment of the method in accordance with the present invention.
  • Fig. 11 is a block diagram of another embodiment of a conference bridge in accordance with the present invention.
  • Fig. 12 is a flow chart illustrating the principles of the method in accordance with the present invention.
  • Fig. 13 is a flow chart illustrating an embodiment of the method in accordance with the present invention.
  • Fig. 14 is a flow chart illustrating a further embodiment of the method in accordance with the present invention. DETAILED DESCRIPTION
  • Fig. 1 is a simple block diagram of a network with a conference bridge 10.
  • Users A-E each have bidirectional connections to conference bridge 10. In one direction each user sends audio or voice packets to conference bridge 10. In the opposite direction each user receives combined or mixed packets from the other users or participants in the conference. For example, user A sends packets A to conference bridge 10 and receives a mix BCDE of packets from the other users.
  • the purpose of the conference bridge is to manage received packets and perform the mixing that is relevant for each user.
  • Fig. 2 is a more detailed block diagram of a typical prior art non- synchronous packet network with a conference bridge 10. In order to simplify the description, Fig. 2 only illustrates how packets from users A-D are mixed and forwarded to user E. The other users are managed in a similar way.
  • Packets from users A-D reach respective jitter buffers 12 in conference bridge 10, where they are delayed. When the packets are released from jitter buffers 10, they are forwarded to respective decoders 14. Decoders 14 decode the packets into samples that are forwarded to a selecting and mixing unit 16. After mixing, the resulting samples are encoded into packets in an encoder 18 and forwarded to user E.
  • a clock unit 20 releases packets from jitter buffers 12 at regular time instants separated by a time interval T, which corresponds to the length of a speech frame, typically 20-40 ms. The added delay in jitter buffers is typically 1-3 time intervals T.
  • Fig. 3 is a time diagram illustrating jitter buffering in the conference bridge of Fig. 2.
  • This example assumes a jitter buffer delay of one time interval T
  • the packets are temporally re- lated means that they are based on samples generated (at the users) at approximately the same absolute or global time, i.e. they represent approximately simultaneous events. Due to the delay in jitter buffers 12, mixing is not performed until time instant k+ 1 , as illustrated by the arrow in the lower left corner of Fig. 3. At time instant k+1 all temporally related packets have also arrived, but this time they were not synchronous.
  • the jitter buffer approach can be described mathematically as follows:
  • the transmission time from, for example, user A to user E without a jitter buffer may be expressed as:
  • T A ⁇ bndge is the (constant) time delay from user A to the bridge
  • T bndge ⁇ E is the (constant) time delay from the bridge to user E, ⁇ ⁇ ⁇ bndge ik) ls the jitter in transmission time from user A to the bridge,
  • ⁇ bndge ⁇ E (k) is the jitter in transmission time from the bridge to user E.
  • the jitter can be assumed to obey some statistical distribution.
  • the transmission time for packets from user A to user E will be:
  • the transmission time for packets from A to E will, in average, be larger with the inclusion of a jitter buffer. Similar reasoning for the other paths leads to jitters ⁇ B ⁇ bndge (k), ⁇ c ⁇ bndge (k) and ⁇ D ⁇ bndge (k) . Thus, the jitter buffers have to satisfy
  • TjXterbuffer > maX (° " A ⁇ b ⁇ dge W > ⁇ B ⁇ b ⁇ dge W > ⁇ C ⁇ bndge ( k ) ⁇ D ⁇ bnd ge ⁇ k )) ( 4 )
  • the media stream with the largest jitter will determine the jitter buffer delay for all streams. It is also known to have adaptive jitter buffer delays.
  • the basic concept of the invention is to exclude the jitter buffering in the conference bridge and asynchronously perform the mixing of the signals once all the packets that should be mixed have arrived.
  • the purpose of the jitter buffers in a conference bridge is to enable the mixing of the correct samples from the included participants by compensating for jitter in the transmission time.
  • the conference bridge does not have to produce synchronous output (for e.g. playback in a loudspeaker), but can produce asyn- chronous output. Instead all the necessary synchronization for playback may be performed at the jitter buffer in the receiving terminal.
  • the present invention is based on two observations, namely:
  • the jitter buffer at user E is of the same magnitude as the jitter buffers in the conference bridge.
  • the jitters ⁇ A ⁇ b ⁇ dge (k) and ⁇ bndge ⁇ E (k) are typically uncorrelated. This means that it is unlikely that the two jitters are large at the same time k. This is illustrated in Fig. 4-6. Thus, in average the combined jitter ⁇ A ⁇ bndge (k) + ⁇ bndge ⁇ E (k) will not result in any significantly increased delay.
  • Fig. 7 is a time diagram illustrating an embodiment of the method in accordance with the present invention.
  • the packets received at the conference bridge are exactly the same as in Fig. 3.
  • the temporally related packets are mixed as soon as they have all been received.
  • the packets that arrive early in the time interval between k+ 1 and k+2 can immediately be decoded and mixed as soon as all packets have been received, instead of waiting until time instant k+3 as in the prior art of Fig. 3.
  • Fig. 8 is a block diagram of an embodiment of a conference bridge in accordance with the present invention.
  • packets that arrive in the con- ference bridge to jitter buffers are forwarded to queue memories or FIFOs 22.
  • Queue memories 22 are controlled by a control unit 24.
  • Control unit 24 monitors queue memories 22 over monitor lines 26 to determine whether temporally related packets from all streams have arrived in the queue memories. As soon as all temporally related packets representing a given time interval have arrived, control unit 24 releases these packets to decoders 14 for decoding and subsequent mixing in unit 16.
  • VADs voice activity detectors
  • Fig. 9 In the embodiment illustrated in Fig. 9 all temporally related packets are collected and thereafter analyzed with regard to speech activity. Only packets including speech activity are then mixed. In Fig. 9 packets without speech activity have been illustrated by empty boxes. Thus, in this embodiment mixing is not performed until all temporally related packets have arrived. Sometimes this means that packets that are later determined to include no speech activity have to be awaited before previously arrived packets with speech activity can be mixed. This situation has been illustrated, for example, between time instants k+1 and k+2, where a non-speech packet from user D arrives after the speech packets from users A-C. Fig. 9 also illustrates that different users may be silent at different times.
  • Fig. 9 can be implemented by a conference bridge in accordance with Fig. 8, provided with voice activity detection for each stream in unit 16.
  • mixing is normally performed as soon as all active packets have arrived. This is accomplished by storing and maintaining a list of active streams, typically in unit 16. For example, the three active packets from users A-C can be mixed as soon as they have all arrived between time instants k+1 and k+2, since user D is not in the list of active streams, and thus the later arriving packet from user D can be ignored in the mixing. The reason is that the previous packet from user D did not include any speech. Thus, by storing and maintaining a list of previously active streams or users, this embodiment needs to wait only for packets from users in the list of active streams before mixing can be started.
  • a stream is in the list does, however, not necessarily mean that the next arriving packet from this stream will be mixed, since the next packet may be inactive. This is illustrated between instants k+2 and k+3, where the arriving packet from stream A is inactive, thereby enabling updating of the list.
  • the active packet from stream D between instants k+2 and k+3 is not in the list when it arrives. However, this packet may actually be included in mixing if the list is updated with the status of all packets that have been received when the packets are released from the queue memories, in this case when the inactive packet from user A has arrived. If the active packet from user D had arrived after the inactive packet from user A, it would thus not have been included in the mixing.
  • Fig. 11 is a block diagram of an embodiment of a conference bridge in accordance with the present invention that is suitable for implementing the method illustrated in Fig. 10.
  • This embodiment differs from the embodiment of Fig. 8 in that selecting and mixing unit 16 has been modified into a selecting and mixing unit 30, which includes a unit 28 for maintaining a list of active streams.
  • Unit 28 forwards a current list of active streams to control unit 24, which uses this list to release arrived temporally related packets to decoders 14 as soon as the last packet from a stream in the list has arrived.
  • voice activity detection is assumed to be performed on decoded signals (samples). However, it is also possible to perform voice activity detection directly on the coded speech parameters (before decoders 14). This can for example be performed using the techniques described in [1] combined with the relevant parts of a standard VAD (e.g. 3GPP 26.094), or as exemplified by [2].
  • a standard VAD e.g. 3GPP 26.094
  • the master clock for the mixing may either be a reference clock in the mixer itself or may be derived from the included participants (e.g. the median time).
  • a concealment unit is typically provided in selecting and mixing unit 16, 30.
  • a mechanism for estimating and correcting for the clock drift may be included.
  • the clock drift preferably is handled at the mixing point, since otherwise an increasing time difference between users with clock drift would be introduced in the mixed signal.
  • Methods for deter- mining clock drift can be found in e.g. [3, 4], which are hereby incorporated by reference.
  • the functionality of the conference bridge of the present invention is typically implemented by a micro processor or micro/ signal processor combination and corresponding software.
  • Step Sl queues packets that have arrived in the conference bridge for each of the streams.
  • Step S2 monitors the queued packets to detect arrival of temporally related packets of the streams.
  • Step S3 mixes selected temporally related packets once it has been detected that they have arrived. The same steps are performed during the next time interval T.
  • Fig. 13 is a flow chart illustrating an embodiment of the method in accordance with the present invention. This embodiment is suitable for the approach described in Fig. 9.
  • Step Sl queues packets that have arrived in the conference bridge for each of the streams.
  • Step S2 monitors the queued packets to detect arrival of temporally related packets of the streams.
  • Step S4 tests whether all temporally related packets of the streams have arrived. If so, active temporally related packets are selected in step S5 and mixed in step S6. Otherwise the procedure returns to step S4. The same steps are performed during the next time interval T.
  • Fig. 14 is a flow chart illustrating a further embodiment of the method in accordance with the present invention. This embodiment is suitable for the approach described in Fig. 10.
  • Step Sl queues packets that have arrived in the conference bridge for each of the streams.
  • Step S7 selects streams eligible for mixing from a list of currently active streams.
  • Step S8 monitors the queued packets to detect arrival of temporally related packets of the streams in the list.
  • Step S9 tests whether all temporally related packets of the streams in the list have arrived. If so, the list of active streams is updated in step SlO and then the received temporally related packets from streams in the list are selected and mixed in step SI l. Otherwise the procedure returns to step S8. The same steps are performed during the next time interval T.

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Multimedia (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Telephonic Communication Services (AREA)

Abstract

A conference bridge for managing arriving packets of multiple packetized media streams of a non-synchronous packet network includes queue memories (22) arranged to queue arrived packets for each stream. A control unit (24) monitors the queued packets to detect arrival of temporally related packets of the streams. A mixer (16) mixes selected temporally related packets once it has been detected that they have arrived.

Description

A CONFERENCE BRIDGE AND A METHOD FOR MANAGING PACKETS ARRIVING THEREIN
TECHNICAL FIELD
The present invention relates to managing packets of multiple packetized media streams arriving in a conference bridge of a non- synchronous packet network
BACKGROUND
Conferencing capability allows for group communication and collaboration among geographically dispersed participants (also called users below). Historically, conferencing has been achieved in the Public Switched Telephone Network (PSTN) by means of a centralized conference bridge. In such a circuit switched network, the mixing of real-time media streams from several users can usually be performed without causing any substantial additional delay. In e.g. a voice teleconference, the individual audio samples from the participants are synchronized and arrive at regular time intervals. This means that the samples can be scheduled to be processed at regular time intervals and no additional delay is added except the time needed for the processing. The processing for a voice teleconference usually consist of determining which talkers that are active and summing the speech contribution from the active talkers.
Currently trends point towards the migration of voice communication services from the circuit- switched PSTN to non- synchronous packet-based Internet Protocol (IP) networks. This shift is motivated by a desire to provide data and voice services on a single, packet-based network infrastructure. In a non- synchronous packet network, the audio samples (or coded parameters representing the audio samples) from the participants in e.g. a voice telecon- ference do usually not arrive at regular time intervals due to the jitter in the transport network.
In order to synchronize the speech contributions from the participants and thus making it possible to mix samples corresponding to the same time from all participants, jitter buffers are typically implemented in the conference bridge on the incoming speech to cater for the varying delay of the packets.
SUMMARY
A drawback of the prior art jitter buffer approach is that the jitter buffers introduce an undesirable extra delay in the conference bridge.
An object of the present invention is to reduce the delay in a conference bridge.
This object is achieved in accordance with the attached claims.
Briefly, the present invention queues arrived packets for each stream. The queued packets are monitored to detect arrival of temporally related packets of the streams. Selected temporally related packets are mixed once it has been detected that they have arrived.
An advantage of the present invention is that it reduces the overall delay in the network. Instead of introducing a fixed delay in the conference bridge, the jitter in the incoming packets is forwarded to be handled by the receiving terminal. BRIEF DESCRIPTION OF THE DRAWINGS
The invention, together with further objects and advantages thereof, may best be understood by making reference to the following description taken together with the accompanying drawings, in which:
Fig. 1 is a simple block diagram of a network with a conference bridge;
Fig. 2 is a more detailed block diagram of a typical prior art non- synchronous packet network with a conference bridge;
Fig. 3 is a time diagram illustrating jitter buffering in a typical prior art conference bridge;
Fig. 4 is a time diagram illustrating time dependence of the jitter from a first user to the conference bridge;
Fig. 5 is a time diagram illustrating time dependence of the jitter from the conference bridge to a second user;
Fig. 6 is a time diagram illustrating time dependence of the combined jitter from the first user to the second user;
Fig. 7 is a time diagram illustrating an embodiment of the method in accordance with the present invention;
Fig. 8 is a block diagram of an embodiment of a conference bridge in accordance with the present invention;
Fig. 9 is a time diagram illustrating another embodiment of the method in accordance with the present invention;
Fig. 10 is a time diagram illustrating still another embodiment of the method in accordance with the present invention;
Fig. 11 is a block diagram of another embodiment of a conference bridge in accordance with the present invention;
Fig. 12 is a flow chart illustrating the principles of the method in accordance with the present invention;
Fig. 13 is a flow chart illustrating an embodiment of the method in accordance with the present invention; and
Fig. 14 is a flow chart illustrating a further embodiment of the method in accordance with the present invention. DETAILED DESCRIPTION
In the following description elements having the same or similar functions will be provided with the same reference designations in the drawings.
Fig. 1 is a simple block diagram of a network with a conference bridge 10. Users A-E each have bidirectional connections to conference bridge 10. In one direction each user sends audio or voice packets to conference bridge 10. In the opposite direction each user receives combined or mixed packets from the other users or participants in the conference. For example, user A sends packets A to conference bridge 10 and receives a mix BCDE of packets from the other users. The purpose of the conference bridge is to manage received packets and perform the mixing that is relevant for each user.
Fig. 2 is a more detailed block diagram of a typical prior art non- synchronous packet network with a conference bridge 10. In order to simplify the description, Fig. 2 only illustrates how packets from users A-D are mixed and forwarded to user E. The other users are managed in a similar way.
Packets from users A-D reach respective jitter buffers 12 in conference bridge 10, where they are delayed. When the packets are released from jitter buffers 10, they are forwarded to respective decoders 14. Decoders 14 decode the packets into samples that are forwarded to a selecting and mixing unit 16. After mixing, the resulting samples are encoded into packets in an encoder 18 and forwarded to user E. A clock unit 20 releases packets from jitter buffers 12 at regular time instants separated by a time interval T, which corresponds to the length of a speech frame, typically 20-40 ms. The added delay in jitter buffers is typically 1-3 time intervals T.
Fig. 3 is a time diagram illustrating jitter buffering in the conference bridge of Fig. 2. This example assumes a jitter buffer delay of one time interval T At time instant k all temporally related packets from users A-D have arrived simultaneously to conference bridge 10. The feature that the packets are temporally re- lated means that they are based on samples generated (at the users) at approximately the same absolute or global time, i.e. they represent approximately simultaneous events. Due to the delay in jitter buffers 12, mixing is not performed until time instant k+ 1 , as illustrated by the arrow in the lower left corner of Fig. 3. At time instant k+1 all temporally related packets have also arrived, but this time they were not synchronous. However, the buffering until time instant k+2 makes them synchronous. Similar comments apply to the temporally related packets arriving between instants k+2 and k+3. It is noted that so far the extra delay provided by the jitter buffers was not actually needed, since all packets arrived in time for mixing at the next periodic time instant. This, however, is changed at time instant k+4, since the packet from user A would have arrived to late for mixing without the extra delay. Thus, for such situations the extra delay provided for late packets by jitter buffers 12 is actually useful.
The jitter buffer approach can be described mathematically as follows: The transmission time from, for example, user A to user E without a jitter buffer may be expressed as:
T^α|(/c) = TA→hndge + §A→hndge{k) + Tbndge→E + §hndge→E{k) (1) where
TA→bndge is the (constant) time delay from user A to the bridge,
Tbndge→E is the (constant) time delay from the bridge to user E, δ Αbndgeik) ls the jitter in transmission time from user A to the bridge,
§bndge→E(k) is the jitter in transmission time from the bridge to user E.
The jitter can be assumed to obey some statistical distribution.
With the inclusion of a jitter buffer 12 in the conference bridge, the transmission time for packets from user A to user E will be:
1 A→E V^/ — 1 A→bndge τ λ jitterbuffer τ 1 bndge→E τ u bridge → E \^ I \^> Since the jitter buffer in the conference bridge is designed to compensate for the possible jitter §A→hndge{k) one can assume that:
* βtterbuffer >
Figure imgf000007_0001
(^)
most of the time. Thus, the transmission time for packets from A to E will, in average, be larger with the inclusion of a jitter buffer. Similar reasoning for the other paths leads to jitters δB→bndge(k), δc→bndge(k) and δD→bndge(k) . Thus, the jitter buffers have to satisfy
TjXterbuffer > maX" A→bπdge W > δ B→bπdge W > ^C→bndge (k) ^ D→bndge {k)) (4)
most of the time, i.e. the media stream with the largest jitter will determine the jitter buffer delay for all streams. It is also known to have adaptive jitter buffer delays.
From equation (1) it is noted that there is a jitter δbndge→E(k) in transmission time from the bridge to user E. This jitter is compensated by a playback jitter buffer at user E. The playback jitter buffer synchronizes the packets of the single mixed stream arriving at user E before decoding to generate a continuous stream of decoded samples at user E.
The basic concept of the invention is to exclude the jitter buffering in the conference bridge and asynchronously perform the mixing of the signals once all the packets that should be mixed have arrived. As discussed above, the purpose of the jitter buffers in a conference bridge is to enable the mixing of the correct samples from the included participants by compensating for jitter in the transmission time. However, unlike the situation for the playback in the terminal, the conference bridge does not have to produce synchronous output (for e.g. playback in a loudspeaker), but can produce asyn- chronous output. Instead all the necessary synchronization for playback may be performed at the jitter buffer in the receiving terminal. The present invention is based on two observations, namely:
1. The jitter buffer at user E is of the same magnitude as the jitter buffers in the conference bridge.
2. The jitters δA→bπdge(k) and δbndge→E(k) are typically uncorrelated. This means that it is unlikely that the two jitters are large at the same time k. This is illustrated in Fig. 4-6. Thus, in average the combined jitter ^A→bndge(k) + δbndge→E(k) will not result in any significantly increased delay.
These observations imply that by omitting the jitter buffers in the conference bridge, the jitters resulting at user E will in average be of the same magnitude as before, but that larger jitters will occur more frequently. However, since the jitter buffer at user E is already designed to cope with such jitter, no changes are necessary at user E.
Fig. 7 is a time diagram illustrating an embodiment of the method in accordance with the present invention. The packets received at the conference bridge are exactly the same as in Fig. 3. However, since there are no jitter buffers in the conference bridge, the temporally related packets are mixed as soon as they have all been received. As can be seen in Fig. 7 this results in an eliminated jitter buffer delay and mixed stream with asynchronously transmitted packets. For example, the packets that arrive early in the time interval between k+ 1 and k+2 can immediately be decoded and mixed as soon as all packets have been received, instead of waiting until time instant k+3 as in the prior art of Fig. 3.
Fig. 8 is a block diagram of an embodiment of a conference bridge in accordance with the present invention. Instead of forwarding packets that arrive in the con- ference bridge to jitter buffers, as in the prior art, they are forwarded to queue memories or FIFOs 22. Queue memories 22 are controlled by a control unit 24. Control unit 24 monitors queue memories 22 over monitor lines 26 to determine whether temporally related packets from all streams have arrived in the queue memories. As soon as all temporally related packets representing a given time interval have arrived, control unit 24 releases these packets to decoders 14 for decoding and subsequent mixing in unit 16.
In the above description it has been assumed that packets from all users A-D should be mixed. However, if one or several users are silent, packets from these users may be discarded instead of mixed with packets from active talkers. The detection may be performed by one or more voice activity detectors (VADs), typically included in unit 16. This situation can be handled in different ways, as illustrated by Fig. 9 and 10.
In the embodiment illustrated in Fig. 9 all temporally related packets are collected and thereafter analyzed with regard to speech activity. Only packets including speech activity are then mixed. In Fig. 9 packets without speech activity have been illustrated by empty boxes. Thus, in this embodiment mixing is not performed until all temporally related packets have arrived. Sometimes this means that packets that are later determined to include no speech activity have to be awaited before previously arrived packets with speech activity can be mixed. This situation has been illustrated, for example, between time instants k+1 and k+2, where a non-speech packet from user D arrives after the speech packets from users A-C. Fig. 9 also illustrates that different users may be silent at different times. Thus, user D is silent between time instants k and k+2, while user A is silent between time instants k+2 and k+5. The embodiment illustrated in Fig. 9 can be implemented by a conference bridge in accordance with Fig. 8, provided with voice activity detection for each stream in unit 16.
In the embodiment illustrated in Fig. 10, mixing is normally performed as soon as all active packets have arrived. This is accomplished by storing and maintaining a list of active streams, typically in unit 16. For example, the three active packets from users A-C can be mixed as soon as they have all arrived between time instants k+1 and k+2, since user D is not in the list of active streams, and thus the later arriving packet from user D can be ignored in the mixing. The reason is that the previous packet from user D did not include any speech. Thus, by storing and maintaining a list of previously active streams or users, this embodiment needs to wait only for packets from users in the list of active streams before mixing can be started. The fact that a stream is in the list does, however, not necessarily mean that the next arriving packet from this stream will be mixed, since the next packet may be inactive. This is illustrated between instants k+2 and k+3, where the arriving packet from stream A is inactive, thereby enabling updating of the list. The active packet from stream D between instants k+2 and k+3 is not in the list when it arrives. However, this packet may actually be included in mixing if the list is updated with the status of all packets that have been received when the packets are released from the queue memories, in this case when the inactive packet from user A has arrived. If the active packet from user D had arrived after the inactive packet from user A, it would thus not have been included in the mixing. Although late packets from streams that are not in the list are ignored for mixing purposes, they are still examined when they arrive to determine whether their inactive /active status has changed to update the list. Similarly, although arriving inactive packets from streams in the list will not be mixed, they will be used to update the list.
Comparing the embodiments of Fig. 9 and 10, it is appreciated that mixing can often be started earlier in the embodiment of Fig. 10. The trade-off is that the active /inactive status of streams may occasionally be delayed one time interval T (due to late arriving active packets from streams not yet in the list), which may lead to exclusion of an actually active packet from mixing.
Fig. 11 is a block diagram of an embodiment of a conference bridge in accordance with the present invention that is suitable for implementing the method illustrated in Fig. 10. This embodiment differs from the embodiment of Fig. 8 in that selecting and mixing unit 16 has been modified into a selecting and mixing unit 30, which includes a unit 28 for maintaining a list of active streams. Unit 28 forwards a current list of active streams to control unit 24, which uses this list to release arrived temporally related packets to decoders 14 as soon as the last packet from a stream in the list has arrived.
In the embodiments illustrated in Fig. 8 and 1 1 , voice activity detection is assumed to be performed on decoded signals (samples). However, it is also possible to perform voice activity detection directly on the coded speech parameters (before decoders 14). This can for example be performed using the techniques described in [1] combined with the relevant parts of a standard VAD (e.g. 3GPP 26.094), or as exemplified by [2].
The master clock for the mixing may either be a reference clock in the mixer itself or may be derived from the included participants (e.g. the median time).
In the description above it has been assumed that packets arrive with reasonable delays to be included in the mixing. However, if this is not the case concealment strategies may be applied. For example, if an expected packet from use A has not arrived within a predetermined time out period, for example 3-5 time intervals T, a concealment packet (for example the last received packet from user A) may be included in the mixing instead. In this case the late packet is typically discarded if it eventually arrives. The time out period for late arriving packets should be set according to the statistics of the jitter and the desired lost frame rate. A concealment unit is typically provided in selecting and mixing unit 16, 30.
In order to handle possible drift between the clocks of the sample circuits of the respective user terminals, a mechanism for estimating and correcting for the clock drift may be included. The clock drift preferably is handled at the mixing point, since otherwise an increasing time difference between users with clock drift would be introduced in the mixed signal. Methods for deter- mining clock drift can be found in e.g. [3, 4], which are hereby incorporated by reference.
The functionality of the conference bridge of the present invention is typically implemented by a micro processor or micro/ signal processor combination and corresponding software.
Fig. 12 is a flow chart illustrating the principles of the method in accordance with the present invention. Step Sl queues packets that have arrived in the conference bridge for each of the streams. Step S2 monitors the queued packets to detect arrival of temporally related packets of the streams. Step S3 mixes selected temporally related packets once it has been detected that they have arrived. The same steps are performed during the next time interval T.
Fig. 13 is a flow chart illustrating an embodiment of the method in accordance with the present invention. This embodiment is suitable for the approach described in Fig. 9. Step Sl queues packets that have arrived in the conference bridge for each of the streams. Step S2 monitors the queued packets to detect arrival of temporally related packets of the streams. Step S4 tests whether all temporally related packets of the streams have arrived. If so, active temporally related packets are selected in step S5 and mixed in step S6. Otherwise the procedure returns to step S4. The same steps are performed during the next time interval T.
Fig. 14 is a flow chart illustrating a further embodiment of the method in accordance with the present invention. This embodiment is suitable for the approach described in Fig. 10. Step Sl queues packets that have arrived in the conference bridge for each of the streams. Step S7 selects streams eligible for mixing from a list of currently active streams. Step S8 monitors the queued packets to detect arrival of temporally related packets of the streams in the list. Step S9 tests whether all temporally related packets of the streams in the list have arrived. If so, the list of active streams is updated in step SlO and then the received temporally related packets from streams in the list are selected and mixed in step SI l. Otherwise the procedure returns to step S8. The same steps are performed during the next time interval T.
It will be understood by those skilled in the art that various modifications and changes may be made to the present invention without departure from the scope thereof, which is defined by the appended claims.
REFERENCES
[1] US 2002/0184010 Al
[2] US 2003/0135370 Al
[3] Tόnu Trump, "Maximum Likelihood Trend Estimation in Exponential Noise", IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 49, NO. 9, SEPTEMBER 2001 , pp 2087-2095,
[4] Tόnu Trump, "Compensation for clock skew in voice over packet networks by speech interpolation", Proceedings of ISCAS 2004, pp 608-61 1.

Claims

1. A method of managing packets of multiple packetized media streams arriving in a conference bridge of a non- synchronous packet network, including the steps of queuing (Sl) arrived packets for each stream; monitoring (S2) the queued packets to detect arrival of temporally related packets of the streams; mixing (S3) selected temporally related packets once it has been detected that they have arrived.
2. The method of claim 1, including the step of selecting (S5) packets to be included in the mixing (S6) when all temporally related packets of the streams have arrived (S4).
3. The method of claim 1, including the steps of selecting (SI l) packets to be included in the mixing from a list of currently active (SlO) streams; starting mixing (SI l) as soon as the temporally related packets of the streams in the list have arrived (S8, S9).
4. The method of any of the preceding claims, including the step of replacing an expected packet to be included in the mix by an error concealment packet if the expected packet has not arrived within a predetermined time period after a previous mix.
5. The method of claim 2 or 3, including the step of determining the active/inactive status of packets by voice activity detection.
6. A conference bridge for managing arriving packets of multiple packetized media streams of a non- synchronous packet network, including queue memories (22) arranged to queue arrived packets for each stream; a control unit (24) arranged to monitor the queued packets to detect arrival of temporally related packets of the streams; a mixer (16, 30) arranged to mix selected temporally related packets once it has been detected that they have arrived.
7. The conference bridge of claim 6, including a packet selector (16) arranged to select packets to be included in the mix when all temporally related packets of the streams have arrived.
8. The conference bridge of claim 6, including a packet selector (28) arranged to determine packets to be included in the mix from a list of currently active streams; a mixer (30) arranged to start mixing as soon as the temporally related packets of the currently active streams in the list have arrived.
9. The conference bridge of any of the preceding claims 6-8, including an error concealer (16, 30) arranged to replace an expected packet to be included in the mix by an error concealment packet if the expected packet has not arrived within a predetermined time period after a previous mix.
10. The conference bridge of claim 7 or 8, including at least one voice activity detector for determining the active /inactive status of packets.
PCT/SE2007/050395 2007-06-01 2007-06-01 A conference bridge and a method for managing packets arriving therein WO2008147272A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/SE2007/050395 WO2008147272A1 (en) 2007-06-01 2007-06-01 A conference bridge and a method for managing packets arriving therein

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/SE2007/050395 WO2008147272A1 (en) 2007-06-01 2007-06-01 A conference bridge and a method for managing packets arriving therein

Publications (1)

Publication Number Publication Date
WO2008147272A1 true WO2008147272A1 (en) 2008-12-04

Family

ID=39301582

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/SE2007/050395 WO2008147272A1 (en) 2007-06-01 2007-06-01 A conference bridge and a method for managing packets arriving therein

Country Status (1)

Country Link
WO (1) WO2008147272A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2011008789A1 (en) * 2009-07-13 2011-01-20 Qualcomm Incorporated Selectively mixing media during a group communication session within a wireless communications system
CN103325385A (en) * 2012-03-23 2013-09-25 杜比实验室特许公司 Speech communication method and device, method and device for operating jitter buffer
WO2014004259A1 (en) * 2012-06-28 2014-01-03 Dolby Laboratories Licensing Corporation Reduced system latency for dominant speaker
US9025497B2 (en) 2009-07-10 2015-05-05 Qualcomm Incorporated Media forwarding for a group communication session in a wireless communications system
EP2913946A1 (en) * 2014-02-26 2015-09-02 Frequentis AG Voice transmission in redundant systems

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1113657A2 (en) * 1999-12-30 2001-07-04 Nortel Networks Limited Apparatus and method for packet-based media communications
US6735213B2 (en) * 2001-11-28 2004-05-11 Thinkengine Networks Inc. Processing of telephony samples

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1113657A2 (en) * 1999-12-30 2001-07-04 Nortel Networks Limited Apparatus and method for packet-based media communications
US6735213B2 (en) * 2001-11-28 2004-05-11 Thinkengine Networks Inc. Processing of telephony samples

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
DICK M ET AL: "Network-centric music performance: practice and experiments", IEEE COMMUNICATIONS MAGAZINE, IEEE SERVICE CENTER, PISCATAWAY, US, vol. 43, no. 6, June 2005 (2005-06-01), pages 86 - 93, XP011134820, ISSN: 0163-6804 *
OHSHIMA K ET AL: "A teleconferencing system with high-speed stream mixing for voice over IP", APPLICATIONS AND THE INTERNET, 2004. PROCEEDINGS. 2004 INTERNATIONAL SYMPOSIUM ON TOKYO, JAPAN 26-30 JAN. 2004, LOS ALAMITOS, CA, USA,IEEE COMPUT. SOC, US, 26 January 2004 (2004-01-26), pages 295 - 298, XP010682166, ISBN: 0-7695-2068-5 *
YANG S ET AL: "Multipoint communications with speech mixing over IP network", COMPUTER COMMUNICATIONS, ELSEVIER SCIENCE PUBLISHERS BV, AMSTERDAM, NL, vol. 25, no. 1, 1 January 2002 (2002-01-01), pages 46 - 55, XP004321155, ISSN: 0140-3664 *

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9025497B2 (en) 2009-07-10 2015-05-05 Qualcomm Incorporated Media forwarding for a group communication session in a wireless communications system
CN102474513B (en) * 2009-07-13 2015-03-18 高通股份有限公司 Method for selectively mixing media during a group communication session of arbitration communication group and application server
CN102474513A (en) * 2009-07-13 2012-05-23 高通股份有限公司 Selectively mixing media during a group communication session within a wireless communications system
WO2011008789A1 (en) * 2009-07-13 2011-01-20 Qualcomm Incorporated Selectively mixing media during a group communication session within a wireless communications system
US9088630B2 (en) 2009-07-13 2015-07-21 Qualcomm Incorporated Selectively mixing media during a group communication session within a wireless communications system
KR101397266B1 (en) * 2009-07-13 2014-05-20 퀄컴 인코포레이티드 Selectively mixing media during a group communication session within a wireless communications system
CN103325385B (en) * 2012-03-23 2018-01-26 杜比实验室特许公司 Speech communication method and device, method and device for operating jitter buffer
WO2013142705A1 (en) * 2012-03-23 2013-09-26 Dolby Laboratories Licensing Corporation Voice communication method and apparatus and method and apparatus for operating jitter buffer
US9571425B2 (en) 2012-03-23 2017-02-14 Dolby Laboratories Licensing Corporation Method and apparatus for voice communication based on voice activity detection
CN103325385A (en) * 2012-03-23 2013-09-25 杜比实验室特许公司 Speech communication method and device, method and device for operating jitter buffer
US9912617B2 (en) 2012-03-23 2018-03-06 Dolby Laboratories Licensing Corporation Method and apparatus for voice communication based on voice activity detection
CN107978325A (en) * 2012-03-23 2018-05-01 杜比实验室特许公司 Voice communication method and equipment, the method and apparatus of operation wobble buffer
CN107978325B (en) * 2012-03-23 2022-01-11 杜比实验室特许公司 Voice communication method and apparatus, method and apparatus for operating jitter buffer
WO2014004259A1 (en) * 2012-06-28 2014-01-03 Dolby Laboratories Licensing Corporation Reduced system latency for dominant speaker
US9426087B2 (en) 2012-06-28 2016-08-23 Dolby Laboratories Licensing Corporation Reduced system latency for dominant speaker
EP2913946A1 (en) * 2014-02-26 2015-09-02 Frequentis AG Voice transmission in redundant systems

Similar Documents

Publication Publication Date Title
US9660887B1 (en) Adaptive audio stream with latency compensation cross reference to other applications
US9641576B2 (en) Dynamic locale based aggregation of full duplex media streams
US7084898B1 (en) System and method for providing video conferencing synchronization
AU2008330261B2 (en) Play-out delay estimation
US9654537B2 (en) Synchronization and mixing of audio and video streams in network-based video conferencing call systems
US7693190B2 (en) Lip synchronization for audio/video transmissions over a network
EP3616398B1 (en) Method and apparatus for synchronizing applications' consumption of remote data
US6947417B2 (en) Method and system for providing media services
US7161939B2 (en) Method and system for switching among independent packetized audio streams
US7664057B1 (en) Audio-to-video synchronization system and method for packet-based network video conferencing
US20030035444A1 (en) Method for synchronizing a communication system via a packet-oriented data network
EP2868055B1 (en) Reduced system latency for dominant speaker
US20090041020A1 (en) Clock management between two endpoints
US20110167174A1 (en) Method and System for In-Band Signaling of Multiple Media Streams
WO1997010674A1 (en) Telecommunications multimedia conferencing system and method
US20080231687A1 (en) Minimizing fast video update requests in a video conferencing system
WO2008147272A1 (en) A conference bridge and a method for managing packets arriving therein
US20170070547A1 (en) Low latency media mixing in packet networks
US6775301B1 (en) System and method for compensating for channel jitter
US20180063011A1 (en) Media Buffering
US7535995B1 (en) System and method for volume indication during a communication session
Heyi Voice over IP End-to-End delay measurements
Elliott et al. Synchronization of Speaker Selection for Centralized Tandem Free VoIP Conferencing
Narbutt et al. Adaptive Anti-jitter Mechanism for Multi-Party Conferencing in a H. 323 Multi-Point Control Unit

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 07748555

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 07748555

Country of ref document: EP

Kind code of ref document: A1